author : s.aparajith@live.com
date : 14/5/2021
- requires MSVC 15 or above compiler.
- requires a latest version of cmake.
- build can be triggered by the following commands
-
mkdir build cd build cmake .. -G "Visual Studio 16 2019" -A x64 cmake --build . --config Release cd Release GNBClassifier.exe
- requires gnu gc++ 5.4 or above compiler.
- requires a latest version of cmake.
- build can be triggered by the following commands
-
mkdir build cd build cmake .. && make ./GNBClassifier
This project deals with the theory of gaussian naive bayes classifier
and it's implementation in C++. It uses an example data of a vehicle making some lane changes.
The Gaussian NB classifier will predict the behavior of the vehicle on the highway given it's Frenet coordinates s
and d
and it's first order derivatives.
the Gaussian Naive Bayes classifier is an extension to the naive bayes classifier.
Abstractly, naïve Bayes is a conditional probability model:
given a problem instance to be classified, represented by a vector x = (x1, x2, x3 ... xn)
representing some n features (independent variables),
it assigns to this instance probabilities
for each of the K
possible outcomes or classes Ck
.
The problem with the above formulation is that if the number of features n is large or if a feature takes on a large number of values, then basing such a model on probability tables is infeasible. The model must therefore be reformulated to make it more tractable.
Using Bayes' theorem, the conditional probability can be decomposed as
which is nothing but,
In practice, the numerator is only of interest as the denominator doesn't depend on C
and the values xi
are given which makes the denominator effectively constant.
The numerator is equivalent to the joint probability model, which can be rewritten using the chain rule for repeated application of conditional probability
now, making a naiive assumption, all features x
are mutually independent, conditional on the category Ck
assuming,
Hence, the joint model can be espressed as:
Thus, with the above independence assumptions, the conditional distribution over the class variable C
is:
For a feature x
and label C
with mean μ
and standard deviation σ
,
the conditional probability can be computed using the formula
where, v
would be used in the prediction step.
v
is the observed states of the vehicle which is used to find the conditional probability of x
given C
so that C
given x
can be found.
In this formula, the argmax is taken over all possible labels Ck
and the product is taken over all features Xi
with values vi
.
src/classifier.h
contains the class GNB
which creates an instance of a gaussian naive bayes classifier object.
- using the
void train(...)
method the model is trained using the previously presented theory. - using the
string predict(...)
method, prediction can be done using the trained model.
Note: the memberpossible_labels
would need to be extended/changed if data files have more/different labels.
In the image below the behaviors possible for on a 3 lane highway (with lanes of 4 meter width) is shown. The dots represent the d (y axis) and s (x axis) coordinates of vehicles as they either...
- change lanes left (shown in blue)
- keep lane (shown in black)
- or change lanes right (shown in red)
the coordinate contains the following four features
- s
- d
- d(s)/dt
- d(d)/dt
the lane width is given as 4m
.