# Q2 Naive-Bayes Classification (30 Points)
## Definition
Naive Bayes is a relatively simple classification algorithm based on probability and uses Bayes Theorom with an independence assumption among the features in the data. The fundamental idea of Naive Bayes is that it computes the probability of every class, which we want to reveal, based on the probability of every feature in the data.

According to Naive Bayes algorithm, we are going to assume that every feature in the data is in an independent condition on the outcome probability of each separate class. Let's assume that we are doing a car classification and we have a data such as;

| buying   | maint    | doors    | persons  | lug-boot | safety   | class    |
| :------- | :------- | :------- | :------- | :------- | :------- | :------- |
| vvhigh   | vhigh    | 2        | 2        | small    | low      | unacc    |

**Description of dataset:**
* CAR                      car acceptability
    * PRICE                  overall price
        * _buying_               buying price
        * _maint_                price of the maintenance
* TECH                   technical characteristics
    * COMFORT              comfort
        * _doors_              number of doors
        * _persons_            capacity in terms of persons to carry
        * _lug-boot_           the size of luggage boot
    * _safety_               estimated safety of the car
   
Naive Bayes assumes that above mentioned features are independent of each other.

In machine learning, Naive Bayes is advantageous against other commonly used classification algorithms because of its simplicity, speed and accuracy on small datasets and it also enables us to make classification despite missing information. Naive Bayes is a supervised learning algorithm because it needs to be trained with a labeled dataset.

## Bayes Theorem
Consider two events, $A$ and $B$. For example, $A$ is a set of car features, which are $A \in \{ vvhigh, vhigh, 2, 2, small, low \}$,and $B$ is a set of car classes that are $B \in \{ unacc, acc, good, vgood \}$


* $A \cap B$ means the intersection of $A$ and $B$.
* $P(A \mid B)$ is read as probability of A given B.

When we know that $B$ is given (Event $B$ has occurred), it means our sample space is $B$ that is the right figure. Now we are trying to compute the probability of also occuring $A$ at the same time (the conditional probability of $A$). It is obvious that we are trying to find the probability of $A \cap B$ given that we are in the space of $B$.

\begin{equation}
P(A \mid B) = \frac{P(A \cap B)}{P(B)}
\end{equation}

We can rewrite $P(A \cap B)$ as $P(A, B)$. Two of these mean the probability of $A$ and $B$ at the same time. So the new form of the equation is :

\begin{equation}
P(A \mid B) = \frac{P(A, B)}{P(B)}
\end{equation}

For the probability of $A$ and $B$, we can deduce equations below from the figure above.

\begin{align}
& P(A, B) = P(B, A) = P(A \mid B)P(B) \\
& P(A, B) = P(B, A) = P(B \mid A)P(A)
\end{align}

Let's look at the new form of the equation putting the second form of $P(A, B)$:

\begin{equation}
P(A \mid B) = \frac{P(B \mid A)P(A)}{P(B)}
\end{equation}

This equation is known as **Bayes Theorem**.
* $P(A \mid B)$ : posterior that is the probability of $A$ when it is known that $B$ is given
* $P(B)$ : evidence that is the marginal probability of $B$
* $P(B \mid A)$ : likelihood
* $P(A)$ : prior probability that is marginal probabiliy of $A$

## Naive-Bayes Formulation
Suppose we have a dataset which each observation belongs to a class from the finite set $C = \{ c_1, c_2, ..., c_n \}$ and each observation constitutes from a few features $F = \{ f_1, f_2, ..., f_b \}$. If we could compute the probabilities of $P(c_1 | F), P(c_2 | F), ..., P(c_n | F)$ then we could predict the class for a new observation $i$ to be one of those which have the highest probability.

To compute the conditional probabilities, we can use Bayes Theorem;

\begin{equation}
P(c_i \mid f_1, f_2, \dots ,f_b) = \frac{P(f_1, f_2, \dots ,f_b \mid c_i)P(c_i)}{P(f_1, f_2, \dots ,f_b)} 
\end{equation}

As you know, Naive-Bayes supposes that all features are in independent conditions, therefore we can rewrite this equation like;

\begin{equation}
P(c_i \mid f_1, f_2, \dots ,f_b) = \frac{P(f_1 \mid c_i)P(f_2 \mid c_i) \dots P(f_b \mid c_i)P(c_i)}{P(f_1, f_2, \dots ,f_b)} 
\end{equation}

The final form of equation is

\begin{align}
& \text{for} \; i = 1, 2, \dots , n \\
& P(c_i \mid f_1, f_2, \dots ,f_b) = P(c_i) \frac{\Pi_{j=1}^b P(f_j \mid c_i)}{P(f_1, f_2, \dots ,f_b)} 
\end{align}

Since $P(f_1, f_2, \dots ,f_b)$ is a constant, we can use the classification rule below.

\begin{align}
& P(c_i \mid f_1, f_2, \dots ,f_b) \propto P(c_i) \Pi_{j=1}^b P(f_j \mid c_i)
\end{align}



# Task. 
- Use the 'car_eval.csv' data set. Train a Naive Bayes model  using odd-indexed rows. Test the accuracy of your model using even-indexed rows of the dataset.
- Display and explain the likelihood (Class conditional probabilities) for each input variable.
- Discuss one missclassified case for each category in terms of class conditional probabilities. Why do you think it was missclassified.



In [4]:
import pandas as pd

In [5]:
# if dataset.index % test_indis == 0 
# then it is going to be used as test dataset
# they will not be appended into the train dataset

dataset = pd.read_csv('car-eval.csv')
test_indis = 2

train_dataset = dataset[dataset.index % test_indis != 0]
test_dataset = dataset[dataset.index % test_indis == 0]

# total count of sample space
total = len(train_dataset)

In [6]:
### Your code here
# Expected accuracy 81%