**Practicum: Naive Bayes**

In this exercise you will implement Naive Bayes classification in Python. 

##Background

- Naive Bayes primarily relies on the Bayes Theorem:

  $$p(y|x) = \frac{p(x|y) \times p(y)}{p(x)}$$

  <br>

  where 

  - $p(y|x)$ is the probability of observing a particular label / class given the data (posterior)
  - $p(x|y)$ is the probability of observing the data given a particular label / class (likelihood)
  - $p(y)$ is the probability of observing the a particular label / class (prior)
  - $p(x)$ is the probability of observing the data

  <br>

- It is assumed that $p(x)$ is constant, and therefore we can ignore the term and rewrite the formulation for Naive Bayes as:

  $$p(y|x) \propto p(x|y) \times p(y)$$

  <br>

- In more concrete terms, we can express the likelihood of observing the data as the joint probability of observing all the features in the data:

  $$p(x|y) = p(x_i|y) \cdot p(x_{i+1}|y) \cdot p(x_{i+2}|y) \cdot \text{...} \cdot p(x_n|y)$$
  
  <br>
  
- We would compute the likelihood based on exisiting data and set a prior based on the class distribution
- Based on the likelihood and prior, we can then compute the probability observing a certain class given I have observed feature i two times and  feature i+1 3 times:

  $$p(y|x) \propto p(x_i|y)^2 \times p(x_{i+1}|y)^3 \times p(y)$$

  <br>

- To take the log form of the above formulation, we will get:

  $$log(p(y|x)) \propto 2log(p(x_i|y)) + 3log(p(x_{i+1}|y)) + log(p(y))$$
  
  <br>
  
- The general form to compute the posterior would be:

  $$log(p(y|x)) \propto \sum_{i=1}^n  x_i log(p(x_i|y)) + log(p(y))$$

  <br>
  
- To compute the likelihood of observing a certain feature given a class, $p(x_i|y)$:

  $$p(x_i|y) = \frac{S_{y,i} + \alpha}{S_y + \alpha p}$$
  
  where 
  - p is the number of features
  - $\alpha$ is a smoothing terming which prevents undefined probability, usually set to 1
  - $S_{y,i}$ is the sum of all of the $i^{th}$ features for all the datapoints in class $y$
  - $S_y$ is the sum of all of the features for all the datapoints in class $y$

##Instructions

You are given some starter code in `naive_bayes_student.py`. You have 4 functions to fill in:

- `computer_prior`
- `computer_likelihood`
- `predict`
- `score`

You are also provided with instance variables which you will use in the 4 functions mentioned above.

<br>

1. Fill in the function `compute_prior` by looping through `y` and keeping count of each of the classes in `y` by adding to the dictionary `self.prior`.
   
   <br>

2. Fill in the function `compute_likelihood` by looping through pairs of `y` and `X` (Hint: look up `zip` in python).

   **In the first loop, you will:**
      
   - Populate the dictionary `self.per_feature_per_label` by adding each class as a key and the corresponding row of features as a value. If the key already exists, then add the current row to the existing row. This operation computes the $S_{y,i}$ term as stated above. 

   <br>
   
   - Populate the dictionary `self.feature_sum_per_label` by adding each class as a key and the sum of corresponding row of features as a value. If the key already exists, then add the current row sum to the exisiting row sum. This operation computes the $S_y$ term as stated above. 

   <br>
   
   **In a second loop, you will:**
   
   - Populate the dictionary `self.likelihood` by applying the formula $p(x_i|y) = \frac{S_{y,i} + \alpha}{S_y + \alpha p}$
   - The key to the dictionary will be the class label and the value would be a numpy array of n features containing the value of likelihood for each of the feature given the class label

   <br>
   
3. Fill in the `predict` function. You will assume `self.likelihood` and `self.prior` are populated and you will be applying the following formula to compute the probability of observing each class for each data point.

   $$log(p(y|x)) \propto \sum_{i=1}^n  x_i log(p(x_i|y)) + log(p(y))$$
   
   While you are computing the posterior probabilities for each of the data point, you will select the class with the highest posterior and add it to a list (outside of all the loops). You will then return the list at the end of the function.
   
   <br>

4. Fill the `score` function by using the `predict` function to generate predicted labels for the `X` argument. Subsequently, compute the accuracy of the predicted classes by comparing to the `y` argument that contains the actual labels.

   <br>
   
5. Finally, run `python run_naive_bayes` to check if you results match that of sklearn's implementation.