# Naive Bayes



---


### Law of Total Probability

#### Definition
`The total probability rule (also called the Law of Total Probability) breaks up probability calculations into distinct parts. It‚Äôs used to find the probability of an event, A, when you don‚Äôt know enough about A‚Äôs probabilities to calculate it directly. Instead, you take a related event, B, and use that to calculate the probability for A.`

$$P(A)=\sum_{i=1}^{n}P(A|B_i)*P(B_i)$$

####Example
We have two bags:
* Bag 1 has 2 red traingles, 2 green squares and 1 blue circle
* Bas 2 has 1 red triangle, 1 green square and 2 blue circles

If we choose one bag at random and pick something from the bag also randomly, then what is the probability of choosing a red triangle?

![](https://drive.google.com/uc?export=view&id=1KxmicbEdMXBHVzP_WzASSK_hgOnnFOSd)

####Solution
- let $P(B_i)$ be the probability of choosing bag $B_i$
- let R be the event of choosing a red triangle

$P(B_a) = P(B_b) = \frac{1}{2} = 0.5$

$P(R|B_a) = \frac{2}{5} = 0.4$

$P(R|B_b) = \frac{1}{4} = 0.25$


By applying Law of Total Probability we can compute the final probability:

$P(R) = P(R|B_a)*P(B_a) + P(R|B_b)*P(B_b) = 0.4*0.5 + 0.25*0.5 = 0.325$



---




### Dependent vs Independent Event

- if events are independent, it means that each event is not affected by any other event(e.g. coin toss - the past does not affect the current toss)
- events are dependent if a past event will affect the current event (e.g drawing two Kings from a deck)


---



### Bayes' Theorem

- $\color{green}{P(A|B)} = \frac{\color{blue}{P(B|A)} * \color{orange}{P(A)}}{\color{LightSeaGreen}{P(B)}}$ 
<br><br>
- $\color{green}{posterior} = \frac{\color{blue}{prior} \times \color{orange}{\color{orange}{likelihood}}}{\color{LightSeaGreen}{evidence}}$



---






### Target

The objective is to build a classifier which is capable to predict a class based on the inputs given (given sample $x_i$ predict to which class it belongs).


---



### Concepts
- $X=\{x_1, x_2, \dots, x_n\}$: dataset with **n** elements
- $A=\{A_1, A_2, \dots, A_m\}$: each $x_i$ is described by **m** features/attributes, thus $x_i=\{a_1^{(i)}, a_2^{(i)}, \dots, a_m^{(i)}\}$
- $C=\{c_1, c_2, \dots, c_k\}$: set of classes
- $P(C = c_j|x = x_i)$: the probability of a sample $x_i$ to belong to class $c_j$
- for simplicity $P(A_1=a_1^{(i)}, A_2=a_2^{(i)}, \dots, A_m=a_m^{(i)}|C=c_j)$ will be written as $P(a_1^{(i)}, a_2^{(i)}, \dots, a_m^{(i)}|C=c_j)$ 
<br><br>



---




### Naive Bayes Classifier Formalism


*   Objective function to compute: $\color{green}{P(C = c_j | x = x_i)}$
*   Apply Bayes Theorem to compute the above probability:
$$\color{green}{P(C=c_j|x=x_i)} = \frac{\color{blue}{P(x=x_i|C=c_j)}\times \color{orange}{P(C=c_j)}}{\color{LightSeaGreen }{P(x=x_i)}}$$
*   Now, we will focus only on priori probability $ \color{blue}{P(x=x_i|C=c_j)}$ which will be rewritten below :
$\color{blue}{P(x=x_i|C=c_j)} = \\
= P(a_1^{(i)}, a_2^{(i)}, \dots, a_m^{(i)}|C=c_j) \\ 
= P(a_1^{(i)} | a_2^{(i)}, \dots, a_m^{(i)}, C=c_j) \times P( a_2^{(i)}, \dots, a_m^{(i)}, C=c_j) \\
= P(a_1^{(i)} | a_2^{(i)}, \dots, a_m^{(i)}, C=c_j) \times P( a_2^{(i)}| a_3^{(i)}, \dots, a_m^{(i)}, C=c_j) \times P(a_3^{(i)}, \dots, a_m^{(i)}, C=c_j) \\
= P(\color{red}{a_1^{(i)}} | a_2^{(i)}, \dots, a_m^{(i)}, C=c_j) \times P(\color{red}{a_2^{(i)}} | a_3^{(i)}, \dots, a_m^{(i)}, C=c_j) \times \dots \times P(\color{red}{a_{m-1}^{(i)}} |a_m^{(i)}, C=c_j) \times  P(\color{red}{a_m^{(i)}} | C=c_j) 
$
<br><br>
- Apply Naive Bayes assumption and rewrite $P(x=x_i|C=c_j)$: \\
$\color{blue}{P(x=x_i|C=c_j)} =\\
= P(\color{red}{a_1^{(i)}}, C=c_j) \times P(\color{red}{a_2^{(i)}}, C=c_j) \times \dots \times P(\color{red}{a_m^{(i)}}, C=c_j)  \\
=   \prod\limits_{\alpha=1}^m   P(\color{red}{a_\alpha^{(i)}}, C=c_j)  
$

![](https://drive.google.com/uc?export=view&id=1kSmOVBKaMmkpgs6eXs1wcYdFuHWklWU4)

- The main idea of Naive Bayes is that it comes with the assumption that all attributes/features($a_1^{(i)},\dots,a_m^{(i)}$) are conditionally independent. A more concrete example would be to classify a collection of documents into topics like politics, science, etc. Given a document **D** which is composed of words/tokens and a list of classes C={politics, science} one will choose politics as the main topic in document D. By using Naive Bayes, the assumption made is that words in document D are conditionally independent and that is why the algorithm is called naive. In English grammar as any other language, there are rules like ``adjectives usually comes before noun`` or ``sentences start with a capital letter``, etc. Thus, each word depends on the previous words. \\
``D=President Trump and first lady Melania Trump handed out commemorative Halloween candy to trick-or-treaters at the White House Monday``

- Now, we will focus on denominator(evidence) $\color{LightSeaGreen}{P(x=x_i)}$ and apply Law of Total Probability: \\
$\color{LightSeaGreen}{P(x=x_i)} = \sum\limits_{\beta=1}^{k} \color{blue}{P(x=x_i|C=c_\beta)} \times P(C=c_\beta)$
- But we have computed already the formulas for $\color{blue}{P(x=x_i|C=c_\beta)}$, thus $\color{LightSeaGreen}{P(x=x_i)}$ becomes: \\
$\color{LightSeaGreen}{P(x=x_i)} = \sum\limits_{\beta=1}^{k} \left[ \prod\limits_{\alpha=1}^m   P(\color{red}{a_\alpha^{(i)}}, C=c_\beta) \times P(C=c_\beta)  \right] $

- After all this computations we get the following formulas: \\
<br>
$\color{green}{P(C=c_j|x=x_i)} = \frac{\color{blue}{P(x=x_i|C=c_j)}\times \color{orange}{P(C=c_j)}}{\color{LightSeaGreen }{P(x=x_i)}} = 
\frac{\left[ \prod\limits_{\alpha=1}^m   P(\color{red}{a_\alpha^{(i)}}, C=c_j) \right] \times \color{orange}{P(C=c_j)}}{ \sum\limits_{\beta=1}^{k} \left[ \prod\limits_{\alpha=1}^m   P(\color{red}{a_\alpha^{(i)}}, C=c_\beta) \times P(C=c_\beta)  \right]}
$



---



### Naive Bayes Classifier

<a id='naive_bayes_cell'></a>

- When only classification is needed, the denominator of the above expression may be ignored (is the same for all ùëêùëó) and the labeling class is obtained by maximizing the numerator: \\
$Class(x_i) = argmax_{c_j} \Bigg\{ \left[ \prod\limits_{\alpha=1}^m   P(\color{red}{a_\alpha^{(i)}}, C=c_j) \right] \times \color{orange}{P(C=c_j)} \Bigg\}$


---



## Example

<table width="100%" style="table-layout:fixed;" >
<thead>
<tr>
<th style="padding:8px;background-color:#4CB96B;text-align:center;"> </th>
<th style="padding:8px;background-color:#4CB96B;text-align:center;">Outlook</th>
<th style="padding:8px;background-color:#4CB96B;text-align:center;">Temperature</th>
<th style="padding:8px;background-color:#4CB96B;text-align:center;">Humidity</th>
<th style="padding:8px;background-color:#4CB96B;text-align:center;">Windy</th>
<th style="padding:8px;background-color:#4CB96B;text-align:center;">Play Golf</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align:center;">0</td>
<td style="text-align:center;">Sunny</td>
<td style="text-align:center;">Hot</td>
<td style="text-align:center;">High</td>
<td style="text-align:center;">Weak</td>
<td style="text-align:center;">No</td>
</tr>
<tr>
<td style="background-color:rgba(183,223,182,0.4);text-align:center;">1</td>
<td style="background-color:rgba(183,223,182,0.4);text-align:center;">Sunny</td>
<td style="background-color:rgba(183,223,182,0.4);text-align:center;">Hot</td>
<td style="background-color:rgba(183,223,182,0.4);text-align:center;">High</td>
<td style="background-color:rgba(183,223,182,0.4);text-align:center;">Strong</td>
<td style="background-color:rgba(183,223,182,0.4);text-align:center;">No</td>
</tr>
<tr>
<td style="text-align:center;">2</td>
<td style="text-align:center;">Overcast</td>
<td style="text-align:center;">Hot</td>
<td style="text-align:center;">High</td>
<td style="text-align:center;">Weak</td>
<td style="text-align:center;">Yes</td>
</tr>
<tr>
<td style="background-color:rgba(183,223,182,0.4);text-align:center;">3</td>
<td style="background-color:rgba(183,223,182,0.4);text-align:center;">Rain</td>
<td style="background-color:rgba(183,223,182,0.4);text-align:center;">Mild</td>
<td style="background-color:rgba(183,223,182,0.4);text-align:center;">High</td>
<td style="background-color:rgba(183,223,182,0.4);text-align:center;">Weak</td>
<td style="background-color:rgba(183,223,182,0.4);text-align:center;">Yes</td>
</tr>
<tr>
<td style="text-align:center;">4</td>
<td style="text-align:center;">Rain</td>
<td style="text-align:center;">Cool</td>
<td style="text-align:center;">Normal</td>
<td style="text-align:center;">Weak</td>
<td style="text-align:center;">Yes</td>
</tr>
<tr>
<td style="background-color:rgba(183,223,182,0.4);text-align:center;">5</td>
<td style="background-color:rgba(183,223,182,0.4);text-align:center;">Rain</td>
<td style="background-color:rgba(183,223,182,0.4);text-align:center;">Cool</td>
<td style="background-color:rgba(183,223,182,0.4);text-align:center;">Normal</td>
<td style="background-color:rgba(183,223,182,0.4);text-align:center;">Strong</td>
<td style="background-color:rgba(183,223,182,0.4);text-align:center;">No</td>
</tr>
<tr>
<td style="text-align:center;">6</td>
<td style="text-align:center;">Overcast</td>
<td style="text-align:center;">Cool</td>
<td style="text-align:center;">Normal</td>
<td style="text-align:center;">Strong</td>
<td style="text-align:center;">Yes</td>
</tr>
<tr>
<td style="background-color:rgba(183,223,182,0.4);text-align:center;">7</td>
<td style="background-color:rgba(183,223,182,0.4);text-align:center;">Sunny</td>
<td style="background-color:rgba(183,223,182,0.4);text-align:center;">Mild</td>
<td style="background-color:rgba(183,223,182,0.4);text-align:center;">High</td>
<td style="background-color:rgba(183,223,182,0.4);text-align:center;">Weak</td>
<td style="background-color:rgba(183,223,182,0.4);text-align:center;">No</td>
</tr>
<tr>
<td style="text-align:center;">8</td>
<td style="text-align:center;">Sunny</td>
<td style="text-align:center;">Cool</td>
<td style="text-align:center;">Normal</td>
<td style="text-align:center;">Weak</td>
<td style="text-align:center;">Yes</td>
</tr>
<tr>
<td style="background-color:rgba(183,223,182,0.4);text-align:center;">9</td>
<td style="background-color:rgba(183,223,182,0.4);text-align:center;">Rain</td>
<td style="background-color:rgba(183,223,182,0.4);text-align:center;">Mild</td>
<td style="background-color:rgba(183,223,182,0.4);text-align:center;">Normal</td>
<td style="background-color:rgba(183,223,182,0.4);text-align:center;">Weak</td>
<td style="background-color:rgba(183,223,182,0.4);text-align:center;">Yes</td>
</tr>
<tr>
<td style="text-align:center;">10</td>
<td style="text-align:center;">Sunny</td>
<td style="text-align:center;">Mild</td>
<td style="text-align:center;">Normal</td>
<td style="text-align:center;">Strong</td>
<td style="text-align:center;">Yes</td>
</tr>
<tr>
<td style="background-color:rgba(183,223,182,0.4);text-align:center;">11</td>
<td style="background-color:rgba(183,223,182,0.4);text-align:center;">Overcast</td>
<td style="background-color:rgba(183,223,182,0.4);text-align:center;">Mild</td>
<td style="background-color:rgba(183,223,182,0.4);text-align:center;">High</td>
<td style="background-color:rgba(183,223,182,0.4);text-align:center;">Strong</td>
<td style="background-color:rgba(183,223,182,0.4);text-align:center;">Yes</td>
</tr>
<tr>
<td style="text-align:center;">12</td>
<td style="text-align:center;">Overcast</td>
<td style="text-align:center;">Hot</td>
<td style="text-align:center;">Normal</td>
<td style="text-align:center;">Weak</td>
<td style="text-align:center;">Yes</td>
</tr>
<tr>
<td style="background-color:rgba(183,223,182,0.4);text-align:center;">13</td>
<td style="background-color:rgba(183,223,182,0.4);text-align:center;">Rain</td>
<td style="background-color:rgba(183,223,182,0.4);text-align:center;">Mild</td>
<td style="background-color:rgba(183,223,182,0.4);text-align:center;">High</td>
<td style="background-color:rgba(183,223,182,0.4);text-align:center;">Strong</td>
<td style="background-color:rgba(183,223,182,0.4);text-align:center;">No</td>
</tr>
</tbody>
</table>

*   We have two classes: [<font color="green">Yes</font>, <font color="red">No</font>]
  *   P(Play=<font color="green">Yes</font>) = 9/14
  *   P(Play=<font color="red">No</font>) = 5/14

1. Training Info $\rightarrow$ compute all probabilities 
<br><br>
<table width="400" border="1" cellspacing="0" cellpadding="2">
<tbody>
<tr>

<td valign="top" width="133"><b>OUTLOOK</b></td>
<td valign="top" width="133"><b>Play = Yes</b></td>
<td valign="top" width="133"><b>Play = No</b></td>
<td valign="top" width="133"><b>Total</b></td>
</tr>
<tr>
<td valign="top" width="133">Sunny</td>
<td valign="top" width="133">2/9</td>
<td valign="top" width="133">3/5</td>
<td valign="top" width="133">5/14</td>
</tr>
<tr>
<td valign="top" width="133">Overcast</td>
<td valign="top" width="133">4/9</td>
<td valign="top" width="133">0/5</td>
<td valign="top" width="133">4/14</td>
</tr>
<tr>
<td valign="top" width="133">Rain</td>
<td valign="top" width="133">3/9</td>
<td valign="top" width="133">2/5</td>
<td valign="top" width="133">5/14</td>
</tr>
</tbody>
</table>
<br><br>
<table width="400" border="1" cellspacing="0" cellpadding="2">
<tbody>

<tr>
<td valign="top" width="133"><b>TEMPERATURE</b></td>
<td valign="top" width="133"><b>Play = Yes</b></td>
<td valign="top" width="133"><b>Play = No</b></td>
<td valign="top" width="133"><b>Total</b></td>
</tr>
<tr>
<td valign="top" width="133">Hot</td>
<td valign="top" width="133">2/9</td>
<td valign="top" width="133">2/5</td>
<td valign="top" width="133">4/14</td>
</tr>
<tr>
<td valign="top" width="133">Mild</td>
<td valign="top" width="133">4/9</td>
<td valign="top" width="133">2/5</td>
<td valign="top" width="133">6/14</td>
</tr>
<tr>
<td valign="top" width="133">Cool</td>
<td valign="top" width="133">3/9</td>
<td valign="top" width="133">1/5</td>
<td valign="top" width="133">4/14</td>
</tr>
</tbody>
</table>
<br><br>
<table width="400" border="1" cellspacing="0" cellpadding="2">
<tbody>
<tr>
<td valign="top" width="133"><b>HUMIDITY</b></td>
<td valign="top" width="133"><b>Play = Yes</b></td>
<td valign="top" width="133"><b>Play = No</b></td>
<td valign="top" width="133"><b>Total</b></td>
</tr>
<tr>
<td valign="top" width="133">High</td>
<td valign="top" width="133">3/9</td>
<td valign="top" width="133">4/5</td>
<td valign="top" width="133">7/14</td>
</tr>
<tr>
<td valign="top" width="133">Normal</td>
<td valign="top" width="133">6/9</td>
<td valign="top" width="133">1/5</td>
<td valign="top" width="133">7/14</td>
</tr>
</tbody>
</table>
<br><br>
<table width="400" border="1" cellspacing="0" cellpadding="2">
<tbody>
<tr>
<td valign="top" width="133"><b>WIND</b></td>
<td valign="top" width="133"><b>Play = Yes</b></td>
<td valign="top" width="133"><b>Play = No</b></td>
<td valign="top" width="133"><b>Total</b></td>
</tr>
<tr>
<td valign="top" width="133">Strong</td>
<td valign="top" width="133">3/9</td>
<td valign="top" width="133">3/5</td>
<td valign="top" width="133">6/14</td>
</tr>
<tr>
<td valign="top" width="133">Weak</td>
<td valign="top" width="133">6/9</td>
<td valign="top" width="133">2/5</td>
<td valign="top" width="133">8/14</td>
</tr>
</tbody>
</table>
<br>



2. Testing $\rightarrow$ compute class for sample **X** <br>

- formulas: $Class(X) = argmax_{c_j} \Bigg\{ \left[ \prod\limits_{\alpha=1}^m   P(\color{red}{a_\alpha^{(i)}}, C=c_j) \right] \times \color{orange}{P(C=c_j)} \Bigg\}$
- sample: ```X = (Outlook=Sunny, Temperature=Cool, Humidity=High, Wind=Strong)```
- probabilities for class=<font color="green">Yes</font>:
<ul>
<li>P(Outlook=Sunny | Play=<font color="green">Yes</font>) = 2/9</li>
<li>P(Temperature=Cool | Play=<font color="green">Yes</font>) = 3/9</li>
<li>P(Humidity=High | Play=<font color="green">Yes</font>) = 3/9</li>
<li>P(Wind=Strong | Play=<font color="green">Yes</font>) = 3/9</li>
<li>P(Play=<font color="green">Yes</font>) = 9/14</li>
</ul>

- probabilities for class=<font color="red">No</font>:
<ul>
<li>P(Outlook=Sunny | Play=<font color="red">No</font>) = 3/5</li>
<li>P(Temperature=Cool | Play=<font color="red">No</font>) = 1/5</li>
<li>P(Humidity=High | Play=<font color="red">No</font>) = 4/5</li>
<li>P(Wind=Strong | Play=<font color="red">No</font>) = 3/5</li>
<li>P(Play=<font color="red">No</font>) = 5/14</li>
</ul>
<br>
- P(X|Play=<font color="green">Yes</font>)P(Play=<font color="green">Yes</font>) = <br>
= P(Outlook=Sunny | Play=<font color="green">Yes</font>) * P(Temperature=Cool | Play=<font color="green">Yes</font>) * P(Humidity=High | Play=<font color="green">Yes</font>) * P(Wind=Strong | Play=<font color="green">Yes</font>* P(Play=<font color="green">Yes</font>)  
= $\frac{2}{9} * \frac{3}{9} * \frac{3}{9} * \frac{3}{9} * \frac{9}{14}$ = 0.0053
<br><br>
- P(X|Play=<font color="red">No</font>)P(Play=<font color="red">No</font>) = \\
= P(Outlook=Sunny | Play=<font color="red">No</font>) * P(Temperature=Cool | Play=<font color="red">No</font>) * P(Humidity=High | Play=<font color="red">No</font>) * P(Wind=Strong | Play=<font color="red">No</font>) * P(Play=<font color="red">No</font>) 
= $\frac{3}{5} * \frac{1}{5} * \frac{4}{5} * \frac{3}{5} * \frac{5}{14} = 0.0206$
<br><br>
- P(X) = P(Outlook=Sunny) * P(Temperature=Cool) * P(Humidity=High) * P(Wind=Strong) =  $\frac{5}{14} * \frac{4}{14} * \frac{7}{14} * \frac{6}{14} = 0.02186$

- P(Play=<font color="green">Yes</font>|X) = $\frac{P(X|Play=\color{green}{Yes}) * P(Play=\color{green}{Yes})}{P(X) }$ = $\frac{0.0053}{0.02186}$ = 0.24

- P(Play=<font color="red">No</font>|X) = $\frac{P(X|Play=\color{red}{No}) * P(Play=\color{red}{No})}{P(X) }$ = $\frac{0.0206}{0.02186}$ = 0.94

- Since 0.9421 is greater than 0.2424 the class will be **No**


---





## Smoothing
<img src="https://drive.google.com/uc?export=view&id=1XBorHfwK_1oHyB18l-4NmiV252ES4mY3"></img>

Imagine that you flip a coin and you get HEADs 100 times in a row. Then, the odds are: P(HEAD)=100% and P(TAIL)=0%. Supposing that you have a fair coin, it would not be really true that you never get a TAIL at a toss. 

If you try to compute the class for this sample: ```X = (Outlook=Overcast, Temperature=Cool, Humidity=High, Wind=Strong)```, you will see that the value of Naive Bayes formulas for class <font color="red">No</font> would be 0 thanks to P(Outlook=Overcast|Play=No).

In order to solve this problem, we may need the help of probability smoothing technique which `is a language modeling technique that assigns some non-zero probability to events that were unseen in the training data. This has the effect that the probability mass is divided over more events, hence the probability distribution becomes more smooth.` 

$$P(A_i=a_i^{(i)}|C=c_j) = \frac{n_{ij}+\lambda}{n_j+\lambda*m_i}$$

- $n_{ij}$ - number of samples that have both $A_i=a_i^{(i)}$ and $C=c_j$
- $n_j$ - total number of samples that have $C=c_j$
- $m_i$ - number of distinct values for attribute $A_i$
- $\lambda$ - a value between [0,1], if it is 0, the value remains unchanged, in our case we need it set to $\frac{1}{n}$, where n is the number of samples


---



## Observations

- The Naive Bayes classifier presented in this lab works only with categorical data. If data is continous it would be hard to compute probabilities.  As the number of features increase linearly, the amount of training data required for classification increases exponentially. You have the followin option: you either transform numerical values to categorical data or you use a PDF (probability density function).

$$f(x) = \frac{1}{\sqrt{2*\pi}*\sigma}*e^{-\frac{(x-\mu)^2}{2*\sigma^2}}$$

<br>

| Play Golf | Humidity   | Mean | Std |
|------|------|------|------|
|Yes | 86	96	80	65	70	80	70	90	75 | 79.1 | 10.2 |
|No  | 85	90	70	95	91 | 86.2 | 9.7 |

<br>

$$P(Humidity=74 | Play=Yes) = \frac{1}{\sqrt{2*\pi}*10.2}*e^{-\frac{(74-79.1)^2}{2*10.2^2}} = 0.0344$$

$$P(Humidity=74 | Play=Yes) = \frac{1}{\sqrt{2*\pi}*9.7}*e^{-\frac{(74-86.2)^2}{2*9.7^2}}=0.0187$$

<br>

- Nice to read: <a href="https://math.stackexchange.com/questions/469974/why-would-i-use-bayes-theorem-if-i-can-directly-compute-the-posterior-probabili">Why would I use Bayes' Theorem if I can directly compute the posterior probability?</a> 



# Exercises

## E1. Naive Bayes

Implement a classifier using the formulas from **Naive Bayes Classifier** section on **Play Tennis** dataset. 

### Download Dataset

In [None]:
!wget -O play_tennis.csv "https://drive.google.com/uc?export=download&id=1NT1iJNj3HrPNtiLCb-myrY0XHaJ8_jtf"

### Process Dataset

In [None]:
from IPython.display import display, HTML
import pandas as pd

def train_test_split(df, train_percent=.85):
  no_samples, _ = df.shape
  no_train_samples = int(0.85 * no_samples)
  no_test_samples = no_samples - no_train_samples
  train_df = tennis_df.iloc[:no_train_samples, :]
  test_df = tennis_df.iloc[-no_test_samples:, :]

  X_train = train_df.iloc[:, :-1]
  y_train = train_df.iloc[:, -1]
  X_test = test_df.iloc[:, :-1]
  y_test = test_df.iloc[:, -1]
  return X_train, X_test, y_train, y_test



tennis_df = pd.read_csv("play_tennis.csv", header=0)  

#this column is dropped because it brings no meaningful information (it's only a day id)
del tennis_df["day"]

print("Dataset before split")
display(HTML(tennis_df.to_html()))
print("\n\n\n")

X_train, X_test, y_train, y_test = train_test_split(tennis_df)

print("Dataset used for training")
display(HTML(X_train.to_html()))
print("\n\n\n")

print("Dataset used for testing")
display(HTML(X_test.to_html()))

### Implement Classifier

* Tasks for **self.fit** method

  * **TODO 1** - compute `self.count_classes`

  * **TODO 2** - compute `self.prob_classes`

  * **TODO 3** - compute `self.count_values`

  * **TODO 4** - compute `self.prob_values`

  * **TODO 5** - smooth probabilities

* Task for **self.predict** method:
  * **TODO 6** - compute probability of each class for each sample
  * **TODO 7** - predict the class with the maximum probability

  * $Class(X) = argmax_{c_j} \Bigg\{ \left[ \prod\limits_{\alpha=1}^m   P(\color{red}{a_\alpha^{(i)}}, C=c_j) \right] \times \color{orange}{P(C=c_j)} \Bigg\}$

In [None]:
import numpy as np

class NaiveBayesClassifier:  
  
  #class constructor
  def __init__(self):
    # dictionary to store the counts of a class
    # e.g. {'No': 5, 'Yes': 9}
    self.count_classes = dict()

    # dictionary to store the probability of a class
    # e.g. {'No': 0.35714285714285715, 'Yes': 0.6428571428571429}
    self.prob_classes = dict()

    # dictionary to store the counts of attribute value A given class C
    # e.g. {('Overcast', 'No'): 0, ..., ('Mild', 'Yes'): 4}
    self.count_values = dict()

    # dictionary to store the probability of each attribute value A given class C
    # e.g. {('Hot', 'No'): 0.4, ..., ('Sunny', 'Yes'): 0.2222222222222222}
    self.prob_values = dict()

  # train method of Naive Bayes Classifier
  def fit(self, x, y):

    # [TODO 1,2] compute the count and probability for each class
    print(x)
    print(y)
    val, counts = np.unique(y, return_counts=True)
    #print(val)
    #print(counts)
    #1
    for (v, c) in zip(val, counts):
      self.count_classes[v] = c
    print(self.count_classes)

    #2
    no_classes = len(y)
    for (v, c) in self.count_classes.items():
      self.prob_classes[v] = c / no_classes
    print(self.prob_classes)
    
    # [TODO 3,4] compute the count and probability for each attribute's value in the datasat given class C
    flatl = flat_list = [attr for l in x for attr in l]
    flatlunique = list(dict.fromkeys(flatl))
    #rows = {}
    #for (l, cls) in zip(x, y):
    #  rows[np.where(x == l)[0][0]] = cls
    lbd = 1
    for attr in flatlunique:
      self.prob_values[(attr, 'Yes')] = 0
      self.prob_values[(attr, 'No')] = 0
      #self.prob_values[(attr, 'Yes')] = len([attr for l in x if attr in l and rows[np.where(x == l)[0][0]] == 'Yes']) / self.count_classes['Yes']
      #self.prob_values[(attr, 'No')] = len([attr for l in x if attr in l and rows[np.where(x == l)[0][0]] == 'No']) / self.count_classes['No']
      attr_yes = 0
      attr_no  = 0
      for i in range(len(x)):
        if attr in x[i]:
          self.prob_values[(attr, y[i])] += 1
      self.prob_values[(attr, 'Yes')] = (self.prob_values[(attr, 'Yes')] + lbd) / (self.count_classes['Yes'] + len(flatlunique) * lbd)
      self.prob_values[(attr, 'No')]  = (self.prob_values[(attr, 'No')] + lbd) / (self.count_classes['No'] + len(flatlunique) * lbd) #self.count_classes['No']
      
          
    print(self.prob_values)
    # [TODO 5] apply smoothing on prob_values
    
  #test method of Naive Bayes Classifier
  def predict(self, X):
    # print sample -> target value (e.g ['Sunny' 'Hot' 'High' 'Strong'] -> No)
    
    #FOREACH sample
    # [TODO 6] compute probability of each class
    class_probs = dict()
    class_probs = dict()
    for i in range(len(X)):
    #for x in X:
      class_probs[(i, 'Yes')] = 1
      class_probs[(i, 'No')] = 1
      for attr in X[i]:
        class_probs[(i, 'Yes')] *= self.prob_values[(attr, 'Yes')]
        class_probs[(i, 'No')]  *= self.prob_values[(attr, 'No')]
      class_probs[(i, 'Yes')] *= self.prob_classes['Yes']
      class_probs[(i, 'No')]  *= self.prob_classes['No']
    # [TODO 7] compute argmax on class_probs
    for i in range(len(X)):
    #for x in X:
      PX = 1
      for attr in X[i]:
        PX *= (self.prob_values[(attr, 'Yes')] + self.prob_values[(attr, 'No')])
      pyes = class_probs[(i, 'Yes')] / PX
      pno  = class_probs[(i, 'No')] / PX
      if pyes > pno:
        print(str(X[i]) + ' --> ' + 'Yes' + str(pyes) + str(pno))
      else:
        print(str(X[i]) + ' --> ' + 'No' + str(pno) + str(pyes))
        

naive = NaiveBayesClassifier()
naive.fit(X_train.values, y_train.values)
naive.predict(X_test.values)