# Bayes Theorem

<img src="imgs/probability.png" width=150 align = "right">
<br><br>


1. Bayes theorem is one of the most important concepts in Probability Theory.
1. Bayes theorem is widely used for text classification.
1. Naive Bayes was the first algorithm to be designed for Email spam filters.

<img src="imgs/bayes.png">

# Derivation of Bayes theorem

# Naive Bayes Question


**Q- Given that, the probability that it rains on Saturday is 25%. It if rains on Saturday, The probability it rains on Sunday is 50%. If it does not rain on saturday, the probability it rains on Sunday is 25%. 
Given that it rained on Sunday, what is the probability that it rained on Saturday.**

Let A = rain on Saturday. B = rain on Sunday.

Given:
1. P(A) = 25%
1. P(B|A) = 50%
1. P(B|$A^c$) = 25%

Find:
 P(A|B)

`P(A|B) = P(B|A).P(A) / P(B)`


# Spam or Non Spam Emails?

<br>
<img src="imgs/email.png" width=200>

**Problem Statement**: Given X as an email content. Predict if the email is a spam or non-spam.

**Solution**:

Find Posterior Probability of Both classes Y=1, Y=0

$$ P(Y=1|X)$$ and $$ P(Y=0|X)$$

$$ argmax_i  P(y_i|X)$$

How to find $P(Y=i|X)$?

**Prior** :

$$P(Y=1) = \frac { spam}{ total}$$

X = `get unlimited mobile data at 95% discounted price`

$P(X|Y = 1)$ : In how many spam emails , these words are occuring. Count the frequency of these words in spam email.

$P(X|Y = 0)$ : In how many non-spam emails , these words are occuring. Count the frequency of these words in non-spam email.

**Marginal likelihood is used for normalizing the data.**

# Naive Bayes Classification

**Prior Formula**

**Likelihood Formula**

# Play Golf Dataset

<table>
    <tr>
        <td>        
            <img src="imgs/golf.png" width=500 align=left>
        </td>
        <td>
            <img src="https://media.giphy.com/media/l2JdTikgn9Y6HPNmg/giphy.gif" width=300 align=right>
        </td>
    </tr>
</table>


### Test Data
`today = (Sunny, Hot, Normal, False)`

# Naive Bayes Code

In [1]:
import pandas as pd
import numpy as np

In [3]:
golf = pd.read_csv("golf.csv")

In [4]:
golf

Unnamed: 0,Outlook,Temperature,Humidity,Windy,Play
0,sunny,hot,high,False,no
1,sunny,hot,high,True,no
2,overcast,hot,high,False,yes
3,rainy,mild,high,False,yes
4,rainy,cool,normal,False,yes
5,rainy,cool,normal,True,no
6,overcast,cool,normal,True,yes
7,sunny,mild,high,False,no
8,sunny,cool,normal,False,yes
9,rainy,mild,normal,False,yes


In [72]:
def prior_prob(golf, label):
    
    total_examples = golf.shape[0]
    class_examples = np.sum(golf['Play']==label)
    
    return (class_examples)/float(total_examples)

In [73]:
PRIOR = {
    'yes' : prior_prob(golf, 'yes'),
    'no' : prior_prob(golf, 'no')
}
print(PRIOR)

{'yes': 0.6428571428571429, 'no': 0.35714285714285715}


In [74]:
def cond_prob(golf, feature_col, feature_val, label):
    
    filtered_data = golf[golf['Play']==label]
    numerator = np.sum(filtered_data[feature_col]==feature_val)
    denominator = np.sum(golf['Play']==label)
    
    return numerator/float(denominator)

In [75]:
features = list(golf.columns)[:-1]
COND_PROB = {}

for label in golf['Play'].unique():
    COND_PROB[label] = {}
    for feature in features:
        COND_PROB[label][feature] = {}
        
        feat_values = golf[feature].unique()
        for value in feat_values:
            #value -> sunny, Outlook, yes
            prob = round(cond_prob(golf, feature, value, label), 2)
            COND_PROB[label][feature][value] = prob
            print(feature,"-",value, "-",label,":", prob)

    print()

Outlook - sunny - no : 0.6
Outlook - overcast - no : 0.0
Outlook - rainy - no : 0.4
Temperature - hot - no : 0.4
Temperature - mild - no : 0.4
Temperature - cool - no : 0.2
Humidity - high - no : 0.8
Humidity - normal - no : 0.2
Windy - False - no : 0.4
Windy - True - no : 0.6

Outlook - sunny - yes : 0.22
Outlook - overcast - yes : 0.44
Outlook - rainy - yes : 0.33
Temperature - hot - yes : 0.22
Temperature - mild - yes : 0.44
Temperature - cool - yes : 0.33
Humidity - high - yes : 0.33
Humidity - normal - yes : 0.67
Windy - False - yes : 0.67
Windy - True - yes : 0.33



In [76]:
COND_PROB

{'no': {'Outlook': {'sunny': 0.6, 'overcast': 0.0, 'rainy': 0.4},
  'Temperature': {'hot': 0.4, 'mild': 0.4, 'cool': 0.2},
  'Humidity': {'high': 0.8, 'normal': 0.2},
  'Windy': {False: 0.4, True: 0.6}},
 'yes': {'Outlook': {'sunny': 0.22, 'overcast': 0.44, 'rainy': 0.33},
  'Temperature': {'hot': 0.22, 'mild': 0.44, 'cool': 0.33},
  'Humidity': {'high': 0.33, 'normal': 0.67},
  'Windy': {False: 0.67, True: 0.33}}}

## Prediction

In [77]:
X_test = ["sunny", "hot", "normal", False]
print(X_test)

['sunny', 'hot', 'normal', False]


In [68]:
features

['Outlook', 'Temperature', 'Humidity', 'Windy']

In [83]:
PRIOR['yes']*COND_PROB['yes']['Outlook']['sunny']*COND_PROB['yes']['Temperature']['hot']*COND_PROB['yes']['Humidity']['normal']*COND_PROB['yes']['Windy'][False]


0.013967202857142858

In [80]:
PRIOR['no']*COND_PROB['no']['Outlook']['sunny']*COND_PROB['no']['Temperature']['hot']*COND_PROB['no']['Humidity']['normal']*COND_PROB['no']['Windy'][False]


0.006857142857142858

In [82]:
for label in golf['Play'].unique():
    
    prob  = PRIOR[label]
    for i in range(len(features)):
        feature = features[i]
        fea_value = X_test[i]
        
        prob *= COND_PROB[label][feature][fea_value]
        
    print(label, prob)

no 0.006857142857142858
yes 0.013967202857142858


# Naive Bayes Sklearn


In [2]:
golf = pd.read_csv("golf.csv")

In [3]:
golf

Unnamed: 0,Outlook,Temperature,Humidity,Windy,Play
0,sunny,hot,high,False,no
1,sunny,hot,high,True,no
2,overcast,hot,high,False,yes
3,rainy,mild,high,False,yes
4,rainy,cool,normal,False,yes
5,rainy,cool,normal,True,no
6,overcast,cool,normal,True,yes
7,sunny,mild,high,False,no
8,sunny,cool,normal,False,yes
9,rainy,mild,normal,False,yes


In [4]:
from sklearn.preprocessing import LabelEncoder

In [5]:
le1 = LabelEncoder()
golf['Outlook'] = le1.fit_transform(golf['Outlook'])

In [6]:
le2 = LabelEncoder()
golf['Temperature'] = le2.fit_transform(golf['Temperature'])

In [7]:
le3 = LabelEncoder()
golf['Humidity'] = le3.fit_transform(golf['Humidity'])

In [8]:
le4 = LabelEncoder()
golf['Windy'] = le4.fit_transform(golf['Windy'])

In [9]:
le5 = LabelEncoder()
golf['Play'] = le5.fit_transform(golf['Play'])

In [10]:
golf.head()

Unnamed: 0,Outlook,Temperature,Humidity,Windy,Play
0,2,1,0,0,0
1,2,1,0,1,0
2,0,1,0,0,1
3,1,2,0,0,1
4,1,0,1,0,1


In [11]:
from sklearn.naive_bayes import CategoricalNB

In [12]:
model = CategoricalNB()

In [13]:
model.fit(golf.iloc[:, :-1], golf.iloc[:, -1])

CategoricalNB()

In [16]:
X_test = ["sunny", "hot", "normal", False]

In [17]:
le1.transform(['sunny'])

array([2])

In [18]:
le2.transform(['hot'])

array([1])

In [19]:
le3.transform(['normal'])

array([1])

In [24]:
le4.transform([False])

array([0])

In [28]:
X_test = np.array([[2,1,1,0]])

In [29]:
model.predict(X_test)

array([1])

In [30]:
model.predict_proba(X_test)

array([[0.33508723, 0.66491277]])

# Naive Bayes Classifier for Text Data

<img src="https://media.giphy.com/media/dQpUkK59l5Imxsh8jN/giphy.gif" width=300 > 

#### Multinomial Naive Bayes
- Important is to compute the likelihood 

$$P(x_i|Y_i = c) = \frac {count(x_i, Y_i = c)} {\sum_{w \in V}{count(w, Y_i=c)}} $$


# Laplace Smoothing

<img src="https://media.giphy.com/media/SqmkZ5IdwzTP2/giphy.gif" width=300 align=left>

$$P(x_i|Y_i = c) = \frac {count(x_i, Y_i = c) + \alpha} {\sum_{w \in V}{count(w, Y_i=c)} + \alpha |V|} $$

# A Practical Example of Multinomial Naive Bayes

<table align=left>
    <tr style="font-weight:bold;">
        <td></td>
        <td>docID</td>
        <td>words in document</td>
        <td>c = China?</td>
    </tr> 
    <tr>
        <td  rowspan=4> training set </td>
        <td> 1 </td>
        <td> Chinese Beijing Chinese </td>
        <td> yes </td>
    </tr>
    <tr>
        <td> 2 </td>
        <td> Chinese Chinese Shanghai </td>
        <td> yes </td>
    </tr>
    <tr>
        <td> 3 </td>
        <td> Chinese Macao </td>
        <td> yes </td>
    </tr>
    <tr>
        <td> 4 </td>
        <td> Tokyo Japan Chinese </td>
        <td> no </td>
    </tr>
    <tr>
        <td> test set </td>
        <td> 5 </td>
        <td> Chinese Chinese Chinese Tokyo Japan </td>
        <td> ? </td>
    </tr>
</table>

# Bernoulli Naive Bayes

<img src="https://media.giphy.com/media/xTiN0h0Kh5gH7yQYUw/giphy.gif" width=300 align=right>
<br><br>

1. Bernoulli doesn't talk about the frequency of a feature/word.
1. It is only concerned about whether a word is present or not (1 or 0).

**Likelihood :**

$$P(x_i|Y_i = c) = \frac {count(d_i \; contains \; x_i, Y_i = c) + \alpha} {{count(Y_i=c)} + \alpha \, |V|} $$

**Prediction:**

$$P(Y=1|X) = \prod_{i=1}^{|V|} { P(x_i|Y=1)^b . \big(1 - P(x_i|Y=spam)\big)^{1-b}} * P(Y=1)$$

## Example of Bernoulli Naive Bayes

<table align=left>
    <tr style="font-weight:bold;">
        <td></td>
        <td>docID</td>
        <td>words in document</td>
        <td>c = China?</td>
    </tr> 
    <tr>
        <td  rowspan=4> training set </td>
        <td> 1 </td>
        <td> Chinese Beijing Chinese </td>
        <td> yes </td>
    </tr>
    <tr>
        <td> 2 </td>
        <td> Chinese Chinese Shanghai </td>
        <td> yes </td>
    </tr>
    <tr>
        <td> 3 </td>
        <td> Chinese Macao </td>
        <td> yes </td>
    </tr>
    <tr>
        <td> 4 </td>
        <td> Tokyo Japan Chinese </td>
        <td> no </td>
    </tr>
    <tr>
        <td> test set </td>
        <td> 5 </td>
        <td> Chinese Chinese Chinese Tokyo Japan </td>
        <td> ? </td>
    </tr>
</table>

# Bias Variance Tradeoff   <img src="imgs/alpha.png" width=100 >

# Gaussian Naive Bayes
<img src="imgs/gaussian.png" width=500>

# Scikit Learn code for Naive Bayes