### Reference 

https://www.saedsayad.com/naive_bayesian.htm

### New Package Installation

```pip install mixed-naive-bayes```

In [None]:
from sklearn import datasets
import pandas as pd
from mixed_naive_bayes import MixedNB

In [None]:
GolfData=pd.read_csv('Data/golfplay.csv')

In [3]:
GolfData

Unnamed: 0,Rain,Temp,Windy,Play
0,1,2,0,0
1,1,2,1,0
2,0,1,0,1
3,0,1,0,1
4,0,0,0,1
5,0,0,1,0
6,1,1,0,1
7,0,2,0,1
8,0,2,1,1
9,0,1,0,0


### Naive Bayes Classifer
$X_1$: Rain, $X_2$: Temp, $X_3$: Windy

$P(Y|X_1,X_2, X_3)=\frac{P(X_1, X_2, X_3|Y)P(Y)}{P(X_1, X_2, X_3)}=\frac{P(X_1|Y)P(X_2|Y)P(X_3|Y)P(Y)}{P(X_1, X_2, X_3)}$

The term after the second "=" is due to the indpendent assumption of Naive Bayes Classifier

### Calculate $P(X_1|Y)$ 
$X_1$: Rain



We need to know the following likelihood

$P(sunny|y=0)=\frac{CNT(sunny|y=0)}{CNT(y=0)}$

$P(rainy|y=0)=\frac{CNT(rainy|y=0)}{CNT(y=0)}$


$P(sunny|y=1)=\frac{CNT(sunny|y=1)}{CNT(y=1)}$


$P(rainy|y=1)=\frac{CNT(rainy|y=1)}{CNT(y=1)}$

|   | $X_1=0$ (sunny)  |  $X_1=1$ (rainy)  |   |   | 
|---|---|---|---|---|
|  y=0 | CNT(sunny\|y=0) | CNT(rainny\|y=0)  |   |   |
|  y=1 | CNT( sunny\|y=1) | CNT(rainy\|y=1)  |   |   |


In [6]:
GolfData_y1=GolfData[GolfData["Play"]==1]
GolfData_y0=GolfData[GolfData["Play"]==0]
CNT_y1=GolfData_y1.shape[0]
CNT_y0=GolfData_y0.shape[0]
P_y1=CNT_y1/(CNT_y1+CNT_y0)
P_y0=CNT_y0/(CNT_y1+CNT_y0)
print(P_y1)
print(P_y0)

0.6
0.4


In [7]:
CNT_rainy_y1=sum(GolfData_y1["Rain"]==1)
CNT_sunny_y1=sum(GolfData_y1["Rain"]==0)
CNT_rainy_y0=sum(GolfData_y0["Rain"]==1)
CNT_sunny_y0=sum(GolfData_y0["Rain"]==0)
P_rainy_y1=CNT_rainy_y1/CNT_y1
P_sunny_y1=CNT_sunny_y1/CNT_y1
P_rainy_y0=CNT_rainy_y0/CNT_y0
P_sunny_y0=CNT_sunny_y0/CNT_y0

In [8]:
print(P_rainy_y1)
print(P_sunny_y1)
print(P_rainy_y0)
print(P_sunny_y0)

0.16666666666666666
0.8333333333333334
0.5
0.5


### Calculate $P(X_2|Y)$

$X_2$: Temp

We need to know the following likelihood

$P(Low|y=0)=\frac{CNT(Low|y=0)}{CNT(y=0)}$

$P(Med|y=0)=\frac{CNT(Med|y=0)}{CNT(y=0)}$

$P(High|y=0)=\frac{CNT(Med|y=0)}{CNT(y=0)}$



$P(Low|y=1)=\frac{CNT(Low|y=1)}{CNT(y=1)}$

$P(Med|y=1)=\frac{CNT(Med|y=1)}{CNT(y=1)}$

$P(High|y=1)=\frac{CNT(High|y=1)}{CNT(y=1)}$



|   | $X_2=0$ (Low)  |  $X_2=1$ (Med)  |  $X_2=2$ (High)  |   | 
|---|---|---|---|---|
|  y=0 | CNT(Low\|y=0) | CNT(Med\|y=0)  |CNT(High\|y=0)  |   |
|  y=1 | CNT( Low\|y=1) | CNT(Med\|y=1)  | CNT(High \|y=1)  |   |



In [9]:
CNT_Low_y1=sum(GolfData_y1["Temp"]==0)
CNT_Med_y1=sum(GolfData_y1["Temp"]==1)
CNT_High_y1=sum(GolfData_y1["Temp"]==2)
CNT_Low_y0=sum(GolfData_y0["Temp"]==0)
CNT_Med_y0=sum(GolfData_y0["Temp"]==1)
CNT_High_y0=sum(GolfData_y0["Temp"]==2)

P_Low_y1=CNT_Low_y1/CNT_y1
P_Med_y1=CNT_Med_y1/CNT_y1
P_High_y1=CNT_High_y1/CNT_y1

P_Low_y0=CNT_Low_y0/CNT_y0
P_Med_y0=CNT_Med_y0/CNT_y0
P_High_y0=CNT_High_y0/CNT_y0

In [22]:
GolfData_y0["Temp"]

0    2
1    2
5    0
9    1
Name: Temp, dtype: int64

In [23]:
print(P_Low_y1)
print(P_Med_y1)
print(P_High_y1)

print(P_Low_y0)
print(P_Med_y0)
print(P_High_y0)

0.16666666666666666
0.5
0.3333333333333333
0.25
0.25
0.5


### Calculate $P(X_3|Y)$ 
$X_3$: Windy

We need to know the following likelihood

$P(wind|y=0)=\frac{CNT(wind|y=0)}{CNT(y=0)}$

$P(nowind|y=0)=\frac{CNT(nowind|y=0)}{CNT(y=0)}$


$P(wind|y=1)=\frac{CNT(wind|y=1)}{CNT(y=1)}$


$P(nowind|y=1)=\frac{CNT(nowind|y=1)}{CNT(y=1)}$

|   | $X_3=0$ (nowind)  |  $X_3=1$ (wind)  |   |   | 
|---|---|---|---|---|
|  y=0 | CNT(nowind\|y=0) | CNT(wind\|y=0)  |   |   |
|  y=1 | CNT( nowind\|y=1) | CNT(wind\|y=1)  |   |   |

In [24]:
CNT_wind_y1=sum(GolfData_y1["Windy"]==1)
CNT_nowind_y1=sum(GolfData_y1["Windy"]==0)
CNT_wind_y0=sum(GolfData_y0["Windy"]==1)
CNT_nowind_y0=sum(GolfData_y0["Windy"]==0)

P_wind_y1=CNT_wind_y1/CNT_y1
P_nowind_y1=CNT_nowind_y1/CNT_y1
P_wind_y0=CNT_wind_y0/CNT_y0
P_nowind_y0=CNT_nowind_y0/CNT_y0

In [25]:
GolfData_y0["Windy"]

0    0
1    1
5    1
9    0
Name: Windy, dtype: int64

In [26]:
print(P_wind_y1)
print(P_nowind_y1)
print(P_wind_y0)
print(P_nowind_y0)

0.16666666666666666
0.8333333333333334
0.5
0.5


In [27]:
X.iloc[0]

NameError: name 'X' is not defined

 $P(y=1|Rain=1, Temp=2, Windy=0) \sim  P(Rain=1|y=1)P(Temp=2|y=1)P(Windy=0|y=1)P(y=1)$
 $P(y=0|Rain=1, Temp=2, Windy=0) \sim P(Rain=1|y=0)P(Temp=2|y=0)P(Windy=0|y=0)P(y=0)$

In [28]:
P_y1_x=P_rainy_y1*P_High_y1*P_nowind_y1*P_y1
P_y0_x=P_rainy_y0*P_High_y0*P_nowind_y0*P_y0
print(P_y1_x/(P_y1_x+P_y0_x))

0.3571428571428571


In [None]:
X.iloc[2]

In [None]:
P_y1_x=P_sunny_y1*P_Med_y1*P_nowind_y1*P_y1
P_y0_x=P_sunny_y0*P_Med_y0*P_nowind_y0*P_y0
print(P_y1_x/(P_y1_x+P_y0_x))

## Using Package mixed_naive_bayes

In [None]:
from mixed_naive_bayes import MixedNB
X=GolfData[["Rain", "Temp", "Windy"]]
Y=GolfData["Play"]

In [None]:
clf = MixedNB(categorical_features='all',  alpha=1e-9)
clf.fit(X,Y)
clf.predict_proba(X)