# Naive Bayes

## Exercise 1

Given the following dataset, with input attributes $A$, $B$, and $C$ and target attribute $Y$, predict the entry $A=0, B=0, C=1$ using `BernoulliNB(alpha=1e-10)` and `predict_proba()` then manually calculate the probabilities using the formulas.

In [1]:
from sklearn.naive_bayes import BernoulliNB
import pandas as pd
d = pd.DataFrame({'A': [0, 0, 1, 0, 1, 1, 1],
                  'B': [0, 1, 1, 0, 1, 0, 1],
                  'C': [1, 0, 0, 1, 1, 0, 0],
                  'Y': [0, 0, 0, 1, 1, 1, 1]})
X = d[['A', 'B', 'C']]
Y = d['Y']
X_predict = pd.DataFrame({'A': [0], 'B': [0], 'C': [1]})

cl = BernoulliNB(alpha=1e-10).fit(X, Y)
print(cl.predict(X_predict))
print(cl.predict_proba(X_predict))
print(cl.classes_)

print("""
For Y=0:
P(Y=0) * P(A=0|Y=0) * P(B=0|Y=0) * P(C=1|Y=0)
= (3/7) * (2/3) * (1/3) * (1/3) = 0.03174
For Y=1:
P(Y=1) * P(A=0|Y=1) * P(B=0|Y=1) * P(C=1|Y=1)
= (4/7) * (1/4) * (2/4) * (2/4) = 0.03571

Normalize these probabilities:
Total = 0.0952 + 0.0714 = 0.06745

P(Y=0|A=0,B=0,C=1) = 0.0952/0.1666 = 0.4705
P(Y=1|A=0,B=0,C=1) = 0.0714/0.1666 = 0.5293
""")

[1]
[[0.47058824 0.52941176]]
[0 1]

For Y=0:
P(Y=0) * P(A=0|Y=0) * P(B=0|Y=0) * P(C=1|Y=0)
= (3/7) * (2/3) * (1/3) * (1/3) = 0.03174
For Y=1:
P(Y=1) * P(A=0|Y=1) * P(B=0|Y=1) * P(C=1|Y=1)
= (4/7) * (1/4) * (2/4) * (2/4) = 0.03571

Normalize these probabilities:
Total = 0.0952 + 0.0714 = 0.06745

P(Y=0|A=0,B=0,C=1) = 0.0952/0.1666 = 0.4705
P(Y=1|A=0,B=0,C=1) = 0.0714/0.1666 = 0.5293



## Exercise 2

Consider two random variables $X_1$ and $X_2$ and a label $Y$ assigned to each instance as in the dataset `d` created below.

1. Classify the instance $X_1=0,X_2=0$ using Naive Bayes.

1. According to Naive Bayes, what is the probability of this classification?

1. How many probabilities are estimated by the model (check the `class_log_prior_` and `feature_log_prob_` attributes)?

1. How many probabilities would be estimated by the model if there were $n$ features instead of 2?

In [3]:
import pandas as pd
from tools.pd_helpers import apply_counts

d_grouped = pd.DataFrame({
    'X1': [0, 0, 1, 1, 0, 0, 1, 1],
    'X2': [0, 0, 0, 0, 1, 1, 1, 1],
    'C' : [2, 18, 4, 1, 4, 1, 2, 18],
    'Y' : [0, 1, 0, 1, 0, 1, 0, 1]})
d = apply_counts(d_grouped, 'C')

X = d[['X1', 'X2']]
Y = d['Y']
X_predict = pd.DataFrame({'X1': [0], 'X2': [0]})

cl = BernoulliNB().fit(X, Y)
# 2
print("\nPoint 1 & 2:")
print(f'Prediction: {cl.predict(X_predict)}')
print(f'Prediction probabilities: {cl.predict_proba(X_predict)}')
print(f'Prediction classes: {cl.classes_}')
# 3
print("\nPoint 3:")
print(f'class_log_prior_: {cl.class_log_prior_}')
print(f'feature_log_prob_: {cl.feature_log_prob_}')
# 4 
print("\nPoint 4:")
print("We need P(Xᵢ=1|Y=0), P(Xᵢ=1|Y=1) for each feature Xi.\nSo the number of probabilities is 2n for n features in features_log_prob_ and 1 probability(P(Y=1)) for class_log_prior_ for a label class with 2 classes. Total: 2n + 1 probabilities estimated obligatory. class_log_prior_ stores both the P(Y=1) and P(Y=0) probabilities so there are 2n + 2 probabilities in this case")


Point 1 & 2:
Prediction: [1]
Prediction probabilities: [[0.24 0.76]]
Prediction classes: [0 1]

Point 3:
class_log_prior_: [-1.42711636 -0.27443685]
feature_log_prob_: [[-0.69314718 -0.69314718]
 [-0.69314718 -0.69314718]]

Point 4:
We need P(Xᵢ=1|Y=0), P(Xᵢ=1|Y=1) for each feature Xi.
So the number of probabilities is 2n for n features in features_log_prob_ and 1 probability(P(Y=1)) for class_log_prior_ for a label class with 2 classes. Total: 2n + 1 probabilities estimated
