# Naive Bayes Classifier:

Naive Bayes is a family of probabilistic algorithms that can be used in a wide variety of classification tasks.Typical application including filtering spam, classifying document, sentiment prediction etc. It is based on the "Reverend Thomas Bayes" or Bayes Theorem.

The name **Naive** is used because it assumes the features that go into the model is independent of each other. That is changing the value of one feature, does not directly influence or change the value of any of the features used in the algorithm.

Why it is popular? easy to code, easily predict the model


# Conditional Probability:
Conditional Probability is the probability of a certain event (A) given that another event has occurred (B).

The conditional probability of A given B is defined as the quotient of the probability of the joint of events A and B, and the probability of B:

$$P(A | B)=\frac{P(A \cap B)}{P(B)}$$

where  $P(A \cap B)$  is the probability that both events A and B occur.

# Bayes Rule:
The bayes rule is a way of going from P(A|B) known to the training dataset to find P(B|A)

$$P(A | B)=\frac{P(B | A) P(A)}{P(B)}$$
this is called bayes rule
$$
{posterior} =\frac{\text { likelihood } \times \text { prior }}{\text { evidence }}
$$
where:
$\frac{P(B | A)}{P(B)}$ is liklihood ratio

**Posterior probability P(A|B) :** It is equals the prior probability times the liklihood ratio. OR The probability of hypothesis after getting the evidence

**Prior probability P(A) :** The probability of hypothesis before getting the evidence   

**Likelihood probability P(B|A):**  It is the condition probability of each B's given 'A' is of particular class 'C'
P(A|B) is a conditional probability: the likelihood of event 'A' occurring given that  'B' is true.

**Marginal Likelihood probability P(B):** this is the total probability of observing the evidence also called as **Mixture of Gaussian** , **Partition Function** , **Total probability**.

# Bayes Theorem:
It is a way of finding a probability when we know certain other probability.

It tell us: how often A happen given that B happen P(A|B)

When we know:  how often B happens given that A happen P(B|A)

so the formula kind of tells us **Forward** P(A|B) when we know **Backward** P(B|A)

*Bayes theorem is derived from conditional probabilities*

*Bayes theorem is an extension of conditional probability and it helps to use one conditional probability to calculate another one.*

Bayes' theorem is how to flip conditional probability. If you know P(X|Y) (the probability of X given Y), Bayes' theorem tells you how to calculate P(Y|X).


When the features are independent we can extend the Bayes rule to what is called **Naive Bayes**

$$
P\left(A | B_{1},B_{2}, \ldots, B_{n}\right)=\frac{P\left(B_{1} | A\right) P\left(B_{2} | A\right) \ldots P\left(B_{n} | A\right) P(A)}{P\left(B_{1}\right) P\left(B_{2}\right) \ldots P\left(B_{n}\right)}
$$

For all entries in the dataset, the denominator does not change, it remain static. Therefore, the denominator can be removed and a proportionality can be introduced.

$$
P\left(A | B_{1},B_{2} \ldots, B_{n}\right) \propto P(A) \prod_{i=1}^{n} P\left(B_{i} | y\right)
$$


## Types of Naive Bayes Classifier:
### Multinomial Naive Bayes:
This is mostly used for document classification problem, i.e whether a document belongs to the category of sports, politics, technology etc. The features/predictors used by the classifier are the frequency of the words present in the document.

### Bernoulli Naive Bayes:
This is similar to the multinomial naive bayes but the predictors are boolean variables. The parameters that we use to predict the class variable take up only values yes or no, for example if a word occurs in the text or not.

### Gaussian Naive Bayes:
When the predictors take up a continuous value and are not discrete, we assume that these values are sampled from a gaussian distribution.

$$
P\left(x_{i} | y\right)=\frac{1}{\sqrt{2 \pi \sigma_{y}^{2}}} \exp \left(-\frac{\left(x_{i}-\mu_{y}\right)^{2}}{2 \sigma_{y}^{2}}\right)
$$


# Lets do some code

In [1]:
import pandas as pd
import numpy as np


In [20]:
training_data = pd.DataFrame()

# Create our feature variables
training_data['Height'] = [6,5.92,5.58,5.92,5,5.5,5.42,5.75]
training_data['Weight'] = [180,190,170,165,100,150,130,150]
training_data['Foot_Size'] = [12,11,12,10,6,8,7,9]
# Create our target variable
training_data['Gender'] = ['male','male','male','male','female','female','female','female']
training_data

Unnamed: 0,Height,Weight,Foot_Size,Gender
0,6.0,180,12,male
1,5.92,190,11,male
2,5.58,170,12,male
3,5.92,165,10,male
4,5.0,100,6,female
5,5.5,150,8,female
6,5.42,130,7,female
7,5.75,150,9,female


In [22]:
testing_data = pd.DataFrame()
testing_data['Height'] = [6]
testing_data['Weight'] = [130]
testing_data['Foot_Size'] = [8] 
testing_data

Unnamed: 0,Height,Weight,Foot_Size
0,6,130,8


#### Now our task is to predict this person  is male or female

$$
\begin{array}{c}{P( {Male } |  { Height \cap Weight \cap FootSize  })=\frac{P( { Height \cap Weight \cap Foot_Size  } |  { Male}) * P( { Male })}{P( { Height \cap Weight \cap FootSize })}} \\ {P( { Female } |  { Height \cap Weight \cap FootSize })=\frac{P( { Height \cap Weight \cap FootSize } |  { Female)* } P( { Female })}{P( { Height \cap Weight \cap FootSize  })}}\end{array}
$$

#### by using gaussian naive bayes

$$
\begin{array}{c}{\text { posterior (male } )=\frac{P(\text { male }) p(\text { height } | \text { male }) p(\text { weight } | \text { male }) p(\text { foot size } | \text { male })}{\text { marginal probability }}} \\ {\text { posterior (female) }=\frac{P(\text { female }) p(\text { height } | \text { female }) p(\text { weight } | \text { female }) p(\text { foot size } | \text { female })}{\text { marginal probability }}}\end{array}
$$

In [183]:
n_male = training_data['Gender'][training_data['Gender'] == 'male'].count()
n_female = training_data['Gender'][training_data['Gender'] == 'female'].count()
total_ppl = training_data['Gender'].count()

# group data by classes
data_means = training_data.groupby('Gender').mean()
data_means

Unnamed: 0_level_0,Height,Weight,Foot_Size
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
female,5.4175,132.5,7.5
male,5.855,176.25,11.25


In [184]:
# Group the data by gender and calculate the variance of each feature
data_variance = data.groupby('Gender').var()

data_variance

Unnamed: 0_level_0,Height,Weight,Foot_Size
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
female,0.097225,558.333333,1.666667
male,0.035033,122.916667,0.916667


In [185]:
# Means for male
male_height_mean = data_means['Height'][data_variance.index == 'male'].values[0]
male_weight_mean = data_means['Weight'][data_variance.index == 'male'].values[0]
male_footsize_mean = data_means['Foot_Size'][data_variance.index == 'male'].values[0]

# Variance for male
male_height_variance = data_variance['Height'][data_variance.index == 'male'].values[0]
male_weight_variance = data_variance['Weight'][data_variance.index == 'male'].values[0]
male_footsize_variance = data_variance['Foot_Size'][data_variance.index == 'male'].values[0]

# Means for female
female_height_mean = data_means['Height'][data_variance.index == 'female'].values[0]
female_weight_mean = data_means['Weight'][data_variance.index == 'female'].values[0]
female_footsize_mean = data_means['Foot_Size'][data_variance.index == 'female'].values[0]

# Variance for female
female_height_variance = data_variance['Height'][data_variance.index == 'female'].values[0]
female_weight_variance = data_variance['Weight'][data_variance.index == 'female'].values[0]
female_footsize_variance = data_variance['Foot_Size'][data_variance.index == 'female'].values[0]

male_height_variance

0.0350333333333333

$$
P\left(x_{i} | y\right)=\frac{1}{\sqrt{2 \pi \sigma_{y}^{2}}} \exp \left(-\frac{\left(x_{i}-\mu_{y}\right)^{2}}{2 \sigma_{y}^{2}}\right)
$$

#### Now we calculate the probability use this function

In [187]:
def p_x_given_y(x, mean_y, variance_y):
    
    # Input the arguments into a probability density function
    p = 1/(np.sqrt(2*np.pi*variance_y)) * np.exp((-( x - mean_y)**2) / (2 * variance_y))
    
    # return p
    return p

P_male = n_male/total_ppl

P_male = P_male * \
p_x_given_y(person['Height'][0], male_height_mean, male_height_variance) * \
p_x_given_y(person['Weight'][0], male_weight_mean, male_weight_variance) * \
p_x_given_y(person['Foot_Size'][0], male_footsize_mean, male_footsize_variance)

P_male

6.197071843878078e-09

In [246]:
P_female = n_female/total_ppl

P_female = P_female * \
p_x_given_y(person['Height'][0], female_height_mean, female_height_variance) * \
p_x_given_y(person['Weight'][0], female_weight_mean, female_weight_variance) * \
p_x_given_y(person['Foot_Size'][0], female_footsize_mean, female_footsize_variance)
P_female

0.0005377909183630018

In [199]:
if (P_female > P_male):
    print('Person is Female')
else:
    print('Person is male')

Person is Female
