# Naive Bayes Classifier

Naive Bayes Classifier is a supervised ML algorithm, which is based on Bayes Theorem and used for solving classification probelms. It is a probabilistic classifier, which means that given an input, it predicts the probability of the input being classified for all the classes.

It is called Naive because it assumes that the occurrence of a certain feature is independent of the occurrence of other features.

It is called Bayes because it depends on the principle of Bayes theorem.

# Bayes Theorem

Bayes' theorem, named after Thomas Bayes, describes the probability of an event, based on prior knowledge of conditions that might be related to the event. It works on conditional probability. 

Conditional probability is the probability that something will happen, given that something else has already occurred. 

Bayes Theorem is stated mathematically as:

$${\displaystyle P(A\mid B)={\frac {P(B\mid A)\,P(A)}{P(B)}}}$$
where $A$ and $B$ are events and $P(B)\neq{0}$.

$P(A\mid B)$ is posterior probability: the likelihood of event $A$ occurring given that $B$ is true.

$P(B\mid A)$ is likelihood probability: the likelihood of event $B$ occurring given that $A$ is true.

$P(A)$ is prior probability: probability of $A$ occurring independently.

$P(B)$ is marginal probability: probability of $B$ occurring independently.

## An Example

Suppose we have a data of **Weather Conditions** and corresponding target variable is **Play**. Using this dataset, we need to decide that whether we should play or not on a particular day according to the weather conditions.

Consider the following table; and using Naive Bayes algorithm, predict whether a player should play or not, if the weather is Sunny.
    
| Sl. No. | Weather | Play |
| :- | :-: | -: |
| 0 | Rainy	| No |
| 1	| Rainy	| No |
| 2	| Overcast | Yes |
| 3	| Sunny	| Yes |
| 4	| Sunny	| Yes |
| 5	| Sunny | No |
| 6	| Overcast | Yes |
| 7	| Rainy	| No |
| 8	| Rainy	| Yes |
| 9	| Sunny | Yes |
| 10 | Rainy | Yes |
| 11 | Overcast	| Yes |
| 12 | Overcast	| Yes |
| 13 | Sunny | No |

Below is the Frequency Table for weather conditions.

| Weather | Yes | No |
| :-: | :-: | :-: |
| Overcast | 4 | 0 |
| Rainy | 2 | 3 |
| Sunny | 3 | 2 |
| | | |
| **Total** | 9 | 5 |
| **Probability** | P(Yes) = 9/14 = 0.64 | P(No) = 5/14 = 0.36 |

Here is the Likelihood Table for weather conditions with all the calculated probabilities. 

| Weather | Yes | No | P(Yes) | P(No) | Total Probability |
| :-: | :-: | :-: | :-: | :-: | :-: |
| Overcast | 4 | 0 | 4/9 = 0.44 | 0/5 = 0 | P(Overcast) = 4/14 = 0.28 |
| Rainy | 2 | 3 | 2/9 = 0.22 | 3/5 = 0.6 | P(Rainy) = 5/14 = 0.36 |
| Sunny | 3 | 2 | 3/9 = 0.33 | 2/5 = 0.4 | P(Sunny) = 5/14 = 0.36 |

Now, applying Bayes Theorem and calculating the posterior probabilities, we get the following:

The probability of playing on a Sunny day is -
$${P(Yes\mid Sunny) = {\frac {P(Sunny\mid Yes)}{P(Sunny)}}\cdot P(Yes) = {\frac {0.33}{0.36}}\times 0.64 = 0.59}$$ 

The probability of not playing on a Sunny day is -
$${P(No\mid Sunny) = {\frac {P(Sunny\mid No)}{P(Sunny)}}\cdot P(No) = {\frac {0.4}{0.36}}\times 0.36 = 0.4}$$ 

Since $P(Yes\mid Sunny)$ > $P(No\mid Sunny)$

So, the prediction that the player would Play on a Sunny day is **Yes**.

# Types of Naive Bayes Classifier

There are three types of Naive Bayes Model:

$Gaussian:$ The Gaussian classifier assumes that the features follow a normal distribution.

$Multinomial:$ The Multinomial classifier is used when the data is multinomial distributed, i.e. for discrete counts.

$Bernoulli:$ The Bernoulli classifier is useful if the feature vectors are binary, i.e. zeros and ones.

## Gaussian Naive Bayes example

In [1]:
import numpy as np
import pandas as pd

In [2]:
# categorical dataset to determine whether a user purchased a particular product
soc = pd.read_csv('Social_Network_Ads.csv')
soc

Unnamed: 0,User ID,Gender,Age,EstimatedSalary,Purchased
0,15624510,Male,19,19000,0
1,15810944,Male,35,20000,0
2,15668575,Female,26,43000,0
3,15603246,Female,27,57000,0
4,15804002,Male,19,76000,0
...,...,...,...,...,...
395,15691863,Female,46,41000,1
396,15706071,Male,51,23000,1
397,15654296,Female,50,20000,1
398,15755018,Male,36,33000,0


In [3]:
# converting categorical column into indicator variables 
gen = pd.get_dummies(soc.Gender, drop_first=True)
soc1 = pd.concat([soc.Age, gen, soc['EstimatedSalary'], soc.Purchased], axis=1)
soc1

Unnamed: 0,Age,Male,EstimatedSalary,Purchased
0,19,1,19000,0
1,35,1,20000,0
2,26,0,43000,0
3,27,0,57000,0
4,19,1,76000,0
...,...,...,...,...
395,46,0,41000,1
396,51,1,23000,1
397,50,0,20000,1
398,36,1,33000,0


In [4]:
# extracting feature variables
X = soc1.iloc[:,:-1].values
X

array([[   19,     1, 19000],
       [   35,     1, 20000],
       [   26,     0, 43000],
       ...,
       [   50,     0, 20000],
       [   36,     1, 33000],
       [   49,     0, 36000]], dtype=int64)

In [5]:
# extracting target variables
Y = soc1.iloc[:,3].values
Y

array([0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 1,
       0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 0, 1, 1, 0,
       1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0,
       1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1,
       0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1,

In [6]:
# splitting the dataset into train data and test data
from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size=0.25, random_state=2)

In [7]:
# standardizing the features by removing the mean and scaling to unit variance
from sklearn.preprocessing import StandardScaler

scale = StandardScaler()
x_train = scale.fit_transform(x_train)
x_test = scale.transform(x_test)

In [8]:
# training the Gaussian Naive Bayes model
from sklearn.naive_bayes import GaussianNB

clf = GaussianNB()
clf.fit(x_train, y_train)

GaussianNB()

In [9]:
# predicting the output of test data through our model
y_pred = clf.predict(x_test)
y_pred

array([0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0,
       0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1,
       0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0,
       1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0], dtype=int64)

In [10]:
# finding the accuracy of our model
from sklearn import metrics

metrics.accuracy_score(y_test, y_pred)

0.86

We have an accuracy of 86%.

In [11]:
# tabular summary of the number of correct and incorrect predictions of our model
metrics.confusion_matrix(y_test, y_pred)

array([[57,  5],
       [ 9, 29]], dtype=int64)

In [12]:
# performance evaluation metrics report
print(metrics.classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       0.86      0.92      0.89        62
           1       0.85      0.76      0.81        38

    accuracy                           0.86       100
   macro avg       0.86      0.84      0.85       100
weighted avg       0.86      0.86      0.86       100



In [13]:
# predicting probabilities of both target classes for the test data
clf.predict_proba(x_test)

array([[0.9372846 , 0.0627154 ],
       [0.98552985, 0.01447015],
       [0.89713695, 0.10286305],
       [0.96366878, 0.03633122],
       [0.15484576, 0.84515424],
       [0.39774544, 0.60225456],
       [0.97909431, 0.02090569],
       [0.97650321, 0.02349679],
       [0.77411121, 0.22588879],
       [0.15017162, 0.84982838],
       [0.98540917, 0.01459083],
       [0.98824089, 0.01175911],
       [0.47228175, 0.52771825],
       [0.48814957, 0.51185043],
       [0.92445587, 0.07554413],
       [0.94976377, 0.05023623],
       [0.9946957 , 0.0053043 ],
       [0.05739065, 0.94260935],
       [0.17766136, 0.82233864],
       [0.9819141 , 0.0180859 ],
       [0.79649932, 0.20350068],
       [0.56736486, 0.43263514],
       [0.95926616, 0.04073384],
       [0.9015874 , 0.0984126 ],
       [0.97877096, 0.02122904],
       [0.58919161, 0.41080839],
       [0.519942  , 0.480058  ],
       [0.82216098, 0.17783902],
       [0.00848171, 0.99151829],
       [0.81161804, 0.18838196],
       [0.

#### Making Predictions with New Data

In [14]:
clf.predict(scale.transform([[19,1,10000]]))

array([0], dtype=int64)

Age: 19, Gender: Male, Salary: 10000 -> Product Purchased: **No**

In [15]:
clf.predict(scale.transform([[50,0,11500]]))

array([1], dtype=int64)

Age: 50, Gender: Female, Salary: 11500 -> Product Purchased: **Yes**

## Multinomial Naive Bayes example

In [16]:
# collection of newsgroup documents from 20 different newsgroups.
from sklearn.datasets import fetch_20newsgroups

text = fetch_20newsgroups()
text.target_names

['alt.atheism',
 'comp.graphics',
 'comp.os.ms-windows.misc',
 'comp.sys.ibm.pc.hardware',
 'comp.sys.mac.hardware',
 'comp.windows.x',
 'misc.forsale',
 'rec.autos',
 'rec.motorcycles',
 'rec.sport.baseball',
 'rec.sport.hockey',
 'sci.crypt',
 'sci.electronics',
 'sci.med',
 'sci.space',
 'soc.religion.christian',
 'talk.politics.guns',
 'talk.politics.mideast',
 'talk.politics.misc',
 'talk.religion.misc']

In [17]:
# extracting the train data and test data
train = fetch_20newsgroups(subset='train')
test = fetch_20newsgroups(subset='test')

In [18]:
print(train.data[10])

From: irwin@cmptrc.lonestar.org (Irwin Arnstein)
Subject: Re: Recommendation on Duc
Summary: What's it worth?
Distribution: usa
Expires: Sat, 1 May 1993 05:00:00 GMT
Organization: CompuTrac Inc., Richardson TX
Keywords: Ducati, GTS, How much? 
Lines: 13

I have a line on a Ducati 900GTS 1978 model with 17k on the clock.  Runs
very well, paint is the bronze/brown/orange faded out, leaks a bit of oil
and pops out of 1st with hard accel.  The shop will fix trans and oil 
leak.  They sold the bike to the 1 and only owner.  They want $3495, and
I am thinking more like $3K.  Any opinions out there?  Please email me.
Thanks.  It would be a nice stable mate to the Beemer.  Then I'll get
a jap bike and call myself Axis Motors!

-- 
-----------------------------------------------------------------------
"Tuba" (Irwin)      "I honk therefore I am"     CompuTrac-Richardson,Tx
irwin@cmptrc.lonestar.org    DoD #0826          (R75/6)
-------------------------------------------------------------------

In [19]:
train.target

array([7, 4, 4, ..., 3, 1, 8])

In [20]:
# transforming the text into a meaningful representation of numbers and fitting into the Multinomial Naive Bayes model
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.pipeline import make_pipeline

model = make_pipeline(TfidfVectorizer(), MultinomialNB())
model.fit(train.data, train.target)

Pipeline(steps=[('tfidfvectorizer', TfidfVectorizer()),
                ('multinomialnb', MultinomialNB())])

In [21]:
# predicting the test data output through our model
pred = model.predict(test.data)
pred

array([ 7, 11,  0, ...,  9,  3, 15])

In [22]:
metrics.accuracy_score(test.target, pred)

0.7738980350504514

We have an accuracy score of 77.3%.

In [23]:
metrics.confusion_matrix(test.target, pred)

array([[166,   0,   0,   1,   0,   1,   0,   0,   1,   1,   1,   3,   0,
          6,   3, 123,   4,   8,   0,   1],
       [  1, 252,  15,  12,   9,  18,   1,   2,   1,   5,   2,  41,   4,
          0,   6,  15,   4,   1,   0,   0],
       [  0,  14, 258,  45,   3,   9,   0,   2,   1,   3,   2,  25,   1,
          0,   6,  23,   2,   0,   0,   0],
       [  0,   5,  11, 305,  17,   1,   3,   6,   1,   0,   2,  19,  13,
          0,   5,   3,   1,   0,   0,   0],
       [  0,   3,   8,  23, 298,   0,   3,   8,   1,   3,   1,  16,   8,
          0,   2,   8,   3,   0,   0,   0],
       [  1,  21,  17,  13,   2, 298,   1,   0,   1,   1,   0,  23,   0,
          1,   4,  10,   2,   0,   0,   0],
       [  0,   1,   3,  31,  12,   1, 271,  19,   4,   4,   6,   5,  12,
          6,   3,   9,   3,   0,   0,   0],
       [  0,   1,   0,   3,   0,   0,   4, 364,   3,   2,   2,   4,   1,
          1,   3,   3,   4,   0,   1,   0],
       [  0,   0,   0,   1,   0,   0,   2,  10, 371,   0,   0,  

In [24]:
print(metrics.classification_report(test.target, pred))

              precision    recall  f1-score   support

           0       0.80      0.52      0.63       319
           1       0.81      0.65      0.72       389
           2       0.82      0.65      0.73       394
           3       0.67      0.78      0.72       392
           4       0.86      0.77      0.81       385
           5       0.89      0.75      0.82       395
           6       0.93      0.69      0.80       390
           7       0.85      0.92      0.88       396
           8       0.94      0.93      0.93       398
           9       0.92      0.90      0.91       397
          10       0.89      0.97      0.93       399
          11       0.59      0.97      0.74       396
          12       0.84      0.60      0.70       393
          13       0.92      0.74      0.82       396
          14       0.84      0.89      0.87       394
          15       0.44      0.98      0.61       398
          16       0.64      0.94      0.76       364
          17       0.93    

#### Making Predictions with New Data

In [25]:
# user-defined function to make predictions through our model
def predict(string, train=train, model=model):
    y = model.predict([string])
    return train.target_names[y[0]]

In [26]:
predict('let us hire an auto to go')

'rec.autos'

In [27]:
predict('I do cycling everyday')

'rec.motorcycles'

In [28]:
predict('what is the screen size')

'comp.windows.x'

## Bernoulli Naive Bayes example

In [29]:
# collection of SMS messages tagged as spam or legitimate
data = pd.read_csv('spam.csv', encoding='latin-1')
data

Unnamed: 0,v1,v2,Unnamed: 2,Unnamed: 3,Unnamed: 4
0,ham,"Go until jurong point, crazy.. Available only ...",,,
1,ham,Ok lar... Joking wif u oni...,,,
2,spam,Free entry in 2 a wkly comp to win FA Cup fina...,,,
3,ham,U dun say so early hor... U c already then say...,,,
4,ham,"Nah I don't think he goes to usf, he lives aro...",,,
...,...,...,...,...,...
5567,spam,This is the 2nd time we have tried 2 contact u...,,,
5568,ham,Will Ì_ b going to esplanade fr home?,,,
5569,ham,"Pity, * was in mood for that. So...any other s...",,,
5570,ham,The guy did some bitching but I acted like i'd...,,,


In [30]:
# dropping unnecessary columns
data.drop(['Unnamed: 2','Unnamed: 3','Unnamed: 4'], axis=1, inplace=True)

In [31]:
# renaming columns
data.rename(columns = {"v1": "class", "v2":"message"}, inplace=True)
data

Unnamed: 0,class,message
0,ham,"Go until jurong point, crazy.. Available only ..."
1,ham,Ok lar... Joking wif u oni...
2,spam,Free entry in 2 a wkly comp to win FA Cup fina...
3,ham,U dun say so early hor... U c already then say...
4,ham,"Nah I don't think he goes to usf, he lives aro..."
...,...,...
5567,spam,This is the 2nd time we have tried 2 contact u...
5568,ham,Will Ì_ b going to esplanade fr home?
5569,ham,"Pity, * was in mood for that. So...any other s..."
5570,ham,The guy did some bitching but I acted like i'd...


In [32]:
# splitting data
x1 = np.array(data["message"])
y1 = np.array(data["class"])

In [33]:
# transforming the given text into vectors on the basis of the frequency of each word that occurs in the entire text
from sklearn.feature_extraction.text import CountVectorizer

cv = CountVectorizer()
x1 = cv.fit_transform(x1)

In [34]:
# splitting dataset for model training
xtrain, xtest, ytrain, ytest = train_test_split(x1, y1, test_size=0.3, random_state=42)

In [35]:
# fitting train data into Bernoulli Naive Bayes model
from sklearn.naive_bayes import BernoulliNB

mod = BernoulliNB(binarize=0.0)
mod.fit(xtrain, ytrain)

BernoulliNB()

In [36]:
# predicting test data output with our model
ypred = mod.predict(xtest)
ypred

array(['ham', 'ham', 'ham', ..., 'ham', 'ham', 'ham'], dtype='<U4')

In [37]:
metrics.accuracy_score(ytest, ypred)

0.9784688995215312

We have a whopping accuracy of 97.8%.

In [38]:
metrics.confusion_matrix(ytest, ypred)

array([[1451,    2],
       [  34,  185]], dtype=int64)

In [39]:
print(metrics.classification_report(ytest, ypred))

              precision    recall  f1-score   support

         ham       0.98      1.00      0.99      1453
        spam       0.99      0.84      0.91       219

    accuracy                           0.98      1672
   macro avg       0.98      0.92      0.95      1672
weighted avg       0.98      0.98      0.98      1672



### References

https://www.javatpoint.com/machine-learning-naive-bayes-classifier 

https://jakevdp.github.io/PythonDataScienceHandbook/05.05-naive-bayes.html