# Bernoulli Naive Bayes
- It is sub-type of Naive Bayes theorem which works on Bernouli's theorem
- It is generaly used for discrete data, with features in binay form

### Bernoulli distribution
As we deal with binary values let, **z** be the probability of sucess thus **1-z** be the probability of failure. Then for the random variable **X** the bernouli classifier is as follows:

$$ p(x) = P[X=x] = \begin{cases}1-z & x = 0\\
z & x = 1\end{cases}$$

Thus the **Bernoulli Naive Bayes** therom beomes as follows:

$$P\begin{pmatrix}\frac{x_i}{y}\end{pmatrix} = P\begin{pmatrix}\frac{i}{y}\end{pmatrix}x_i + \begin{pmatrix}1 - P\begin{pmatrix}\frac{i}{y}\end{pmatrix}\end{pmatrix}(1-x_i)$$


### Advantages of Bernoulli Naive Bayes:

- They are extremely fast as compared to other classification models and can also be used for realtime prediction
- It can handle irrelevant features nicely and work with relevent features because of explicit penalty to the model for non-occurrence of any of the features which are necessary for predicting the output y. 
- In case of small amount of data or small documents(for example in text classification), Bernoulli Naive Bayes gives more accurate and precise results as compared to other models.



### Disdvantages of Bernoulli Naive Bayes:

- Since, it only works in haves and have nots, it sometimes makes a strong assumption based on the shape of data
- Just like all Naive Bayes, the dependent features can affect the prediction and accuracy of the model. 
- If there is a categorial variable which is not present in training dataset, it results in zero frequency problem. This problem can be easily solved by Laplace estimation.

It is no fun learnings the theory, let us implement of a burnoulli naive bayes model 

# Example of a Model

In [30]:
# some imports for data 
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split

# some imports for model
from sklearn.pipeline import Pipeline
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.naive_bayes import BernoulliNB

# some imports for test
from sklearn.metrics import accuracy_score, confusion_matrix,classification_report

For the model the dataset was taken from kaggle [Fake and real news dataset](https://www.kaggle.com/clmentbisaillon/fake-and-real-news-dataset). Now, let us load the data using *read_csv()* function.

In [31]:
true = pd.read_csv("True.csv")
fake = pd.read_csv("Fake.csv")
fake.head()

Unnamed: 0,title,text,subject,date
0,Donald Trump Sends Out Embarrassing New Year’...,Donald Trump just couldn t wish all Americans ...,News,"December 31, 2017"
1,Drunk Bragging Trump Staffer Started Russian ...,House Intelligence Committee Chairman Devin Nu...,News,"December 31, 2017"
2,Sheriff David Clarke Becomes An Internet Joke...,"On Friday, it was revealed that former Milwauk...",News,"December 30, 2017"
3,Trump Is So Obsessed He Even Has Obama’s Name...,"On Christmas day, Donald Trump announced that ...",News,"December 29, 2017"
4,Pope Francis Just Called Out Donald Trump Dur...,Pope Francis used his annual Christmas Day mes...,News,"December 25, 2017"


Since, the fake and true news are in seperate data frames, we will add the two data to make **all_news** data frame using *concat()* function. We also modify the index of the individual data frame using *reset_index()*

In [37]:
fake['target'] = 'fake'
true['target'] = 'true'
all_news = pd.concat([fake, true]).reset_index(drop = True)
all_news.head()

Unnamed: 0,title,text,subject,date,target
0,Donald Trump Sends Out Embarrassing New Year’...,Donald Trump just couldn t wish all Americans ...,News,"December 31, 2017",fake
1,Drunk Bragging Trump Staffer Started Russian ...,House Intelligence Committee Chairman Devin Nu...,News,"December 31, 2017",fake
2,Sheriff David Clarke Becomes An Internet Joke...,"On Friday, it was revealed that former Milwauk...",News,"December 30, 2017",fake
3,Trump Is So Obsessed He Even Has Obama’s Name...,"On Christmas day, Donald Trump announced that ...",News,"December 29, 2017",fake
4,Pope Francis Just Called Out Donald Trump Dur...,Pope Francis used his annual Christmas Day mes...,News,"December 25, 2017",fake


Now, we divide the **all_news** into training and testing data in 70:30 ratio in random manner.

In [38]:
x_train,x_test,y_train,y_test = train_test_split(all_news['text'], all_news.target, test_size=0.3, random_state=49)

Since, data is in text format, we need to transform the text in the data into binary vectors. And we will then apply the bernoulli naive bayes to the data. For making our work easier we will be using **Pipeline()** for feeding the input to the model.

In [39]:
# creating the pipeline
pipe = Pipeline([('vect', CountVectorizer()),
                 ('tfidf', TfidfTransformer()),
                 ('model', BernoulliNB())])

# training the model
model = pipe.fit(x_train, y_train)

Now, let us see test the model on the testing data to check the accuracy of our model.

In [40]:
prediction = model.predict(x_test)
print("accuracy: {}%".format(round(accuracy_score(y_test, prediction)*100,2)))

accuracy: 94.36%


Hurrey! It is a good accuracy. However let us look how our model performed in detail using confusion matrix.

In [35]:
print(confusion_matrix(y_test, prediction))

[[6444  531]
 [ 229 6266]]


In [None]:
Let us also check generate the detailed report of the model

In [36]:
print(classification_report(y_test, prediction))

              precision    recall  f1-score   support

        fake       0.97      0.92      0.94      6975
        true       0.92      0.96      0.94      6495

    accuracy                           0.94     13470
   macro avg       0.94      0.94      0.94     13470
weighted avg       0.94      0.94      0.94     13470

