# Naive Bayes

### Naive Bayes Basic Concept.
<br>
**John**: Hey I think I have won a lottery
<br>
**Ben**: Why do you think you have won, why you are not sure
<br>
**John**: Because the machine where I scanned my ticket to check the result is not very accurate, it gives correct result 99% of the time. That's why I am only 99% sure.
<br>
**Ben**: I think it's not 99% sure, lets calculate. Hoe many lottery tickets were sold and how  many prizes were there
<br>
**John**: 100,000 tickets were sold and there are 100 prizes
<br>
**Ben**: So you had (100/100000), .001 chances of winning the lottery before checking the result. That's your prior probability
<br>
At the end there would be 100 winners and 99900 loosers
<br>
Assume everybody goes and checks their result on the machine, how many people would have positive result:
<br>
100 * .99 (not all the Actual winners would get positive result)  + 99900 * .01 (some of the loosers would also get positive result)
<br>
99 + 999 =1098
<br>
This means machine would give positive result to 1098 people, but only 99 would be the real winner
<br>
Therefore if you have got positive result then your chances of being a real winner is 99/1098 = .09 or 9%
<br>
This is called Naive Bayes theorem.It is defined as
<br>
** *Below Formula Images are taken from wiki** 
<img src="nb3.png">
<br>
**Explaination of formula**
<br>
result=B, winner = A
<br>
P(A) = Prior probability (Probability of "A" before before "B", probability of lottery winner before checking your result = .001
<br>
P(notA) = 1 - P(A)
<br>
P(B/A) = Probability of B being positive if A is true, probability of (+) winning result is true when you are winner = .99
<br>
P(B/notA) = Probability of B being positive if A is not true, probability of (+) winning result is true when you are not a winner = .01
<br>
P(A/B) = Posterior probability, probability of being a real winner if you get winner result on machine = result = .09
<br>
Using this formula
<br>
P(A/B) = (.001 * .99) /((.001 * .99) + (.999*.01)) = .09
<br>
<br>
**Further on Naive Bayes**
<br>
**John**: So I have only 9% probability of being a real winner.. So what can I do to confirm
<br>
**Ben**:Do we have any other machine to check lottery result
<br>
**John**:Yes actually I checked my result again on the differect machine just before coming to you, and it also gave positive result that I am winner
<br>
**Ben**:WoW. Lets calculate your chances now with the same formula. Now the prior probaility is .09. Which means before checking the result on second machine your chances of being a winner was .09 . Everything else would be same as previous calculation
<br>
P(Winner/result is winner) = P(A/B) = (.09 * .99) /((.09 * .99) + (.91*.01)) = .907
<br>
**Ben**: Congratulations, now we are 90.7% sure that you are a real winner. This is what Naive Bayes is all about the more evidence, the more you are sure of something being true.

In [1]:
#prior probability
PA = .001
#accuracy of result
PBA = .99
#number of times result checked
n=4
#posterior probability after n evidence
for i in range(n):
    PAB = (PA*PBA)/((PA*PBA)+((1-PA)*(1-PBA)))
    PA = PAB
    print ("You probability of being a winner after getting winner result {} time(s) is {}".format(i+1,PAB))


You probability of being a winner after getting winner result 1 time(s) is 0.09016393442622944
You probability of being a winner after getting winner result 2 time(s) is 0.9074999999999999
You probability of being a winner after getting winner result 3 time(s) is 0.9989714794017902
You probability of being a winner after getting winner result 4 time(s) is 0.9999896003148012


### Lets take an hypothetical example to understand how Naive Bayes works in scikit-learn
**Once you understand this, you will be able to write your own Naive Bayes in any programming language**
I have done all the calculations in spredsheet as well, the spreadsheet and the ipython notebook can be found on following github directory
<br>
[Github Link devksingh](http://www.github.com/devksingh/ML_Naive_Bayes)

In [2]:
#import all the python library required for naive bayes modelling
import numpy as np
#we are using BernoulliNB as the data we have is in (0,1) format
from sklearn.naive_bayes import BernoulliNB
#library used to check accuracy of model prediction
from sklearn.metrics import accuracy_score

**Naive Bayes for choosing best ingredient for creating new dish** <br>
A chef is going to use naïve bayes to design a new dish. He has got 4 ingredients, from which he is going to choose and mix few ingredients and then either fry or bake it. Every dish will be tasted by random people and they are going to tell chef if they liked or not liked the food.<br>Here is the test data:
<br>
<img src="nb4.png">
<br>
Here is the result of tasting:
<br>
<img src="nb5.png">

In [3]:
#define test data and result
X=np.array([[1,1,0,0,0],[1,1,0,0,1],[1,0,0,1,1],[1,0,0,1,0],[1,0,1,0,1],[1,0,1,0,1],[0,0,0,1,0],[0,0,1,1,0],[0,1,0,1,0],[0,1,0,1,1]])
y=np.array([[1],        [1],        [0],        [1],        [0],        [1],        [1],        [0],        [1],        [0]])


For predicting we would use Naive Bayes formula and for given data we would calculate P(A/B) for class "Like" and for Class "Don't Like" and whichever is greater would be the prediction. That means we will check for given data which is more probable "Like" or "Don't Like" <br><br>
P(A/B) = (P(B/A) * P (A)) / P(B)<br>
P(A) is probability of particular class, class-1 is 6 times out of 10 time so P(A) for class "Like" is .6<br>
P(A/B) ishow many times ingredient was present for that class, for example Ingredient 1 is present 4 times out 6 times of class "Like<br>
We may not use P(B) which means how many time particular ingredient was present during all the 10 times tasting, because it will be same for both the class. For example Ingredient-1 is present 6 times, which is true for both the class.

In [4]:
# define classifier
clf =  BernoulliNB()
#fit the classifier with training data
clf.fit(X,y)

  y = column_or_1d(y, warn=True)


BernoulliNB(alpha=1.0, binarize=0.0, class_prior=None, fit_prior=True)

#### For using Naive bayes for predicting which ingredients to choose, first we will find P(B/A) for both class ("Like" and "Don't Like")
<br>
In the below image you will see that there are two counts "Actual" and Modified" <br>
Actual is how many times actually that ingredient was present for that class like for example Ingredient 1 is present 4 times out 6 times of class "Like<br>
Modified is we add "One" to the ingredient count and "Two" to the total class count. We do this so that in case we have ingredient which is not present at all for particular class then the probability would be zero and at the time of calculating it will make the combined probability as zero because we multiply all the individual feature (ingredient) probability to get the probability of particular class.
**scikit-learn also does the same addition, as you can see in below result**
<img src="nb6.png">


In [5]:
feature_prob=np.exp(clf.feature_log_prob_)
print("***Feature probability for Like***")
print(feature_prob[1])
print("***Feature probability for Don't Like***")
print(feature_prob[0])

***Feature probability for Like***
[ 0.625  0.5    0.25   0.5    0.375]
***Feature probability for Don't Like***
[ 0.5         0.33333333  0.5         0.66666667  0.66666667]


### Prediction
for predicting we would calculation probability or each class and compare, whichever is higher would be the predicted class for that given data<br>
for given data this is how we calculate the probaility of each class using formula ((multiply P(B/A) for each feature (ingredient) * P(A))<br> P(A) is .6(6 out of 10) for class "Like" and .4(4 out of 10) for class "Don't Like" <br> While multiplying if the ingredient is 1 then multiply the P(B/A) of that ingredient, if it's ) then (1-P(B/A) for that ingredient, because if it's not present then P(B/A) would be (1-p)<br>
If data is 1,1,1,0,0 the probability of class "Like" would be (0.625 x 0.5 x 0.25 x (1-0.5) x (1-0.375)) x 0.6 = 0.0146484375 <br>
And for class "Don't Like" it would be (0.5 x 0.3333333 x 0.5 x (1-0.666666) x (1-0.666666)) x 0.4 = 0.003703718 <br>
Therefore class "Don't Like" would have probability of (0.003703718) / ((0.0146484375)+(0.003703718))  = 0.20181317 <br>
Therefore class "Like" would have probability of (0.0146484375) / ((0.0146484375)+(0.003703718))  = 0.79818683<br>
<br>
**Prediction is Class "Like"**
<img src="nb7.png">

In [6]:
test=np.array([[1,1,1,0,0]])
print(clf.predict_proba(test))
print(clf.predict(test))

[[ 0.20181317  0.79818683]]
[1]


### Calculate accuracy on training data
<img src="nb8.png">

In [7]:
pred_train=clf.predict(X)
acc=accuracy_score(y, pred_train)
print("The accuracy on training data is :  ", acc)

The accuracy on training data is :   0.8


## Use of Naive Bayes
We generally use Naive Bayes when we have to check effect on chances after any particular event which is supposed to affect the previous chance. <br>Like we did in case of lottery,earlier the chances of winning was .001 but after checking the first result it became 0.09 and after 2nd result it became 0.907. It's probabilistic approach but why it's called Naive is it assumes that all the features are independent. 
<br> For example if we are using Naive Bayes to detect spam in a message, it would predict only based on presence of particular word or collection of words, order of the words in message does not have any effect on result.