In [6]:
from sklearn.datasets import load_iris
import pandas as pd
import numpy as np

  return f(*args, **kwds)


# Bayes Theorem
Naive Bayes classifiers are a collection of classification algorithms based on Bayes’ Theorem. It is not a single algorithm but a family of algorithms where all of them share a common principle, i.e. every pair of features being classified is independent of each other.<br>
Bayes’ Theorem finds the probability of an event occurring given the probability of another event that has already occurred. Bayes’ theorem is stated mathematically as the following equation:<br>
<img src="Images/naivebayes-1.png">
where A and B are events and P(B) ? 0.<br>

Basically, we are trying to find probability of event A, given the event B is true. Event B is also termed as evidence.
P(A) is the priori of A (the prior probability, i.e. Probability of event before evidence is seen). The evidence is an attribute value of an unknown instance(here, it is event B).<br>

P(A|B) is a posteriori probability of B, i.e. probability of event after evidence is seen.<br>
Now, with regards to our dataset, we can apply Bayes’ theorem in following way:<br>
<img src="Images/naivebayes-2.png">
where, y is class variable and X is a dependent feature vector (of size n) where: <br>
<img src="Images/naivebayes-3.png">

# Naive assumption
Now, its time to put a naive assumption to the Bayes’ theorem, which is, independence among the features. So now, we split evidence into the independent parts.<br>

Now, if any two events A and B are independent, then,<br>

<b>P(A,B) = P(A)P(B)</b>
Hence, we reach to the result:<br>
<img src="Images/naivebayes-4.png">
which can be expressed as:
<img src="Images/naivebayes-5.png">
Now, as the denominator remains constant for a given input, we can remove that term:
<img src="Images/naivebayes-6.png">
Now, we need to create a classifier model. For this, we find the probability of given set of inputs for all possible values of the class variable y and pick up the output with maximum probability. This can be expressed mathematically as:
<img src="Images/naivebayes-7.png">
So, finally, we are left with the task of calculating P(y) and P(xi | y).<br>

Please note that P(y) is also called class probability and P(xi | y) is called conditional probability.<br>

The different naive Bayes classifiers differ mainly by the assumptions they make regarding the distribution of P(xi | y).<br>

# Types of Naive Bayes

## 1) Gaussian Naive Bayes
In Gaussian Naive Bayes, continuous values associated with each feature are assumed to be distributed according to a Gaussian distribution. A Gaussian distribution is also called Normal distribution. When plotted, it gives a bell shaped curve which is symmetric about the mean of the feature values as shown below:
<img src="https://media.geeksforgeeks.org/wp-content/uploads/naive-bayes-classification-1.png" width=30% height=30%>
The likelihood of the features is assumed to be Gaussian, hence, conditional probability is given by:
<img src="https://www.geeksforgeeks.org/wp-content/ql-cache/quicklatex.com-7fb78d7323fcbade0cb664161a8e84c4_l3.svg">


In [19]:
#getting independent(x) and dependent(y) features from our data
x = pd.DataFrame(load_iris()['data'], columns=load_iris()['feature_names'])
y = load_iris()['target']

# getting our train & test data
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.33, random_state=42)

In [49]:
from sklearn.naive_bayes import GaussianNB 
gnb = GaussianNB() 
gnb.fit(x_train, y_train) 
  
# making predictions on the testing set 
pred = gnb.predict(x_test) 

In [22]:
gnb.get_params()

{'priors': None, 'var_smoothing': 1e-09}

In [66]:
# getting our prediction in some readable format
print('[',end=" ")
for i in gnb.predict_proba(x_test):
    print('[',end='')
    for j in i:
        print("{0:.0f},".format(j),end=' ') 
    print('],',end=" ")
print(']')

[ [0, 1, 0, ], [1, 0, 0, ], [0, 0, 1, ], [0, 1, 0, ], [0, 1, 0, ], [1, 0, 0, ], [0, 1, 0, ], [0, 0, 1, ], [0, 1, 0, ], [0, 1, 0, ], [0, 0, 1, ], [1, 0, 0, ], [1, 0, 0, ], [1, 0, 0, ], [1, 0, 0, ], [0, 0, 1, ], [0, 0, 1, ], [0, 1, 0, ], [0, 1, 0, ], [0, 0, 1, ], [1, 0, 0, ], [0, 0, 1, ], [1, 0, 0, ], [0, 0, 1, ], [0, 0, 1, ], [0, 0, 1, ], [0, 0, 1, ], [0, 0, 1, ], [1, 0, 0, ], [1, 0, 0, ], [1, 0, 0, ], [1, 0, 0, ], [0, 1, 0, ], [1, 0, 0, ], [1, 0, 0, ], [0, 0, 1, ], [0, 1, 0, ], [1, 0, 0, ], [1, 0, 0, ], [1, 0, 0, ], [0, 0, 1, ], [0, 1, 0, ], [0, 1, 0, ], [1, 0, 0, ], [1, 0, 0, ], [0, 1, 0, ], [0, 1, 0, ], [0, 0, 1, ], [0, 1, 0, ], [0, 0, 1, ], ]


In [59]:
from sklearn.metrics import confusion_matrix
confusion_matrix(y_test, pred)

array([[19,  0,  0],
       [ 0, 14,  1],
       [ 0,  1, 15]], dtype=int64)

### Advantages
<ul>
    <li>It is not only a simple approach but also a fast and accurate method for prediction.</li>
    <li>Naive Bayes has very low computation cost.</li>
    <li>It can efficiently work on a large dataset.</li>
    <li>It performs well in case of discrete response variable compared to the continuous variable.</li>
    <li>It can be used with multiple class prediction problems.</li>
    <li>It also performs well in the case of text analytics problems.</li>
    <li>When the assumption of independence holds, a Naive Bayes classifier performs better compared to other models like logistic regression.</li>
</ul>

### Disadvantages

<li>The assumption of independent features. In practice, it is almost impossible that model will get a set of predictors which are entirely independent.</li>
<li>If there is no training tuple of a particular class, this causes zero posterior probability. In this case, the model is unable to make predictions. This problem is known as Zero Probability/Frequency Problem.</li>

### When to use Naive Bayes?
Naive  Bayes  classifiers  tend  to  perform  especially  well  in  one  of  the  following situations: <br>
<ol>
    <li>When the naive assumptions actually match the data (very rare in practice)</li>
    <li>For very well-separated categories, when model complexity is less important</li>
    <li>For very high-dimensional data, when model complexity is less important</li>
</ol>