# Naive Bayes

Naive Bayes is a probabilistic machine learning algorithm based on **Bayes Theorem** with **naive assumption** of conditional independence of features. **Bayes Theorem** is a simple mathematical formula used for calculating conditional probabilities. Conditional probability is a measure of the probability of an event occurring given that another event has (by assumption, presumption, assertion, or evidence) occurred.

<p style="text-align: center;">P(A|B) = P(B|A) * P(A) / P(B)</p>

The different naive Bayes classifiers differ mainly by the assumptions they make regarding the distribution of P(A|B). Here are some popular Naive Bayes classifiers:

1. Gaussian Naive Bayes
2. Multinomial Naive Bayes
3. Complement Naive Bayes
4. Bernoulli Naive Bayes
5. Categorical Naive Bayes

### Data Ingestion

In [1]:
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from utils import clean

In [2]:
df = pd.read_csv('./../data/reviews.csv')
df.head()

Unnamed: 0,Id,Review,Label
0,103868,Very bad course.,1
1,15884,"Creativity without a reason, without a real pr...",1
2,25381,Hopeless ! Less clear and understandable than ...,1
3,64220,If you are considering this specialization I w...,1
4,52846,Week 4 does not give enough explanation or ext...,1


### Data Preprocessing

In [3]:
df['Review'] = df['Review'].apply(clean)
df.head()

Unnamed: 0,Id,Review,Label
0,103868,very bad course,1
1,15884,creativity without a reason without a real pr...,1
2,25381,hopeless less clear and understandable than s...,1
3,64220,if you are considering this specialization i w...,1
4,52846,week 4 does not give enough explanation or ext...,1


In [5]:
vectorizer = TfidfVectorizer()
df['Review'] = vectorizer.fit_transform(df['Review'])
df.head()

TypeError: sparse matrix length is ambiguous; use getnnz() or shape[0]

### 1. Gaussian Naive Bayes

In [None]:
from sklearn.naive_bayes import GaussianNB

In [None]:
gnb = GaussianNB()
model = gnb.fit(df['Review'], df['Label'])

### 2. Multinomial Naive Bayes

In [None]:
from sklearn.naive_bayes import MultinomialNB

In [None]:
mnb = MultinomialNB()
model = mnb.fit(df['Review'], df['Label'])

### 3. Complement Naive Bayes

In [None]:
from sklearn.naive_bayes import ComplementNB

In [None]:
cnb = ComplementNB()
model = cnb.fit(df['Review'], df['Label'])

### 4. Bernoulli Naive Bayes

In [None]:
from sklearn.naive_bayes import BernoulliNB

In [None]:
bnb = BernoulliNB()
model = bnb.fit(df['Review'], df['Label'])

### 5. Categorical Naive Bayes

In [None]:
from sklearn.naive_bayes import CategoricalNB

In [None]:
catnb = CategoricalNB()
model = catnb.fit(df['Review'], df['Label'])