# Naive Bayes Classifier

## Theoretical Understanding:
1. Tutorial 48th : https://www.youtube.com/watch?v=jS1CKhALUBQ
2. Tutorial 49th:  https://www.youtube.com/watch?v=temQ8mHpe3k

### What Are the Basic Assumption?
Features Are Independent

### Advantages
1. Work Very well with many number of features
2. Works Well with Large training Dataset
3. It converges faster when we are training the model
4. It also performs well with categorical features

### Disadvantages
1. Correlated features affects performance

### Whether Feature Scaling is required?
No
### Impact of Missing Values?
Naive Bayes can handle missing data. Attributes are handled separately by the algorithm at both model construction time and prediction time. As such, if a data instance has a missing value for an attribute, it can be ignored while preparing the model, and ignored when a probability is calculated for a class value
tutorial :https://www.youtube.com/watch?v=EqjyLfpv5oA
### Impact of outliers?
It is usually robust to outliers

### Different Problem statement you can solve using Naive Baye's
1. Sentiment Analysis
2. Spam classification
3. twitter sentiment analysis
4. document categorization

-

## Types of Naive Bayes

### Bernoulli Naive Bayes

It assumes that all our features are binary such that they
take only two values. Means Os can represent "word does not occur in the document"
and 1s as "word occurs in the document"

### Multinomial Naive Bayes (good for NLP)

Its is used when we have discrete data (e.g. movie
ratings ranging 1 and 5 as each rating will have certain frequency to represent). In
text learning we have the count of each word to predict the class or label.

### Gaussian Naive Bayes

Because of the assumption of the normal distribution,
Gaussian Naive Bayes is used in cases when all our features are continuous. For
example in Iris dataset features are sepal width, petal width, sepal length, petal
length. So its features can have different values in data set as width and length can
vary. We can't represent features in terms of their occurrences. This means data is
continuous. Hence we use Gaussian Naive Bayes here.

-

![My Explanation](https://raw.githubusercontent.com/Mhnd-DS/ML-Models-Journey/main/Naive.png)

## Implementation Part

Simple data to decide wether to play or not

### Creating the dataset

In [1]:
import pandas as pd

In [2]:
outlook = ['Sunny','Sunny','Sunny','Sunny','Sunny' , 'overcast' , 'overcast', 'overcast', 'overcast',
           'Rainy', 'Rainy', 'Rainy', 'Rainy', 'Rainy']

In [3]:
Temp = ['Hot', 'Hot', 'Hot', 'Hot', 'Mild', 'Mild', 'Mild', 'Mild', 'Mild', 'Mild',
        'Cool', 'Cool', 'Cool', 'Cool']

In [4]:
play = ['Yes', 'Yes', 'No', 'No', 'No', 'Yes', 'Yes', 'Yes', 'Yes', 'Yes',
        'Yes', 'Yes', 'No', 'No']

In [5]:
df = pd.DataFrame({'outlook' : outlook , 'Temp' : Temp , 'play' : play})

### Dummies
Before inserting fata in the model we need to convert object to numeric data

In [18]:
df_dummies = pd.get_dummies(df[['outlook','Temp']])

In [23]:
df = pd.concat([df,df_dummies] , axis = 1)

In [28]:
df.drop(['outlook', 'Temp'] , axis = 1 , inplace = True)

In [33]:
df = df[['outlook_Rainy', 'outlook_Sunny', 'outlook_overcast',
         'Temp_Cool', 'Temp_Hot', 'Temp_Mild', 'play']]

In [35]:
X = df.drop(['play'], axis = 1)
y = df['play']

### Bernoulli Naive Bayes

In [10]:
from sklearn.naive_bayes import BernoulliNB
from sklearn.model_selection import train_test_split

In [36]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.05, random_state=0)

In [37]:
BNaive = BernoulliNB()

In [38]:
y_pred = BNaive.fit(X_train, y_train).predict(X_test)

In [43]:
BNaive.predict(X_test)

array(['Yes'], dtype='<U3')

In [42]:
BNaive.predict_proba(X_test)

array([[0.04722148, 0.95277852]])

In [46]:
# SUNNY - HOT
BNaive.predict([[0,1,0,0,1,0]])

array(['No'], dtype='<U3')

In [47]:
BNaive.predict_proba([[0,1,0,0,1,0]])

array([[0.72961048, 0.27038952]])

## NLP Example

### Importaing and simple observation for the dataset

In [49]:
df = pd.read_csv('https://raw.githubusercontent.com/codebasics/py/master/ML/14_naive_bayes/spam.csv')

# Ham = safe
# spam = spam :)

df[:5]

Unnamed: 0,Category,Message
0,ham,"Go until jurong point, crazy.. Available only ..."
1,ham,Ok lar... Joking wif u oni...
2,spam,Free entry in 2 a wkly comp to win FA Cup fina...
3,ham,U dun say so early hor... U c already then say...
4,ham,"Nah I don't think he goes to usf, he lives aro..."


In [51]:
df['Category'].value_counts()

ham     4825
spam     747
Name: Category, dtype: int64

In [55]:
df.groupby('Category').describe()

Unnamed: 0_level_0,Message,Message,Message,Message
Unnamed: 0_level_1,count,unique,top,freq
Category,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
ham,4825,4516,"Sorry, I'll call later",30
spam,747,641,Please call our customer service representativ...,4


In [56]:
df['spam'] = df['Category'].apply(lambda x: 1 if x== 'spam' else 0)

### We split the data to test it

In [65]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(df.Message,df.spam, test_size=0.15, random_state=0) # X = Messages , Y= SPAM OR NOT

#### handing Objects
We have to convert the Messages to numbers and its impossible to use one hot encoder or dummies so, we will use CountVectorizer

It wil traverse all the messages and picks the unique words, if the unique word occurs in the message it will be > 1 else 0 

In [73]:
from sklearn.feature_extraction.text import CountVectorizer
v = CountVectorizer()

In [85]:
X_train_count = v.fit_transform(X_train.values)

X_train_count.toarray()[:2]

array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]])

#### We will use Multinomail because we have freq of data > 1

We will apply Pipeline because it will make the process way much easier and readable

- read the sentences and find the freq words
- Apply the multinomial

In [76]:
from sklearn.naive_bayes import MultinomialNB
model = MultinomialNB(alpha=1.0, class_prior=None, fit_prior=True)
model.fit(X_train_count,y_train)

MultinomialNB()

In [101]:
from sklearn.pipeline import Pipeline
MultiNB = Pipeline([
    ('vectorizer', CountVectorizer()),
    ('nb', MultinomialNB())
])
MultiNB.fit(X_train,y_train)
MultiNB.score(X_test, y_test)

0.9868421052631579

In [102]:
ypred = MultiNB.predict(X_test)

#### Compare predicted with truth data

In [104]:
d = {'Truth' :y_test , 'Predicted' : ypred }
Observe_df = pd.DataFrame(data = d)
print(Observe_df.shape)

(836, 2)


In [108]:
Observe_df[:5]

Unnamed: 0,Truth,Predicted
4456,0,0
690,1,1
944,0,0
3768,0,0
1189,0,0


In [109]:
len(Observe_df[Observe_df['Truth'] != Observe_df['Predicted']])

11

In [110]:
Observe_df.loc[Observe_df['Truth'] != Observe_df['Predicted']]

Unnamed: 0,Truth,Predicted
684,1,0
731,1,0
2575,1,0
1940,1,0
991,0,1
751,1,0
4213,1,0
4298,1,0
4382,0,1
1290,0,1


### TEST the model on real data

In [96]:
emails = [
    'Hey mohan, can we get together to watch footbal game tomorrow?', #NOT SPAM
    'Upto 20% discount on parking, exclusive offer just for you. Dont miss this reward!' #SPAM
]
emails_count = v.transform(emails)
model.predict(emails_count)

array([0, 1])