# Naive Baye's Classifier

### Bayer's Formula
#### $P(A|B)=\frac{P(B|A)\times P(A)}{P(B)}$
#### In General, 
#### $$P(y|x_{1},x_{2},x_{3}...,x_{n})=\frac{P(y)\times\prod\limits_{i=1}{n}P(x_{i}|y)}{p(x_{1},x_{2},x_{3}...,x_{n})}$$

### Complement Bayer's
#### The formula for Complement Naive Bayes is as follows:
#### $$P(class | document) = \frac{( P(class) \times \prod\limits_{i}( 1 - P(word_i | class) ) )}{Z}$$
#### ---->P(class | document) is the probability of the document belonging to a particular class
#### ----> P(class) is the prior probability of the class
#### ---->P(word_i | class) is the probability of the i-th word given the class
#### ---->Z is a normalization constant
#### $$P(word_i | class) = \frac{count(word_i, not class)}{\sum\limits_{j}(count(word_j, not class))}$$

In [30]:
import numpy as np
import pandas as pd
import sklearn
import seaborn as sns
import matplotlib.pyplot as plt

In [2]:
from sklearn.datasets import load_wine

In [3]:
wine=load_wine()
wine.DESCR



In [5]:
wine.target_names

array(['class_0', 'class_1', 'class_2'], dtype='<U7')

In [6]:
wine.feature_names

['alcohol',
 'malic_acid',
 'ash',
 'alcalinity_of_ash',
 'magnesium',
 'total_phenols',
 'flavanoids',
 'nonflavanoid_phenols',
 'proanthocyanins',
 'color_intensity',
 'hue',
 'od280/od315_of_diluted_wines',
 'proline']

In [10]:
wine.data

array([[1.423e+01, 1.710e+00, 2.430e+00, ..., 1.040e+00, 3.920e+00,
        1.065e+03],
       [1.320e+01, 1.780e+00, 2.140e+00, ..., 1.050e+00, 3.400e+00,
        1.050e+03],
       [1.316e+01, 2.360e+00, 2.670e+00, ..., 1.030e+00, 3.170e+00,
        1.185e+03],
       ...,
       [1.327e+01, 4.280e+00, 2.260e+00, ..., 5.900e-01, 1.560e+00,
        8.350e+02],
       [1.317e+01, 2.590e+00, 2.370e+00, ..., 6.000e-01, 1.620e+00,
        8.400e+02],
       [1.413e+01, 4.100e+00, 2.740e+00, ..., 6.100e-01, 1.600e+00,
        5.600e+02]])

In [12]:
wine.target

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2])

In [13]:
X=wine.data
y=wine.target

In [14]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y,test_size=0.25)

In [15]:
from sklearn.naive_bayes import GaussianNB, MultinomialNB
model = GaussianNB()
model.fit(X_train,y_train)

In [16]:
model.score(X_train,y_train)

0.9924812030075187

In [17]:
model1=MultinomialNB()
model1.fit(X_test,y_test)

In [19]:
model1.score(X_train,y_train)

0.7518796992481203

In [24]:
from sklearn.metrics import classification0_report
pred=model.predict(X_test)
pred1=model1.predict(X_test)

In [27]:
print(classification_report(y_test,pred))

              precision    recall  f1-score   support

           0       1.00      0.94      0.97        16
           1       0.94      1.00      0.97        16
           2       1.00      1.00      1.00        13

    accuracy                           0.98        45
   macro avg       0.98      0.98      0.98        45
weighted avg       0.98      0.98      0.98        45



In [28]:
print(classification_report(y_test,pred1))

              precision    recall  f1-score   support

           0       0.93      0.81      0.87        16
           1       0.93      0.81      0.87        16
           2       0.71      0.92      0.80        13

    accuracy                           0.84        45
   macro avg       0.85      0.85      0.84        45
weighted avg       0.86      0.84      0.85        45



### Email Spam Detection 

In [31]:
df=pd.read_csv('email.csv')
df.head()

Unnamed: 0,Category,Message
0,ham,"Go until jurong point, crazy.. Available only ..."
1,ham,Ok lar... Joking wif u oni...
2,spam,Free entry in 2 a wkly comp to win FA Cup fina...
3,ham,U dun say so early hor... U c already then say...
4,ham,"Nah I don't think he goes to usf, he lives aro..."


In [32]:
df.groupby('Category').describe()

Unnamed: 0_level_0,Message,Message,Message,Message
Unnamed: 0_level_1,count,unique,top,freq
Category,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
ham,4825,4516,"Sorry, I'll call later",30
spam,747,641,Please call our customer service representativ...,4


In [33]:
df['spam']=df['Category'].apply(lambda x: 1 if x=='spam' else 0)

In [34]:
df['spam']

0       0
1       0
2       1
3       0
4       0
       ..
5567    1
5568    0
5569    0
5570    0
5571    0
Name: spam, Length: 5572, dtype: int64

In [35]:
X=df['Message']
y=df['spam']

In [36]:
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.25)

In [37]:
from sklearn.feature_extraction.text import CountVectorizer
v=CountVectorizer()
X_train_count=v.fit_transform(X_train.values)
X_train_count.toarray()[:3]

array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]], dtype=int64)

In [38]:
from sklearn.naive_bayes import MultinomialNB
model=MultinomialNB()
model.fit(X_train_count,y_train)

In [44]:
email=['enter your otp for getting cash','click this url to for getting rewards','Good morning dear','select 12500 to make this ringtone','Hello Rohith ,you have won 10,00 rupees','win a dvd player and cash','5 big chances to win cash and get 50% off on your product ','   50 % off on your productr']
email_count=v.transform(email)
model.predict(email_count)

array([1, 0, 0, 1, 1, 1, 1, 1], dtype=int64)