### Naive Bayes (NB) is called as 'naive' because it makes the assumption that features of a measurement are independent of each other.
### Naive Bayes uses a similar method to predict the probability of different class based on various attributes. This algorithm is mostly used in text classification and with problems having multiple classes.
### Naive Bayes classifiers are a collection of classification algorithms based on Bayes' Theorem. It is not a single algorithm but a family of algorithms where all of them share a common principle, i.e. every pair of features being classified is independent of each other.
### The fundamental Naive Bayes assumption is that each feature makes an independent and equal contribution to the outcome.


### Bayes’ Theorem finds the probability of an event occurring given the probability of another event that has already occurred. Bayes’ theorem is stated mathematically as the following equation:

 P(A|B) ={P(B|A) P(A)}/{P(B)}
 ### The method that we discussed above is applicable for discrete data.

### In case of continuous data, we need to make some assumptions regarding the distribution of values of each feature. The different naive Bayes classifiers differ mainly by the assumptions they make regarding the distribution of P(xi | y).

###  Other popular Naive Bayes classifiers are:
### In Gaussian Naive Bayes, continuous values associated with each feature are assumed to be distributed according to a Gaussian distribution. A Gaussian distribution is also called Normal distribution. 

### Multinomial Naive Bayes: Feature vectors represent the frequencies with which certain events have been generated by a multinomial distribution. This is the event model typically used for document classification.
### Bernoulli Naive Bayes: In the multivariate Bernoulli event model, features are independent booleans (binary variables) describing inputs. Like the multinomial model, this model is popular for document classification tasks, where binary term occurrence(i.e. a word occurs in a document or not) features are used rather than term frequencies(i.e. frequency of a word in the document).

### Naive Bayes implementation

In [1]:
import numpy as np
import pandas as pd
from sklearn.naive_bayes import GaussianNB, BernoulliNB, MultinomialNB
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.feature_selection import SelectPercentile

In [2]:
data=pd.read_csv('c:/Users/naresh/Documents/GitHub/NB/spam.csv',encoding='latin-1')
data.head()

Unnamed: 0,v1,v2,Unnamed: 2,Unnamed: 3,Unnamed: 4
0,ham,"Go until jurong point, crazy.. Available only ...",,,
1,ham,Ok lar... Joking wif u oni...,,,
2,spam,Free entry in 2 a wkly comp to win FA Cup fina...,,,
3,ham,U dun say so early hor... U c already then say...,,,
4,ham,"Nah I don't think he goes to usf, he lives aro...",,,


In [3]:
data.head()

Unnamed: 0,v1,v2,Unnamed: 2,Unnamed: 3,Unnamed: 4
0,ham,"Go until jurong point, crazy.. Available only ...",,,
1,ham,Ok lar... Joking wif u oni...,,,
2,spam,Free entry in 2 a wkly comp to win FA Cup fina...,,,
3,ham,U dun say so early hor... U c already then say...,,,
4,ham,"Nah I don't think he goes to usf, he lives aro...",,,


In [4]:
data.drop(['Unnamed: 2','Unnamed: 3','Unnamed: 4'], axis=1,inplace=True)

In [5]:
data1 = data['v1']
data.shape

(5572, 2)

In [21]:
X_train, X_test,y_train, y_test = train_test_split(data.v2,data.v1,test_size=0.3)

In [7]:
y_test.shape

(1672,)

In [8]:
X_train.head()

3873       No. Did you multimedia message them or e-mail?
4422    alright. Thanks for the advice. Enjoy your nig...
1795    I hope your alright babe? I worry that you mig...
2903    HI DARLIN I HOPE YOU HAD A NICE NIGHT I WISH I...
2738    I sent you the prices and do you mean the  &lt...
Name: v2, dtype: object

In [9]:
vectorizer = TfidfVectorizer()

In [10]:
X_train_transformed = vectorizer.fit_transform(X_train)
X_test_transformed = vectorizer.transform(X_test)

In [11]:
features_names = vectorizer.get_feature_names()

In [12]:
len(features_names)

7199

In [13]:
selector = SelectPercentile(percentile=5)
selector.fit(X_train_transformed, y_train)
X_train_transformed = selector.transform(X_train_transformed).toarray()
X_test_transformed = selector.transform(X_test_transformed).toarray()

In [14]:
X_train_transformed

array([[0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.        , 0.        , 0.        , ..., 0.16606813, 0.        ,
        0.        ],
       [0.        , 0.        , 0.        , ..., 0.11566027, 0.        ,
        0.        ],
       ...,
       [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ]])

### Applying Naive Bayes¶

In [15]:
m1 = GaussianNB()

m1.fit(X_train_transformed,y_train)
y_predict = m1.predict(X_test_transformed)
y_predict # Predicted Value
y_test # Actual Value
accuracy_score(y_test,y_predict)
#y_predict
#accuracy_score(y_test,y_predict)
np.mean(y_test == y_predict)
confusion_matrix(y_test,y_predict)

array([[1412,   23],
       [  25,  212]], dtype=int64)

In [16]:
y_test.shape

(1672,)

In [17]:
np.mean(y_test == y_predict)

0.9712918660287081

In [18]:
confusion_matrix(y_test,y_predict)

array([[1412,   23],
       [  25,  212]], dtype=int64)

In [19]:
model_bernb = BernoulliNB()

model_bernb.fit(X_train_transformed,y_train)
y_predict = model_bernb.predict(X_test_transformed)


accuracy_score(y_test,y_predict)

0.9814593301435407

In [20]:
model_mulnb = MultinomialNB()

model_mulnb.fit(X_train_transformed,y_train)
y_predict = model_mulnb.predict(X_test_transformed)


accuracy_score(y_test,y_predict)

0.9258373205741627