Naive Bayes Classifier

In [1]:
"""
Naive Bayes is a classification algorithm based on Bayes' theorem. It is called "naive" because it assumes that the features are independent of each other, which may not be true in real-world scenarios. Despite this simplifying assumption, Naive Bayes is a popular choice for many classification problems due to its simplicity and high accuracy.

Here we tries to find conditional probability of target variable given the probabilities of features.

Ex: We use titanic survival dataset here and using naive bayes classifier find out the survival probability of titanic travellers.

Naive Bayes Classifier Algorithm Uses:
    1. Email Spam Detection
    2. Handwritten Character Recognition
    3. Weather Prediction
    4. Face Detection
    5. News Article Categorization
"""

'\nNaive Bayes is a classification algorithm based on Bayes\' theorem. It is called "naive" because it assumes that the features are independent of each other, which may not be true in real-world scenarios. Despite this simplifying assumption, Naive Bayes is a popular choice for many classification problems due to its simplicity and high accuracy.\n\nHere we tries to find conditional probability of target variable given the probabilities of features.\n\nEx: We use titanic survival dataset here and using naive bayes classifier find out the survival probability of titanic travellers.\n\nNaive Bayes Classifier Algorithm Uses:\n    1. Email Spam Detection\n    2. Handwritten Character Recognition\n    3. Weather Prediction\n    4. Face Detection\n    5. News Article Categorization\n'

Titanic Survival Prediction

In [2]:

import pandas as pd 
df = pd.read_csv("titanic.csv")
df.head()

Unnamed: 0,PassengerId,Name,Pclass,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,Survived
0,1,"Braund, Mr. Owen Harris",3,male,22.0,1,0,A/5 21171,7.25,,S,0
1,2,"Cumings, Mrs. John Bradley (Florence Briggs Th...",1,female,38.0,1,0,PC 17599,71.2833,C85,C,1
2,3,"Heikkinen, Miss. Laina",3,female,26.0,0,0,STON/O2. 3101282,7.925,,S,1
3,4,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",1,female,35.0,1,0,113803,53.1,C123,S,1
4,5,"Allen, Mr. William Henry",3,male,35.0,0,0,373450,8.05,,S,0


In [3]:
# Remove unnecessary features
df.drop(['PassengerId', 'Name', 'SibSp', 'Parch', 'Ticket', 'Cabin', 'Embarked'], axis="columns", inplace=True)
df.head()

Unnamed: 0,Pclass,Sex,Age,Fare,Survived
0,3,male,22.0,7.25,0
1,1,female,38.0,71.2833,1
2,3,female,26.0,7.925,1
3,1,female,35.0,53.1,1
4,3,male,35.0,8.05,0


In [4]:
inputs = df.drop('Survived', axis="columns")
target = df.Survived

In [5]:
dummies = pd.get_dummies(inputs.Sex)
dummies.head(3)

Unnamed: 0,female,male
0,0,1
1,1,0
2,1,0


In [6]:
inputs.head()

Unnamed: 0,Pclass,Sex,Age,Fare
0,3,male,22.0,7.25
1,1,female,38.0,71.2833
2,3,female,26.0,7.925
3,1,female,35.0,53.1
4,3,male,35.0,8.05


In [7]:
inputs = pd.concat([inputs, dummies], axis="columns")
inputs.head(3)

Unnamed: 0,Pclass,Sex,Age,Fare,female,male
0,3,male,22.0,7.25,0,1
1,1,female,38.0,71.2833,1,0
2,3,female,26.0,7.925,1,0


In [8]:
inputs.drop('Sex', axis="columns", inplace=True)
inputs.head()

Unnamed: 0,Pclass,Age,Fare,female,male
0,3,22.0,7.25,0,1
1,1,38.0,71.2833,1,0
2,3,26.0,7.925,1,0
3,1,35.0,53.1,1,0
4,3,35.0,8.05,0,1


In [9]:
# Look for NaN / Missing value in any of the columns
inputs.columns[inputs.isna().any()]

Index(['Age'], dtype='object')

In [10]:
inputs.Age[:10]

0    22.0
1    38.0
2    26.0
3    35.0
4    35.0
5     NaN
6    54.0
7     2.0
8    27.0
9    14.0
Name: Age, dtype: float64

In [11]:
inputs.Age = inputs.Age.fillna(inputs.Age.mean())
inputs.head(6)

Unnamed: 0,Pclass,Age,Fare,female,male
0,3,22.0,7.25,0,1
1,1,38.0,71.2833,1,0
2,3,26.0,7.925,1,0
3,1,35.0,53.1,1,0
4,3,35.0,8.05,0,1
5,3,29.699118,8.4583,0,1


In [12]:
inputs.columns[inputs.isna().any()]

Index([], dtype='object')

In [13]:
# Train_Test_Split
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(inputs, target, test_size=0.2)

In [14]:
len(X_train)

712

In [15]:
len(X_test)

179

In [16]:
len(inputs)

891

Types of Naive Bayes model

In [17]:
""" 
There are three types of Naive Bayes model under the scikit-learn library:

1. Gaussian Naive Bayes (bell-curve): 
    Because of the assumption of the normal distribution, Gaussian Naive Bayes is used in cases when all our features are continuous. 
    
    For example, In Iris dataset features are sepal width, petal width, sepal length, petal length. So its features can have different values in dataset as width and length can vary. We can’t represent features in terms of their occurrences. This means data is continuous. Hence we use Gaussian Naive Bayes here.

2. Multinomial Naive Bayes: 
    It is used when we have discrete data (e.g. movie ratings ranging 1 and 5 as each rating will have certain frequency to represent). In text learning, we have the count of each word to predict the class or label.
    
3. Bernoulli Naive Bayes: 
    It assumes that all our features are binary such that they take only two values. Means 0s can represent “word does not occur in the document” and 1s as "word occurs in the document" .

link : https://qr.ae/pspBNa

"""

' \nThere are three types of Naive Bayes model under the scikit-learn library:\n\n1. Gaussian Naive Bayes (bell-curve): \n    Because of the assumption of the normal distribution, Gaussian Naive Bayes is used in cases when all our features are continuous. \n    \n    For example, In Iris dataset features are sepal width, petal width, sepal length, petal length. So its features can have different values in data set as width and length can vary. We can’t represent features in terms of their occurrences. This means data is continuous. Hence we use Gaussian Naive Bayes here.\n\n2. Multinomial Naive Bayes: \n    It is used when we have discrete data (e.g. movie ratings ranging 1 and 5 as each rating will have certain frequency to represent). In text learning, we have the count of each word to predict the class or label.\n    \n3. Bernoulli Naive Bayes: \n    It assumes that all our features are binary such that they take only two values. Means 0s can represent “word does not occur in the do

In [18]:
from sklearn.naive_bayes import GaussianNB
model = GaussianNB()
model.fit(X_train, y_train)

In [19]:
model.score(X_test, y_test)

0.770949720670391

In [20]:
model.predict(X_test[:10])

array([0, 0, 1, 0, 0, 1, 1, 0, 0, 1], dtype=int64)

In [21]:
X_test[:10]

Unnamed: 0,Pclass,Age,Fare,female,male
227,3,20.5,7.25,0,1
292,2,36.0,12.875,0,1
311,1,18.0,262.375,1,0
69,3,26.0,8.6625,0,1
628,3,26.0,7.8958,0,1
617,3,26.0,16.1,1,0
374,3,3.0,21.075,1,0
163,3,17.0,8.6625,0,1
131,3,20.0,7.05,0,1
779,1,43.0,211.3375,1,0


In [22]:
y_test[:10]

227    0
292    0
311    1
69     0
628    0
617    0
374    0
163    0
131    0
779    1
Name: Survived, dtype: int64

In [23]:
# Probabililty of each class like : Weather person survive or not?
model.predict_proba(X_test[:10])

array([[9.86933208e-01, 1.30667919e-02],
       [9.79281399e-01, 2.07186011e-02],
       [6.07558856e-13, 1.00000000e+00],
       [9.88631146e-01, 1.13688535e-02],
       [9.88598272e-01, 1.14017282e-02],
       [3.56057778e-02, 9.64394222e-01],
       [1.49796618e-02, 9.85020338e-01],
       [9.85521095e-01, 1.44789054e-02],
       [9.86732102e-01, 1.32678983e-02],
       [3.46346451e-09, 9.99999997e-01]])