<center>
    <h1><b>Naive Bayes - Titanic Survival Predicton</b></h1>
    ---------------------------
</center>

This method is named after Thomas Bayes that gave the probability equation of finding a conditional probability. It can be used to predict
 - Email spam detection
 - Character recognisation
 - Weather prediction
 - Face detection
 - Categorization of news article

There are three types of Naive Bayes 
 - **Bernoulli Naive Bayes:** It ssumes that all our features are binary such that they take only two values. Means **0s** can represent 'word does not occur in the document' and **1s** as 'word occurs in the document'
 - **Multinomial Naive Bayes:** It is used when we have **discrete data** (e.g. movie ratings ranging 1 and 5 as each rating will have certain **frequency** to represent). In text learning we have the count of each word to predict the class or label.
 -  **Gaussian Naive Bayes:** Because of the assumption of the **normal distribution**, Gaussian Naive Bayes is used in cases when all our features are **continuos**. For example in **Iris dataset** features are sepal width, petal width, sepal length, petal legth. We can't represent features in terms of their occurrences. This means data is continuos. Hence we use Gaussian Naive Bayes here.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sb
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB

%matplotlib inline

In [2]:
df = pd.read_csv('titanic.csv', usecols = ['Pclass', 'Sex', 'Age', 'Fare', 'Survived'])

In [3]:
df.head()

Unnamed: 0,Survived,Pclass,Sex,Age,Fare
0,0,3,male,34.5,7.8292
1,1,3,female,47.0,7.0
2,0,2,male,62.0,9.6875
3,0,3,male,27.0,8.6625
4,1,3,female,22.0,12.2875


In [4]:
target = df.Survived
inputs = df.drop(columns='Survived', axis = 1)

In [5]:
# covertng the Sex column to dummies
dummies = pd.get_dummies(inputs['Sex']).astype(int)
dummies.head(3)

Unnamed: 0,female,male
0,0,1
1,1,0
2,0,1


In [6]:
inputs = pd.concat([inputs, dummies], axis = 'columns')
inputs.head(3)

Unnamed: 0,Pclass,Sex,Age,Fare,female,male
0,3,male,34.5,7.8292,0,1
1,3,female,47.0,7.0,1,0
2,2,male,62.0,9.6875,0,1


In [7]:
inputs = inputs.drop(columns ='Sex')
inputs.head()

Unnamed: 0,Pclass,Age,Fare,female,male
0,3,34.5,7.8292,0,1
1,3,47.0,7.0,1,0
2,2,62.0,9.6875,0,1
3,3,27.0,8.6625,0,1
4,3,22.0,12.2875,1,0


In [8]:
# checking for Nan values
inputs.columns[inputs.isna().any()]

Index(['Age', 'Fare'], dtype='object')

In [9]:
inputs.isnull().sum()

Pclass     0
Age       86
Fare       1
female     0
male       0
dtype: int64

In [10]:
age_median = inputs['Age'].median()
age_median

27.0

In [11]:
fare_median = inputs['Fare'].median()
fare_median

14.4542

In [12]:
# Replacing Nan with median in Age column
inputs['Age'] = inputs['Age'].fillna(age_median)

In [13]:
# replcing Nan with median in Fare column
inputs['Fare'] = inputs['Fare'].fillna(fare_median)

In [14]:
inputs.isnull().sum()

Pclass    0
Age       0
Fare      0
female    0
male      0
dtype: int64

#### Model Building

In [15]:
# Model training
X_train, X_test, y_train, y_test = train_test_split(inputs, target, test_size = 0.2, random_state = 0)

In [16]:
X_test.shape

(84, 5)

In [17]:
# Creating model object
model = GaussianNB()

In [18]:
# training the model
model.fit(X_train, y_train)

In [19]:
model.score(X_test, y_test)

1.0

In [20]:
X_test[:10]

Unnamed: 0,Pclass,Age,Fare,female,male
360,3,14.5,69.55,0,1
170,3,27.0,7.55,0,1
224,1,53.0,27.4458,1,0
358,3,27.0,7.75,0,1
309,3,45.0,14.1083,1,0
308,1,55.0,93.5,0,1
150,1,23.0,83.1583,1,0
10,3,27.0,7.8958,0,1
21,3,9.0,3.1708,0,1
261,3,21.0,7.8542,0,1


In [21]:
 y_test[:10]

360    0
170    0
224    1
358    0
309    1
308    0
150    1
10     0
21     0
261    0
Name: Survived, dtype: int64

In [22]:
model.predict(X_test[:10])

array([0, 0, 1, 0, 1, 0, 1, 0, 0, 0], dtype=int64)

In [23]:
model.predict_proba(X_test[:10])

array([[1., 0.],
       [1., 0.],
       [0., 1.],
       [1., 0.],
       [0., 1.],
       [1., 0.],
       [0., 1.],
       [1., 0.],
       [1., 0.],
       [1., 0.]])