## Naive Bayes Classifier: Algorithm

**P(A/B)**<br>
Conditional Probability, 0 .. 1<br>
Probability of event A occurring knowing that event B has already occurred.

**Bayes Theorem:**<br>
P(A/B) = [P(B/A) * P(A)] / P(B)

**Naive Bayes:**<br>
Make a naive assumption for simplicity that features such as male/class/age/etc. are independent of each other.

## Import dataset. Drop irrelevant columns. Split into inputs and target dataframes. Convert 'sex' column to use dummies. Check for 'NA' values and fill using an average.

In [1]:
import pandas as pd
df = pd.read_csv("titanic.csv")
df.drop(["PassengerId", "Name", "SibSp", "Parch", "Ticket", "Cabin", "Embarked"], axis="columns", inplace=True)
df.head()

Unnamed: 0,Pclass,Sex,Age,Fare,Survived
0,3,male,22.0,7.25,0
1,1,female,38.0,71.2833,1
2,3,female,26.0,7.925,1
3,1,female,35.0,53.1,1
4,3,male,35.0,8.05,0


In [7]:
target = df[["Survived"]]
inputs = df.drop("Survived", axis="columns")

In [9]:
inputs = pd.concat([inputs, pd.get_dummies(inputs.Sex)], axis="columns")

In [11]:
inputs.drop("Sex", axis="columns", inplace=True)
inputs.head()

Unnamed: 0,Pclass,Age,Fare,female,male
0,3,22.0,7.25,0,1
1,1,38.0,71.2833,1,0
2,3,26.0,7.925,1,0
3,1,35.0,53.1,1,0
4,3,35.0,8.05,0,1


In [12]:
inputs.columns[inputs.isna().any()]

Index(['Age'], dtype='object')

In [13]:
inputs.Age = inputs.Age.fillna(inputs.Age.mean())

## Model using GaussianNB. Gaussian is used for normal distributions.

In [14]:
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(inputs, target, test_size=0.2)

In [16]:
from sklearn.naive_bayes import GaussianNB
import warnings
warnings.filterwarnings("ignore")

model = GaussianNB()

model.fit(x_train, y_train)

GaussianNB()

In [17]:
model.score(x_test, y_test)

0.8379888268156425

In [19]:
model.predict([[1,20,7.0,1,0]])

array([1], dtype=int64)

In [20]:
model.predict_proba([[1,20,7.0,1,0]])

array([[0.01026951, 0.98973049]])