In [14]:
import pandas as pd

**Loading** **data**

In [25]:
df = pd.read_csv("/content/sample_data/dataset.csv")
df.head()

Unnamed: 0,Age,Cabin,Embarked,Fare,Name,Parch,PassengerId,Pclass,Sex,SibSp,Survived,Ticket,Title,Family_Size
0,22.0,,S,7.25,"Braund, Mr. Owen Harris",0,1,3,male,1,0.0,A/5 21171,Mr,1
1,38.0,C85,C,71.2833,"Cumings, Mrs. John Bradley (Florence Briggs Th...",0,2,1,female,1,1.0,PC 17599,Mrs,1
2,26.0,,S,7.925,"Heikkinen, Miss. Laina",0,3,3,female,0,1.0,STON/O2. 3101282,Miss,0
3,35.0,C123,S,53.1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",0,4,1,female,1,1.0,113803,Mrs,1
4,35.0,,S,8.05,"Allen, Mr. William Henry",0,5,3,male,0,0.0,373450,Mr,0


**Drop independent columns**

In [26]:
df.drop(['PassengerId','Name','SibSp','Parch','Ticket','Cabin','Embarked','Title','Family_Size'],axis='columns',inplace=True)
df.head()

Unnamed: 0,Age,Fare,Pclass,Sex,Survived
0,22.0,7.25,3,male,0.0
1,38.0,71.2833,1,female,1.0
2,26.0,7.925,3,female,1.0
3,35.0,53.1,1,female,1.0
4,35.0,8.05,3,male,0.0


In [27]:
inputs = df.drop('Survived',axis='columns')
target = df.Survived

**As we know ML don't accept integer as input to convert into 0 and 1**

In [28]:
dummies = pd.get_dummies(inputs.Sex)
dummies.head(3)

Unnamed: 0,female,male
0,0,1
1,1,0
2,1,0


**Putting the dummies data into original dataset**

In [29]:
inputs = pd.concat([inputs,dummies],axis='columns')
inputs.head(3)

Unnamed: 0,Age,Fare,Pclass,Sex,female,male
0,22.0,7.25,3,male,0,1
1,38.0,71.2833,1,female,1,0
2,26.0,7.925,3,female,1,0


**I am dropping male column as well because of dummy variable trap theory. One column is enough to repressent male vs female.**

In [30]:
inputs.drop(['Sex','male'],axis='columns',inplace=True)
inputs.head(3)

Unnamed: 0,Age,Fare,Pclass,female
0,22.0,7.25,3,0
1,38.0,71.2833,1,1
2,26.0,7.925,3,1


In [31]:
inputs.columns[inputs.isna().any()]


Index([], dtype='object')

**Here, we have got any Null/Nan value so no required to fill missing data.**

**Splitting the dataset in test and training dataset in 80 and 20 ratio:**

In [34]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(inputs,target,test_size=0.2)

**Using Gaussaian Model of naive bayes.**
A Gaussian Naive Bayes algorithm is a special type of NB algorithm. It's specifically used when the features have continuous values. It's also assumed that all the features are following a gaussian distribution i.e, normal distribution.

In [35]:
from sklearn.naive_bayes import GaussianNB
model = GaussianNB()


In [36]:
model.fit(X_train,y_train)


GaussianNB()

In [37]:
model.score(X_test,y_test)

0.776536312849162

In [38]:
X_test[0:10]


Unnamed: 0,Age,Fare,Pclass,female
201,69.55,69.55,3,0
174,30.6958,30.6958,1,0
569,7.8542,7.8542,3,0
246,7.775,7.775,3,1
40,9.475,9.475,3,1
571,51.4792,51.4792,1,1
76,7.8958,7.8958,3,0
550,110.8833,110.8833,1,0
598,7.225,7.225,3,0
281,7.8542,7.8542,3,0


In [39]:
y_test[0:10]


201    0.0
174    0.0
569    1.0
246    0.0
40     0.0
571    1.0
76     0.0
550    1.0
598    0.0
281    0.0
Name: Survived, dtype: float64

In [40]:
model.predict(X_test[0:10])


array([0., 0., 0., 0., 0., 1., 0., 1., 0., 0.])

**We can cleary match the result given by y_test anf X_test which are almost same.**

In [43]:
model.predict_proba(X_test[:10])

array([[0.83175978, 0.16824022],
       [0.87428079, 0.12571921],
       [0.98381766, 0.01618234],
       [0.55872514, 0.44127486],
       [0.56253047, 0.43746953],
       [0.05423312, 0.94576688],
       [0.9838245 , 0.0161755 ],
       [0.00376206, 0.99623794],
       [0.98370828, 0.01629172],
       [0.98381766, 0.01618234]])

**Calculate the score using cross validation.**

In [44]:
from sklearn.model_selection import cross_val_score
cross_val_score(GaussianNB(),X_train, y_train, cv=5)

array([0.73426573, 0.7972028 , 0.74647887, 0.78169014, 0.77464789])

**Hence, we successfully predict the titanic survival.**

**Thank You!**