**It is one of the most popular datasets used for understanding machine learning basics. It contains information of all the passengers aboard the RMS Titanic, which unfortunately was shipwrecked. This dataset can be used to predict whether a given passenger survived or not.**

In [None]:
import numpy as np 
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns 
import missingno as msno
from sklearn.preprocessing import LabelEncoder
e=LabelEncoder()
from pandas_profiling import ProfileReport

from IPython.display import Image

In [None]:
titanic = pd.read_csv('/kaggle/input/titanic/train_and_test2.csv')
titanic

In [None]:
titanic.columns

In [None]:
titanic.drop(['sibsp', 'zero', 'zero.1',
       'zero.2', 'zero.3', 'zero.4', 'zero.5', 'zero.6', 'zero.7',
       'zero.8', 'zero.9', 'zero.10', 'zero.11', 'zero.12', 'zero.13',
       'zero.14', 'zero.15', 'zero.16', 'zero.17',
       'zero.18'], axis=1, inplace=True)
titanic

In [None]:
titanic.rename(columns = {'2urvived':'Survived'}, inplace=True)
titanic

# **Data Acquisition**

# Pandas Profiling for EDA

In [None]:
profile = ProfileReport(titanic, title='Titanic Dataset Report', html={'style':{'full_width':False}})

In [None]:
profile.to_notebook_iframe()

In [None]:
profile.to_widgets()

# **Missing Values**

In [None]:
titanic.info()

**Two Na values in embarked**

In [None]:
titanic.isna().sum()

In [None]:
titanic.Embarked.fillna(method='ffill', inplace=True)

In [None]:
titanic

**Check the null values.**

**No Nan Values**

In [None]:
titanic.isna().sum()

# **Visualization**

In [None]:
sns.catplot(x ="Sex", hue ="Survived", kind ="count", data = titanic, palette='gist_rainbow_r')

**it can be approximated that the survival rate of men is around 20% and that of women is around 75%. Therefore, Sex is must for determining the Survived rate.**

In [None]:
group = titanic.groupby(['Pclass', 'Survived'])
pclass_survived = group.size().unstack()
  
sns.heatmap(pclass_survived, annot = True, fmt ="d")

Above corelation shows, Class 1 passengers have a higher survival chance compared to classes 2 and 3. 

In [None]:
sns.catplot(x ='Embarked', hue ='Survived', kind ='count', col ='Pclass', data = titanic,palette='gist_rainbow_r' )

Majority of the passengers boarded from S. So, the missing values can be filled with S.

Majority of class 3 passengers boarded from Q.

S looks lucky for class 1 and 2 passengers compared to class 3.

**Conclusion :**


    
* The columns that can be dropped are: PassengerId, Name, Ticket, Cabin: They are strings, cannot be categorized and don’t contribute much to the outcome.
* Age, Fare: Instead, the respective range columns are retained.
     
* The titanic data can be analyzed using many more graph techniques and also more column correlations, than, as described in this article.
 
* Once the EDA is completed, the resultant dataset can be used for predictions.

# **Model Predictions**

In [None]:
target_e=titanic['Survived']

In [None]:
titanic.columns

In [None]:
titanic['Sex']=LabelEncoder().fit_transform(titanic.Sex)
titanic.drop(['Passengerid', 'Survived', 'Pclass','Parch', 'Embarked'],axis=1,inplace=True)
titanic

In [None]:
from sklearn import tree
model = tree.DecisionTreeClassifier(max_depth=20,max_leaf_nodes=20,random_state=10)

In [None]:
model.fit(titanic,target_e)

In [None]:
model.score(titanic,target_e)

# **Prediction**

In [None]:
model.predict([[0,22.5,23.45]])

In [None]:
model.predict([[0,15,11]])

In [None]:
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(titanic,target_e,test_size=0.2)

In [None]:
!pip install graphviz

In [None]:
from sklearn import tree 
decison_tree = tree.export_graphviz(model,out_file='tree.dot',feature_names=x_train.columns,max_depth=2,filled=True)

**Decision Tree Plot**

In [None]:
plt.figure(figsize=(60,60))
tree.plot_tree(model)