## 

<h1 style="color:#00C1D4; ">Let's set sail to our journey through this data.</h1>

![](https://faithmag.com/sites/default/files/styles/article_full/public/2018-09/titanic2.jpg?h=6521bd5e&itok=H8td6QVv)

<h2 style="color:#555273; ">Titanic:</h2>
<ul>
    <li> <a href= "#Importing-Libraries"><h3>Import Libraries</h3></a></li>
    <li><a href= "#Missing-Values"><h3>Missing Value</h3></a></li>
    <li><a href = "#Imputing-Data"><h3>Impute data</h3></a></li>
    <li><a href = "#Data-Visualization"><h3>Data Visualisation</h3></a></li>
    <li><a href = "#Feature-Engineering"><h3>Feature Engineer</h3></a></li>
    <li><a href = "#Feature-Selection"><h3>Features Selection</h3></a></li>
    <li><a href = "#Model"><h3>Models</h3></a></li>
    <li><a href = "#Conclusion"><h3> In Conclusion</h3></a></li>
</ul>



<h2 style="color:#555273; ">Features:</h2>
<ol>
    <li><h4>Pclass -> Passenger class.</h4></li>
    <li><h4>SibSp -> Number of siblings on board.</h4></li>
    <li><h4>Parch -> Number of Parent or child on board.</h4></li>
    <li><h4>Cabin -> Cabin where the seat of the passenger was located.</h4></li>
    <li><h4>Embarked -> Boarding Location.</h4></li>
    <li><h4>Fare -> Amount paid for the ticket</h4></li>
    <li><h4>Ticket -> Ticket number.</h4></li>
    <li><h4>Name ,Sex, Age are self-explanatory</h4></li>
 </ol>

## Questions to wonder
### 1. Does being young increase survivalibility?🧒🧒
### 2. Being in different passenger class changes the chances of surviving?
### 3. Being alone is better or being with family helps ?👪👪
### 4. Paying more for the ticket affect the survival?🤑🤑
### 5. Being a female increases survivability 🤯
### 6. Boarding location also matters in the game of survival?

# Importing Libraries

In [None]:
import numpy as np
import pandas as pd
import matplotlib as mt
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
# plt.style.use('seaborn-notebook')
sns.set_style("darkgrid")
import re
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import OneHotEncoder, LabelEncoder, StandardScaler
from sklearn.metrics import roc_curve, auc
from sklearn.model_selection import StratifiedKFold
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score,classification_report

from sklearn.preprocessing import OneHotEncoder, LabelEncoder, StandardScaler

from sklearn.ensemble import RandomForestClassifier,AdaBoostClassifier,GradientBoostingClassifier
from xgboost import XGBClassifier
from lightgbm import LGBMClassifier
import warnings 
warnings.filterwarnings('ignore')


In [None]:
train_data = pd.read_csv("../input/titanic/train.csv")
test_data = pd.read_csv("../input/titanic/test.csv")

# gender = pd.read_csv("../input/titanic/gender_submission.csv")


In [None]:
display(train_data.head())

In [None]:
test_data.head()

In [None]:
train_data.info()

# Missing Values

In [None]:
train_data.isnull().sum()

In [None]:
test_data.isnull().sum()

Since there are missing values in both test and train dataset , we will combine both dataset into one to deal with the missing values in the both without tampering both dataset. 

# Imputing Data 

### Age

In [None]:
corr = train_data.corr()
corr['Age']

### In case of Age , 
* We want to replace the values with median of the value so the outliers may not play a major role.
* Also we can use Pclass and Sex features to get stratified medians according to these features.

In [None]:
age_by_pclass_sex = train_data.groupby(['Sex', 'Pclass']).median()['Age']

for pclass in range(1, 4):
    for sex in ['female', 'male']:
        print('Median age of Pclass {} {}s: {}'.format(pclass, sex, age_by_pclass_sex[sex][pclass]))
print('Median age of all passengers: {}'.format(train_data['Age'].median()))

# Filling the missing values in Age with the medians of Sex and Pclass groups
train_data['Age'] = train_data.groupby(['Sex', 'Pclass'])['Age'].apply(lambda x: x.fillna(x.median()))

In [None]:
age_by_pclass = test_data.groupby(['Pclass','Sex']).median()['Age']

for i in range(1,4):
    for j in ['female','male']:
        print(f"Median of Passenger class {i} {j} : {age_by_pclass[i][j]}")

test_data['Age'] = test_data.groupby(['Pclass','Sex'])['Age'].apply(lambda x: x.fillna(x.median()))

### Embarked

#### In case of Embarked , we will use mode of the feature to fill the missing values. As this a categorical data and no other information is given to us using mode seems like a plausible solution.

In [None]:
train_data['Embarked'] = train_data['Embarked'].fillna(np.array(train_data.Embarked.mode())[0])

### Fare

In [None]:
corr['Fare']

#### In case of Fare , we use the same idea that Pclass which has highest (negative) correlation with the fare which implies that as Pclass decreases the Fare increases 

In [None]:
test_data['Fare'] = test_data.groupby(['Pclass'])['Fare'].apply(lambda x: x.fillna(x.median()))

### Cabin

#### In case of Cabin , we will fill the missing values with U for unidentified 

In [None]:
# df_new['HasCabin'] = df_complete["Cabin"].apply(lambda x: 0 if type(x) == float else 1)
# Replace missing values with 'U' for Cabin
train_data['Cabin'] = train_data['Cabin'].fillna('U')
test_data['Cabin'] = test_data['Cabin'].fillna('U')
# Extract first letter
train_data['Cabin'] = train_data['Cabin'].map(lambda x: re.compile("([a-zA-Z]+)").search(x).group())
test_data['Cabin'] = test_data['Cabin'].map(lambda x: re.compile("([a-zA-Z]+)").search(x).group())
cabin_category = {'A':9, 'B':8, 'C':7, 'D':6, 'E':5, 'F':4, 'G':3, 'T':2, 'U':1}
# Mapping 'Cabin' to group
train_data['Cabin'] = train_data['Cabin'].map(cabin_category)
test_data['Cabin'] = test_data['Cabin'].map(cabin_category)

In [None]:
df_train = train_data
df_test = test_data

# Data Visualization 

In [None]:
# color palette for visualizations
colors = ['#EEEEEE','#F8485E','#00C1D4','#512D6D','black']
palette = sns.color_palette( palette = colors)

sns.palplot(palette,size=3)
plt.text(-0.75,-0.75,'Color Palette for this Visualization', {'fontname':'serif', 'size':25, 'weight':'bold'})
plt.text(-0.75,-0.64,'Mostly same colors will be used for throughout this notebook.', {'fontname':'serif', 'size':18, 'weight':'normal'}, alpha = 0.8)
plt.show()

In [None]:

x = pd.DataFrame( df_train.groupby(['Survived'])['Survived'].count())

# plot
fig, ax = plt.subplots(figsize = (7,6), dpi=70 )
ax.barh([0], x.Survived[0], height = 0.7, color = colors[1])
plt.text(-210,-0.08, 'Not Survived',{'fontname': 'Serif','weight':'bold','Size': '16','style':'normal', 'color':colors[1]})
plt.text(590,-0.08, '62%',{'fontname':'Serif','weight':'bold' ,'size':'16','color': colors[1]})
ax.barh([1], x.Survived[1], height = 0.7, color = colors[3])
plt.text(-210,1, 'Survived', {'fontname': 'Serif','weight':'bold','Size': '16','style':'normal', 'color': colors[3]})
plt.text(390,1, '38%',{'fontname':'Serif', 'weight':'bold','size':'16','color':colors[3]})

fig.patch.set_facecolor('white')
ax.set_facecolor('white')

plt.text(-150,1.77, 'Percentage of People surviving' ,{'fontname': 'Serif', 'Size': '25','weight':'bold', 'color':'black'})
plt.text(450,1.55, 'Not Survived ', {'fontname': 'Serif','weight':'bold','Size': '14','weight':'bold','style':'normal', 'color':colors[1]})
plt.text(610,1.55, '|', {'color':'black' , 'size':'16', 'weight': 'bold'})
plt.text(630,1.55, 'Survived', {'fontname': 'Serif','weight':'bold', 'Size': '14','style':'normal', 'weight':'bold','color':colors[3]})

ax.axes.get_xaxis().set_visible(False)
ax.axes.get_yaxis().set_visible(False)
ax.spines['bottom'].set_visible(False)
ax.spines['left'].set_visible(True)
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)

### Age

In [None]:
df_train.isnull().sum()

In [None]:
data_vis = df_train.drop(['PassengerId'],axis=1)
died = data_vis[data_vis.Survived == 0]
survived = data_vis[data_vis.Survived == 1]

In [None]:
fig = plt.figure(figsize = (24,10))

spec = fig.add_gridspec(10,24)

ax1 = fig.add_subplot(spec[1:4,:8])
ax2 = fig.add_subplot(spec[6:9,:8 ])
ax3 = fig.add_subplot(spec[1:10,13:])

# axes list
axes = [ ax1,ax2, ax3]

# setting of axes; visibility of axes and spines turn off
for ax in axes:
    ax.axes.get_yaxis().set_visible(False)
    ax.set_facecolor(colors[0])
    
    for loc in ['left', 'right', 'top', 'bottom']:
        ax.spines[loc].set_visible(False)

fig.patch.set_facecolor(colors[0])
        
ax3.axes.get_xaxis().set_visible(True)
ax3.axes.get_yaxis().set_visible(True)


# kdeplot for Age feature

sns.kdeplot(data_vis["Age"], shade = True,ax = ax1,color = colors[3],alpha =1,legend=False)
ax1.set_xlabel('Age of a person', fontdict = {'fontname':'Serif', 'color': 'black', 'size': 16,'weight':'bold' })
ax1.text(-17,0.075,'Overall Age Distribution - How skewed is it?', {'fontname':'Serif', 'color': 'black','weight':'bold','size':24}, alpha = 0.9)
ax1.text(-17,0.055, 'Based on Age we have data from infants to elderly people.\nYoung Adult population is the median group.', 
        {'fontname':'Serif', 'size':'16','color': 'black'})

# # distribution of Age with respect to Survived feature

sns.kdeplot(died.Age,ax = ax2, shade = True,  alpha = 1, color = colors[3],legend=False)
sns.kdeplot(survived.Age,ax = ax2, shade = True,  alpha = 0.8, color = colors[1],legend=False)
ax2.set_xlabel('Age of a person', fontdict = {'fontname':'Serif', 'color': 'black', 'weight':'bold','size': 16})

ax2.text(-17,0.0525,'Who is more safe - Young or Old?', {'fontname':'Serif', 'weight':'bold','color': 'black', 'size':24}, alpha= 0.9)
ax2.text(80,0.043, 'Not Survived', {'fontname': 'Serif','weight':'bold','Size': '16','weight':'bold','style':'normal', 'color':colors[3]})
ax2.text(112,0.043, '|', {'color':'black' , 'size':'16', 'weight': 'bold'})
ax2.text(115,0.043, 'Survived', {'fontname': 'Serif','weight':'bold', 'Size': '16','style':'normal', 'weight':'bold','color':colors[1]})

# distplot
sns.distplot(died['Age'], label='Not Survived', hist=True, color = colors[3], ax=ax3)
sns.distplot(survived['Age'], label='Survived', hist=True, color = colors[1], ax=ax3)
ax3.text(-10,0.08, 'Data is not skewed and young adults have highest \nchacnes of surviving ', {'fontname': 'Serif','weight':'bold','Size': '25','weight':'bold','style':'normal', 'color': 'black'})
ax3.text(80,0.07, 'Not Survived', {'fontname': 'Serif','weight':'bold','Size': '16','weight':'bold','style':'normal', 'color':colors[3]})
ax3.text(105,0.07, '|', {'color':'black' , 'size':'16', 'weight': 'bold'})
ax3.text(110,0.07, 'Survived', {'fontname': 'Serif','weight':'bold', 'Size': '16','style':'normal', 'weight':'bold','color':colors[1]})

fig.text(0.5,1,'Survival of the Youngest ?? True or False?',{'fontname':'Serif', 'weight':'bold','color': 'black', 'size':35})
fig.show()


### Fare

In [None]:
fig = plt.figure(figsize = (24,10))

spec = fig.add_gridspec(10,24)

ax1 = fig.add_subplot(spec[1:4,:8])
ax2 = fig.add_subplot(spec[6:9,:8 ])
ax3 = fig.add_subplot(spec[1:10,13:])

# axes list
axes = [ ax1,ax2, ax3]

# setting of axes; visibility of axes and spines turn off
for ax in axes:
    ax.axes.get_yaxis().set_visible(False)
    ax.set_facecolor(colors[0])
    
    for loc in ['left', 'right', 'top', 'bottom']:
        ax.spines[loc].set_visible(False)

fig.patch.set_facecolor(colors[0])
        
ax3.axes.get_xaxis().set_visible(True)
ax3.axes.get_yaxis().set_visible(True)


# kdeplot for Age feature

sns.kdeplot(data_vis['Fare'] , shade = True,ax = ax1,color = colors[3],alpha =1,legend=False)
ax1.set_xlabel('Fare', fontdict = {'fontname':'Serif', 'color': 'black', 'size': 16,'weight':'bold' })
ax1.text(-17,0.045,'Overall Fare Distribution - How skewed is it?', {'fontname':'Serif', 'color': 'black','weight':'bold','size':24}, alpha = 0.9)

# distribution of Age with respect to Survived feature

sns.kdeplot(died['Fare'],ax = ax2, shade = True,  alpha = 1, color = colors[3],legend=False )
sns.kdeplot(survived['Fare'],ax = ax2, shade = True,  alpha = 0.8, color = colors[1],legend=False)
ax2.set_xlabel('Fare of a Person', fontdict = {'fontname':'Serif', 'color': 'black', 'weight':'bold','size': 16})

ax2.text(-17,0.0555,'High Fare or Low Fare?', {'fontname':'Serif', 'weight':'bold','color': 'black', 'size':24}, alpha= 0.9)
ax2.text(150,0.033, 'Not Survived', {'fontname': 'Serif','weight':'bold','Size': '16','weight':'bold','style':'normal', 'color':colors[3]})
ax2.text(340,0.033, '|', {'color':'black' , 'size':'16', 'weight': 'bold'})
ax2.text(370,0.033, 'Survived', {'fontname': 'Serif','weight':'bold', 'Size': '16','style':'normal', 'weight':'bold','color':colors[1]})

# distplot
sns.distplot(died['Fare'], label='Not Survived', hist=True ,color = colors[3], ax=ax3)
sns.distplot(survived['Fare'], label='Survived', hist=True,color = colors[1], ax=ax3)
ax3.text(-10,0.099, 'Highly skewed left ', {'fontname': 'Serif','weight':'bold','Size': '25','weight':'bold','style':'normal', 'color': 'black'})
ax3.text(190,0.088, 'Not Survived', {'fontname': 'Serif','weight':'bold','Size': '16','weight':'bold','style':'normal', 'color':colors[3]})
ax3.text(330,0.088, '|', {'color':'black' , 'size':'16', 'weight': 'bold'})
ax3.text(350,0.088, 'Survived', {'fontname': 'Serif','weight':'bold', 'Size': '16','style':'normal', 'weight':'bold','color':colors[1]})

fig.text(0.13,1,'Suvival of the Richest',{'fontname':'Serif', 'weight':'bold','color': 'black', 'size':35})
fig.show()

### Categorical features

In [None]:
fig,axes = plt.subplots(5,2,figsize=(24,40))
died.Embarked.value_counts().plot(kind='pie',colors = colors ,ax=axes[0][0], fontsize=10, autopct='%1.0f%%',title="Pie Chart of People Not Surviving based on Embarking place")
survived.Embarked.value_counts().plot(kind='pie',ax=axes[0][1],colors = colors, fontsize=10, autopct='%1.0f%%',title="Pie Chart of People Surviving based on Embarking place")
died.Pclass.value_counts().plot(kind='pie',ax=axes[1][0],colors = colors, fontsize=10, autopct='%1.0f%%',title="Pie Chart of People Not Surviving based on Passenger Class")
survived.Pclass.value_counts().plot(kind='pie',ax=axes[1][1], colors= colors , fontsize=10, autopct='%1.0f%%',title="Pie Chart of People Surviving based on Passenger Class")
died.Parch.value_counts().plot(kind='pie',ax=axes[2][0], colors = colors ,fontsize=10, autopct='%1.0f%%',title="Pie Chart of People Not Surviving based on How many ParentChild boarded")
survived.Parch.value_counts().plot(kind='pie',ax=axes[2][1], colors = colors ,fontsize=10, autopct='%1.0f%%',title="Pie Chart of People Surviving based on How many ParentChild boarded")
died.Sex.value_counts().plot(kind='pie',ax=axes[3][0], colors=colors,fontsize=10, autopct='%1.0f%%',title="Pie Chart of People Not Surviving based on Gender")
survived.Sex.value_counts().plot(kind='pie',ax=axes[3][1], colors=colors,fontsize=10, autopct='%1.0f%%',title="Pie Chart of People Surviving based on Gender")
died.SibSp.value_counts().plot(kind='pie',ax=axes[4][0], colors=colors,fontsize=10, autopct='%1.0f%%',title="Pie Chart of People Not Surviving based on How many Siblings boarded")
survived.SibSp.value_counts().plot(kind='pie',ax=axes[4][1], colors=colors,fontsize=10, autopct='%1.0f%%',title="Pie Chart of People Surviving based on How many Siblings boarded")
fig.show()

In [None]:
fig = plt.figure(figsize=(16,16))
sns.heatmap(df_train.drop(['PassengerId'], axis=1).corr(), annot=True,square=True,cmap = mt.colors.LinearSegmentedColormap.from_list("",colors) , annot_kws={'size': 14})
plt.title("Correlation")

# Feature Engineering

In [None]:
df_comp = pd.concat([df_train,df_test])

In [None]:
def get_title(name):
    title_search = re.search(' ([A-Za-z]+)\.', name)
    # If the title exists, extract and return it.
    if title_search:
        return title_search.group(1)
    return ""
df_comp['Title'] = df_comp['Name'].apply(get_title)
df_comp['Title'] = df_comp['Title'].replace(['Lady', 'Countess','Capt', 'Col','Don', 'Dr', 'Major', 'Rev', 'Sir', 'Jonkheer', 'Dona'], 'Rare')
df_comp['Title'] = df_comp['Title'].replace('Mlle', 'Miss')
df_comp['Title'] = df_comp['Title'].replace('Ms', 'Miss')
df_comp['Title'] = df_comp['Title'].replace('Mme', 'Mrs')
df_comp['Title'] = LabelEncoder().fit_transform(df_comp['Title'])

In [None]:
df_comp['Last_Name'] = df_comp['Name'].apply(lambda x: str.split(x, ",")[0])
df_comp['Fare'].fillna(df_comp['Fare'].mean(), inplace=True)

DEFAULT_SURVIVAL_VALUE = 0.5
df_comp['Family_Survival'] = DEFAULT_SURVIVAL_VALUE

for grp, grp_df in df_comp[['Survived','Name', 'Last_Name', 'Fare', 'Ticket', 'PassengerId',
                           'SibSp', 'Parch', 'Age']].groupby(['Last_Name', 'Fare']):
    
    if (len(grp_df) != 1):
        # A Family group is found.
        for ind, row in grp_df.iterrows():
            smax = grp_df.drop(ind)['Survived'].max()
            smin = grp_df.drop(ind)['Survived'].min()
            passID = row['PassengerId']
            if (smax == 1.0):
                df_comp.loc[df_comp['PassengerId'] == passID, 'Family_Survival'] = 1
            elif (smin==0.0):
                df_comp.loc[df_comp['PassengerId'] == passID, 'Family_Survival'] = 0

print("Number of passengers with family survival information:", 
      df_comp.loc[df_comp['Family_Survival']!=0.5].shape[0])

In [None]:
for _, grp_df in df_comp.groupby('Ticket'):
    if (len(grp_df) != 1):
        for ind, row in grp_df.iterrows():
            if (row['Family_Survival'] == 0) | (row['Family_Survival']== 0.5):
                smax = grp_df.drop(ind)['Survived'].max()
                smin = grp_df.drop(ind)['Survived'].min()
                passID = row['PassengerId']
                if (smax == 1.0):
                    df_comp.loc[df_comp['PassengerId'] == passID, 'Family_Survival'] = 1
                elif (smin==0.0):
                    df_comp.loc[df_comp['PassengerId'] == passID, 'Family_Survival'] = 0
                        
print("Number of passenger with family/group survival information: " 
      +str(df_comp[df_comp['Family_Survival']!=0.5].shape[0]))

In [None]:
df_comp['Senior'] = df_comp['Age'].map(lambda s:1 if s>70 else 0)

In [None]:
# Binning to deal with outliers and also to categorise the feature?
bins_i = [-1, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550]
labels_i = [1,2,3,4,5,6,7,8,9,10,11]

df_train['stage'] = 0
df_train['stage'] = pd.cut(df_train.Fare, bins=bins_i, labels=labels_i)

df_test['stage'] = 0
df_test['stage'] = pd.cut(df_test.Fare, bins=bins_i, labels=labels_i)

df_train.stage.unique()


In [None]:
df_train.Fare = df_train.stage.astype("int64")
df_test.Fare = df_test.stage.astype("int64")
df_train.drop("stage", axis=1, inplace=True)
df_test.drop("stage", axis=1, inplace=True)

In [None]:

df_train['Age'] = pd.qcut(df_train['Age'],4)
df_test['Age'] = pd.qcut(df_test['Age'],4)

In [None]:
# Encoding the categorical features
df_comp['Fare'] = LabelEncoder().fit_transform(df_comp['Fare'])
df_comp['Age'] = LabelEncoder().fit_transform(df_comp['Age'])
df_comp['Sex'] = LabelEncoder().fit_transform(df_comp['Sex'])
# Using highly correlated features to get new features
df_comp['P_fare'] = df_comp['Pclass'] * df_comp['Fare']
df_comp['P_Age'] = df_comp['Pclass'] * df_comp['Age']
df_comp['Fam'] = df_comp['Parch'] + df_comp['SibSp'] + 1
df_comp['Alone'] = [1 if i == 1 else 0 for i in df_comp['Fam']]
# df_comp['SmallF'] = df_comp['Fam'].map(lambda s: 1 if  s == 2  else 0)
# df_comp['MedF']   = df_comp['Fam'].map(lambda s: 1 if 3 <= s <= 4 else 0)
# df_comp['LargeF'] = df_comp['Fam'].map(lambda s: 1 if s >= 5 else 0)
# df_comp['FareCat_Sex'] = df_comp['Fare']*df_comp['Sex']
# df_comp['Pcl_Sex'] = df_comp['Pclass']*df_comp['Sex']
# df_comp['Pcl_Title'] = df_comp['Pclass']*df_comp['Title']
# df_comp['Title_Sex'] = df_comp['Title']*df_comp['Sex']
df_comp['Fam'] = OneHotEncoder().fit_transform(df_comp['Fam'].values.reshape(-1,1)).toarray()
df_comp['Embarked'] = OneHotEncoder().fit_transform(df_comp['Embarked'].values.reshape(-1,1)).toarray()
df_comp['Pclass'] = OneHotEncoder().fit_transform(df_comp['Pclass'].values.reshape(-1,1)).toarray()
df_comp['Age'] = OneHotEncoder().fit_transform(df_comp['Age'].values.reshape(-1,1)).toarray()
df_comp['Fare'] = OneHotEncoder().fit_transform(df_comp['Fare'].values.reshape(-1,1)).toarray()

In [None]:
df = df_comp.drop('Name',axis=1)
df.drop('Ticket',axis=1,inplace=True)
df.drop('Last_Name',axis=1,inplace=True)
d_passenger_id = df.PassengerId.values
df.drop('PassengerId',axis=1,inplace=True)


In [None]:
df_train_f = df.iloc[:891,:]
df_test_f = df.iloc[891:,:]
df_test_f.drop("Survived",axis=1,inplace=True)

# Feature Selection

##### Select the features highly correlated with target feature

In [None]:

c = {}
for i in df_train_f.columns:
    c[i] = df_train_f.Survived.corr(df_train_f[i])

In [None]:
selected_features = [x for x in c.keys() if abs(c[x]) > 0.15 and x != 'Survived']

In [None]:
columns = selected_features
X = df_train_f[columns]
X_t = StandardScaler().fit_transform(X)
Y_t = df_train_f['Survived'].values
X_train,X_val,Y_train,Y_val = train_test_split(X_t,Y_t,test_size = 0.1,random_state =42)


# Model

In [None]:
model_rf = RandomForestClassifier(criterion='gini',
                                           n_estimators=1750,
                                           max_depth=7,
                                           min_samples_split=6,
                                           min_samples_leaf=6,
                                           max_features='auto',
                                           oob_score=True,
                                           random_state=42,
                                           n_jobs=-1,
                                           verbose=1)
model_rf.fit(X_train, Y_train)
predictions_rf = model_rf.predict(X_val)
print(accuracy_score(Y_val,predictions_rf))
print(classification_report(Y_val,predictions_rf))

In [None]:
model_ab = AdaBoostClassifier(random_state=42)
model_ab.fit(X_train, Y_train)
predictions_ab = model_ab.predict(X_val)
print(accuracy_score(Y_val,predictions_ab))
print(classification_report(Y_val,predictions_ab))

In [None]:
model_xgb = XGBClassifier(random_state=42)
model_xgb.fit(X_train, Y_train)
predictions_xgb = model_xgb.predict(X_val)
print(accuracy_score(Y_val,predictions_xgb))
print(classification_report(Y_val,predictions_xgb))

In [None]:
model_lgb = LGBMClassifier(random_state=42)
model_lgb.fit(X_train, Y_train)
predictions_lgb = model_lgb.predict(X_val)
print(accuracy_score(Y_val,predictions_lgb))
print(classification_report(Y_val,predictions_lgb))

In [None]:
model_gb = GradientBoostingClassifier(random_state=42)
model_gb.fit(X_train, Y_train)
predictions_gb = model_gb.predict(X_val)
print(accuracy_score(Y_val,predictions_gb))
print(classification_report(Y_val,predictions_gb))

In [None]:
from sklearn.svm import SVC
model_svc = SVC(random_state=42)
model_svc.fit(X_train, Y_train)
predictions_svc = model_svc.predict(X_val)
print(accuracy_score(Y_val,predictions_svc))
print(classification_report(Y_val,predictions_svc))

In [None]:
from sklearn.neighbors import KNeighborsClassifier
model_knn = KNeighborsClassifier(algorithm='auto', leaf_size=26, metric='minkowski', 
                           metric_params=None, n_jobs=1, n_neighbors=6, p=2, 
                           weights='uniform')
model_knn.fit(X_train, Y_train)
predictions_knn = model_knn.predict(X_val)
print(accuracy_score(Y_val,predictions_knn))
print(classification_report(Y_val,predictions_knn))

In [None]:
X_test = df_test_f[columns]
X_test = StandardScaler().fit_transform(X_test)
mmo = {897:1,899:1,930:1,1143:1,1152:1,1153:1,1171:1,972:0,1130:0,1138:0,1173:0,1284:0}
predictions = model_gb.predict(X_test)


In [None]:
sub = pd.DataFrame()
sub['PassengerId'] = d_passenger_id[891:]
sub['Survived'] = predictions.astype('int')
for i in mmo.keys():
    sub[sub['PassengerId'] == i].Survived = mmo[i]
sub['Survived'] = sub['Survived'].apply(lambda x: 1 if x>0.8 else 0)
sub.head()

In [None]:
sub.to_csv('submission.csv',index=False)

# Conclusion


####  - From Parch and SibSp ,people who boarded alone are more than not likely to survive.
#### - From Sex , females had more chances of surviving.
#### - From Passenger Class , class 3 are more prone to not surviving.
#### - From Alone , being alone decreases the survivalibility
#### - From Fare , we can see that people who paid less are more likely to survive.
#### - From Embarked , people boarding from Southampton had lower chances of surviving

### Please Check out my other notebooks also :
- [Heart Attack prediction](https://www.kaggle.com/govindsrathore/heart-attack-analysis-prediction-91-acc)
- [Pneumonia Detection Using Chest Xrays](https://www.kaggle.com/govindsrathore/vgg-transfer-learning-data-augmentation-94-acc)