<p style="background-color:#00407A;font-family:newtimeroman;color:#FFF9ED;font-size:100%;text-align:center;border-radius:10px 10px;">
    Titanic Survival Prediction Model
</p>



<div style="text-align: center;">
    <img src="https://i.pinimg.com/564x/03/6e/32/036e32952ebb3e5d64797f0a8d125cc8.jpg">
</div>



# **Information ℹ**

- The Titanic was a British passenger liner that famously sank on its maiden voyage on April 15, 1912.
- It remains one of the most well-known maritime disasters in history, resulting in the loss of more than 1,500 lives.
- The ship struck an iceberg in the North Atlantic Ocean, leading to its tragic sinking.

## Dataset Columns 📁

- **PassengerId**: A unique identifier assigned to each passenger on the Titanic.
- **Survived**: Indicates whether a passenger survived the disaster. It has binary values: 0 for not survived and 1 for survived.
- **Pclass**: Represents the passenger class, categorized as 1, 2, or 3 (1st, 2nd, or 3rd class).
- **Name**: The name of the passenger.
- **Sex**: The gender of the passenger.
- **Age**: The age of the passenger in years.
- **SibSp**: Represents the number of siblings or spouses the passenger had aboard the Titanic.
- **Parch**: Represents the number of parents or children the passenger had aboard the Titanic.
- **Ticket**: The ticket number assigned to the passenger.
- **Fare**: The fare or price paid by the passenger for the ticket.
- **Cabin**: The cabin number assigned to the passenger (if available).
- **Embarked**: Indicates the port of embarkation for the passenger (C = Cherbourg, Q = Queenstown, S = Southampton).

## End-to-End Pipeline for Survival Prediction 📈

In this notebook, we will develop an end-to-end pipeline for predicting passenger survival on the Titanic. The pipeline will include data analysis, preprocessing, feature selection, model training, and evaluation.

By following this pipeline, we aim to gain insights into the factors influencing survival and build a predictive model.

We will use popular libraries such as Pandas, NumPy, and Scikit-learn for data manipulation, analysis, and modeling.

Let's start building the pipeline!


# Importing Libraries 🐍

In [1]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.compose import ColumnTransformer
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.feature_selection import RFE
from sklearn.metrics import accuracy_score
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import GridSearchCV
import pickle

from sklearn import set_config
set_config(display='diagram')
import warnings
warnings.filterwarnings("ignore")

In [2]:
df=pd.read_csv(r"https://github.com/I-AdityaGoyal/Titanic-Survival-Prediction-Model/blob/main/CSV%20File/train.csv")

In [3]:
df.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


# Feature Engineering 👷‍♂️

In [4]:
df['FamilySize'] = df['SibSp'] + df['Parch'] + 1
df['Fare'] = pd.qcut(df['Fare'], q=4, labels=['Low', 'Medium', 'High', 'Very High'])

df.drop(['PassengerId', 'Cabin','Ticket','Name','SibSp','Parch'], axis=1, inplace=True)


In [5]:
df.head(10)

Unnamed: 0,Survived,Pclass,Sex,Age,Fare,Embarked,FamilySize
0,0,3,male,22.0,Low,S,2
1,1,1,female,38.0,Very High,C,2
2,1,3,female,26.0,Medium,S,1
3,1,1,female,35.0,Very High,S,2
4,0,3,male,35.0,Medium,S,1
5,0,3,male,,Medium,Q,1
6,0,1,male,54.0,Very High,S,1
7,0,3,male,2.0,High,S,5
8,1,3,female,27.0,Medium,S,3
9,1,2,female,14.0,High,C,2


In [6]:
df.isnull().sum()

Survived        0
Pclass          0
Sex             0
Age           177
Fare            0
Embarked        2
FamilySize      0
dtype: int64

In [7]:
X = df.drop("Survived", axis=1)
y = df["Survived"]


# Handling Missing Values 🔢

In [8]:
#trf1: Handling Missing Values

trf1 = ColumnTransformer(
    transformers=[
        ("imputer_age", SimpleImputer(strategy="most_frequent"), [2]),
        ("imputer_embark", SimpleImputer(strategy="most_frequent"), [4])
    ],
    remainder='passthrough'
)

# Handling Categorical Values 🔠

In [9]:
# trf2: Handling Categorical Values

trf2 = ColumnTransformer(
    transformers=[
        ("ohe",OneHotEncoder(sparse=False,dtype=np.int32, drop='first',handle_unknown="ignore"),[1,3,4]),
    ],
    remainder='passthrough'
)

# Feature Scaling ⚡

In [10]:
# trf3: Feature Scaling

trf3 = ColumnTransformer(
    transformers=[
        ("scale", StandardScaler(), slice(0,15))
    ])

# Feature Selection 🔎

In [11]:
# trf4 Feature Selection
trf4 = RFE(estimator= RandomForestClassifier(), n_features_to_select=3)

# Model Selection ♏

In [12]:
# trf5: Model
trf5 = RandomForestClassifier()

# Pipeline 🤖

In [13]:
# Creating Pipeline
pipe = Pipeline([
    ("trf1",trf1),
    ("trf2",trf2),
    ("trf3",trf3),
    ("trf4",trf4),
    ("trf5",trf5)
])

In [14]:
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2)

In [15]:
X_train

Unnamed: 0,Pclass,Sex,Age,Fare,Embarked,FamilySize
451,3,male,,High,S,2
661,3,male,40.0,Low,C,1
580,2,female,25.0,High,S,3
356,1,female,22.0,Very High,S,2
113,3,female,20.0,Medium,S,2
...,...,...,...,...,...,...
203,3,male,45.5,Low,C,1
350,3,male,23.0,Medium,S,1
521,3,male,22.0,Low,S,1
676,3,male,24.5,Medium,S,1


In [16]:
pipe.fit(X_train, y_train)

In [17]:
pipe.named_steps

{'trf1': ColumnTransformer(remainder='passthrough',
                   transformers=[('imputer_age',
                                  SimpleImputer(strategy='most_frequent'), [2]),
                                 ('imputer_embark',
                                  SimpleImputer(strategy='most_frequent'),
                                  [4])]),
 'trf2': ColumnTransformer(remainder='passthrough',
                   transformers=[('ohe',
                                  OneHotEncoder(drop='first',
                                                dtype=<class 'numpy.int32'>,
                                                handle_unknown='ignore',
                                                sparse=False),
                                  [1, 3, 4])]),
 'trf3': ColumnTransformer(transformers=[('scale', StandardScaler(),
                                  slice(0, 15, None))]),
 'trf4': RFE(estimator=RandomForestClassifier(), n_features_to_select=3),
 'trf5': RandomForestClassifier()

In [18]:
y_pred = pipe.predict(X_test)

# Accuracy 🎯

In [19]:
accuracy_score(y_test,y_pred)

0.8212290502793296

# Cross Validation Score 💯

In [20]:
cross_val_score(pipe, X_train, y_train, cv=5, scoring="accuracy").mean()

0.7753373387176205

In [21]:
params = {
    'trf5__max_depth':[1,2,3,4,5,None]
}

# Grid Search CV 🚀

In [22]:
grid = GridSearchCV(pipe, params, cv=5, scoring="accuracy")
grid.fit(X_train, y_train)

In [23]:
grid.best_score_

0.7921205555008372

In [24]:
grid.best_params_

{'trf5__max_depth': 4}

# Pickle ❕

In [27]:
pickle.dump(pipe, open("pipe_data.pkl","wb"))

<p style="background-color:#00407A;font-family:newtimeroman;color:#FFF9ED;font-size:100%;text-align:center;border-radius:10px 10px;">
    Model Deployment
</p>

# Streamlit Library 💡

- Save this as a python file from any python IDE as(file_name)
- Import streamlit from your command prompt
- Run this file and type a command
- " streamlit run file_name.py "
- Make Amazing Predictions!!!

In [26]:
'''
import streamlit as st
import pickle
import pandas as pd


pipeline = pickle.load(open("pipe_data.pkl","rb"))


def predict_survival(pclass, sex, age, fare, embarked, family_size):
    # Create a DataFrame with the user input
    data = pd.DataFrame({
        'Pclass': [pclass],
        'Sex': [sex],
        'Age': [age],
        'Fare': [fare],
        'Embarked': [embarked],
        'FamilySize': [family_size]
    })
    
    # Make predictions using the loaded pipeline
    predictions = pipeline.predict(data)
    
    return predictions[0]


def main():
    # Set the title and description of the web app
    st.title('Titanic Survival Prediction')
    st.write('Enter the passenger details to predict survival.')
    
    # Get user input using Streamlit input components
    pclass = st.selectbox('Pclass', [1, 2, 3])
    sex = st.radio('Sex', ['Male', 'Female'])
    age = st.number_input('Age')
    fare = st.selectbox('Fare', ['Low', 'Medium', 'High', 'Very High'])
    embarked = st.selectbox('Embarked', ['C', 'Q', 'S'])
    family_size = st.number_input('Family Size')
    
    # Make predictions on user input
    if st.button('Predict'):
        result = predict_survival(pclass, sex, age, fare, embarked, family_size)
        st.write(f'The predicted survival is: {result}')

if __name__ == '__main__':
    main()
'''

'\nimport streamlit as st\nimport pickle\nimport pandas as pd\n\n\npipeline = pickle.load(open("pipe_data.pkl","rb"))\n\n\ndef predict_survival(pclass, sex, age, fare, embarked, family_size):\n    # Create a DataFrame with the user input\n    data = pd.DataFrame({\n        \'Pclass\': [pclass],\n        \'Sex\': [sex],\n        \'Age\': [age],\n        \'Fare\': [fare],\n        \'Embarked\': [embarked],\n        \'FamilySize\': [family_size]\n    })\n    \n    # Make predictions using the loaded pipeline\n    predictions = pipeline.predict(data)\n    \n    return predictions[0]\n\n\ndef main():\n    # Set the title and description of the web app\n    st.title(\'Titanic Survival Prediction\')\n    st.write(\'Enter the passenger details to predict survival.\')\n    \n    # Get user input using Streamlit input components\n    pclass = st.selectbox(\'Pclass\', [1, 2, 3])\n    sex = st.radio(\'Sex\', [\'Male\', \'Female\'])\n    age = st.number_input(\'Age\')\n    fare = st.selectbox(\'Far

<p style="background-color:#00407A;font-family:newtimeroman;color:#FFF9ED;font-size:100%;text-align:center;border-radius:10px 10px;">
    Conclusion
</p>

This notebook serves as a comprehensive guide for analyzing the Titanic dataset and developing a survival prediction pipeline. It highlights the importance of data exploration, preprocessing, and modeling techniques in extracting meaningful insights from the data and making accurate predictions.

If you found my work valuable, please consider giving it an upvote. Your support is greatly appreciated and encourages me to continue creating valuable content.

Thank you for your time!