<a href="https://www.kaggle.com/code/zeeshanahmadyar/survival-prediction-on-the-titanic-dataset?scriptVersionId=294493827" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

# **üö¢ Survival Prediction on the Titanic Dataset**

This notebook is my clear and easy-to-follow solution to the **Titanic**: **Machine Learning** from Disaster challenge on Kaggle.

I walk through:

* Data loading and basic exploration
* Handling missing values and preprocessing
* Training a classification model
* Evaluating model performance with proper metrics

The goal of this notebook is to help beginners and **intermediate learners** understand how to build and evaluate a simple ML model step by step.

Feedback and **suggestions** are very welcome!
If you find this notebook useful, an **upvote** ‚≠ê will help it reach more learners.

# **Importing Libraries**

**üì¶ Importing Required Libraries**

In this section, we import all the necessary **Python libraries** used for:

* Data manipulation
* Data visualization
* Machine learning model building

Using the right libraries helps keep the workflow clean, readable, and efficient.

In [1]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import pandas as pd
import numpy as np
import seaborn as sns 
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

import warnings
warnings.filterwarnings('ignore')


# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

/kaggle/input/titanic/train.csv
/kaggle/input/titanic/test.csv
/kaggle/input/titanic/gender_submission.csv


# **Loading the Dataset**
üìÇ Loading the Titanic Dataset

Here, we load the Titanic dataset provided by Kaggle.

The dataset contains passenger information such as:

* Age
* Gender
* Passenger class
* Fare

The target variable is Survived, which makes this a binary classification problem.

# **Load Dataset**

In [2]:
df_train = pd.read_csv('/kaggle/input/titanic/train.csv')
df_test = pd.read_csv('/kaggle/input/titanic/test.csv')

In [3]:
df_train.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


In [4]:
df_train.shape

(891, 12)

In [5]:
df_test.head()

Unnamed: 0,PassengerId,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,892,3,"Kelly, Mr. James",male,34.5,0,0,330911,7.8292,,Q
1,893,3,"Wilkes, Mrs. James (Ellen Needs)",female,47.0,1,0,363272,7.0,,S
2,894,2,"Myles, Mr. Thomas Francis",male,62.0,0,0,240276,9.6875,,Q
3,895,3,"Wirz, Mr. Albert",male,27.0,0,0,315154,8.6625,,S
4,896,3,"Hirvonen, Mrs. Alexander (Helga E Lindqvist)",female,22.0,1,1,3101298,12.2875,,S


In [6]:
df_test.shape

(418, 11)

# **Data Preprocessing**
üßπ Data Preprocessing

Raw data is rarely ready for machine learning.

In this step, we:
* Handle missing values
* Encode categorical variables
* Prepare features for model training

Proper preprocessing improves model performance and stability.

# **Fill Missing Values from Training Data**

In [7]:
# drop cabin column
df_train.drop(['Cabin', 'Name'], inplace=True, axis=1)
df_test.drop(['Cabin', 'Name'], inplace=True, axis=1)

In [8]:
# Fill missing Age and Fare values
df_train["Age"].fillna(df_train["Age"].median(), inplace=True)
df_test["Age"].fillna(df_train["Age"].median(), inplace=True)

df_test["Fare"].fillna(df_test["Fare"].median(), inplace=True)

# Embarked missing fill
df_train["Embarked"].fillna(df_train["Embarked"].mode()[0], inplace=True)
df_test["Embarked"].fillna(df_test["Embarked"].mode()[0], inplace=True)

In [9]:
df_train.isnull().sum()

PassengerId    0
Survived       0
Pclass         0
Sex            0
Age            0
SibSp          0
Parch          0
Ticket         0
Fare           0
Embarked       0
dtype: int64

In [10]:
df_train.head()

Unnamed: 0,PassengerId,Survived,Pclass,Sex,Age,SibSp,Parch,Ticket,Fare,Embarked
0,1,0,3,male,22.0,1,0,A/5 21171,7.25,S
1,2,1,1,female,38.0,1,0,PC 17599,71.2833,C
2,3,1,3,female,26.0,0,0,STON/O2. 3101282,7.925,S
3,4,1,1,female,35.0,1,0,113803,53.1,S
4,5,0,3,male,35.0,0,0,373450,8.05,S


# **Convert Sex and Embarked to numeric (Label Encoding)**

In [11]:
le = LabelEncoder()
df_train["Sex"] = le.fit_transform(df_train["Sex"])
df_test["Sex"] = le.transform(df_test["Sex"])

df_train["Embarked"] = le.fit_transform(df_train["Embarked"])
df_test["Embarked"] = le.transform(df_test["Embarked"])

In [12]:
df_test.head()

Unnamed: 0,PassengerId,Pclass,Sex,Age,SibSp,Parch,Ticket,Fare,Embarked
0,892,3,1,34.5,0,0,330911,7.8292,1
1,893,3,0,47.0,1,0,363272,7.0,2
2,894,2,1,62.0,0,0,240276,9.6875,1
3,895,3,1,27.0,0,0,315154,8.6625,2
4,896,3,0,22.0,1,1,3101298,12.2875,2


# **Feature Selection**
üß† Feature Selection

Not all features contribute equally to predictions.

In this section, we select the most relevant **features** that help the model learn **meaningful patterns** related to passenger survival.

# **Select Features and Target**

In [13]:
features = ["Pclass", "Sex", "Age", "Fare", "Embarked"]
X = df_train[features]
y = df_train["Survived"]

# **Train-Test Split**
üîÄ Splitting Data into Train and Test Sets

The dataset is split into **training and testing** sets.

This allows us to:

* Train the model on one portion of data
* Evaluate its performance on unseen data

This step helps prevent **overfitting** and gives a more **realistic evaluation**.

# **Train / Test Split (Optional Check)**

In [14]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=42)

# **Model Training**
**ü§ñ Model Training**

In this section, we train a **machine learning classification** model to predict survival.

The focus here is on:

* Simplicity
* Clear understanding of model behavior

This baseline model helps us evaluate how well our approach works.

# **Train Model**

In [15]:
model = RandomForestClassifier(n_estimators=200, random_state=42)
model.fit(X_train, y_train)

In [16]:
y_pred_test = model.predict(X_test)
print("Validation Accuracy:", accuracy_score(y_test, y_pred_test))

Validation Accuracy: 0.7877094972067039


# **Make Predictions on Test.csv**

In [17]:
test_predictions = model.predict(df_test[features])

# **Create Submission File**

In [18]:
submission = pd.DataFrame({
    "PassengerId": df_test["PassengerId"],
    "Survived": test_predictions
})

In [19]:
# Save file
submission.to_csv("submission.csv", index=False)
print("Submission file created successfully!")

Submission file created successfully!


# **üèÅ Conclusion**

This notebook demonstrated a complete **machine learning** workflow using the **Titanic dataset** ‚Äî from data loading to** model evaluation.**

Key takeaways:

* Data understanding and preprocessing are essential
* Simple models can perform well when used correctly
* Proper evaluation helps build trustworthy predictions

This project focuses on clarity, learning, and practical understanding, making it useful for anyone starting their journey in machine learning.

Feedback and suggestions are always welcome.
If you found this notebook helpful, consider giving it an **upvote** ‚≠ê to support and share knowledge with others.