# Titanic Dataset - Exploratory Data Analysis

In this notebook I explore and use the Titanic dataset to understand passenger demographics, survival rates, and relationships between different features. This task uses the titanic dataset from kaggle.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

## Loading the Dataset
Load the Titanic Dataset into a pandas DataFrame 

In [None]:
# Loading and looking at the dataset

df = pd.read_csv("Titanic-Dataset.csv")

print("Shape", df.shape) # data.shape will return (rows, collums)
display(df.head())       # data.head() returns the first 5 rows
print("\nColumns:", df.columns.tolist())

## Handling Missing Data
Here I check which columns have missing values and decide how to handle them

In [None]:
# check the null counts of each collumn
df.isna().sum()

In [None]:
df['Embarked'].unique()

In [None]:
df[df['Embarked'].isna()]

In [None]:
df['Embarked'].mode()

In [None]:
df['Embarked'].fillna('S', inplace=True)    # fill missing values
df['Embarked'].isnull().sum()               # check 

In [None]:
df['Age'].isnull().sum()    # check the missing values for Age

In [None]:
df['Age'].fillna(df['Age'].median(), inplace=True) 
df['Age'].isnull().sum()

In [None]:
df.drop(columns=['Cabin'], inplace=True)    # because the Cabin column is mostly missing it is best to drop it

In [None]:
df.describe(include = 'object') # summary of data

In [None]:
df.groupby('Sex')['Survived'].mean()    # survival by gender

In [None]:
df.groupby('Pclass')['Survived'].mean()     # survival by passenger class

## Survival Analysis
I visualize how survival varies by gender and passenger class using bar plots and a heatmap

In [None]:
sns.barplot(x='Sex', y='Survived', data=df)
plt.title("Survival Rate by Gender")
plt.show()

In [None]:
sns.barplot(x='Pclass', y='Survived', data=df)
plt.title("Survival Rate by Passenger Class")
plt.show()

In [None]:
plt.figure(figsize=(8,6))
sns.heatmap(df.corr(numeric_only=True), annot=True, cmap='coolwarm')
plt.title("Correlation Heatmap")
plt.show()