A complete data exploration project using Python, Pandas, Seaborn, and Matplotlib to analyze the Titanic dataset.
This project is part of my Data Analytics Internship at Elevvo Pathways.
The goal was to perform Exploratory Data Analysis (EDA) on the classic Titanic dataset, uncover survival patterns, calculate key metrics, and visualize insights professionally.
- Handled missing values:
Age
→ filled with medianCabin
→ dropped due to high missing rateEmbarked
→ filled with mode
- Corrected data types and ensured consistency across features.
- Calculated key metrics (KPIs):
- Overall Survival Rate
- Survival Rate by Gender
- Survival Rate by Passenger Class
- Survival Rate by Embarkation Port
- Average Age by Survival
- Group-based insights to understand patterns in survival.
- Created professional charts to visualize key insights:
- Barplots for survival by gender and passenger class
- Horizontal Barplot for clearer class comparison
- Stacked Histogram with KDE for age distribution by survival
- Clustered Barplot (Catplot) for survival by class & gender
- Correlation Heatmap for numeric feature relationships
- Transform raw data into actionable insights
- Communicate patterns and relationships effectively through KPIs and visualization
- Practice Python, Pandas, Seaborn, Matplotlib in a professional data analytics workflow
- Python
- Pandas
- NumPy
- Seaborn
- Matplotlib
The dataset used is the Titanic: Machine Learning from Disaster from Kaggle:
🔗 https://www.kaggle.com/c/titanic
[Sayed Esmail] – Data Analytics Intern at Elevvo Pathways