# Titanic EDA

This notebook runs an exploratory data analysis (EDA) on the Titanic dataset (`train.csv`).

**Generated outputs:** plots and summary observations.


In [None]:
import pandas as pd
train = pd.read_csv(r'/mnt/data/train.csv')
train.shape


In [None]:
train.head()

train.info()

train.describe(include='all')


## Quick observations
- Train dataset shape: (891, 12)
- Columns: PassengerId, Survived, Pclass, Name, Sex, Age, SibSp, Parch, Ticket, Fare, Cabin, Embarked
- Top missing values: {'Cabin': 687, 'Age': 177, 'Embarked': 2}
- Survival rate: 38.38% (342/891)
- Females have a notably higher survival rate than males (see survival by Sex bar chart).
- Higher Pclass passengers generally have higher fares and different age distributions (see boxplots).
- Correlation matrix shows 'Pclass' negatively correlated with 'Fare' (note: Pclass is ordinal where 1=highest class) and 'Age' has low correlation with Survived.

In [None]:

import matplotlib.pyplot as plt
corr = train.select_dtypes(include=['number']).corr()
plt.figure(figsize=(6,5))
plt.imshow(corr, interpolation='nearest', aspect='auto')
plt.colorbar()
plt.xticks(range(len(corr.columns)), corr.columns, rotation=45, ha='right')
plt.yticks(range(len(corr.index)), corr.index)
plt.title("Correlation matrix (numeric columns)")
plt.show()
