# Titanic EDA Assignment

This notebook covers the Exploratory Data Analysis (EDA) steps (a–f).

In [None]:

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

sns.set(style="whitegrid")

train = pd.read_csv("train.csv")
print(train.info())
train.describe()



### Observations
- Dataset has 891 rows and 12 columns.
- Missing values: Age (177), Cabin (687), Embarked (2).
- Target column: Survived (0 = No, 1 = Yes).


In [None]:

print("Sex counts:\n", train["Sex"].value_counts())
print("\nPclass counts:\n", train["Pclass"].value_counts())
print("\nEmbarked counts:\n", train["Embarked"].value_counts())



### Observations
- More males than females.
- Most passengers travelled in 3rd class.
- Majority embarked from port 'S'.


In [None]:

sns.pairplot(train[["Survived","Pclass","Age","SibSp","Parch","Fare"]], hue="Survived")
plt.show()

plt.figure(figsize=(8,6))
sns.heatmap(train.corr(), annot=True, cmap="coolwarm")
plt.title("Correlation Heatmap")
plt.show()



### Observations
- Fare is positively correlated with survival.
- Pclass is negatively correlated with survival (higher class → higher survival).
- Younger passengers tended to survive slightly more, but Age correlation is weaker.


In [None]:

plt.figure(figsize=(8,5))
sns.histplot(train["Age"].dropna(), bins=30, kde=True)
plt.title("Age Distribution")
plt.show()

plt.figure(figsize=(8,5))
sns.histplot(train["Fare"], bins=40, kde=True)
plt.title("Fare Distribution")
plt.show()



### Observations
- Age is right-skewed, most passengers between 20–40 years old.
- Fare is highly skewed: most passengers paid low fares, few paid very high fares.


In [None]:

sns.boxplot(x="Survived", y="Age", data=train)
plt.title("Age vs Survival")
plt.show()

plt.figure(figsize=(8,6))
sns.scatterplot(x="Age", y="Fare", hue="Survived", data=train)
plt.title("Age vs Fare (colored by Survival)")
plt.show()



### Observations
- Younger passengers had slightly higher survival rates.
- Passengers who paid higher fares were more likely to survive.


In [None]:

sns.countplot(x="Sex", hue="Survived", data=train)
plt.title("Survival by Sex")
plt.show()

sns.countplot(x="Pclass", hue="Survived", data=train)
plt.title("Survival by Class")
plt.show()

sns.countplot(x="Embarked", hue="Survived", data=train)
plt.title("Survival by Embarked Port")
plt.show()



### Observations
- Females had a much higher survival rate than males.
- 1st class passengers had the highest survival, 3rd class the lowest.
- Passengers from port C survived more compared to port S.



# 📌 Summary of Findings
- **Sex**: Strongest factor; females survived more often than males.
- **Pclass**: Higher classes had higher survival rates (1st > 2nd > 3rd).
- **Fare**: Higher fares linked with higher survival chances.
- **Age**: Younger passengers, especially children, had slightly higher survival.
- **Embarked**: Passengers from port 'C' survived more than those from 'S'.
- **Cabin**: Too many missing values to be used directly.
