# 🚢 Titanic Dataset - Exploratory Data Analysis (EDA)
Exploratory Data Analysis (EDA) on the Titanic dataset to identify trends and insights using Python.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

In [None]:
df = pd.read_csv("../data/train.csv")

In [None]:
print("--- INFO ---")
print(df.info())
print("\n--- DESCRIBE ---")
print(df.describe())
print("\n--- VALUE COUNTS (Survived) ---")
print(df['Survived'].value_counts())

In [None]:
sns.pairplot(df[['Survived', 'Pclass', 'Age', 'Fare']], hue='Survived')
plt.suptitle("Pairplot of Numerical Features", y=1.02)
plt.show()

In [None]:
plt.figure(figsize=(10, 6))
sns.heatmap(df.corr(), annot=True, cmap='coolwarm')
plt.title("Correlation Heatmap")
plt.show()

In [None]:
sns.countplot(x='Sex', hue='Survived', data=df)
plt.title("Survival by Sex")
plt.show()

sns.countplot(x='Pclass', hue='Survived', data=df)
plt.title("Survival by Passenger Class")
plt.show()

In [None]:
df['Age'].hist(bins=30, edgecolor='black')
plt.title("Age Distribution")
plt.xlabel("Age")
plt.ylabel("Frequency")
plt.show()

In [None]:
sns.boxplot(x='Survived', y='Age', data=df)
plt.title("Age vs Survived")
plt.show()

In [None]:
sns.scatterplot(x='Age', y='Fare', hue='Survived', data=df)
plt.title("Fare vs Age (Colored by Survival)")
plt.show()

## 📌 Summary of Findings
- **Females** and **1st class passengers** had higher survival chances.
- `Fare` is positively correlated with survival.
- Survivors tended to be younger and paid higher fares.
- Missing data in `Age` and `Cabin` needs handling.

### 🔄 Next Steps
- Impute missing values
- Create new features like `Title`, `FamilySize`
- Build predictive models (Logistic Regression, Random Forest, etc.)