# Summary of Analysis on Titanic Dataset

### # Titanic Dataset — Exploratory Data Analysis (EDA)

Tasks:
- Inspect data types and missing values
- Clean/transform data (handle missing values, convert types, create features)
- Compute summary statistics and group-based insights (e.g., survival by gender and class)
- Visualize key patterns and correlations with **matplotlib** (no seaborn)
- Bonus: visualize survival rates with bar plots and a correlation heatmap

### ## 1. Setup
Import libraries and load the dataset.

- **Code executed:** `import os...`

- **Code executed:** `df=pd.read_csv("train.csv")...`

### ## 2. Data Types & Missing Data
Check column types and missing values. This helps decide cleaning/imputation steps.

- **Code executed:** `print("Data types:")...`

- **Code executed:** `print("\nMissing values per column:")...`

- **Code executed:** `df.describe(include=[np.number])...`

### ## 3. Data Cleaning & Type Conversion
- **Embarked**: fill missing with mode
- **Age**: fill missing with median
- **Fare**: fill missing with median (if any)
- **Cabin** -> `Deck` (first letter), missing => "Unknown"
- Convert `Pclass`, `Sex`, `Embarked`, `Deck` to `category`
- Add `FamilySize` and `IsAlone`

- **Code executed:** `df_clean = df.copy()...`

### ## 4. Summary Statistics & Group-Based Insights

- **Code executed:** `overall_survival_rate = df_clean["Survived"].mean()...`

- **Code executed:** `survival_by_sex = df_clean.groupby("Sex")["Survived"].mean().sort_values(ascending=False)...`

- **Code executed:** `survival_by_pclass = df_clean.groupby("Pclass")["Survived"].mean().sort_index()...`

- **Code executed:** `survival_by_embarked = df_clean.groupby("Embarked")["Survived"].mean().sort_values(ascending=False)...`

- **Code executed:** `survival_pclass_sex = df_clean.pivot_table(values="Survived", index="Pclass", columns="Sex", aggfunc="mean")...`

- **Code executed:** `df_clean["FamilySizeBin"] = pd.cut(df_clean["FamilySize"], bins=[0,1,2,4,7,11], labels=["1","2","3-4","5-7","8-11"], include_lowest=True)...`

- **Code executed:** `#Displaying all above...`

### ## 5. Visualizations (Matplotlib)
One chart per figure, no seaborn, no custom color settings.

### ### 5.1 Survival by Sex — Bar Plot

- **Code executed:** `rates = df_clean.groupby("Sex")["Survived"].mean().sort_values(ascending=False)...`

### ### 5.2 Survival by Pclass — Bar Plot

- **Code executed:** `rates = df_clean.groupby("Pclass")["Survived"].mean().sort_index()...`

### ### 5.3 Survival by Pclass and Sex — Grouped Bars

- **Code executed:** `pivot_rates = df_clean.pivot_table(values="Survived", index="Pclass", columns="Sex", aggfunc="mean").sort_index()...`

### ### 5.4 Survival by Embarked — Bar Plot

- **Code executed:** `rates = df_clean.groupby("Embarked")["Survived"].mean().sort_values(ascending=False)...`

### ### 5.5 Correlation Heatmap (Numeric Features)

- **Code executed:** `import numpy as np...`

### ## 6. Key Takeaways
- Females show a higher survival rate than males
- First class passengers had higher survival
- Being alone is associated with lower survival
- Numeric features show expected relationships (e.g., Fare relates to Pclass)

