# Titanic Data Analysis Project
This notebook analyzes the Titanic dataset using Pandas, Matplotlib, and Seaborn to explore survival patterns and answer key questions.

## Problem Space
**What:** Analyze Titanic passenger data to understand survival patterns.

**Why:** The Titanic disaster is a well-known event and provides a classic dataset for survival analysis and machine learning practice.

**Who:** Stakeholders include historians, data scientists, and enthusiasts interested in survival outcomes.

**Where & When:** Data from the 1912 Titanic disaster.

## Understanding the Dataset

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()

# Load Titanic dataset
df = pd.read_csv('https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv')
df.info()
df.describe()
df.head()

## Questions to Address
1. What percentage of passengers survived?
2. Survival rate by passenger class?
3. Survival rate by gender?
4. Average age of survivors vs. non-survivors?
5. How many traveled alone vs. with family?
6. Did embarkation port affect survival?
7. Fare distribution by class?
8. Top names among survivors?
9. Relationship between age and survival?
10. Number of passengers with missing age data?

## Exploratory Data Analysis

In [None]:
sns.countplot(x='Survived', data=df)
plt.title('Survival Count')
plt.show()

sns.countplot(x='Pclass', hue='Survived', data=df)
plt.title('Survival by Class')
plt.show()

df['Age'].hist(bins=30)
plt.title('Age Distribution')
plt.show()

## Data Analysis & Answers

In [None]:
# 1. Percentage survived
print(f"Survival rate: {df['Survived'].mean():.2%}")

# 2. Survival by class
print(df.groupby('Pclass')['Survived'].mean())

# 3. Survival by gender
print(df.groupby('Sex')['Survived'].mean())

# 4. Average age survivors vs. non-survivors
print(df.groupby('Survived')['Age'].mean())

# 5. Traveling alone
df['Alone'] = df['SibSp'] + df['Parch']
print(df['Alone'].value_counts())

# 6. Survival by embarkation port
print(df.groupby('Embarked')['Survived'].mean())

# 7. Fare distribution by class
print(df.groupby('Pclass')['Fare'].describe())

# 8. Top survivor names
print(df[df['Survived'] == 1]['Name'].head())

# 9. Age and survival
sns.boxplot(x='Survived', y='Age', data=df)
plt.title('Age vs. Survival')
plt.show()

# 10. Missing age data
print(f"Missing age entries: {df['Age'].isnull().sum()}")

## Presentation of Insights
- Overall survival rate was ~38%.
- Women had ~74% survival, men ~19%.
- First-class survival ~63%, third class ~24%.
- Younger passengers slightly more likely to survive.
- Cherbourg embarkation had highest survival rate.
- First-class passengers paid significantly more.