# Titanic Dataset Analysis  
This project analyzes survival patterns using the Titanic dataset and the Pandas library.


In [8]:
import pandas as pd

# Load the Titanic dataset
df = pd.read_csv('titanic.csv')

# Show the first few rows
df.head()


Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


## Data Cleaning and Preparation

I checked for missing values and handled them as follows:
- Filled missing **Age** values with the median age
- Filled missing **Embarked** values with the most common port
- Dropped any rows missing **Fare** (though there were none in this dataset)


In [9]:
# Check how many missing values are in each column
df.isnull().sum()

Unnamed: 0,0
PassengerId,0
Survived,0
Pclass,0
Name,0
Sex,0
Age,177
SibSp,0
Parch,0
Ticket,0
Fare,0


In [11]:
# Fill missing Age values with the median (future-proof way)
df['Age'] = df['Age'].fillna(df['Age'].median())

# Fill missing Embarked values with the most common (mode)
df['Embarked'] = df['Embarked'].fillna(df['Embarked'].mode()[0])

# Drop rows with missing Fare values (if any)
df = df.dropna(subset=['Fare'])

## Question 1: How many passengers survived vs. died?

In [12]:
df['Survived'].value_counts()

Unnamed: 0_level_0,count
Survived,Unnamed: 1_level_1
0,549
1,342


## Question 2: What percentage of males vs. females survived?

In [13]:
df.groupby('Sex')['Survived'].mean()

Unnamed: 0_level_0,Survived
Sex,Unnamed: 1_level_1
female,0.742038
male,0.188908


## Question 3: Did survival rates differ by passenger class (1st, 2nd, 3rd)?

In [14]:
df.groupby('Pclass')['Survived'].mean()

Unnamed: 0_level_0,Survived
Pclass,Unnamed: 1_level_1
1,0.62963
2,0.472826
3,0.242363


## Question 4: What was the average age of survivors vs. non-survivors?


In [15]:
df.groupby('Survived')['Age'].mean()


Unnamed: 0_level_0,Age
Survived,Unnamed: 1_level_1
0,30.028233
1,28.291433


## Question 5: Did passengers with family onboard survive more?


In [16]:
df['HasFamily'] = (df['SibSp'] + df['Parch']) > 0
df.groupby('HasFamily')['Survived'].mean()

Unnamed: 0_level_0,Survived
HasFamily,Unnamed: 1_level_1
False,0.303538
True,0.50565


## Question 6: Where did most passengers embark from?


In [18]:
df['Embarked'].value_counts()

Unnamed: 0_level_0,count
Embarked,Unnamed: 1_level_1
S,646
C,168
Q,77


## Summary of Insights

Based on my analysis of the Titanic dataset:

- **342 passengers survived** and **549 did not**, meaning only about **38%** survived overall.
- **74% of females survived**, while only **19% of males** did — gender had a huge impact.
- **1st class passengers** had a **63% survival rate**, compared to **47%** in 2nd class and only **24%** in 3rd class.
- Survivors were **younger on average** (28.3 years) compared to non-survivors (30.6 years).
- Passengers with **family on board** had a **50.6% survival rate**, higher than the **30.4%** for those alone.
- The majority of passengers (644) **embarked from port 'S' (Southampton)**.

These findings show that social factors like gender, class, and family presence played a major role in survival.
