# Titanic Data Analysis

We have a dataset containing information about the passengers of the legendary Titanic ship. In this project, we will analyze the data to extract meaningful insights and answer key questions about survival rates and passenger characteristics.

---

**Instructor:** Prof. Samadzadeh  
**Contributors:** Parsa Khezli, Amir Omidvar  
**Course:** Data Science, Spring 1404

---

# Importing the required libraries

In [None]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Loading the the file

In [None]:
df = pd.read_csv('titanic.csv')

**Display the first few rows**

In [None]:
df.head()

**Basic Information**

In [None]:
print("Shape of the dataset:", df.shape)
print("\nData Types:\n", df.dtypes)
print("\nMissing values:\n", df.isnull().sum())

**Statiscal Summary**

In [None]:
df.describe(include='all')

**Visualize Missing Data**

In [None]:
plt.figure(figsize=(10,6))
sns.heatmap(df.isnull(), cbar=False, cmap='viridis')
plt.title('Missing Values Heatmap')
plt.show()

**Survival Rate**

In [None]:
survival_rate = df['Survived'].mean()
print(f"Overall Survival Rate: {survival_rate:.2%}")

**Survival by Sex**

In [None]:
plt.figure(figsize=(6,4))
sns.barplot(x='Sex', y='Survived', data=df)
plt.title('Survival Rate by Sex')
plt.show()

**Survival by Passenger Class**

In [None]:
plt.figure(figsize=(6,4))
sns.barplot(x='Pclass', y='Survived', data=df)
plt.title('Survival Rate by Passenger Class')
plt.show()

**Age Distribution**

In [None]:
plt.figure(figsize=(8,4))
sns.histplot(df['Age'].dropna(), bins=30, kde=True)
plt.title('Age Distribution')
plt.show()

**Survival by Age**

In [None]:
plt.figure(figsize=(8,4))
sns.histplot(data=df, x='Age', hue='Survived', bins=30, kde=True, multiple='stack')
plt.title('Survival by Age')
plt.show()


**Survival by Embarked**

In [None]:
plt.figure(figsize=(6,4))
sns.barplot(x='Embarked', y='Survived', data=df)
plt.title('Survival Rate by Embarked Port')
plt.show()

**Correlation Heatmap**

In [None]:
plt.figure(figsize=(10,6))
sns.heatmap(df.corr(numeric_only=True), annot=True, cmap='coolwarm')
plt.title('Correlation Heatmap')
plt.show()

**Feature Engineering: Family Size**

In [None]:
df['FamilySize'] = df['SibSp'] + df['Parch'] + 1

plt.figure(figsize=(6,4))
sns.barplot(x='FamilySize', y='Survived', data=df)
plt.title('Survival Rate by Family Size')
plt.show()


**Survival by Fare**

In [None]:
plt.figure(figsize=(8,4))
sns.histplot(data=df, x='Fare', hue='Survived', bins=30, kde=True, multiple='stack')
plt.title('Survival by Fare')
plt.show()


**Conclusion**

In [21]:
print("Key Findings:")
print("- Females had a much higher survival rate than males.")
print("- Passengers in 1st class survived at a higher rate than those in 2nd or 3rd class.")
print("- Younger passengers and those with smaller families had higher survival rates.")
print("- Embarked port and fare also influenced survival.")

Key Findings:
- Females had a much higher survival rate than males.
- Passengers in 1st class survived at a higher rate than those in 2nd or 3rd class.
- Younger passengers and those with smaller families had higher survival rates.
- Embarked port and fare also influenced survival.
