# Titanic Dataset - Exploratory Data Analysis (EDA)


This notebook performs an Exploratory Data Analysis (EDA) on the Titanic Dataset.
We explore the data using summary statistics and visualizations to understand key patterns and trends.

## Objectives:
- Generate summary statistics
- Visualize distributions and relationships
- Identify patterns and insights


In [None]:

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Display plots inline
%matplotlib inline


In [None]:

# Load the dataset
df = pd.read_csv('Titanic-Dataset.csv')

# Preview the data
df.head()


In [None]:

# Check data info
df.info()


In [None]:

# Summary statistics of numerical features
df.describe()


In [None]:

# Check for null values
df.isnull().sum()


In [None]:

# Histograms of numerical features
df.select_dtypes(include=['float64', 'int64']).hist(figsize=(12, 10))
plt.suptitle("Histograms of Numerical Features")
plt.tight_layout()
plt.show()


In [None]:

# Boxplots to detect outliers
for col in ['Age', 'Fare']:
    plt.figure(figsize=(6, 4))
    sns.boxplot(x=df[col])
    plt.title(f'Boxplot of {col}')
    plt.show()


In [None]:

# Correlation Heatmap
plt.figure(figsize=(10, 6))
sns.heatmap(df.corr(), annot=True, cmap='coolwarm')
plt.title("Correlation Heatmap")
plt.show()


In [None]:

# Survival count
sns.countplot(x='Survived', data=df)
plt.title('Survival Count')
plt.show()


In [None]:

# Survival by Passenger Class
sns.countplot(x='Pclass', hue='Survived', data=df)
plt.title('Survival by Passenger Class')
plt.show()

# Survival by Gender
sns.countplot(x='Sex', hue='Survived', data=df)
plt.title('Survival by Gender')
plt.show()



## Key Insights:
- Most 3rd class passengers did not survive.
- Females had a higher survival rate than males.
- Fare and Pclass show a strong correlation with survival.
- Age has outliers and missing values.
- Overall, visualization reveals key trends helpful for feature engineering.
