# Titanic - Exploratory Data Analysis (EDA)
## Objective
In this exploratory data analysis, we will analyze Titanic passenger data,
understand its structure, identify patterns, and prepare it for further processing.

In [None]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import missingno as msno

train_df = pd.read_csv('train.csv')
test_df = pd.read_csv('test.csv')

print('Train dataset information:')
print(train_df.info())
print('\nTest dataset information:')
print(test_df.info())

## Descriptive Statistics
Below are the descriptive statistics for the datasets.

In [None]:
print(train_df.describe())
print(test_df.describe())

## Missing Values Visualization
Analyzing missing values helps us identify columns that need imputation or removal.

In [None]:
plt.figure(figsize=(10, 6))
sns.heatmap(train_df.isnull(), cbar=False, cmap='viridis')
plt.title('Missing values in train.csv')
plt.show()

plt.figure(figsize=(10, 6))
sns.heatmap(test_df.isnull(), cbar=False, cmap='viridis')
plt.title('Missing values in test.csv')
plt.show()

## Survival Distribution
Let's analyze the survival distribution.

In [None]:
sns.countplot(x='Survived', data=train_df)
plt.title('Survival distribution')
plt.show()

## Age Distribution
Analyzing the age distribution of passengers on Titanic.

In [None]:
sns.histplot(train_df['Age'].dropna(), kde=True, bins=30)
plt.title('Age distribution')
plt.show()

## Survival by Gender
Analyzing survival rates based on gender.

In [None]:
sns.countplot(x='Sex', hue='Survived', data=train_df)
plt.title('Survival by Gender')
plt.show()

## Survival by Passenger Class
Analyzing survival rates by passenger class.

In [None]:
sns.countplot(x='Pclass', hue='Survived', data=train_df)
plt.title('Survival by Passenger Class')
plt.show()

## Conclusions
- A significant number of missing values are in the 'Cabin' column.
- Most passengers are in the age range of 20-40 years.
- Women had a higher survival rate.
- Higher class passengers had better chances of survival.