# Exploratory Data Analysis on Titanic Dataset

In this notebook, we will perform exploratory data analysis (EDA) on the Titanic dataset to gain insights into the data and understand the factors that may have influenced the survival of passengers.

In [None]:
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Set visualisation style
sns.set(style='whitegrid')

In [None]:
# Load the dataset
train_data = pd.read_csv('../data/train.csv')
test_data = pd.read_csv('../data/test.csv')

# Display the first few rows of the training data
train_data.head()

In [None]:
# Summary statistics of the training data
train_data.describe(include='all')

In [None]:
# Check for missing values
missing_values = train_data.isnull().sum()
missing_values[missing_values > 0]

In [None]:
# Visualize the distribution of the target variable 'Survived'
plt.figure(figsize=(6, 4))
sns.countplot(x='Survived', data=train_data)
plt.title('Survival Count')
plt.xlabel('Survived')
plt.ylabel('Count')
plt.show()

In [None]:
# Analyze survival rate by gender
plt.figure(figsize=(8, 4))
sns.barplot(x='Sex', y='Survived', data=train_data)
plt.title('Survival Rate by Gender')
plt.ylabel('Survival Rate')
plt.show()

In [None]:
# Analyze survival rate by passenger class
plt.figure(figsize=(8, 4))
sns.barplot(x='Pclass', y='Survived', data=train_data)
plt.title('Survival Rate by Passenger Class')
plt.ylabel('Survival Rate')
plt.show()

In [None]:
# Analyze survival rate by age
plt.figure(figsize=(10, 6))
sns.histplot(train_data[train_data['Survived'] == 1]['Age'], bins=30, color='green', label='Survived', kde=True)
sns.histplot(train_data[train_data['Survived'] == 0]['Age'], bins=30, color='red', label='Not Survived', kde=True)
plt.title('Age Distribution by Survival')
plt.xlabel('Age')
plt.ylabel('Count')
plt.legend()
plt.show()

## Conclusion

In this exploratory data analysis, we have visualized and analyzed various factors that may have influenced the survival of passengers on the Titanic. Further analysis and feature engineering will be required to build a predictive model.