# Exploratory Data Analysis (EDA)

This notebook is used for performing exploratory data analysis on the insurance premiums dataset. The goal is to understand the data, identify patterns, and visualize key insights that can inform model development.

In [None]:
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Set visualisation style
sns.set(style='whitegrid')

# Load the training data
train_data = pd.read_csv('../data/train.csv')

# Display the first few rows of the dataset
train_data.head()

In [None]:
# Summary statistics of the training data
train_data.describe()

In [None]:
# Check for missing values
missing_values = train_data.isnull().sum()
missing_values[missing_values > 0]

In [None]:
# Visualize the distribution of the target variable (Premium Amount)
plt.figure(figsize=(10, 6))
sns.histplot(train_data['Premium Amount'], bins=30, kde=True)
plt.title('Distribution of Premium Amount')
plt.xlabel('Premium Amount')
plt.ylabel('Frequency')
plt.show()

In [None]:
# Correlation heatmap
plt.figure(figsize=(12, 8))
correlation_matrix = train_data.corr()
sns.heatmap(correlation_matrix, annot=True, fmt='.2f', cmap='coolwarm', square=True)
plt.title('Correlation Heatmap')
plt.show()

## Conclusion

This notebook provides a foundation for understanding the dataset through exploratory data analysis. Further steps will involve data preprocessing, model training, and evaluation.