# Exploratory Data Analysis (EDA)

In this notebook, we will perform exploratory data analysis on the fraud detection dataset. The goal is to understand the data better and identify patterns that may help in building a predictive model.

In [1]:
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Set visualisation style
sns.set(style='whitegrid')

In [2]:
# Load the dataset
train_data = pd.read_csv('../data/raw/train.csv')
test_data = pd.read_csv('../data/raw/test.csv')

# Display the first few rows of the training data
train_data.head()

In [3]:
# Check for missing values
missing_values = train_data.isnull().sum()
missing_values[missing_values > 0]

In [4]:
# Visualize the distribution of the target variable 'is_fraud'
plt.figure(figsize=(8, 6))
sns.countplot(x='is_fraud', data=train_data)
plt.title('Distribution of Fraud Cases')
plt.xlabel('Is Fraud')
plt.ylabel('Count')
plt.show()

In [5]:
# Visualize correlations between features
plt.figure(figsize=(12, 10))
correlation_matrix = train_data.corr()
sns.heatmap(correlation_matrix, annot=True, fmt='.2f', cmap='coolwarm')
plt.title('Correlation Matrix')
plt.show()

## Conclusion

In this notebook, we performed exploratory data analysis on the fraud detection dataset. We examined the distribution of the target variable, checked for missing values, and visualized correlations between features. This analysis will guide us in the feature engineering and modeling phases.