# Fraud Detection - Exploratory Data Analysis
This notebook contains the exploratory data analysis for the fraud detection project.

In [None]:
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Set plot style
sns.set_style('whitegrid')

In [None]:
# Load the data
data_path = '../data/raw/Fraud_Data.csv'
ip_path = '../data/raw/IpAddress_to_Country.csv'

# Read the data
fraud_data = pd.read_csv(data_path)
ip_data = pd.read_csv(ip_path)

# Display first few rows of the data
print('Fraud Data:')
display(fraud_data.head())

print('\nIP to Country Data:')
display(ip_data.head())

## Data Exploration

In this section, we'll explore the data to understand its structure, check for missing values, and perform initial analysis.

In [None]:
# Basic information about the datasets
print('Fraud Data Info:')
fraud_data.info()

print('\nIP to Country Data Info:')
ip_data.info()

In [None]:
# Check for missing values
print('Missing Values in Fraud Data:')
print(fraud_data.isnull().sum())

print('\nMissing Values in IP to Country Data:')
print(ip_data.isnull().sum())

## Data Visualization

Visualizing the data to understand distributions and relationships.

In [None]:
# Plot distribution of the target variable (assuming 'class' is the target)
if 'class' in fraud_data.columns:
    plt.figure(figsize=(8, 5))
    sns.countplot(x='class', data=fraud_data)
    plt.title('Distribution of Fraudulent vs Non-Fraudulent Transactions')
    plt.show()

## Next Steps

Based on the EDA, we can proceed with:
1. Data preprocessing and feature engineering
2. Building predictive models
3. Model evaluation and optimization