# Fraud Transaction Detection â€“ Machine Learning Case Study

**Candidate Name:** Akshay Tripathi 
**Role:** Data Science Intern  
**Dataset:** Financial Fraud Dataset  
**Objective:** Predict fraudulent transactions and suggest prevention strategies

## 1. Business Problem Understanding

The objective of this project is to build a machine learning model that can
identify fraudulent transactions and help the company take proactive actions
to reduce financial loss and customer risk.
## 2. Dataset Overview

- Total Rows: 6,362,620
- Total Columns: 10
- Target Variable: isFraud


In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

df = pd.read_csv("fraud.csv")
df.head()

df.info()
df.describe()


## 3. Data Cleaning

Data cleaning was performed to handle missing values, outliers,
and multicollinearity.
### 3.1 Missing Values

No missing values were found in the dataset, hence no imputation was required.
### 3.2 Outlier Detection

Outliers were analyzed using box plots.
Transaction amounts show high variance, which is expected in financial data.

In [None]:
df.isnull().sum()
df.describe()
sns.boxplot(x=df["amount"])
plt.show()
plt.figure(figsize=(8,6))
sns.heatmap(df.corr(), cmap="coolwarm")
plt.show()


## 4. Feature Selection

Features were selected based on:
- Correlation with target variable
- Business relevance
- Model performance


In [None]:
X = df.drop("isFraud", axis=1)
y = df["isFraud"]
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

## 5. Fraud Detection Model

Random Forest Classifier was used because it can capture non-linear
patterns and works well with imbalanced datasets.

In [None]:
from sklearn.ensemble import RandomForestClassifier

model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

## 6. Model Evaluation

In [None]:
from sklearn.metrics import classification_report

y_pred = model.predict(X_test)
print(classification_report(y_test, y_pred))

## 7. Key Factors Predicting Fraud

- High transaction amount
- TRANSFER and CASH_OUT transaction types
- Sudden balance reduction
- Zero balance destination accounts
## 8. Do These Factors Make Sense?

Yes, these factors logically align with real-world fraud behavior
where fraudsters attempt to move large amounts quickly.
## 9. Prevention Strategies

- Real-time fraud detection systems
- Multi-factor authentication
- Transaction limits
- AI-based alert systems

## 10. Conclusion

The model effectively identifies fraudulent transactions and provides
actionable insights to prevent fraud proactively.
