# Business Understanding
Credit card fraud is a growing problem in the financial sector, driven by the increasing volume of digital and online transactions. Fraudulent activities result in significant financial losses for banks and credit card issuers, damage customer trust, and increase operational costs related to fraud investigation and chargebacks. As transaction volumes continue to rise, manual fraud detection methods become inefficient, slow, and prone to human error.

Machine learning provides an opportunity to automatically identify suspicious transaction patterns and flag potentially fraudulent transactions in real time. By leveraging historical transaction data, financial institutions can build predictive models that help distinguish between legitimate and fraudulent transactions more accurately.

This project focuses on applying classification techniques to a real-world credit card transaction dataset to address this challenge.

## Problem Statement
Financial institutions struggle to accurately detect fraudulent credit card transactions due to the high volume of daily transactions and the fact that fraud cases are rare compared to legitimate ones. Failing to detect fraud leads to direct financial losses, while incorrectly flagging legitimate transactions inconveniences customers and reduces trust.

The problem this project seeks to address is:

How can a machine learning classification model be used to accurately identify fraudulent credit card transactions while minimizing false alarms on legitimate transactions?

## Business Objective

### Primary Objectives
To develop a machine learning classification model that can predict whether a credit card transaction is fraudulent or legitimate, enabling financial institutions to reduce fraud-related losses and improve transaction security.

### Specific Objectives
- To explore and understand patterns in credit card transaction data that differentiate fraudulent and non-fraudulent transactions.
- To preprocess and prepare transaction data for machine learning, including handling class imbalance and feature scaling where necessary.
- To train and evaluate multiple classification models and compare their performance using appropriate evaluation metrics.
- To minimize false negatives (missed fraud cases) while maintaining reasonable false positive rates to avoid unnecessary transaction declines.

## Stakeholders
The stakeholders involved in the business and credit card problem include:

1. Financial Institutions(Banks and Credit Card Issuers)
2. Customers/Cardholders
3. Fraud and Risk Management Teams
4. Regulatory and Compliance Bodies(Indirect Stakeholders)

## Business Success Criteria
For a business perspective, this project will be considered successful if the model:

- Accurately Identifies fraudulent transactions
- Minimizes missed fraud cases
- Maintains customer trust by reducing unnecessary transaction blocks
- Provides insights that can support fraud prevention strategies

# Data Understanding


## Dataset Choice
This project uses the **Credit Card Fraud Detection dataset** provided by the Machine Learning Group (ULB) and originally analyzed by Andrea Dal Pozzolo et al. The dataset contains credit card transactions made by European cardholders in September 2013.

ðŸ”— Dataset source (Kaggle): https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud

**Reasons for choosing this dataset:**
- Represents a real-world financial fraud detection problem
- Suitable for binary classification
- Contains a large number of observations and features
- Presents a realistic challenge due to extreme class imbalance

## Dataset Description
- **Total transactions:** 284,807  
- **Fraudulent transactions:** 492  
- **Fraud rate:** 0.172%  
- **Time period:** Two consecutive days  

The dataset is highly imbalanced, with fraudulent transactions representing a very small fraction of all observations. This makes it ideal for demonstrating data preparation and modeling strategies for imbalanced classification problems.

## Features
- **V1â€“V28:** PCA-transformed numerical features (anonymized to preserve confidentiality)  
- **Time:** Seconds elapsed between each transaction and the first transaction  
- **Amount:** Transaction amount  
- **Class (Target Variable):**  
  - 1 â†’ Fraudulent transaction  
  - 0 â†’ Legitimate transaction  

All features are numerical, making the dataset suitable for machine learning models without categorical encoding.

**NOTE:** The dataset is highly imbalanced, with fraudulent transactions representing only 0.172% of all observations. This characteristic will be important to consider during model training and evaluation.