------
# Credit Card Fraud Detection

**Description:** Build a Machine Learning model to detect fraudulent credit card transactions and mitigate `fraud risk`.

**Dataset:** [Kaggle Credit Card Fraud Detection](https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud)

**Author:** Mohanad Alemam

**`Note:`** This notebook was originally developed under the name `01_data_exploration.ipynb`.
Earlier work, including data overview, formatting of the Markdown table and initial setup, can be found in the Git history of that file [refer to commits [c45b921](https://github.com/MohanadAlemam/fraud-detection-credit-card/commit/c45b9214b116024702900acc15d18f238c0e255f), [447cc31](https://github.com/MohanadAlemam/fraud-detection-credit-card/commit/447cc31e673be80c572a5a45c9fa991f05c650de), and [8eff2fb](https://github.com/MohanadAlemam/fraud-detection-credit-card/commit/8eff2fb7c86cf1a3a23e765cb202334f1f04a7dd). The notebook was renamed for clarity and alignment with content.

----

## 00. Project Setup — Framing the Problem

----

#### Objective Overview

The goal of this project is to build a Machine Learning (ML) Model capable of detecting fraudulent credit card transactions. Early detection of fraud assist financial institutions to mitigate `fraud risk` and protect customers.

---------------


 #### Dataset Overview

The dataset contains transactions made by credit cards in September 2013 by European cardholders. The table below highlights the main characteristics in its raw state:

| Section                  | Description                                                                                                                                                                                                    |
|--------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| **Dataset Source**       | [Credit Card Fraud Detection – Kaggle Dataset](https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud)                                                                                                        |
| **Description**          | Contains transactions made by European cardholders in **September 2013**. The dataset covers **two days** of transactions.                                                                                     |
| **Size**                 | **284,807 total transactions**, with **492 fraud cases** (≈0.172% fraud rate).                                                                                                                                 |
| **Features**             | 30 numerical features: **V1–V28** (principal components PCA), plus **`Time`** and **`Amount`**.                                                                                                               |
| **Target Value**         | Target variable is **`Class`** (`1` = Fraud, `0` = Non-fraud).                                                                                                                                                 |
| **Confidentiality Note** | Original features were transformed using **PCA** to protect sensitive and confidential customer information. Therefore, the dataset’s feature names do not reveal original transaction details.                |
| **Recommended Metrics**  | Due to extreme class imbalance, use **Precision-Recall AUC (AUPRC)** instead of accuracy for performance evaluation.                                                                                           |



`Key Takeaways`

- Severe class imbalance — only `0.17%` of transactions are fraudulent.
- The dataset’s anonymized PCA-transformed features inhibits `semantic feature engineering` (I can’t interpret what each V1–V28 means).
- Focus will therefore be on robust modeling, class weighting, and evaluation metrics (e.g., PR-AUC, Recall) tailored for imbalanced classification.
-----

#### Environment and Libraries
Initially Required Libraries:
- pandas
- numpy
- matplotlib
- scikit-learn

--------------------
Next Step: Is Data Exploration → I will load and explore the dataset

--------------