------
# Fraud Risk Modelling (Credit Card)

**Objective:** To build a production-ready Fraud Risk solution, designed to detect fraudulent credit card transactions and mitigate `fraud risk`.

**Dataset:** [European Credit Card Transactions](https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud)

**Author:** Mohanad Alemam

**Start Date**: 14 October 2025


**`Note:`** This notebook was originally developed under the name `01_data_exploration.ipynb`.
Earlier work, including data overview formatting of the Markdown table and initial setup can be found in the Git history of that file, refer to commits [c45b921](https://github.com/MohanadAlemam/fraud-detection-credit-card/commit/c45b9214b116024702900acc15d18f238c0e255f), [447cc31](https://github.com/MohanadAlemam/fraud-detection-credit-card/commit/447cc31e673be80c572a5a45c9fa991f05c650de), and [8eff2fb](https://github.com/MohanadAlemam/fraud-detection-credit-card/commit/8eff2fb7c86cf1a3a23e765cb202334f1f04a7dd). The notebook was renamed for clarity and alignment with content.

----

### 00. Project Setup and Framing of the Problem

----

#### Objective Overview

The goal of this project is to build a Machine Learning (ML) Model capable of detecting fraudulent credit card transactions, assisting  financial institutions to mitigate `fraud risk`. The main challenge  is the **scarcity of the labeled fraud instances** in the dataset we only have 492 fraud cases making a **prevalence of only ~0.172%** of the total dataset.

---------------


 #### Dataset Overview

The dataset contains transactions made by credit cards in September 2013, by European cardholders. The table below shows the main characteristics of the raw data:

| Section                          | Description                                                                                                                                            |
|----------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------|
| **Dataset Source**               | [European Credit Card Transactions (provided by Kaggle)](https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud)                                      |
| **Description**                  | Transactions made by European cardholders in September 2013 over `two days`.                                                                           |
| **Size**                         | **284,807 total transactions** with **492 fraud cases** ~0.172% fraud rate.                                                                            |
| **Features**                     | 30 numerical features: `V1 to V28` principal components PCA plus `Time` and `Amount`.                                                                  |
| **Target Value**                 | Target values are **`Class`** `1` = Fraud, `0` = Non-fraud.                                                                                            |
| **Confidentiality and Semantics** | The original features were transformed using `PCA` for confidentiality. Therefore the features dont have semantic value excpet time and amount.        |
| **Main Perfomance Metrics**      | Because of the extreme class imbalance I will use **Precision-Recall AUC (PR AUC)** as the main metric instead of accuracy for performance evaluation. |



**Key Takeaways**

- Severely imbalanced classes only `~0.172%` of transactions are labelled fraudulent.
- The dataset is anonymized i.e. PCA transformed features, this inhibits `semantic feature engineering` (I can’t interpret what each V1–V28 means).
- Focus will therefore be on robust modeling, class weighting and evaluation metrics e.g. PR-AUC and F1 score tailored for imbalanced classification.
-----

#### Environment and Libraries
For the required Libraries and dependencies please see file **requirements.txt** in the main directory.


--------------------
Next Step: Is Data Exploration → I will load and explore the dataset

--------------