# ML for Credit Card Fraud Detection

### Project Description

The goal of this project is to analyze the a credit card transaction dataset to build a feature engineering and training pipeline that we will use to train a machine learning model that allows the user to detect fraudulent credit card transactions in the future.

### Project Relevance

Fraud detection systems are a key security feature of banking systems, because they protect their user's assets for unauthorized use.

Just to highlight the importance of fraud detection systems, according to the [Security.org 2023 Credit Card Fraud Report](https://www.security.org/digital-safety/credit-card-fraud-report/):
- 65% of credit and credit card holders have been fraud victims at some point in their lives, up from 58 percent in 2022. This equates to about 151 million in the United States.
- An increasing number of Americans have been victimized multiple times: in 2022, 44 percent of credit card users reported having two or more fraudulent charges, compared to 35 percent in 2021.
- Since 2021, the median fraudulent charge has climbed by about 27 percent (rising to $79 in 2023). This equates to about $12 billion in total attempted fraudulent charges.



### Key Stakeholders

The main stakeholders in this project are:

1) The banking institution(s) that would provide anonymized banking transaction data required to train the machine learning model.
2) The user(s) allowing for their banking transactions data to be used to train the model.
3) The FTC (in the US) and other regulatory institutions that would need to verify and approve the use of the users data to train the model, and approve the use of the model.

### Objectives

- The final machine learning model should provide at least 80% of fraud detection accuracy.
- A feature egineering and training pipeline should be used to provide future training of the model.
- An application that allows a user to enter a dummy transaction and verify its authenticity.

### Preparation Steps

1) Identify a public credit card transaction dataset suitable for an Exploratory Data Analysis, that allows the clear and easy identification of each column's information. Some datasets provided in Kaggle contain columns that were already scaled or processed using PCA analysis, and therefore are not useful for this project's goals.
2) Research the different machine learning models that are best suited for detecting fraudulent credit card transactions.
3) Select a suitable online platform to deploy the machine learning model that is free to use.

### Dataset

- We will use the [Credit Card Transactions Kaggle Dataset](https://www.kaggle.com/datasets/ealtman2019/credit-card-transactions), because it contains a good amount of data to work with, and its columns are easy to identify and work with because they are not scaled or processed in any way.
- The dataset contains the following columns:
    1) 'User': An ID of the user.
    2) 'Card': An ID for the user's card, some users have multiple cards.
    3) 'Year', 'Month', 'Day', 'Time': The timestamp of the transaction. 
    4) 'Amount': The amount of the transaction.
    5) 'Use Chip': 'Swipe Transaction' if a physical card was used to perform the transaction, or 'Online Transaction' if the transaction was performed online.
    6) 'Merchant Name': The name of the store where the transaction was made.
    7) 'Merchant City', 'Merchant State', 'Zip': The store's location.
    8) 'MCC': The [Merchant Category Code](https://www.investopedia.com/terms/m/merchant-category-codes-mcc.asp).
    9) 'Errors?': Any error(s) during the transaction, eg. 'Insufficient Balance', 'Technical Glitch', etc.
    10) 'Is Fraud?: A label indicating if the transaction was fraudulent or not.

### Deployment Plan

- We will use Streamlit to host the user application and trained machine learning model to test dummy transactions.
- We will use MLFlow to register and monitor the model.
- We will use BentoML to intake the trained model and deploy it.