# Bank transaction fraud detection

TML model that detect fraudulent transactions.

## Setup

### Dependencies

Import the necessary modules.

* kagglehub: used to download the dataset.
* pathlib: provide path to the dataset.
* TabularDataLoaders: create a data block to be used in the TML model.
* tabular_learner: learner object for TML task.
* accuracy: metric to evaluate model's performance.
* Categorify: a preprocessing step that converts categorical variables into numeric codes.
* FillMissing: a preprocessing step that fills the empty fields of a continuous column with its median value.
* Normalize: a preprocessing step that scales continuous variables to have zero mean and unit variance.

In [None]:
from kagglehub import dataset_download

from pathlib import Path

from fastai.tabular.all import (
    TabularDataLoaders,
    tabular_learner,
    accuracy,
    Categorify,
    FillMissing,
    Normalize
)

### Variables

1. Define the path to the dataset.

In [None]:
dataset_path_string = dataset_download('marusagar/bank-transaction-fraud-detection')
dataset_path = Path(dataset_path_string) / 'Bank_Transaction_Fraud_Detection.csv'

2. Define the data block for training

In [None]:
data_block = TabularDataLoaders.from_csv(
    dataset_path,
    y_names='Is_Fraud',
    cat_names=['Customer_ID', 'Gender', 'State', 'City', 'Bank_Branch',
               'Account_Type', 'Transaction_ID', 'Transaction_Date', 'Transaction_Time',
               'Merchant_ID', 'Transaction_Type', 'Merchant_Category',
               'Transaction_Device', 'Device_Type', 'Customer_Contact', 'Transaction_Location',
               'Transaction_Currency', 'Transaction_Description', 'Customer_Email'],
    cont_names=['Age', 'Transaction_Amount', 'Account_Balance'],
    procs=[Categorify, FillMissing, Normalize]
)

### Training

Pass the defined data block to the tabular_learner for training. Since there is commonly no pretrained model for a tabular machine learning task the fit_one_cycle method is used instead of fine_tune.

In [None]:
learner = tabular_learner(data_block, metrics=accuracy)
learner.fit_one_cycle(3)