# Bank transaction fraud detection

TML model that detect fraudulent transactions.

## Setup

### Dependencies

Import the necessary modules.

* pandas: used to feature engineer dataset's table columns.
* kagglehub: used to download the dataset.
* pathlib: provide path to the dataset.
* RandomForestRegressor: random forest model used for training.

In [None]:
import pandas

from kagglehub import dataset_download

from pathlib import Path

from sklearn.ensemble import RandomForestRegressor
from sklearn.preprocessing import OneHotEncoder

### Variables

1. Define the path to the dataset and one-hot encoder.

In [None]:
dataset_path_string = dataset_download('marusagar/bank-transaction-fraud-detection')
dataset_path = Path(dataset_path_string) / 'Bank_Transaction_Fraud_Detection.csv'
encoder = OneHotEncoder(handle_unknown='ignore', sparse_output=False)

2. Feature engineer the Transaction_Time column to represent only hour of the day.

In [None]:
data_frame = pandas.read_csv(dataset_path)
data_frame['Transaction_Hour'] = pandas.to_datetime(data_frame['Transaction_Time'], format='%H:%M:%S').dt.hour
data_frame.drop(columns='Transaction_Time', inplace=True)

3. Check for missing values.

In [None]:
data_frame.isna().sum()

4. Since there are no empty entries, proceed with defining the target variable.

In [None]:
y = data_frame['Is_Fraud']

5. Define training features.

In [None]:
features = ['Gender', 'State', 'City', 'Bank_Branch', 'Account_Type', 'Transaction_Date', 'Transaction_Type', 'Merchant_Category',
            'Transaction_Device', 'Device_Type', 'Transaction_Location', 'Transaction_Currency', 'Transaction_Description',
            'Customer_Email', 'Age', 'Transaction_Amount', 'Account_Balance', 'Transaction_Hour']

6. Construct the data frame for encoding.

In [None]:
data_frame_for_encoding = data_frame[features]

7. One-hot encode categorical columns.

In [None]:
data_frame_final = encoder.fit_transform(data_frame_for_encoding)

### Training

1. Initialize the training model.

In [None]:
training_model = RandomForestRegressor(random_state=1)

2. Pass the data frame to the training model.

In [None]:
training_model.fit(data_frame_final, y)