- The aim of the project is to analyze and predict whether transactions either Fraud and No Fraud.
To analyze the dataset of the Online Payments Fraud Detection Dataset and build and train the model on the basis of different features and variables.
There are 11 features and 6362620 entries in this dataset.
-
step
: Maps a unit of time in the real world. In this case 1 step is 1 hour of time. Total steps 744 (30 days simulation). -
type
: CASH-IN, CASH-OUT, DEBIT, PAYMENT and TRANSFER. -
amount
: Amount of the transaction in local currency. -
nameOrig
: Customer who started the transaction. -
oldbalanceOrg
: Initial balance before the transaction. -
newbalanceOrig
: New balance after the transaction. -
nameDest
: Customer who is the recipient of the transaction. -
oldbalanceDest
: Initial balance recipient before the transaction. Note that there is not information for customers that start with M (Merchants). -
newbalanceDest
: New balance recipient after the transaction. Note that there is not information for customers that start with M (Merchants). -
isFraud
: This is the transactions made by the fraudulent agents inside the simulation. In this specific dataset the fraudulent behavior of the agents aims to profit by taking control or customers accounts and try to empty the funds by transferring to another account and then cashing out of the system. -
isFlaggedFraud
: The business model aims to control massive transfers from one account to another and flags illegal attempts. An illegal attempt in this dataset is an attempt to transfer more than 200.000 in a single transaction.
- Pandas
- Numpy
- Matplotlib
- Sklearn
- Sci-py
- Seaborn
- Joblib
- Flask
- Create a virtual environment using
python -m venv myenv
. - To activate the virtual environment use
.\myenv\Scripts\activate
. - If error occurs, use
Set-ExecutionPolicy -Scope Process -ExecutionPolicy Bypass
. - Now, app.py is the flask app code. run the command "pip install -r requirements.txt" to install the required dependencies for the flask app.
- You may need to install additional libraries for running the jupyter notebooks.
- In case need help, Use video reference link
- Load the dataset which contains 6362620 entries in it and having 11 features in it.
- Performing EDA on the dataset to get insights of the dataset.
- Plotting different features graphs correspond to
target
feature. - Analyse the dataset by using correlation and plot the bar plot i.e., how much it is related to
target
feature. - Reduce the parameters and split the dataset into input and target features.
- Split the parameters into training and testing sets.
- Train the different models and get their accuracies and MSE & R2 scores even after tuning the hyper-parameters.
- Even build a neural network and tune the parameters of their.
- But Decision Tree Classifier Model gives promising performance on this dataset and classify and fit to the target variable with upto 99.97%.
- Save the model into
.joblib
extension file and create a front-end for it. - Also creating a
requirements.txt
file for the model and website build-up. - Create a front-end using FLASK framework and create a user-friendly template.
- Website can takes input and pass to the backend of the model and model will predict and provide the user a best result as of accuracy is around 99.97%.
- Decision Tree Classifier models show promising performance with 99.97% accuracy of the model.
- Created a user-friendly front-end framework using FLASK and integrate it to the model.