# Anti-Money Laundering and Fraud Prediction
This is a breakdown of the overall models that can be developed to make AI models to predict and detect money-laundering or fraud within financial datasets.
Data sources for these datasets come from sources on Kaggle:
- [Credit Card Fraud Detection | Kaggle](https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud?datasetId=310&sortBy=voteCount)
- [Fake Bills | Kaggle]("https://www.kaggle.com/datasets/alexandrepetit881234/fake-bills")

These data sets contain various amounts of data dating till 2013, with various levels of information that is captured from financial entities.

## Packages to install for below
Make sure pip is up to date for these packages to install
```
python.exe -m pip install --upgrade pip
```

To install the SciKit (sklearn) packages use the below command:
```
pip install scikit-learn
```

To install Seaborn packages use the below command:
```
pip install seaborn
```

To install the Plotly packages use the below command:
```
pip install plotly
```

## Library Imports
To start the overall work click play on the play button for the packages

In [2]:
import os

import numpy as np # linear algebra breakdown
import pandas as pd # data processing, CSV files input/output
import matplotlib.pyplot as plt # graph plotting
import seaborn as sns 
import warnings
import plotly.express as px

from numpy import percentile
from mpl_toolkits.mplot3d import Axes3D

from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import precision_score, recall_score, f1_score
from sklearn.metrics import average_precision_score


warnings.filterwarnings('ignore')

%matplotlib inline

## Dataset files

From the above components you will then be able to import the different files that are needing to be analysed with Pandas.
Pandas will be able to pull in the different files, for example with this work from Github.

In [18]:
df = pd.read_csv("https://github.com/jono120/fictional-octo-potato/raw/main/transaction_data_3/fake_bills.csv")
#df = pd.read_csv("https://github.com/Jono120/fictional-octo-potato/tree/main/transaction_data_1/bank.csv")
print(df.head())
df.info()

  is_genuine;diagonal;height_left;height_right;margin_low;margin_up;length
0         True;171.81;104.86;104.95;4.52;2.89;112.83                      
1         True;171.46;103.36;103.66;3.77;2.99;113.09                      
2           True;172.69;104.48;103.5;4.4;2.94;113.16                      
3         True;171.36;103.91;103.94;3.62;3.01;113.51                      
4         True;171.73;104.28;103.46;4.04;3.48;112.54                      
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1500 entries, 0 to 1499
Data columns (total 1 columns):
 #   Column                                                                    Non-Null Count  Dtype 
---  ------                                                                    --------------  ----- 
 0   is_genuine;diagonal;height_left;height_right;margin_low;margin_up;length  1500 non-null   object
dtypes: object(1)
memory usage: 11.8+ KB


In [None]:
print("Shape of dataset:", df.shape)
print("Overview of the data:" )

In [None]:
def date_to_float(date_data): return (date_data - date_data.min()) / np.timedelta64(1, 'D')

df["Date Days"] = date_to_float(df["VALUE DATE"])
dates = pd.Series(df["DATE"])
df["YEAR"] = dates.dt.year
df["MONTH"] = dates.dt.month
df["WEEK"] = dates.dt.isocalendar().week
df["DAY"] = dates.dt.day
df["DAYOFWEEK"] = dates.dt.dayofweek
df["Account No"] = df["Account No"].replace(df["Account No"].unique(), ["A","B","C","D","E","F","G","H","I","J"])