## Link al dataset di kaggle
https://www.kaggle.com/datasets/rupakroy/online-payments-fraud-detection-dataset

## Colonne :
- step: rappresenta un'unita di tempo 1 = 1 ora
- type: tipo di transazione
- amount: the amount of the transaction
- nameOrig: customer starting the transaction
- oldbalanceOrg: balance before the transaction
- newbalanceOrig: balance after the transaction
- nameDest: recipient of the transaction
- oldbalanceDest: initial balance of recipient before the transaction
- newbalanceDest: the new balance of recipient after the transaction
- isFraud: fraud transaction

## Problems
- Non ci sono gli stessi ordini di grandezza per i dati nelle colonne "amount"
- Type possiamo vederla come una variabile categorica
    - PAYMENT
    - DEBIT
    - CASH_OUT
    - CASH_IN
- Al momento il codice nameOrig e newbalanceOrig non mi interessa
- Step ???
- La colonna da predire è isFraud

In [2]:
import pandas as pd
import numpy as np

# Read data
data = pd.read_csv("../data/data.csv")
print("\n",data.head())
print("\n",data.isnull().sum())

# Understanding the transaction type
print("\n",data.type.value_counts())
dataType = data["type"].value_counts()
transactions = dataType.index
quantity = dataType.values

# Checking correlation
correlation = data.corr()
print("\n",correlation["isFraud"].sort_values(ascending=False))

# Transform the categorical into numeric
data["type"] = data["type"].map({"CASH_OUT": 1, "PAYMENT": 2, "CASH_IN": 3, "TRANSFER": 4,"DEBIT": 5})
data["isFraud"] = data["isFraud"].map({0: "No Fraud", 1: "Fraud"})
print("\n\n", data.head())

# Splitting the data
from sklearn.model_selection import train_test_split
x = np.array(data[["type", "amount", "oldbalanceOrg", "newbalanceOrig"]])
y = np.array(data[["isFraud"]])

# Training a machine learning model
from sklearn.tree import DecisionTreeClassifier
xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=0.10, random_state=42)
model = DecisionTreeClassifier()
model.fit(xtrain, ytrain)
print("\n",model.score(xtest, ytest))

# Prediction
#                    [type, amount,  oldbalanceOrg, newbalanceOrig]
features = np.array([[4,    5000.60, 3000.60,       0.0]])
print("\n",model.predict(features))


    step      type    amount     nameOrig  oldbalanceOrg  newbalanceOrig  \
0     1   PAYMENT   9839.64  C1231006815       170136.0       160296.36   
1     1   PAYMENT   1864.28  C1666544295        21249.0        19384.72   
2     1  TRANSFER    181.00  C1305486145          181.0            0.00   
3     1  CASH_OUT    181.00   C840083671          181.0            0.00   
4     1   PAYMENT  11668.14  C2048537720        41554.0        29885.86   

      nameDest  oldbalanceDest  newbalanceDest  isFraud  isFlaggedFraud  
0  M1979787155             0.0             0.0        0               0  
1  M2044282225             0.0             0.0        0               0  
2   C553264065             0.0             0.0        1               0  
3    C38997010         21182.0             0.0        1               0  
4  M1230701703             0.0             0.0        0               0  

 step              0
type              0
amount            0
nameOrig          0
oldbalanceOrg     0
n