# **Detecção de Fraudes em Transações Financeiras**

Luiz Henrique Rigo Faccio | CCR de `Inteligência Artifical`

*Ciência da Computação - Universidade Federal Da Fronteira Sul*

Dataset disponível em: [https://www.kaggle.com/datasets/aryan208/financial-transactions-dataset-for-fraud-detection](https://www.kaggle.com/datasets/aryan208/financial-transactions-dataset-for-fraud-detection)

## **Importando bibliotecas e o dataset**

In [5]:
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
import datetime as dt

In [2]:
path = "archive/financial_fraud_detection_dataset.csv"
dataSet = pd.read_csv(path)

## **Vizualizando as informações**

Algumas informações como IDs, números de contas e números de dispositivos são inúteis nesta situação

In [3]:
info = pd.DataFrame({"Tipos":dataSet.dtypes, "Valores únicos": dataSet.nunique(), "Valores Nulos": dataSet.isnull().sum()})

useLess = set(["transaction_id", "sender_account", "receiver_account", "ip_address", "device_hash"])
numeric_columns = set(dataSet.select_dtypes(include=['int64', 'float64']).columns) - useLess
categorical_columns = set(dataSet.select_dtypes(include=['object']).columns) - useLess


In [None]:
print("Dimensão do dataset: ", dataSet.shape)
display(info)

print("Informações contínuas:")
display(dataSet[list(numeric_columns)].describe())

print("Informações categóricas:")
display(dataSet[list(categorical_columns)].describe())

dataSet.drop(columns=useLess, inplace=True).head()

Dimensão do dataset:  (5000000, 18)


Unnamed: 0,Tipos,Valores únicos,Valores Nulos
transaction_id,object,5000000,0
timestamp,object,4999998,0
sender_account,object,896513,0
receiver_account,object,896639,0
amount,float64,217069,0
transaction_type,object,4,0
merchant_category,object,8,0
location,object,8,0
device_used,object,4,0
is_fraud,bool,2,0


Informações contínuas:


Unnamed: 0,amount,time_since_last_transaction,spending_deviation_score,geo_anomaly_score,velocity_score
count,5000000.0,4103487.0,5000000.0,5000000.0,5000000.0
mean,358.9343,1.525799,-0.000388116,0.5000293,10.50132
std,469.9333,3576.569,1.000807,0.2886349,5.766842
min,0.01,-8777.814,-5.26,0.0,1.0
25%,26.57,-2562.376,-0.68,0.25,5.0
50%,138.67,0.8442747,0.0,0.5,11.0
75%,503.89,2568.339,0.67,0.75,16.0
max,3520.57,8757.758,5.02,1.0,20.0


Informações categóricas:


Unnamed: 0,device_used,payment_channel,location,timestamp,fraud_type,merchant_category,transaction_type
count,5000000,5000000,5000000,5000000,179553,5000000,5000000
unique,4,4,8,4999998,1,8,4
top,mobile,wire_transfer,Tokyo,2023-12-14T01:56:37.401698,card_not_present,retail,deposit
freq,1251131,1251219,625994,2,179553,626319,1250593


Unnamed: 0,timestamp,amount,transaction_type,merchant_category,location,device_used,is_fraud,fraud_type,time_since_last_transaction,spending_deviation_score,velocity_score,geo_anomaly_score,payment_channel
0,2023-08-22T09:22:43.516168,343.78,withdrawal,utilities,Tokyo,mobile,False,,,-0.21,3,0.22,card
1,2023-08-04T01:58:02.606711,419.65,withdrawal,online,Toronto,atm,False,,,-0.14,7,0.96,ACH
2,2023-05-12T11:39:33.742963,2773.86,deposit,other,London,pos,False,,,-1.78,20,0.89,card
3,2023-10-10T06:04:43.195112,1666.22,deposit,online,Sydney,pos,False,,,-0.6,6,0.37,wire_transfer
4,2023-09-24T08:09:02.700162,24.43,transfer,utilities,Toronto,mobile,False,,,0.79,13,0.27,ACH


In [6]:
def categorize_timestamp(timestamps : pd.Series):
    """Função para categorizar timestap em períodos: manhã, tarde, noite e madrugada

    Args:
        timestamps (pd.Series): Coluna de timestamp do DataSet
    
    Returns:
        periodos (pd.Series): Coluna de timestamps já categorizada
    """
    
    def get_period(hour):
        if 6 <= hour < 9:
            return "manha_1"
        if 9 <= hour < 12:
            return "manha_2"
        elif 12 <= hour < 15:
            return "tarde_1"
        elif 15 <= hour < 18:
            return "tarde_2"
        elif 18 <= hour < 21:
            return "noite_1"
        elif 18 <= hour < 21:
            return "noite_2"
        elif 21 <= hour < 23:
            return "tarde_2"
        elif 23 <= hour < 2:
            return "tarde_2"
        elif 2 <= hour < 5:
            return "madrugada_1"
        else:
            return "madrugada_2"
        
    periodos = timestamps.apply(lambda x: get_period(dt.datetime.fromisoformat(x).hour))
    return periodos
    

In [10]:
dataSet["timestamp"] = categorize_timestamp(dataSet["timestamp"])
dataSet.sample(10)


Unnamed: 0,transaction_id,timestamp,sender_account,receiver_account,amount,transaction_type,merchant_category,location,device_used,is_fraud,fraud_type,time_since_last_transaction,spending_deviation_score,velocity_score,geo_anomaly_score,payment_channel,ip_address,device_hash
2417709,T2517709,madrugada_2,ACC588577,ACC333973,25.86,transfer,online,New York,pos,False,,,-1.57,17,0.19,ACH,38.35.90.220,D2441819
3371124,T3471124,manha_2,ACC379050,ACC168519,178.33,withdrawal,online,Sydney,mobile,False,,-4885.458113,1.47,16,0.71,card,20.238.169.50,D4115547
3167963,T3267963,tarde_1,ACC426761,ACC476792,66.74,payment,entertainment,New York,atm,False,,-5162.230652,0.19,7,0.67,ACH,185.204.210.66,D9692722
3433670,T3533670,madrugada_2,ACC833016,ACC312631,32.82,transfer,retail,London,pos,False,,5318.45564,-1.07,9,0.42,wire_transfer,173.241.134.129,D6247628
1909891,T2009891,tarde_2,ACC914063,ACC216648,134.79,withdrawal,other,Singapore,atm,False,,-4770.085188,0.49,8,0.35,UPI,135.188.29.89,D7176698
3809247,T3909247,noite_1,ACC837426,ACC135019,317.11,withdrawal,restaurant,New York,pos,False,,1992.209775,-0.42,3,0.37,UPI,28.41.30.47,D2791677
3470689,T3570689,manha_2,ACC419762,ACC916493,0.01,withdrawal,entertainment,Singapore,atm,False,,642.252247,1.31,15,0.09,card,215.237.98.212,D8568108
1261136,T1361136,madrugada_2,ACC173009,ACC672484,4.66,payment,online,Tokyo,mobile,False,,399.966815,0.57,17,0.17,ACH,52.66.87.118,D5992889
2935235,T3035235,manha_2,ACC508522,ACC778800,11.56,transfer,other,Sydney,web,False,,4067.176274,-0.29,16,0.93,wire_transfer,54.26.77.160,D1739027
3562846,T3662846,tarde_1,ACC591577,ACC400766,455.15,withdrawal,retail,Tokyo,web,False,,6372.550892,-0.49,2,0.72,wire_transfer,218.4.198.23,D3109026
