# Modelo Preditiva para Detecção de Fraude

## Introdução
Este projeto tem como objetivo criar um modelo preditivo para detectar fraudes. Os dados necessários para o desenvolvimento do modelo estão disponíveis no site da do [Kaggle](https://www.kaggle.com/datasets/gopalmahadevan/fraud-detection-example).

## Objetivo
Este projeto busca criar um modelo preditivo que classifique a probabilidade de fraude, com base em dados históricos coletados. O objetivo é fornecer uma previsão precisa e útil para avaliação de riscos financeiros.

## Dicionário dos dados

<ul>
        <li><strong>step:</strong> unidade de tempo (1 hora)</li>
        <li><strong>type:</strong> CASH-IN, CASH-OUT, DEBIT, PAYMENT e TRANSFER.</li>
        <li><strong>amount:</strong> valor da transação na moeda local.</li>
        <li><strong>nameOrig:</strong> originador da transação</li>
        <li><strong>oldbalanceOrg:</strong> saldo inicial (antes da transação)</li>
        <li><strong>newbalanceOrig:</strong> novo saldo (após a transação)</li>
        <li><strong>nameDest:</strong> destinatário da transação</li>
        <li><strong>oldbalanceDest:</strong> saldo inicial antes da transação.</li>
        <li><strong>newbalanceDest:</strong> novo saldo após a transação.</li>
        <li><strong>isFraud:</strong> Agente fraudulento assume o controle das contas dos clientes e tenta esvaziá-las transferindo para outra conta e depois sacando.</li>
        <li><strong>isFlaggedFraud:</strong> Tentativa ilegal de transferir uma quantia massiva de dinheiro em uma única transação.</li>
</ul>



## Bibliotecas

In [1]:
import pandas as pd

## Carregando os dados

In [2]:
dados = pd.read_csv('Dados/fraud_dataset_example.csv')
dados.head()

Unnamed: 0,step,type,amount,nameOrig,oldbalanceOrg,newbalanceOrig,nameDest,oldbalanceDest,newbalanceDest,isFraud,isFlaggedFraud
0,1,PAYMENT,9839.64,C1231006815,170136.0,160296.36,M1979787155,0.0,0.0,0,0
1,1,PAYMENT,1864.28,C1666544295,21249.0,19384.72,M2044282225,0.0,0.0,0,0
2,1,TRANSFER,181.0,C1305486145,181.0,0.0,C553264065,0.0,0.0,1,0
3,1,CASH_OUT,181.0,C840083671,181.0,0.0,C38997010,21182.0,0.0,1,0
4,1,PAYMENT,11668.14,C2048537720,41554.0,29885.86,M1230701703,0.0,0.0,0,0


### Ordenando as colunas do dataframe

In [3]:
dados = dados[['isFraud',
       'isFlaggedFraud','step', 'type', 'amount', 'nameOrig', 'oldbalanceOrg', 'newbalanceOrig',
       'nameDest', 'oldbalanceDest', 'newbalanceDest']]

dados.head()

Unnamed: 0,isFraud,isFlaggedFraud,step,type,amount,nameOrig,oldbalanceOrg,newbalanceOrig,nameDest,oldbalanceDest,newbalanceDest
0,0,0,1,PAYMENT,9839.64,C1231006815,170136.0,160296.36,M1979787155,0.0,0.0
1,0,0,1,PAYMENT,1864.28,C1666544295,21249.0,19384.72,M2044282225,0.0,0.0
2,1,0,1,TRANSFER,181.0,C1305486145,181.0,0.0,C553264065,0.0,0.0
3,1,0,1,CASH_OUT,181.0,C840083671,181.0,0.0,C38997010,21182.0,0.0
4,0,0,1,PAYMENT,11668.14,C2048537720,41554.0,29885.86,M1230701703,0.0,0.0


### Renomeando as colunas

In [4]:
colunas = {
    'isFraud': 'fraude',
    'isFlaggedFraud':'super_fraude',
    'step':'tempo',
    'type':'tipo',
    'amount':'valor',
    'nameOrig':'cliente1',
    'oldbalanceOrg':'saldo_inicial_c1',
    'newbalanceOrig':'novo_saldo_c1',
    'nameDest':'cliente2',
    'oldbalanceDest':'saldo_inicial_c2',
    'newbalanceDest':'novo_saldo_c2',
}

dados.rename(columns=colunas, inplace=True)

dados.head()

Unnamed: 0,fraude,super_fraude,tempo,tipo,valor,cliente1,saldo_inicial_c1,novo_saldo_c1,cliente2,saldo_inicial_c2,novo_saldo_c2
0,0,0,1,PAYMENT,9839.64,C1231006815,170136.0,160296.36,M1979787155,0.0,0.0
1,0,0,1,PAYMENT,1864.28,C1666544295,21249.0,19384.72,M2044282225,0.0,0.0
2,1,0,1,TRANSFER,181.0,C1305486145,181.0,0.0,C553264065,0.0,0.0
3,1,0,1,CASH_OUT,181.0,C840083671,181.0,0.0,C38997010,21182.0,0.0
4,0,0,1,PAYMENT,11668.14,C2048537720,41554.0,29885.86,M1230701703,0.0,0.0


## Análise exploratória

In [5]:
print(f'Número de linhas: {dados.shape[0]}\nNúmero de colunas: {dados.shape[1]}')

Número de linhas: 101613
Número de colunas: 11


In [6]:
dados.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
fraude,101613.0,0.001141586,0.03376824,0.0,0.0,0.0,0.0,1.0
super_fraude,101613.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
tempo,101613.0,8.523457,1.820681,1.0,8.0,9.0,10.0,10.0
valor,101613.0,174090.1,345019.9,0.32,10016.59,53385.41,212498.4,10000000.0
saldo_inicial_c1,101613.0,907175.3,2829575.0,0.0,0.0,20190.47,194715.0,38939424.03
novo_saldo_c1,101613.0,923499.2,2867319.0,0.0,0.0,0.0,219217.76,38946233.02
saldo_inicial_c2,101613.0,881042.8,2399949.0,0.0,0.0,21058.0,591921.7,34008736.98
novo_saldo_c2,101613.0,1183998.0,2797761.0,0.0,0.0,51783.43,1063121.64,38946233.02


In [7]:
dados.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 101613 entries, 0 to 101612
Data columns (total 11 columns):
 #   Column            Non-Null Count   Dtype  
---  ------            --------------   -----  
 0   fraude            101613 non-null  int64  
 1   super_fraude      101613 non-null  int64  
 2   tempo             101613 non-null  int64  
 3   tipo              101613 non-null  object 
 4   valor             101613 non-null  float64
 5   cliente1          101613 non-null  object 
 6   saldo_inicial_c1  101613 non-null  float64
 7   novo_saldo_c1     101613 non-null  float64
 8   cliente2          101613 non-null  object 
 9   saldo_inicial_c2  101613 non-null  float64
 10  novo_saldo_c2     101613 non-null  float64
dtypes: float64(5), int64(3), object(3)
memory usage: 8.5+ MB


### Quantidade de fraudes

In [8]:
dados['fraude'].value_counts()

fraude
0    101497
1       116
Name: count, dtype: int64