##Introdução
O objetivo central deste projeto é desenvolver e otimizar um modelo de rede neural para detectar fraudes em transações de cartão de crédito. A detecção de fraudes é um desafio crítico para empresas financeiras, pois envolve identificar padrões sutis em grandes volumes de dados desbalanceados, onde as transações fraudulentas representam apenas uma pequena fração do total.

Essa atividade utiliza técnicas de aprendizado de máquina, como balanceamento de classes com SMOTE (Synthetic Minority Over-sampling Technique) e otimização de hiperparâmetros com Keras Tuner, para construir um modelo capaz de diferenciar eficazmente entre transações legítimas e fraudulentas. O processo abrange desde o pré-processamento e preparação dos dados até a avaliação final do modelo, garantindo uma abordagem robusta e orientada a resultados.

## Etapa 1: Importação de Bibliotecas Necessárias


In [21]:
pip install keras-tuner




In [22]:
# Importação das bibliotecas necessárias
import numpy as np
import pandas as pd
import gdown
from sklearn.preprocessing import MinMaxScaler, StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, accuracy_score, precision_score, recall_score, f1_score, roc_auc_score, confusion_matrix
from sklearn.utils import class_weight
from imblearn.over_sampling import SMOTE
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import Adam
from keras_tuner import RandomSearch
import plotly.figure_factory as ff


## Etapa 2: Carregamento de Dados

Baixamos arquivos do Google Drive usando gdown e armazenamos os dados em um dicionário de DataFrames.

In [23]:
# Base do nome do arquivo destino
arquivo_destino_base = "dataset_{}.csv"

# IDs dos arquivos no Google Drive
ids = {
    "creditcard": "1HfsfVLy6v-RlDId5xpyL2SdQrsIViUBy",
}

# Dicionário para armazenar DataFrames
dataframes = {}

# Loop para baixar e ler cada arquivo
for key, file_id in ids.items():
    url = f"https://drive.google.com/uc?id={file_id}"
    arquivo_destino = arquivo_destino_base.format(key)

    # Baixa o arquivo usando gdown
    gdown.download(url, arquivo_destino, quiet=False)

    # Tenta ler o arquivo com pandas
    try:
        df = pd.read_csv(arquivo_destino)
        dataframes[key] = df
    except pd.errors.ParserError:
        print(f"Erro ao ler o arquivo {arquivo_destino}. Verifique o separador.")


Downloading...
From (original): https://drive.google.com/uc?id=1HfsfVLy6v-RlDId5xpyL2SdQrsIViUBy
From (redirected): https://drive.google.com/uc?id=1HfsfVLy6v-RlDId5xpyL2SdQrsIViUBy&confirm=t&uuid=e2043804-c109-4b1c-b715-c4dbda5d229d
To: /content/dataset_creditcard.csv
100%|██████████| 151M/151M [00:02<00:00, 64.2MB/s]


In [26]:
dados = pd.read_csv("/content/dataset_creditcard.csv", delimiter=",")
display(dados)

Unnamed: 0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,...,V21,V22,V23,V24,V25,V26,V27,V28,Amount,Class
0,0.0,-1.359807,-0.072781,2.536347,1.378155,-0.338321,0.462388,0.239599,0.098698,0.363787,...,-0.018307,0.277838,-0.110474,0.066928,0.128539,-0.189115,0.133558,-0.021053,149.62,0
1,0.0,1.191857,0.266151,0.166480,0.448154,0.060018,-0.082361,-0.078803,0.085102,-0.255425,...,-0.225775,-0.638672,0.101288,-0.339846,0.167170,0.125895,-0.008983,0.014724,2.69,0
2,1.0,-1.358354,-1.340163,1.773209,0.379780,-0.503198,1.800499,0.791461,0.247676,-1.514654,...,0.247998,0.771679,0.909412,-0.689281,-0.327642,-0.139097,-0.055353,-0.059752,378.66,0
3,1.0,-0.966272,-0.185226,1.792993,-0.863291,-0.010309,1.247203,0.237609,0.377436,-1.387024,...,-0.108300,0.005274,-0.190321,-1.175575,0.647376,-0.221929,0.062723,0.061458,123.50,0
4,2.0,-1.158233,0.877737,1.548718,0.403034,-0.407193,0.095921,0.592941,-0.270533,0.817739,...,-0.009431,0.798278,-0.137458,0.141267,-0.206010,0.502292,0.219422,0.215153,69.99,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
284802,172786.0,-11.881118,10.071785,-9.834783,-2.066656,-5.364473,-2.606837,-4.918215,7.305334,1.914428,...,0.213454,0.111864,1.014480,-0.509348,1.436807,0.250034,0.943651,0.823731,0.77,0
284803,172787.0,-0.732789,-0.055080,2.035030,-0.738589,0.868229,1.058415,0.024330,0.294869,0.584800,...,0.214205,0.924384,0.012463,-1.016226,-0.606624,-0.395255,0.068472,-0.053527,24.79,0
284804,172788.0,1.919565,-0.301254,-3.249640,-0.557828,2.630515,3.031260,-0.296827,0.708417,0.432454,...,0.232045,0.578229,-0.037501,0.640134,0.265745,-0.087371,0.004455,-0.026561,67.88,0
284805,172788.0,-0.240440,0.530483,0.702510,0.689799,-0.377961,0.623708,-0.686180,0.679145,0.392087,...,0.265245,0.800049,-0.163298,0.123205,-0.569159,0.546668,0.108821,0.104533,10.00,0


##Etapa 3: Pré-processamento dos Dados


Nesta etapa, carregamos o dataset específico e realizamos uma análise inicial dos dados, como normalização e tratamento de valores ausentes.

In [29]:
# Informações iniciais do dataset
dados.info()

# Análise estatística descritiva
print("\nTipos de dados das features:")
print(dados.dtypes)

print("\nDescrição estatística das features numéricas:")
print(dados.describe())

print("\nValores ausentes por coluna:")
print(dados.isnull().sum())


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 284807 entries, 0 to 284806
Data columns (total 31 columns):
 #   Column  Non-Null Count   Dtype  
---  ------  --------------   -----  
 0   Time    284807 non-null  float64
 1   V1      284807 non-null  float64
 2   V2      284807 non-null  float64
 3   V3      284807 non-null  float64
 4   V4      284807 non-null  float64
 5   V5      284807 non-null  float64
 6   V6      284807 non-null  float64
 7   V7      284807 non-null  float64
 8   V8      284807 non-null  float64
 9   V9      284807 non-null  float64
 10  V10     284807 non-null  float64
 11  V11     284807 non-null  float64
 12  V12     284807 non-null  float64
 13  V13     284807 non-null  float64
 14  V14     284807 non-null  float64
 15  V15     284807 non-null  float64
 16  V16     284807 non-null  float64
 17  V17     284807 non-null  float64
 18  V18     284807 non-null  float64
 19  V19     284807 non-null  float64
 20  V20     284807 non-null  float64
 21  V21     28

### Normalização das Features Numéricas


Normalizamos as colunas Amount e Time utilizando o MinMaxScaler.



In [30]:
# Normalização das colunas 'Amount' e 'Time'
scaler = MinMaxScaler()

# Aplicando o scaler somente às colunas específicas
dados['Amount'] = scaler.fit_transform(dados[['Amount']])
dados['Time'] = scaler.fit_transform(dados[['Time']])

# Exibindo o dataset após normalização
display(dados)


Unnamed: 0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,...,V21,V22,V23,V24,V25,V26,V27,V28,Amount,Class
0,0.000000,-1.359807,-0.072781,2.536347,1.378155,-0.338321,0.462388,0.239599,0.098698,0.363787,...,-0.018307,0.277838,-0.110474,0.066928,0.128539,-0.189115,0.133558,-0.021053,0.005824,0
1,0.000000,1.191857,0.266151,0.166480,0.448154,0.060018,-0.082361,-0.078803,0.085102,-0.255425,...,-0.225775,-0.638672,0.101288,-0.339846,0.167170,0.125895,-0.008983,0.014724,0.000105,0
2,0.000006,-1.358354,-1.340163,1.773209,0.379780,-0.503198,1.800499,0.791461,0.247676,-1.514654,...,0.247998,0.771679,0.909412,-0.689281,-0.327642,-0.139097,-0.055353,-0.059752,0.014739,0
3,0.000006,-0.966272,-0.185226,1.792993,-0.863291,-0.010309,1.247203,0.237609,0.377436,-1.387024,...,-0.108300,0.005274,-0.190321,-1.175575,0.647376,-0.221929,0.062723,0.061458,0.004807,0
4,0.000012,-1.158233,0.877737,1.548718,0.403034,-0.407193,0.095921,0.592941,-0.270533,0.817739,...,-0.009431,0.798278,-0.137458,0.141267,-0.206010,0.502292,0.219422,0.215153,0.002724,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
284802,0.999965,-11.881118,10.071785,-9.834783,-2.066656,-5.364473,-2.606837,-4.918215,7.305334,1.914428,...,0.213454,0.111864,1.014480,-0.509348,1.436807,0.250034,0.943651,0.823731,0.000030,0
284803,0.999971,-0.732789,-0.055080,2.035030,-0.738589,0.868229,1.058415,0.024330,0.294869,0.584800,...,0.214205,0.924384,0.012463,-1.016226,-0.606624,-0.395255,0.068472,-0.053527,0.000965,0
284804,0.999977,1.919565,-0.301254,-3.249640,-0.557828,2.630515,3.031260,-0.296827,0.708417,0.432454,...,0.232045,0.578229,-0.037501,0.640134,0.265745,-0.087371,0.004455,-0.026561,0.002642,0
284805,0.999977,-0.240440,0.530483,0.702510,0.689799,-0.377961,0.623708,-0.686180,0.679145,0.392087,...,0.265245,0.800049,-0.163298,0.123205,-0.569159,0.546668,0.108821,0.104533,0.000389,0


## Etapa 4: Separação das Features e Alvo
Separamos as features (X) e o target (y) e dividimos o conjunto de dados em treinamento e teste.

In [32]:
# Separando as features (X) e o target (y)
X = dados.drop(columns=['Class'])
y = dados['Class']

# Dividindo o conjunto de dados em treino e teste
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

## Etapa 5: Treinamento do Modelo Inicial
Criamos e treinamos um modelo de rede neural básico.

In [33]:
# Criação do modelo de rede neural
model = Sequential([
    Dense(16, input_shape=(X_train.shape[1],), activation='relu'),
    Dense(8, activation='relu'),
    Dense(1, activation='sigmoid')
])

# Compilação do modelo
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Treinamento do modelo
model.fit(X_train, y_train, epochs=10, batch_size=32, validation_split=0.2)



Do not pass an `input_shape`/`input_dim` argument to a layer. When using Sequential models, prefer using an `Input(shape)` object as the first layer in the model instead.



Epoch 1/10
[1m4985/4985[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m17s[0m 3ms/step - accuracy: 0.9711 - loss: 0.0871 - val_accuracy: 0.9982 - val_loss: 0.0039
Epoch 2/10
[1m4985/4985[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m31s[0m 5ms/step - accuracy: 0.9989 - loss: 0.0038 - val_accuracy: 0.9994 - val_loss: 0.0029
Epoch 3/10
[1m4985/4985[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m38s[0m 4ms/step - accuracy: 0.9995 - loss: 0.0025 - val_accuracy: 0.9994 - val_loss: 0.0029
Epoch 4/10
[1m4985/4985[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m45s[0m 5ms/step - accuracy: 0.9994 - loss: 0.0027 - val_accuracy: 0.9994 - val_loss: 0.0029
Epoch 5/10
[1m4985/4985[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m26s[0m 2ms/step - accuracy: 0.9994 - loss: 0.0030 - val_accuracy: 0.9993 - val_loss: 0.0030
Epoch 6/10
[1m4985/4985[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m40s[0m 6ms/step - accuracy: 0.9994 - loss: 0.0026 - val_accuracy: 0.9993 - val_loss: 0.0030
Epoch 7/10

<keras.src.callbacks.history.History at 0x78c7195d4040>

## Etapa 6: Avaliação do Modelo Inicial

Avaliamos o desempenho do modelo nos dados de teste e calculamos várias métricas de desempenho.

In [38]:
# Previsões do modelo
y_pred_prob = model.predict(X_test)
y_pred = (y_pred_prob > 0.5).astype(int)

# Cálculo das métricas
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)
roc_auc = roc_auc_score(y_test, y_pred_prob)

# Exibindo os resultados
print(f'Acurácia: {accuracy:.4f}')
print(f'Precisão: {precision:.4f}')
print(f'Recall: {recall:.4f}')
print(f'F1-Score: {f1:.4f}')
print(f'AUC-ROC: {roc_auc:.4f}')

# Calculando a matriz de confusão
cm = confusion_matrix(y_test, y_pred)

# Definindo os rótulos
labels = ['Classe 0', 'Classe 1']

# Criando a matriz de confusão com cores ajustadas e texto mais destacado
fig = ff.create_annotated_heatmap(
    z=cm,
    x=labels,
    y=labels,
    colorscale='Viridis',  # Alterando a paleta de cores para 'Viridis'
    showscale=True,
    annotation_text=[[f'{value}' for value in row] for row in cm],  # Exibindo os números diretamente na matriz
    textfont=dict(size=14, color="white")  # Aumentando o tamanho da fonte e mudando a cor para branco
)

# Atualizando o layout da figura
fig.update_layout(
    title={
        'text': 'Matriz de Confusão - Modelo Otimizado',
        'y':0.9,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'
    },
    title_font=dict(size=20, color='darkblue'),  # Mudando o tamanho e cor do título
    xaxis=dict(
        title='Valores Preditos',
        title_font=dict(size=16, color='darkgreen'),
        tickfont=dict(size=14, color='black')
    ),
    yaxis=dict(
        title='Valores Reais',
        title_font=dict(size=16, color='darkgreen'),
        tickfont=dict(size=14, color='black')
    )
)

# Adicionando um fundo e ajustando o layout para melhor estética
fig.update_layout(
    plot_bgcolor='rgba(0,0,0,0)',
    paper_bgcolor='rgba(255,255,255,1)'
)

# Exibindo a matriz de confusão
fig.show()



[1m2671/2671[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 3ms/step
Acurácia: 0.9994
Precisão: 0.8088
Recall: 0.8088
F1-Score: 0.8088
AUC-ROC: 0.9935


## Etapa 7: Aplicação do SMOTE para Balanceamento de Classes
Aplicamos o SMOTE para balancear o conjunto de treinamento e padronizamos as features.

In [41]:
# Aplicação do SMOTE para balanceamento das classes
# O SMOTE gera novas amostras sintéticas da classe minoritária para equilibrar a distribuição de classes
smote = SMOTE(random_state=42)
X_train_balanced, y_train_balanced = smote.fit_resample(X_train, y_train)

# Padronização das features usando StandardScaler
# A padronização é importante para redes neurais, pois garante que todas as features tenham a mesma escala
scaler = StandardScaler()

# Ajustando o scaler nos dados de treinamento balanceados
X_train_balanced = scaler.fit_transform(X_train_balanced)

# Aplicando o scaler aos dados de teste
X_test = scaler.transform(X_test)

# Exibição do número de exemplos em cada classe após o balanceamento
print("Distribuição de classes após SMOTE:")
print(pd.Series(y_train_balanced).value_counts())


Distribuição de classes após SMOTE:
Class
0    199008
1    199008
Name: count, dtype: int64


## Etapa 8: Treinamento do Modelo com Dados Balanceados
Recriamos o modelo e o treinamos novamente com o conjunto de dados balanceado.

In [39]:
# Criação do modelo após balanceamento
model = Sequential([
    Dense(16, input_shape=(X_train_balanced.shape[1],), activation='relu'),
    Dense(8, activation='relu'),
    Dense(1, activation='sigmoid')
])

# Compilação do modelo
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Treinamento do modelo com dados balanceados
model.fit(X_train_balanced, y_train_balanced, epochs=10, batch_size=32, validation_split=0.2)



Do not pass an `input_shape`/`input_dim` argument to a layer. When using Sequential models, prefer using an `Input(shape)` object as the first layer in the model instead.



Epoch 1/10
[1m9951/9951[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m35s[0m 3ms/step - accuracy: 0.9433 - loss: 0.1434 - val_accuracy: 0.9776 - val_loss: 0.0647
Epoch 2/10
[1m9951/9951[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m46s[0m 4ms/step - accuracy: 0.9870 - loss: 0.0394 - val_accuracy: 0.9953 - val_loss: 0.0174
Epoch 3/10
[1m9951/9951[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m30s[0m 3ms/step - accuracy: 0.9915 - loss: 0.0261 - val_accuracy: 0.9927 - val_loss: 0.0252
Epoch 4/10
[1m9951/9951[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m35s[0m 2ms/step - accuracy: 0.9942 - loss: 0.0190 - val_accuracy: 0.9982 - val_loss: 0.0102
Epoch 5/10
[1m9951/9951[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m24s[0m 2ms/step - accuracy: 0.9952 - loss: 0.0164 - val_accuracy: 0.9956 - val_loss: 0.0126
Epoch 6/10
[1m9951/9951[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m22s[0m 2ms/step - accuracy: 0.9957 - loss: 0.0147 - val_accuracy: 0.9958 - val_loss: 0.0130
Epoch 7/10

<keras.src.callbacks.history.History at 0x78c78d3faaa0>

## Etapa 9: Avaliação do Modelo Balanceado
Avaliamos o desempenho do modelo balanceado.

In [46]:
# Previsões do modelo balanceado
y_pred_prob = model.predict(X_test)
y_pred = (y_pred_prob > 0.5).astype(int)

# Cálculo das métricas
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)
roc_auc = roc_auc_score(y_test, y_pred_prob)

# Exibindo os resultados
print(f'Acurácia: {accuracy:.4f}')
print(f'Precisão: {precision:.4f}')
print(f'Recall: {recall:.4f}')
print(f'F1-Score: {f1:.4f}')
print(f'AUC-ROC: {roc_auc:.4f}')


[1m2671/2671[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 2ms/step
Acurácia: 0.9970
Precisão: 0.3305
Recall: 0.8603
F1-Score: 0.4776
AUC-ROC: 0.9698


## Etapa 10: Otimização de Hiperparâmetros com Keras Tuner
Utilizamos o Keras Tuner para buscar os melhores hiperparâmetros para o modelo.

In [None]:
# Função para criar o modelo
def build_model(hp):
    model = Sequential()
    model.add(Dense(units=hp.Int('units1', min_value=8, max_value=64, step=8), activation='relu', input_shape=(X_train_balanced.shape[1],)))
    model.add(Dense(units=hp.Int('units2', min_value=8, max_value=64, step=8), activation='relu'))
    model.add(Dense(1, activation='sigmoid'))

    model.compile(optimizer=hp.Choice('optimizer', values=['adam', 'rmsprop']), loss='binary_crossentropy', metrics=['accuracy'])
    return model

# Configuração do Keras Tuner
tuner = RandomSearch(build_model, objective='val_accuracy', max_trials=10, executions_per_trial=1, directory='my_dir', project_name='intro_to_kt')

# Busca pelos melhores hiperparâmetros
tuner.search(X_train_balanced, y_train_balanced, epochs=10, validation_split=0.2)

# Melhores hiperparâmetros
best_hps = tuner.get_best_hyperparameters(num_trials=1)[0]
print(f'Best Hyperparameters: {best_hps.values}')


Reloading Tuner from my_dir/intro_to_kt/tuner0.json

Search: Running Trial #7

Value             |Best Value So Far |Hyperparameter
24                |24                |units1
16                |48                |units2
adam              |adam              |optimizer




Do not pass an `input_shape`/`input_dim` argument to a layer. When using Sequential models, prefer using an `Input(shape)` object as the first layer in the model instead.



Epoch 1/10
[1m1971/9951[0m [32m━━━[0m[37m━━━━━━━━━━━━━━━━━[0m [1m13s[0m 2ms/step - accuracy: 0.9203 - loss: 0.2120

## Etapa 11: Treinamento Final com Hiperparâmetros Otimizados
Treinamos o modelo novamente com os melhores hiperparâmetros encontrados pelo Keras Tuner.

In [None]:
# Construindo o modelo com os melhores hiperparâmetros
model = tuner.hypermodel.build(best_hps)

# Treinamento do modelo com os melhores hiperparâmetros
model.fit(X_train_balanced, y_train_balanced, epochs=10, validation_split=0.2)


## Etapa 12: Avaliação do Modelo com Hiperparâmetros Otimizados
Avaliamos o modelo com os dados de teste usando os hiperparâmetros otimizados e calculamos as métricas de desempenho.

In [None]:
# Previsões do modelo otimizado
y_pred_prob = model.predict(X_test)
y_pred = (y_pred_prob > 0.5).astype(int)

# Cálculo das métricas
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)
roc_auc = roc_auc_score(y_test, y_pred_prob)

# Exibindo os resultados
print(f'Acurácia: {accuracy:.4f}')
print(f'Precisão: {precision:.4f}')
print(f'Recall: {recall:.4f}')
print(f'F1-Score: {f1:.4f}')
print(f'AUC-ROC: {roc_auc:.4f}')


## Etapa 13: Visualização da Matriz de Confusão do Modelo Otimizado
Criamos e exibimos a matriz de confusão para o modelo treinado com hiperparâmetros otimizados para visualizar o desempenho em termos de classificações corretas e incorretas.



In [None]:
# Calculando a matriz de confusão
cm = confusion_matrix(y_test, y_pred)

# Definindo os rótulos
labels = ['Classe 0', 'Classe 1']

# Criando a figura da matriz de confusão
fig = ff.create_annotated_heatmap(z=cm, x=labels, y=labels, colorscale='Blues', showscale=True)

# Atualizando o layout da figura
fig.update_layout(
    title='Matriz de Confusão - Modelo Otimizado',
    xaxis=dict(title='Valores Preditos'),
    yaxis=dict(title='Valores Reais')
)

# Exibindo a matriz de confusão
fig.show()


## Conclusão
O desenvolvimento e a otimização do modelo de rede neural para a detecção de fraudes resultaram em um sistema capaz de identificar transações fraudulentas com uma precisão razoável, equilibrando métricas importantes como precisão, recall e F1-Score.

No entanto, vale destacar que, dependendo do contexto e do objetivo específico, algumas métricas podem ser mais relevantes que outras. Por exemplo, em situações onde o custo de um falso negativo (não identificar uma fraude) é muito alto, o recall pode ser mais importante do que a precisão.

Além disso, é fundamental considerar a revisão do tratamento dos dados e do próprio processo de coleta e limpeza de dados, pois dados de baixa qualidade podem prejudicar significativamente o desempenho do modelo.

Refinar e afunilar o objetivo dos dados, concentrando-se nas características mais relevantes para a detecção de fraudes, pode tornar o modelo mais assertivo e eficaz. Portanto, continuar aprimorando a abordagem de engenharia de features e explorar outras técnicas de modelagem podem proporcionar resultados ainda melhores na prevenção de fraudes em ambientes financeiros.







