# Identificando fraudes com *PyCaret*

É importante que as empresas de cartão de crédito sejam capazes de reconhecer transações fraudulentas com cartão de crédito para que os clientes não sejam cobrados por itens que não compraram. Essas fraudes podem ocorrer por conta da falta de atenção dos clientes das operadores de cartão quando os cartões ou informações do cartão foram fornecidos.

O Brasil é o segundo país com mais fraudes em cartão de crédito em toda América Latina, atrás do México. Principalmente, durante a pandemia do coronavírus, provavelmente por conta da alta demanda de serviços digitais e *e-commerce*, o número de casos de fraudes de cartão de crédito e débito dispararam e mais do que dobraram.

## Sobre o projeto

Nesse projeto eu irei aplicar modelos de detecção de anomalias, ou seja, valores que são discrepantes ou muitos diferentes da população que temos e depois analisarei quais desses resultados são ou não fraudes; visto que já tenho os resultados rotulados irei os comparar com as anomalias detectadas.

A atividade de detecção de anomalias é um processo não supervisionado, semelhante ao processo de *agrupamento*, mas no nosso caso irei usar também métricas de avaliação de modelos de classificação para ver o desempenho dos modelos.

## Instalando a biblioteca *PyCaret*

In [2]:
pip install pycaret



## Importando as bibliotecas


In [23]:
import pandas as pd
from sklearn.metrics import accuracy_score
from sklearn.metrics import classification_report
from pycaret.anomaly import *

import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import precision_recall_curve
from sklearn.metrics import confusion_matrix, accuracy_score, roc_auc_score, roc_curve

## Importando a base de dados


Os dados foram obtidos [aqui](https://www.kaggle.com/mlg-ulb/creditcardfraud) e a base de dados possui 31 *features*.

Conforme informações do *link*:

Os conjuntos de dados contêm transações feitas por cartões de crédito em setembro de 2013 por titulares de cartões europeus. Este conjunto de dados apresenta as transações que ocorreram em dois dias, onde temos 492 fraudes em 284.807 transações. O conjunto de dados é altamente desequilibrado, a classe positiva (fraudes) é responsável por 0,172% de todas as transações.

Ele contém apenas variáveis ​​de entrada numéricas que são o resultado de uma transformação *PCA*. Infelizmente, devido a questões de confidencialidade, não podemos fornecer os recursos originais e mais informações básicas sobre os dados. Os recursos V1, V2,… V28 são os componentes principais obtidos com o PCA, os únicos recursos que não foram transformados com o PCA são 'Tempo' e 'Quantidade'. O recurso 'Tempo' contém os segundos decorridos entre cada transação e a primeira transação no conjunto de dados. O recurso 'Amount' é o Amount da transação, esse recurso pode ser usado por exemplo, aprendizado dependente de custos. O recurso 'Classe' é a variável de resposta e assume o valor 1 em caso de fraude e 0 em caso contrário.

Importando a base de dados.

In [3]:
fraude = pd.read_csv('creditcard.csv')

Visualizando a base de dados.

In [4]:
fraude

Unnamed: 0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,V10,V11,V12,V13,V14,V15,V16,V17,V18,V19,V20,V21,V22,V23,V24,V25,V26,V27,V28,Amount,Class
0,0.0,-1.359807,-0.072781,2.536347,1.378155,-0.338321,0.462388,0.239599,0.098698,0.363787,0.090794,-0.551600,-0.617801,-0.991390,-0.311169,1.468177,-0.470401,0.207971,0.025791,0.403993,0.251412,-0.018307,0.277838,-0.110474,0.066928,0.128539,-0.189115,0.133558,-0.021053,149.62,0
1,0.0,1.191857,0.266151,0.166480,0.448154,0.060018,-0.082361,-0.078803,0.085102,-0.255425,-0.166974,1.612727,1.065235,0.489095,-0.143772,0.635558,0.463917,-0.114805,-0.183361,-0.145783,-0.069083,-0.225775,-0.638672,0.101288,-0.339846,0.167170,0.125895,-0.008983,0.014724,2.69,0
2,1.0,-1.358354,-1.340163,1.773209,0.379780,-0.503198,1.800499,0.791461,0.247676,-1.514654,0.207643,0.624501,0.066084,0.717293,-0.165946,2.345865,-2.890083,1.109969,-0.121359,-2.261857,0.524980,0.247998,0.771679,0.909412,-0.689281,-0.327642,-0.139097,-0.055353,-0.059752,378.66,0
3,1.0,-0.966272,-0.185226,1.792993,-0.863291,-0.010309,1.247203,0.237609,0.377436,-1.387024,-0.054952,-0.226487,0.178228,0.507757,-0.287924,-0.631418,-1.059647,-0.684093,1.965775,-1.232622,-0.208038,-0.108300,0.005274,-0.190321,-1.175575,0.647376,-0.221929,0.062723,0.061458,123.50,0
4,2.0,-1.158233,0.877737,1.548718,0.403034,-0.407193,0.095921,0.592941,-0.270533,0.817739,0.753074,-0.822843,0.538196,1.345852,-1.119670,0.175121,-0.451449,-0.237033,-0.038195,0.803487,0.408542,-0.009431,0.798278,-0.137458,0.141267,-0.206010,0.502292,0.219422,0.215153,69.99,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
284802,172786.0,-11.881118,10.071785,-9.834783,-2.066656,-5.364473,-2.606837,-4.918215,7.305334,1.914428,4.356170,-1.593105,2.711941,-0.689256,4.626942,-0.924459,1.107641,1.991691,0.510632,-0.682920,1.475829,0.213454,0.111864,1.014480,-0.509348,1.436807,0.250034,0.943651,0.823731,0.77,0
284803,172787.0,-0.732789,-0.055080,2.035030,-0.738589,0.868229,1.058415,0.024330,0.294869,0.584800,-0.975926,-0.150189,0.915802,1.214756,-0.675143,1.164931,-0.711757,-0.025693,-1.221179,-1.545556,0.059616,0.214205,0.924384,0.012463,-1.016226,-0.606624,-0.395255,0.068472,-0.053527,24.79,0
284804,172788.0,1.919565,-0.301254,-3.249640,-0.557828,2.630515,3.031260,-0.296827,0.708417,0.432454,-0.484782,0.411614,0.063119,-0.183699,-0.510602,1.329284,0.140716,0.313502,0.395652,-0.577252,0.001396,0.232045,0.578229,-0.037501,0.640134,0.265745,-0.087371,0.004455,-0.026561,67.88,0
284805,172788.0,-0.240440,0.530483,0.702510,0.689799,-0.377961,0.623708,-0.686180,0.679145,0.392087,-0.399126,-1.933849,-0.962886,-1.042082,0.449624,1.962563,-0.608577,0.509928,1.113981,2.897849,0.127434,0.265245,0.800049,-0.163298,0.123205,-0.569159,0.546668,0.108821,0.104533,10.00,0


A ideia desse projeto é aplicar o módulo de detecção de anomalias da biblioteca *PyCaret* com algumas *features* da base de dados e identificar as possíveis anomalias. Depois comparar a coluna *Class* que possui os resultados reais e os resultados encontrados pelos modelos da biblioteca, ou seja, quais anomalias identificadas são fraudes.

Primeiro irei excluir as colunas *Class* (que os modelos não irão 'ver') e a columa *Time*, que ao meu ver não trás nenhuma informação relevante. Os resultado de *Class* serão salvos em uma váriável.

In [5]:
fraude1=fraude.drop(['Class','Time'],axis=1)
classe = fraude['Class']

## *Setup*

Aqui eu irei realizar automaticamente todo pré-processamento dos dados.

In [6]:
exp_ano101 = setup(fraude1, normalize = False,session_id = 123)

Unnamed: 0,Description,Value
0,session_id,123
1,Original Data,"(284807, 29)"
2,Missing Values,False
3,Numeric Features,29
4,Categorical Features,0
5,Ordinal Features,False
6,High Cardinality Features,False
7,High Cardinality Method,
8,Transformed Data,"(284807, 29)"
9,CPU Jobs,-1


## Modelos

Os todo ao módulo de detecção de anomalias da biblioteca *PyCaret* nos permite usar doze (12) modelos, entretanto usarei os três modelos abaixo.

Depois irei avaliar os resultados de cada um desses modelos por meio de métricas que também podem ser usadas para modelos de classificação.

Irei usar os três modelos abaixo:

1) *iforest -	Isolation Forest*;

2) *histogram	- Histogram-based Outlier Detection*;

3) *pca	- Principal Component Analysis*;


## Criando os modelos

In [7]:
iforest = create_model('iforest')
print(iforest)

histogram = create_model('histogram')
print(histogram)

pca = create_model('pca')
print(pca)

PCA(contamination=0.05, copy=True, iterated_power='auto', n_components=None,
  n_selected_components=None, random_state=123, standardization=True,
  svd_solver='auto', tol=0.0, weighted=True, whiten=False)


## Resultados de cada modelo.

#### Resultados do modelo *IForest*

In [9]:
iforest_results = assign_model(iforest)
iforest_results.head()

Unnamed: 0,V1,V2,V3,V4,V5,V6,V7,V8,V9,V10,V11,V12,V13,V14,V15,V16,V17,V18,V19,V20,V21,V22,V23,V24,V25,V26,V27,V28,Amount,Anomaly,Anomaly_Score
0,-1.359807,-0.072781,2.536347,1.378155,-0.338321,0.462388,0.239599,0.098698,0.363787,0.090794,-0.5516,-0.617801,-0.99139,-0.311169,1.468177,-0.470401,0.207971,0.025791,0.403993,0.251412,-0.018307,0.277838,-0.110474,0.066928,0.128539,-0.189115,0.133558,-0.021053,149.62,0,-0.106937
1,1.191857,0.266151,0.16648,0.448154,0.060018,-0.082361,-0.078803,0.085102,-0.255425,-0.166974,1.612727,1.065235,0.489095,-0.143772,0.635558,0.463917,-0.114805,-0.183361,-0.145783,-0.069083,-0.225775,-0.638672,0.101288,-0.339846,0.16717,0.125895,-0.008983,0.014724,2.69,0,-0.12672
2,-1.358354,-1.340163,1.773209,0.37978,-0.503198,1.800499,0.791461,0.247676,-1.514654,0.207643,0.624501,0.066084,0.717293,-0.165946,2.345865,-2.890083,1.109969,-0.121359,-2.261857,0.52498,0.247998,0.771679,0.909412,-0.689281,-0.327642,-0.139097,-0.055353,-0.059752,378.66,0,-0.031492
3,-0.966272,-0.185226,1.792993,-0.863291,-0.010309,1.247203,0.237609,0.377436,-1.387024,-0.054952,-0.226487,0.178228,0.507757,-0.287924,-0.631418,-1.059647,-0.684093,1.965775,-1.232622,-0.208038,-0.1083,0.005274,-0.190321,-1.175575,0.647376,-0.221929,0.062723,0.061458,123.5,0,-0.0771
4,-1.158233,0.877737,1.548718,0.403034,-0.407193,0.095921,0.592941,-0.270533,0.817739,0.753074,-0.822843,0.538196,1.345852,-1.11967,0.175121,-0.451449,-0.237033,-0.038195,0.803487,0.408542,-0.009431,0.798278,-0.137458,0.141267,-0.20601,0.502292,0.219422,0.215153,69.99,0,-0.100236


#### Resultados do modelo *Histogram*

In [8]:
histogram_results = assign_model(histogram)
histogram_results.head()

Unnamed: 0,V1,V2,V3,V4,V5,V6,V7,V8,V9,V10,V11,V12,V13,V14,V15,V16,V17,V18,V19,V20,V21,V22,V23,V24,V25,V26,V27,V28,Amount,Anomaly,Anomaly_Score
0,-1.359807,-0.072781,2.536347,1.378155,-0.338321,0.462388,0.239599,0.098698,0.363787,0.090794,-0.5516,-0.617801,-0.99139,-0.311169,1.468177,-0.470401,0.207971,0.025791,0.403993,0.251412,-0.018307,0.277838,-0.110474,0.066928,0.128539,-0.189115,0.133558,-0.021053,149.62,0,51.472409
1,1.191857,0.266151,0.16648,0.448154,0.060018,-0.082361,-0.078803,0.085102,-0.255425,-0.166974,1.612727,1.065235,0.489095,-0.143772,0.635558,0.463917,-0.114805,-0.183361,-0.145783,-0.069083,-0.225775,-0.638672,0.101288,-0.339846,0.16717,0.125895,-0.008983,0.014724,2.69,0,48.44044
2,-1.358354,-1.340163,1.773209,0.37978,-0.503198,1.800499,0.791461,0.247676,-1.514654,0.207643,0.624501,0.066084,0.717293,-0.165946,2.345865,-2.890083,1.109969,-0.121359,-2.261857,0.52498,0.247998,0.771679,0.909412,-0.689281,-0.327642,-0.139097,-0.055353,-0.059752,378.66,0,56.794045
3,-0.966272,-0.185226,1.792993,-0.863291,-0.010309,1.247203,0.237609,0.377436,-1.387024,-0.054952,-0.226487,0.178228,0.507757,-0.287924,-0.631418,-1.059647,-0.684093,1.965775,-1.232622,-0.208038,-0.1083,0.005274,-0.190321,-1.175575,0.647376,-0.221929,0.062723,0.061458,123.5,0,54.242685
4,-1.158233,0.877737,1.548718,0.403034,-0.407193,0.095921,0.592941,-0.270533,0.817739,0.753074,-0.822843,0.538196,1.345852,-1.11967,0.175121,-0.451449,-0.237033,-0.038195,0.803487,0.408542,-0.009431,0.798278,-0.137458,0.141267,-0.20601,0.502292,0.219422,0.215153,69.99,0,51.049323


#### Resultados do modelo *PCA*

In [10]:
pca_results = assign_model(pca)
pca_results.head()

Unnamed: 0,V1,V2,V3,V4,V5,V6,V7,V8,V9,V10,V11,V12,V13,V14,V15,V16,V17,V18,V19,V20,V21,V22,V23,V24,V25,V26,V27,V28,Amount,Anomaly,Anomaly_Score
0,-1.359807,-0.072781,2.536347,1.378155,-0.338321,0.462388,0.239599,0.098698,0.363787,0.090794,-0.5516,-0.617801,-0.99139,-0.311169,1.468177,-0.470401,0.207971,0.025791,0.403993,0.251412,-0.018307,0.277838,-0.110474,0.066928,0.128539,-0.189115,0.133558,-0.021053,149.62,0,4980.345503
1,1.191857,0.266151,0.16648,0.448154,0.060018,-0.082361,-0.078803,0.085102,-0.255425,-0.166974,1.612727,1.065235,0.489095,-0.143772,0.635558,0.463917,-0.114805,-0.183361,-0.145783,-0.069083,-0.225775,-0.638672,0.101288,-0.339846,0.16717,0.125895,-0.008983,0.014724,2.69,0,4151.934543
2,-1.358354,-1.340163,1.773209,0.37978,-0.503198,1.800499,0.791461,0.247676,-1.514654,0.207643,0.624501,0.066084,0.717293,-0.165946,2.345865,-2.890083,1.109969,-0.121359,-2.261857,0.52498,0.247998,0.771679,0.909412,-0.689281,-0.327642,-0.139097,-0.055353,-0.059752,378.66,0,9708.254463
3,-0.966272,-0.185226,1.792993,-0.863291,-0.010309,1.247203,0.237609,0.377436,-1.387024,-0.054952,-0.226487,0.178228,0.507757,-0.287924,-0.631418,-1.059647,-0.684093,1.965775,-1.232622,-0.208038,-0.1083,0.005274,-0.190321,-1.175575,0.647376,-0.221929,0.062723,0.061458,123.5,0,6962.308917
4,-1.158233,0.877737,1.548718,0.403034,-0.407193,0.095921,0.592941,-0.270533,0.817739,0.753074,-0.822843,0.538196,1.345852,-1.11967,0.175121,-0.451449,-0.237033,-0.038195,0.803487,0.408542,-0.009431,0.798278,-0.137458,0.141267,-0.20601,0.502292,0.219422,0.215153,69.99,0,5383.99691


Observação: Eu poderia plotar (como já tentei) os gráficos de *t-SNE* e *Umap* para visualizar os resultados, mas o processo é muito demorado (por conta da grande quantidade de dados) e por muitas vezes o *Colab* não funcionava, por isso não irei realizar essa etapa, mas colocarei os comandos abaixo.

In [28]:
#plot_model(iforest)
#plot_model(histogram)
#plot_model(pca)

#plot_model(iforest, plot = 'umap')
#plot_model(histogram, plot = 'umap')
#plot_model(pca, plot = 'umap')

Irei inserir em cada resultado dos modelos uma nova coluna com os valores reais.

In [11]:
iforest_results['Label2'] = classe
histogram_results['Label2']= classe
pca_results['Label2'] = classe

Resultados reais e previstos do *Iforest*.

In [27]:
iforest_results[['Anomaly','Label2']]

Unnamed: 0,Anomaly,Label2
0,0,0
1,0,0
2,0,0
3,0,0
4,0,0
...,...,...
284802,1,0
284803,0,0
284804,0,0
284805,0,0


Resultados reais e previstos do *Histogram*.

In [12]:
histogram_results[['Anomaly','Label2']]

Unnamed: 0,Anomaly,Label2
0,0,0
1,0,0
2,0,0
3,0,0
4,0,0
...,...,...
284802,1,0
284803,0,0
284804,0,0
284805,0,0


Resultados reais e previstos do *PCA*.

In [13]:
pca_results[['Anomaly','Label2']]

Unnamed: 0,Anomaly,Label2
0,0,0
1,0,0
2,0,0
3,0,0
4,0,0
...,...,...
284802,1,0
284803,0,0
284804,0,0
284805,0,0


## *Classification Report* de cada modelo

Em todos os relatórios abaixo vemos que a precisão da classe 1 (anomalias/fraudes) é baixíssima, mas o *recall* é bastante alto.

Sabe-se que o *recall* é o número de resultados classificados corretamente como fraudes pelo total de fraudes; e nesse projeto eu irei me ater a essa métrica, pois, como dito anteriormente, eu quero saber quais as anomalias que são fraudes.

#### *Classification report* do modelo *Iforest*

O *recall* foi de 86%.

In [28]:
valor_classe=[0,1]
print(classification_report(iforest_results['Label2'],iforest_results['Anomaly'],valor_classe))

              precision    recall  f1-score   support

           0       1.00      0.95      0.97    284315
           1       0.03      0.86      0.06       492

    accuracy                           0.95    284807
   macro avg       0.51      0.90      0.52    284807
weighted avg       1.00      0.95      0.97    284807



#### *Classification report* do modelo *Histogram*

Vemos que os resultados forma iguais ao do modelo anterior.

In [14]:
valor_classe=[0,1]
print(classification_report(histogram_results['Label2'],histogram_results['Anomaly'],valor_classe))

              precision    recall  f1-score   support

           0       1.00      0.95      0.97    284315
           1       0.03      0.86      0.06       492

    accuracy                           0.95    284807
   macro avg       0.51      0.91      0.52    284807
weighted avg       1.00      0.95      0.97    284807



#### *Classification report* do modelo *PCA*

Aqui vemos que o resultado do *recall* foi ligeiramente superior ao modelos anteriores, 88%.

In [15]:
valor_classe=[0,1]
print(classification_report(pca_results['Label2'],pca_results['Anomaly'],valor_classe))

              precision    recall  f1-score   support

           0       1.00      0.95      0.98    284315
           1       0.03      0.88      0.06       492

    accuracy                           0.95    284807
   macro avg       0.52      0.91      0.52    284807
weighted avg       1.00      0.95      0.97    284807



## Matrizes de Confusão

Uma forma de visualizar melhores o que encontramos é com a matriz de confusão.

1) A matriz de confusão do modelo *Iforest* mostra que forma identificadas 422 anomalias, que corretamente, são fraudes; e isso é um bom resultado.

2) A matriz de confusão do modelo *Histogram*, mostra um resultado ligeiramente melhor, com 423 anomalias que são, corretamente, fraudes.

3) Por fim, o modelo *PCA* deu um resultado melhor, com 432 anomalias que são também fraudes.

In [16]:
print('Matriz de confusão da Iforest')
print(confusion_matrix(iforest_results['Label2'],iforest_results['Anomaly']))
print(''*127)


print('Matriz de confusão da Histogram')
print(confusion_matrix(histogram_results['Label2'],histogram_results['Anomaly']))
print(''*127)


print('Matriz de confusão da PCA')
print(confusion_matrix(pca_results['Label2'],pca_results['Anomaly']))
print(''*127)

Matriz de confusão da Iforest
[[270496  13819]
 [    70    422]]

Matriz de confusão da Histogram
[[270500  13815]
 [    69    423]]

Matriz de confusão da PCA
[[270506  13809]
 [    60    432]]



Então podemos perceber, que mesmo com classes desbalanceadas os modelos retornaram resultados satisfatórios e mesmo não sendo modelos de classificação. Por fim, podemos usar outras métricas para avaliação dos resultados.

## Métricas de Avaliação

Abaixo para cada modelo temos a acurácia e o valor do AUC.

Veja que a acurácia de todos os modelos supera os 95% e o AUC está acima de 90%, o que é um resultado próximo.

In [20]:
print('Métricas de avaliação do modelo Iforest')
print(''*127)
print('Acurácia do modelo Iforest :',accuracy_score(iforest_results['Label2'],iforest_results['Anomaly']))
print('AUC do modelo Iforest :',roc_auc_score(iforest_results['Label2'],iforest_results['Anomaly']))

Métricas de avaliação do modelo Iforest

Acurácia do modelo Iforest : 0.9512336424315414
AUC do modelo Iforest : 0.9045595182487532


In [21]:
print('Métricas de avaliação do modelo Histogram')
print(''*127)
print('Acurácia do modelo Iforest :',accuracy_score(histogram_results['Label2'],histogram_results['Anomaly']))
print('AUC do modelo Iforest :',roc_auc_score(histogram_results['Label2'],histogram_results['Anomaly']))

Métricas de avaliação do modelo Iforest

Acurácia do modelo Iforest : 0.9512511981798201
AUC do modelo Iforest : 0.9055828128625799


In [22]:
print('Métricas de avaliação do modelo PCA')
print(''*127)
print('Acurácia do modelo Iforest :',accuracy_score(pca_results['Label2'],pca_results['Anomaly']))
print('AUC do modelo Iforest :',roc_auc_score(pca_results['Label2'],pca_results['Anomaly']))

Métricas de avaliação do modelo PCA

Acurácia do modelo Iforest : 0.951303865424656
AUC do modelo Iforest : 0.9147397060028317


## Salvando os modelos

Salvando o modelo *Iforest*.

In [29]:
save_model(iforest, 'Modelo Iforest Final 07Dez2020')

Transformation Pipeline and Model Succesfully Saved


(Pipeline(memory=None,
          steps=[('dtypes',
                  DataTypes_Auto_infer(categorical_features=[],
                                       display_types=True, features_todrop=[],
                                       id_columns=[],
                                       ml_usecase='classification',
                                       numerical_features=[],
                                       target='UNSUPERVISED_DUMMY_TARGET',
                                       time_features=[])),
                 ('imputer',
                  Simple_Imputer(categorical_strategy='most frequent',
                                 fill_value_categorical=None,
                                 fill_value_numerical=...
                 ('fix_perfect', 'passthrough'),
                 ('clean_names', Clean_Colum_Names()),
                 ('feature_select', 'passthrough'), ('fix_multi', 'passthrough'),
                 ('dfs', 'passthrough'), ('pca', 'passthrough'),
                 

Salvando o modelo *Histogram*

In [30]:
save_model(histogram,'Modelo Histogram Final 07Dez2020')

Transformation Pipeline and Model Succesfully Saved


(Pipeline(memory=None,
          steps=[('dtypes',
                  DataTypes_Auto_infer(categorical_features=[],
                                       display_types=True, features_todrop=[],
                                       id_columns=[],
                                       ml_usecase='classification',
                                       numerical_features=[],
                                       target='UNSUPERVISED_DUMMY_TARGET',
                                       time_features=[])),
                 ('imputer',
                  Simple_Imputer(categorical_strategy='most frequent',
                                 fill_value_categorical=None,
                                 fill_value_numerical=...
                 ('binn', 'passthrough'), ('rem_outliers', 'passthrough'),
                 ('cluster_all', 'passthrough'),
                 ('dummy', Dummify(target='UNSUPERVISED_DUMMY_TARGET')),
                 ('fix_perfect', 'passthrough'),
                 ('cle

Salvando o modelo *PCA*

In [31]:
save_model(pca, 'Modelo PCA final 07Dez2020')

Transformation Pipeline and Model Succesfully Saved


(Pipeline(memory=None,
          steps=[('dtypes',
                  DataTypes_Auto_infer(categorical_features=[],
                                       display_types=True, features_todrop=[],
                                       id_columns=[],
                                       ml_usecase='classification',
                                       numerical_features=[],
                                       target='UNSUPERVISED_DUMMY_TARGET',
                                       time_features=[])),
                 ('imputer',
                  Simple_Imputer(categorical_strategy='most frequent',
                                 fill_value_categorical=None,
                                 fill_value_numerical=...
                 ('clean_names', Clean_Colum_Names()),
                 ('feature_select', 'passthrough'), ('fix_multi', 'passthrough'),
                 ('dfs', 'passthrough'), ('pca', 'passthrough'),
                 ['trained_model',
                  PCA(contamina

## Conclusão

O módulo de detecção de anomalias se mostrou eficiente em identificar anomalias em dados de fraudes e ao compararmos os resultados dos modelos com os resultados reais obtivemos uma conclusão satisfatória.