##<font color='REDBLUE'>**PyCaret**</font> 

# 1° Problema de Negócio

- O banco quer construir uma máquina preditiva com aprendizado de máquina que os ajudará a identificar os **clientes em potencial, com maior probabilidade de adquirir um empréstimo pessoal.**

#### Instalando o **Pycaret** 


In [39]:
# Instalação do pacote pycaret
!pip3 install pycaret          # OU !pip install pycaret

Collecting importlib-metadata
  Using cached importlib_metadata-4.11.3-py3-none-any.whl (18 kB)
Installing collected packages: importlib-metadata
  Attempting uninstall: importlib-metadata
    Found existing installation: importlib-metadata 2.1.3
    Uninstalling importlib-metadata-2.1.3:
      Successfully uninstalled importlib-metadata-2.1.3
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
watermark 2.3.0 requires importlib-metadata<3.0; python_version < "3.8", but you have importlib-metadata 4.11.3 which is incompatible.[0m
Successfully installed importlib-metadata-4.11.3


##### **Instalando biblioteca de versão**

In [40]:
# Instalando o pacote para gerar as versões dos pacotes usados
!pip install -q -U watermark

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
mlflow 1.25.1 requires importlib-metadata!=4.7.0,>=3.7.0, but you have importlib-metadata 2.1.3 which is incompatible.
markdown 3.3.6 requires importlib-metadata>=4.4; python_version < "3.10", but you have importlib-metadata 2.1.3 which is incompatible.[0m


#### **Importando as Bibliotecas**

In [41]:
# Bibliotecas utilizadas
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Importando o Pycaret e o Shape
import pycaret

# Ignorando Alertas
import warnings
warnings.filterwarnings('ignore')

# Exibindo os gráfico nesse Notebook
%matplotlib inline

In [42]:
%reload_ext watermark
%watermark -a "WMS" --iversions

Author: WMS

numpy     : 1.19.5
pandas    : 1.3.5
pycaret   : 2.3.10
seaborn   : 0.11.2
matplotlib: 3.2.2
google    : 2.0.3
IPython   : 5.5.0



#### **Importando os dados**

In [43]:
# Lendo os Dados
from datetime import datetime
print(datetime.now())
df = pd.read_csv('/content/loan_train_data.csv', index_col=0)
print(datetime.now())

2022-04-14 02:00:59.856991
2022-04-14 02:00:59.873552


In [44]:
# Verificar o tamanho da base de dados "(Linhas, Colunas)"
df.shape

(4000, 13)

In [45]:
# Visualizando as 5 primeiras lindas da base de dados
df.head()

Unnamed: 0_level_0,Age,Experience,Income,ZIP Code,Family,CCAvg,Education,Mortgage,Personal Loan,Securities Account,CD Account,Online,CreditCard
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
3510,38,12,61,91330,3,0.9,3,0,0,0,0,0,0
1129,30,5,171,94025,2,1.9,2,0,1,0,0,0,0
1637,65,39,100,92122,4,1.7,3,0,0,0,0,0,1
3165,28,4,82,95136,4,0.0,1,0,0,0,0,1,1
3563,32,8,169,94596,1,6.5,3,272,1,1,1,1,0


### Descrição das Variáveis

- **Age** $→$ A idade do cliente

- **Experience** $→$ Anos de experiencia

- **Income** $→$ Renda anula do cliente 

- **ZIP Code** $→$ Código postal

- **Family** $→$ Quantidade de pessoas na família ou seja, o tamanho da família

- **CCAvg** $→$ Gastos em cartões de crédito por mês

- **Education** $→$ Níveis de escolaridade

  - 1 → Graduação;

  - 2 → Pós-graduação;

  - 3 → Formação Profissional Avançada

- **Motagage** $→$ Indica se a pessoa tem valor da hipoteca da casa.

- **Personal Loan** $→$ Indica se o cliente aceitou o empréstimo pessoal oferecido pelo banco (_é o nosso target_)

  - $1$ $→$ **Aceito**

  - $0$ $→$ **Não aceitou**

- **Securities Account** $→$ Indica se o cliente possui conta de títulos no banco.

- **CD Account** $→$ Indica se o cliente possui conta com *certificado de depósito (CD)* no banco

- **Online** $→$ Se a pessoa tem acesso bancário online. Ou seja, se o cliente utiliza serviços de internet banking.

- **CreditCard** $→$  Se a pessoa possui ou utiliza cartão de credito.

###<font color='REDBLUE'>**Inicializando as configurações**</font> $→$ **Setup**

In [46]:
from pycaret import classification
from pycaret.classification import *

In [47]:
config_setup = classification.setup(data= df, target='Personal Loan')

Unnamed: 0,Description,Value
0,session_id,4983
1,Target,Personal Loan
2,Target Type,Binary
3,Label Encoded,
4,Original Data,"(4000, 13)"
5,Missing Values,False
6,Numeric Features,6
7,Categorical Features,6
8,Ordinal Features,False
9,High Cardinality Features,False


###<font color='BLUE'>**Comparando diversos Modelos**</font> 

In [48]:
# Comparando diversos Model
classification.compare_models()

Unnamed: 0,Model,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC,TT (Sec)
rf,Random Forest Classifier,0.9839,0.9937,0.8442,0.9823,0.9073,0.8985,0.9021,0.559
gbc,Gradient Boosting Classifier,0.9829,0.9963,0.8631,0.9523,0.9041,0.8947,0.8969,0.308
et,Extra Trees Classifier,0.9825,0.9872,0.8254,0.9863,0.8978,0.8883,0.893,0.507
lightgbm,Light Gradient Boosting Machine,0.9807,0.9961,0.844,0.947,0.891,0.8805,0.8832,0.174
dt,Decision Tree Classifier,0.9771,0.9293,0.8704,0.8871,0.8772,0.8646,0.8655,0.017
ada,Ada Boost Classifier,0.9661,0.976,0.7497,0.8784,0.8053,0.7869,0.792,0.153
lda,Linear Discriminant Analysis,0.9446,0.9596,0.5892,0.7641,0.6637,0.6343,0.6414,0.019
ridge,Ridge Classifier,0.9382,0.0,0.3651,0.9375,0.519,0.4937,0.5597,0.014
lr,Logistic Regression,0.9142,0.9301,0.3496,0.5811,0.4335,0.3903,0.4066,0.29
dummy,Dummy Classifier,0.906,0.5,0.0,0.0,0.0,0.0,0.0,0.035


RandomForestClassifier(bootstrap=True, ccp_alpha=0.0, class_weight=None,
                       criterion='gini', max_depth=None, max_features='auto',
                       max_leaf_nodes=None, max_samples=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, n_estimators=100,
                       n_jobs=-1, oob_score=False, random_state=4983, verbose=0,
                       warm_start=False)

In [49]:
# Criando o Modelo (Máquina Preditiva)
clf_rf = classification.create_model('rf')

Unnamed: 0_level_0,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,0.9893,0.9983,0.8846,1.0,0.9388,0.9329,0.935
1,0.9893,0.9967,0.9231,0.96,0.9412,0.9353,0.9355
2,0.9821,0.9958,0.8462,0.9565,0.898,0.8882,0.8902
3,0.975,0.9985,0.7692,0.9524,0.8511,0.8376,0.8433
4,0.9893,0.9936,0.8846,1.0,0.9388,0.9329,0.935
5,0.9821,0.9699,0.8077,1.0,0.8936,0.884,0.89
6,0.9786,0.9907,0.7778,1.0,0.875,0.8635,0.8716
7,0.9857,0.9982,0.8519,1.0,0.92,0.9122,0.9157
8,0.9893,0.9975,0.8889,1.0,0.9412,0.9353,0.9373
9,0.9785,0.998,0.8077,0.9545,0.875,0.8633,0.8669


#### **Avaliação Geral do Modelo**

In [50]:
# Avaliação das Máquinas Preditivas
classification.evaluate_model(clf_rf)

interactive(children=(ToggleButtons(description='Plot Type:', icons=('',), options=(('Hyperparameters', 'param…

## Fazendo novas previsões

In [51]:
# Dados de Teste
test_df = pd.read_csv('/content/loan_test_data.csv')

In [52]:
test_df.head()

Unnamed: 0,ID,Age,Experience,Income,ZIP Code,Family,CCAvg,Education,Mortgage,Personal Loan,Securities Account,CD Account,Online,CreditCard
0,118,58,33,61,92833,2,2.3,3,193,0,0,0,1,0
1,1817,45,19,91,92373,2,1.7,2,0,0,1,0,1,0
2,671,23,-1,61,92374,4,2.6,1,239,0,0,0,1,0
3,2994,65,40,20,92647,3,0.1,3,0,0,0,0,0,1
4,3265,67,41,114,95616,4,2.4,3,0,0,0,0,1,0


In [53]:
# Fazendo novas previsões
predictions = classification.predict_model(clf_rf, data=test_df)

Unnamed: 0,Model,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC
0,Random Forest Classifier,0.987,0.9977,0.8854,0.977,0.929,0.9218,0.9232


In [54]:
# Imprimindo na tela as novas predições
predictions

Unnamed: 0,ID,Age,Experience,Income,ZIP Code,Family,CCAvg,Education,Mortgage,Personal Loan,Securities Account,CD Account,Online,CreditCard,Label,Score
0,118,58,33,61,92833,2,2.3,3,193,0,0,0,1,0,0,1.00
1,1817,45,19,91,92373,2,1.7,2,0,0,1,0,1,0,0,1.00
2,671,23,-1,61,92374,4,2.6,1,239,0,0,0,1,0,0,0.99
3,2994,65,40,20,92647,3,0.1,3,0,0,0,0,0,1,0,1.00
4,3265,67,41,114,95616,4,2.4,3,0,0,0,0,1,0,0,0.78
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
995,4331,62,37,44,90401,1,1.1,3,0,0,0,0,0,0,0,1.00
996,3221,61,35,28,93302,2,0.2,3,135,0,0,0,1,0,0,1.00
997,1932,28,2,140,92122,2,2.0,1,0,0,0,0,1,0,0,1.00
998,4960,51,27,55,93014,1,1.6,2,197,0,1,0,1,0,0,0.98


In [55]:
# Salvando a Máquina Preditiva Final
classification.save_model(clf_rf, 'modelo_pronto_dt')

Transformation Pipeline and Model Successfully Saved


(Pipeline(memory=None,
          steps=[('dtypes',
                  DataTypes_Auto_infer(categorical_features=[],
                                       display_types=True, features_todrop=[],
                                       id_columns=[],
                                       ml_usecase='classification',
                                       numerical_features=[],
                                       target='Personal Loan',
                                       time_features=[])),
                 ('imputer',
                  Simple_Imputer(categorical_strategy='not_available',
                                 fill_value_categorical=None,
                                 fill_value_numerical=None,
                                 numeric...
                  RandomForestClassifier(bootstrap=True, ccp_alpha=0.0,
                                         class_weight=None, criterion='gini',
                                         max_depth=None, max_features='auto',
      

In [56]:
# Carregando para Usar a Máquina Preditiva Final. Aqui é onde a Aplicação, depois de Criada, vai trabalhar...
dt_model = classification.load_model(model_name='modelo_pronto_dt')

Transformation Pipeline and Model Successfully Loaded
