O dataset `winequality-red.csv` contém as seguintes colunas, que representam as características químicas dos vinhos, e a última coluna é a qualidade do vinho, que é a variável alvo para tarefas de classificação:

In [2]:
import pandas as pd

url = 'https://raw.githubusercontent.com/alura-cursos/combina-classificadores/main/dados/bank_additional_full_processed.csv'
df = pd.read_csv(url, sep=',')


df.head()


Unnamed: 0,age,duration,campaign,previous,emp.var.rate,cons.price.idx,cons.conf.idx,euribor3m,nr.employed,job_blue-collar,...,pdays_20,pdays_21,pdays_22,pdays_25,pdays_26,pdays_27,pdays_not_contacted,poutcome_nonexistent,poutcome_success,y_yes
0,56,261,1,0,1.1,93.994,-36.4,4.857,5191.0,0,...,0,0,0,0,0,0,1,1,0,0
1,57,149,1,0,1.1,93.994,-36.4,4.857,5191.0,0,...,0,0,0,0,0,0,1,1,0,0
2,37,226,1,0,1.1,93.994,-36.4,4.857,5191.0,0,...,0,0,0,0,0,0,1,1,0,0
3,40,151,1,0,1.1,93.994,-36.4,4.857,5191.0,0,...,0,0,0,0,0,0,1,1,0,0
4,56,307,1,0,1.1,93.994,-36.4,4.857,5191.0,0,...,0,0,0,0,0,0,1,1,0,0


Instalação e importação das bibliotecas necessárias

In [3]:
!pip install mlxtend



In [4]:
!pip install catboost

Collecting catboost
  Downloading catboost-1.2.2-cp310-cp310-manylinux2014_x86_64.whl (98.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m98.7/98.7 MB[0m [31m7.2 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: catboost
Successfully installed catboost-1.2.2


In [11]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from mlxtend.classifier import StackingCVClassifier
from sklearn.ensemble import ExtraTreesClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.linear_model import LogisticRegression
from catboost import CatBoostClassifier

Parâmetros de busca para o grid_search

In [12]:
parametros = {'bootstrap': False,
 'max_depth': None,
 'min_samples_leaf': 1,
 'min_samples_split': 6,
 'n_estimators': 200}

In [13]:
# Separação das features e do alvo
X = df.drop('y_yes', axis=1)
y = df['y_yes']

# Divisão em treino e teste
X_treino, X_teste, y_treino, y_teste = train_test_split(X, y, test_size=0.2, random_state=42)

# Classificadores de base
catboost = CatBoostClassifier(verbose=0)  # desligando a saída de treino
extratrees = ExtraTreesClassifier(**parametros,random_state=42)
gaussiannb = GaussianNB()

scaler = StandardScaler()
scaler.fit(X_treino)
X_treino = scaler.transform(X_treino)
X_teste = scaler.transform(X_teste)
# Classificador Meta
logistic = LogisticRegression()

# Usando o StackingCVClassifier
stack = StackingCVClassifier(classifiers=[catboost, extratrees, gaussiannb],
                             meta_classifier=logistic,
                             cv=5,
                             random_state=42,
                             verbose=1,
                             n_jobs=-1)



# Ajustando o modelo aos dados de treino
stack.fit(X_treino, y_treino)



Fitting 3 classifiers...
Fitting classifier1: catboostclassifier (1/3)


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=-1)]: Done   5 out of   5 | elapsed:  1.1min finished
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers.


Fitting classifier2: extratreesclassifier (2/3)


[Parallel(n_jobs=-1)]: Done   5 out of   5 | elapsed:   36.0s finished
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers.


Fitting classifier3: gaussiannb (3/3)


[Parallel(n_jobs=-1)]: Done   5 out of   5 | elapsed:    0.3s finished


In [14]:
# Usando o `pipeline` para fazer previsões.
y_pred = stack.predict(X_teste)

In [15]:
stack.score(X_teste,y_teste)

0.9090798737557659