# Boosting

A metodologia foi inicialmente criada para resolver uma classificação, a idéia principal por é encontrar hipóteses fracas, aprender repetidamente e combinar essas hipóteses fracas dentro de uma única hipótese.
  
É um método de ensemble? Sim.  
**Métodos de ensemble** que tem como objetivo **combinar as predições de diversos estimadores mais simples** para gerar uma **predição final mais robusta**

- **Métodos de boosting**: têm como procedimento geral a construção de estimadores de forma sequencial, de modo que estimadores posteriores tentam reduzir o **viés** do estimador conjunto, que leva em consideração estimadores anteriores. Ex.: **adaboost**.

## Métodos de Ensemble


Há uma classe de algoritmos de Machine Learning, os chamados **métodos de ensemble** que tem como objetivo **combinar as predições de diversos estimadores mais simples** para gerar uma **predição final mais robusta**

Os métodos de ensemble costuman ser divididos em duas classes:

- **Métodos de média**: têm como procedimento geral construir diversos estimadores independentes, e tomar a média de suas predições como a predição final. O principal objetivo do método é reduzir **variância**, de modo que o modelo final seja melhor que todos os modelos individuais. Ex.: **random forest.**
<br>

- **Métodos de boosting**: têm como procedimento geral a construção de estimadores de forma sequencial, de modo que estimadores posteriores tentam reduzir o **viés** do estimador conjunto, que leva em consideração estimadores anteriores. Ex.: **adaboost**.

Há, ainda, uma terceira classe de método de ensemble, o chamado [stacking ensemble](https://machinelearningmastery.com/stacking-ensemble-machine-learning-with-python/), que consiste em "empilhar" modelos de modo a produzir a mistura. Não veremos esta modalidade em detalhes, mas deixo como sugestão para estudos posteriores! :)

Para mais detalhes sobre métodos de ensemble no contexto do sklearn, [clique aqui!](https://scikit-learn.org/stable/modules/ensemble.html)

Na aula de hoje, vamos conhecer em detalhes os procedimentos de bagging e boosting, ilustrados pelos métodos AdaBoost e GradientBoost, respectivamente. Vamos lá!

______

### Bagging vs Boosting

Pra lembrar as principais diferenças entre os dois métodos de ensemble que estudamos:

<img src=https://pluralsight2.imgix.net/guides/81232a78-2e99-4ccc-ba8e-8cd873625fdf_2.jpg width=600>

____
____
____

_________
_______
_________

## Boosting & AdaBoost

O AdaBoost significa **Adaptive Boosting**, e tem como procedimento geral **a criação sucessiva dos chamados weak learners**, que são modelos bem fracos de aprendizagem - geralmente, **árvores de um único nó (stumps)**.

<img src="https://miro.medium.com/max/1744/1*nJ5VrsiS1yaOR77d4h8gyw.png" width=300>

O AdaBoost utiliza os **erros da árvore anterior para melhorar a próxima árvore**. As predições finais são feitas com base **nos pesos de cada stump**, cuja determinação faz parte do algoritmo!

<img src="https://static.packt-cdn.com/products/9781788295758/graphics/image_04_046-1.png" width=700>

Vamos entender um pouco melhor...

Aqui, o bootstrapping não é utilizado: o método começa treinando um classificador fraco **no dataset original**, e depois treina diversas cópias adicionais do classificador **no mesmo dataset**, mas dando **um peso maior às observações que foram classificadas erroneamente** (ou, no caso de regressões, a observações **com o maior erro**).

Assim, após diversas iterações, classificadores/regressores vão sequencialmente "focando nos casos mais difíceis", e construindo um classificador encadeado que seja forte, apesar de utilizar diversos classificadores fracos em como elementos fundamentais.

<img src="https://www.researchgate.net/profile/Zhuo_Wang8/publication/288699540/figure/fig9/AS:668373486686246@1536364065786/Illustration-of-AdaBoost-algorithm-for-creating-a-strong-classifier-based-on-multiple.png" width=500>


De forma resumida, as principais ideias por trás deste algoritmo são:

- O algoritmo cria e combina um conjunto de **modelos fracos** (em geral, stumps);
- Cada stump é criado **levando em consideração os erros do stump anterior**;
- Alguns dos stumps têm **maior peso de decisão** do que outros na predição final;

As classes no sklearn são:

- [AdaBoostClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.AdaBoostClassifier.html)

- [AdaBoostRegressor](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.AdaBoostRegressor.html#sklearn.ensemble.AdaBoostRegressor)

Note que não há muitos hiperparâmetros. O mais importante, que deve ser tunado com o grid/random search, é:

- `n_estimators` : o número de weak learners encadeados;

Além disso, pode também ser interessante tunar os hiperparâmetros dos weak learners. Isso é possível de ser feito, como veremos a seguir!


Uma animação para entendermos melhor...  
- O projeto https://periodicos.uff.br/anaisdoser/article/download/29032/16865/100072
- O resultado https://mateusmaia.shinyapps.io/adaboosting/

Primeiro, vamos começar com nosso baseline:

In [15]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

import warnings
warnings.filterwarnings('ignore')

from sklearn.model_selection import train_test_split

from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer

from sklearn.ensemble import AdaBoostClassifier

from sklearn.metrics import classification_report

In [6]:
df = pd.read_csv('./datasets/german_credit_data.csv', index_col=0)
df.head()

Unnamed: 0,Age,Sex,Job,Housing,Saving accounts,Checking account,Credit amount,Duration,Purpose,Risk
0,67,male,2,own,,little,1169,6,radio/TV,good
1,22,female,2,own,little,moderate,5951,48,radio/TV,bad
2,49,male,1,own,little,,2096,12,education,good
3,45,male,2,free,little,little,7882,42,furniture/equipment,good
4,53,male,2,free,little,little,4870,24,car,bad


In [7]:
df.dtypes

Age                  int64
Sex                 object
Job                  int64
Housing             object
Saving accounts     object
Checking account    object
Credit amount        int64
Duration             int64
Purpose             object
Risk                object
dtype: object

In [13]:
y_colum = 'Risk'

X = df.drop(columns=[y_colum])
y = df[y_colum]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

In [10]:
def pipe_preprocessor(path_dataset, y_colum):
    df = pd.read_csv(path_dataset, index_col=0)

    X = df.drop(columns=[y_colum])
    y = df[y_colum]
    
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)
    
    pipe_features_num = Pipeline([
        ('input_num', SimpleImputer(strategy='mean')),
        ('std', StandardScaler())
    ])
    
    features_num = X_train.select_dtypes(include=np.number).columns.tolist()
    
    pipe_features_cat = Pipeline([
        ('input_cat', SimpleImputer(strategy='constant', fill_value='unknown')),
        ('ondehot', OneHotEncoder())
    ])
    
    features_cat = X_train.select_dtypes(exclude=np.number).columns.tolist()
    
    pre_processor = ColumnTransformer([
        ('transf_num', pipe_features_num, features_num),
        ('transf_cat', pipe_features_cat, features_cat)
    ])
    
    return pre_processor

In [11]:
pre_processor = pipe_preprocessor('./datasets/german_credit_data.csv', 'Risk')

In [12]:
pipe_ab = Pipeline([
    ('pre_processor', pre_processor),
    ('ab', AdaBoostClassifier(random_state=42))
])

In [14]:
pipe_ab.fit(X_train, y_train)

In [17]:
def metricas_classificacao(estimador, X, y):
    y_pred = estimador.predict(X)
    print(classification_report(y, y_pred))

In [19]:
metricas_classificacao(pipe_ab, X_train, y_train)

              precision    recall  f1-score   support

         bad       0.68      0.51      0.58       240
        good       0.81      0.89      0.85       560

    accuracy                           0.78       800
   macro avg       0.74      0.70      0.72       800
weighted avg       0.77      0.78      0.77       800



In [20]:
metricas_classificacao(pipe_ab, X_test, y_test)

              precision    recall  f1-score   support

         bad       0.62      0.48      0.54        60
        good       0.80      0.87      0.83       140

    accuracy                           0.76       200
   macro avg       0.71      0.68      0.69       200
weighted avg       0.74      0.76      0.75       200



In [29]:
print(len(pipe_ab['ab'].estimators_))
print(pipe_ab['ab'].estimator_weights_)
pipe_ab['ab'].estimators_

50
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1.]


[DecisionTreeClassifier(max_depth=1, random_state=1608637542),
 DecisionTreeClassifier(max_depth=1, random_state=1273642419),
 DecisionTreeClassifier(max_depth=1, random_state=1935803228),
 DecisionTreeClassifier(max_depth=1, random_state=787846414),
 DecisionTreeClassifier(max_depth=1, random_state=996406378),
 DecisionTreeClassifier(max_depth=1, random_state=1201263687),
 DecisionTreeClassifier(max_depth=1, random_state=423734972),
 DecisionTreeClassifier(max_depth=1, random_state=415968276),
 DecisionTreeClassifier(max_depth=1, random_state=670094950),
 DecisionTreeClassifier(max_depth=1, random_state=1914837113),
 DecisionTreeClassifier(max_depth=1, random_state=669991378),
 DecisionTreeClassifier(max_depth=1, random_state=429389014),
 DecisionTreeClassifier(max_depth=1, random_state=249467210),
 DecisionTreeClassifier(max_depth=1, random_state=1972458954),
 DecisionTreeClassifier(max_depth=1, random_state=1572714583),
 DecisionTreeClassifier(max_depth=1, random_state=1433267572),


Vamos deixar o base_estimator explícito

In [30]:
from sklearn.tree import DecisionTreeClassifier

In [31]:
pre_processor = pipe_preprocessor('./datasets/german_credit_data.csv', 'Risk')

In [32]:
basal = DecisionTreeClassifier(max_depth=1)

In [33]:
pipe_ab = Pipeline([
    ('pre_processor', pre_processor),
    ('ab', AdaBoostClassifier(base_estimator=basal, random_state=42))
])

In [34]:
pipe_ab.fit(X_train, y_train)

In [36]:
metricas_classificacao(pipe_ab, X_train, y_train)

              precision    recall  f1-score   support

         bad       0.68      0.51      0.58       240
        good       0.81      0.89      0.85       560

    accuracy                           0.78       800
   macro avg       0.74      0.70      0.72       800
weighted avg       0.77      0.78      0.77       800



In [37]:
metricas_classificacao(pipe_ab, X_test, y_test)

              precision    recall  f1-score   support

         bad       0.62      0.48      0.54        60
        good       0.80      0.87      0.83       140

    accuracy                           0.76       200
   macro avg       0.71      0.68      0.69       200
weighted avg       0.74      0.76      0.75       200



Podemos, também, mudar o estimador basal. Por exemplo, uma regressão logística fortemente regularizada.

In [38]:
from sklearn.linear_model import LogisticRegression

In [39]:
pre_processor = pipe_preprocessor('./datasets/german_credit_data.csv', 'Risk')

In [40]:
basal = LogisticRegression(C=0.1, random_state=42)

In [41]:
pipe_ab = Pipeline([
    ('pre_processor', pre_processor),
    ('ab', AdaBoostClassifier(base_estimator=basal, random_state=42))
])

In [42]:
pipe_ab.fit(X_train, y_train)

In [43]:
metricas_classificacao(pipe_ab, X_train, y_train)

              precision    recall  f1-score   support

         bad       0.70      0.13      0.22       240
        good       0.72      0.97      0.83       560

    accuracy                           0.72       800
   macro avg       0.71      0.55      0.53       800
weighted avg       0.72      0.72      0.65       800



In [44]:
metricas_classificacao(pipe_ab, X_test, y_test)

              precision    recall  f1-score   support

         bad       0.75      0.15      0.25        60
        good       0.73      0.98      0.84       140

    accuracy                           0.73       200
   macro avg       0.74      0.56      0.54       200
weighted avg       0.74      0.73      0.66       200



In [45]:
print(len(pipe_ab['ab'].estimators_))
print(pipe_ab['ab'].estimator_weights_)
pipe_ab['ab'].estimators_

50
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1.]


[LogisticRegression(C=0.1, random_state=1608637542),
 LogisticRegression(C=0.1, random_state=1273642419),
 LogisticRegression(C=0.1, random_state=1935803228),
 LogisticRegression(C=0.1, random_state=787846414),
 LogisticRegression(C=0.1, random_state=996406378),
 LogisticRegression(C=0.1, random_state=1201263687),
 LogisticRegression(C=0.1, random_state=423734972),
 LogisticRegression(C=0.1, random_state=415968276),
 LogisticRegression(C=0.1, random_state=670094950),
 LogisticRegression(C=0.1, random_state=1914837113),
 LogisticRegression(C=0.1, random_state=669991378),
 LogisticRegression(C=0.1, random_state=429389014),
 LogisticRegression(C=0.1, random_state=249467210),
 LogisticRegression(C=0.1, random_state=1972458954),
 LogisticRegression(C=0.1, random_state=1572714583),
 LogisticRegression(C=0.1, random_state=1433267572),
 LogisticRegression(C=0.1, random_state=434285667),
 LogisticRegression(C=0.1, random_state=613608295),
 LogisticRegression(C=0.1, random_state=893664919),
 Log

Não ficou muito legal. Por isso que, apesar de ser possível usar outros estimadores basais, é comum usarmos stumps mesmo (árvores com uma única quebra).

Vamos agora fazer o gridsearch!

In [46]:
from sklearn.model_selection import GridSearchCV, StratifiedGroupKFold

In [47]:
pre_processor = pipe_preprocessor('./datasets/german_credit_data.csv', 'Risk')

In [48]:
basal = LogisticRegression(l1_ratio=0.5, random_state=42)

In [49]:
pipe_ab = Pipeline([
    ('pre_processor', pre_processor),
    ('ab', AdaBoostClassifier(base_estimator=basal, random_state=42))
])

In [53]:
params_grid_ab = {
    'ab__base_estimator__C': [0.1, 0.01],
    'ab__base_estimator__penalty': ['l2', 'elasticnet'],
    'ab__n_estimators': [50, 100, 150]
}

splitter = StratifiedGroupKFold(n_splits=5, shuffle=True, random_state=42)

grid_ab = GridSearchCV(
    estimator=pipe_ab,
    param_grid=params_grid_ab,
    scoring='f1_weighted',
    cv=splitter,
    verbose=10,
    n_jobs=-1
)

grid_ab.fit(X_train, y_train)

Fitting 5 folds for each of 12 candidates, totalling 60 fits


Traceback (most recent call last):
  File "/home/thiago/workspace/python/lab/letscode/letscode-env/lib/python3.8/site-packages/sklearn/model_selection/_validation.py", line 767, in _score
    scores = scorer(estimator, X_test, y_test)
  File "/home/thiago/workspace/python/lab/letscode/letscode-env/lib/python3.8/site-packages/sklearn/metrics/_scorer.py", line 219, in __call__
    return self._score(
  File "/home/thiago/workspace/python/lab/letscode/letscode-env/lib/python3.8/site-packages/sklearn/metrics/_scorer.py", line 261, in _score
    y_pred = method_caller(estimator, "predict", X)
  File "/home/thiago/workspace/python/lab/letscode/letscode-env/lib/python3.8/site-packages/sklearn/metrics/_scorer.py", line 71, in _cached_call
    return getattr(estimator, method)(*args, **kwargs)
  File "/home/thiago/workspace/python/lab/letscode/letscode-env/lib/python3.8/site-packages/sklearn/pipeline.py", line 457, in predict
    Xt = transform.transform(Xt)
  File "/home/thiago/workspace/pytho



Traceback (most recent call last):
  File "/home/thiago/workspace/python/lab/letscode/letscode-env/lib/python3.8/site-packages/sklearn/model_selection/_validation.py", line 767, in _score
    scores = scorer(estimator, X_test, y_test)
  File "/home/thiago/workspace/python/lab/letscode/letscode-env/lib/python3.8/site-packages/sklearn/metrics/_scorer.py", line 219, in __call__
    return self._score(
  File "/home/thiago/workspace/python/lab/letscode/letscode-env/lib/python3.8/site-packages/sklearn/metrics/_scorer.py", line 261, in _score
    y_pred = method_caller(estimator, "predict", X)
  File "/home/thiago/workspace/python/lab/letscode/letscode-env/lib/python3.8/site-packages/sklearn/metrics/_scorer.py", line 71, in _cached_call
    return getattr(estimator, method)(*args, **kwargs)
  File "/home/thiago/workspace/python/lab/letscode/letscode-env/lib/python3.8/site-packages/sklearn/pipeline.py", line 457, in predict
    Xt = transform.transform(Xt)
  File "/home/thiago/workspace/pytho

Traceback (most recent call last):
  File "/home/thiago/workspace/python/lab/letscode/letscode-env/lib/python3.8/site-packages/sklearn/model_selection/_validation.py", line 767, in _score
    scores = scorer(estimator, X_test, y_test)
  File "/home/thiago/workspace/python/lab/letscode/letscode-env/lib/python3.8/site-packages/sklearn/metrics/_scorer.py", line 219, in __call__
    return self._score(
  File "/home/thiago/workspace/python/lab/letscode/letscode-env/lib/python3.8/site-packages/sklearn/metrics/_scorer.py", line 261, in _score
    y_pred = method_caller(estimator, "predict", X)
  File "/home/thiago/workspace/python/lab/letscode/letscode-env/lib/python3.8/site-packages/sklearn/metrics/_scorer.py", line 71, in _cached_call
    return getattr(estimator, method)(*args, **kwargs)
  File "/home/thiago/workspace/python/lab/letscode/letscode-env/lib/python3.8/site-packages/sklearn/pipeline.py", line 457, in predict
    Xt = transform.transform(Xt)
  File "/home/thiago/workspace/pytho





Traceback (most recent call last):
  File "/home/thiago/workspace/python/lab/letscode/letscode-env/lib/python3.8/site-packages/sklearn/model_selection/_validation.py", line 767, in _score
    scores = scorer(estimator, X_test, y_test)
  File "/home/thiago/workspace/python/lab/letscode/letscode-env/lib/python3.8/site-packages/sklearn/metrics/_scorer.py", line 219, in __call__
    return self._score(
  File "/home/thiago/workspace/python/lab/letscode/letscode-env/lib/python3.8/site-packages/sklearn/metrics/_scorer.py", line 261, in _score
    y_pred = method_caller(estimator, "predict", X)
  File "/home/thiago/workspace/python/lab/letscode/letscode-env/lib/python3.8/site-packages/sklearn/metrics/_scorer.py", line 71, in _cached_call
    return getattr(estimator, method)(*args, **kwargs)
  File "/home/thiago/workspace/python/lab/letscode/letscode-env/lib/python3.8/site-packages/sklearn/pipeline.py", line 457, in predict
    Xt = transform.transform(Xt)
  File "/home/thiago/workspace/pytho

Traceback (most recent call last):
  File "/home/thiago/workspace/python/lab/letscode/letscode-env/lib/python3.8/site-packages/sklearn/model_selection/_validation.py", line 767, in _score
    scores = scorer(estimator, X_test, y_test)
  File "/home/thiago/workspace/python/lab/letscode/letscode-env/lib/python3.8/site-packages/sklearn/metrics/_scorer.py", line 219, in __call__
    return self._score(
  File "/home/thiago/workspace/python/lab/letscode/letscode-env/lib/python3.8/site-packages/sklearn/metrics/_scorer.py", line 261, in _score
    y_pred = method_caller(estimator, "predict", X)
  File "/home/thiago/workspace/python/lab/letscode/letscode-env/lib/python3.8/site-packages/sklearn/metrics/_scorer.py", line 71, in _cached_call
    return getattr(estimator, method)(*args, **kwargs)
  File "/home/thiago/workspace/python/lab/letscode/letscode-env/lib/python3.8/site-packages/sklearn/pipeline.py", line 457, in predict
    Xt = transform.transform(Xt)
  File "/home/thiago/workspace/pytho

Traceback (most recent call last):
  File "/home/thiago/workspace/python/lab/letscode/letscode-env/lib/python3.8/site-packages/sklearn/model_selection/_validation.py", line 767, in _score
    scores = scorer(estimator, X_test, y_test)
  File "/home/thiago/workspace/python/lab/letscode/letscode-env/lib/python3.8/site-packages/sklearn/metrics/_scorer.py", line 219, in __call__
    return self._score(
  File "/home/thiago/workspace/python/lab/letscode/letscode-env/lib/python3.8/site-packages/sklearn/metrics/_scorer.py", line 261, in _score
    y_pred = method_caller(estimator, "predict", X)
  File "/home/thiago/workspace/python/lab/letscode/letscode-env/lib/python3.8/site-packages/sklearn/metrics/_scorer.py", line 71, in _cached_call
    return getattr(estimator, method)(*args, **kwargs)
  File "/home/thiago/workspace/python/lab/letscode/letscode-env/lib/python3.8/site-packages/sklearn/pipeline.py", line 457, in predict
    Xt = transform.transform(Xt)
  File "/home/thiago/workspace/pytho

Traceback (most recent call last):
  File "/home/thiago/workspace/python/lab/letscode/letscode-env/lib/python3.8/site-packages/sklearn/model_selection/_validation.py", line 767, in _score
    scores = scorer(estimator, X_test, y_test)
  File "/home/thiago/workspace/python/lab/letscode/letscode-env/lib/python3.8/site-packages/sklearn/metrics/_scorer.py", line 219, in __call__
    return self._score(
  File "/home/thiago/workspace/python/lab/letscode/letscode-env/lib/python3.8/site-packages/sklearn/metrics/_scorer.py", line 261, in _score
    y_pred = method_caller(estimator, "predict", X)
  File "/home/thiago/workspace/python/lab/letscode/letscode-env/lib/python3.8/site-packages/sklearn/metrics/_scorer.py", line 71, in _cached_call
    return getattr(estimator, method)(*args, **kwargs)
  File "/home/thiago/workspace/python/lab/letscode/letscode-env/lib/python3.8/site-packages/sklearn/pipeline.py", line 457, in predict
    Xt = transform.transform(Xt)
  File "/home/thiago/workspace/pytho

Traceback (most recent call last):
  File "/home/thiago/workspace/python/lab/letscode/letscode-env/lib/python3.8/site-packages/sklearn/model_selection/_validation.py", line 767, in _score
    scores = scorer(estimator, X_test, y_test)
  File "/home/thiago/workspace/python/lab/letscode/letscode-env/lib/python3.8/site-packages/sklearn/metrics/_scorer.py", line 219, in __call__
    return self._score(
  File "/home/thiago/workspace/python/lab/letscode/letscode-env/lib/python3.8/site-packages/sklearn/metrics/_scorer.py", line 261, in _score
    y_pred = method_caller(estimator, "predict", X)
  File "/home/thiago/workspace/python/lab/letscode/letscode-env/lib/python3.8/site-packages/sklearn/metrics/_scorer.py", line 71, in _cached_call
    return getattr(estimator, method)(*args, **kwargs)
  File "/home/thiago/workspace/python/lab/letscode/letscode-env/lib/python3.8/site-packages/sklearn/pipeline.py", line 457, in predict
    Xt = transform.transform(Xt)
  File "/home/thiago/workspace/pytho

Traceback (most recent call last):
  File "/home/thiago/workspace/python/lab/letscode/letscode-env/lib/python3.8/site-packages/sklearn/model_selection/_validation.py", line 767, in _score
    scores = scorer(estimator, X_test, y_test)
  File "/home/thiago/workspace/python/lab/letscode/letscode-env/lib/python3.8/site-packages/sklearn/metrics/_scorer.py", line 219, in __call__
    return self._score(
  File "/home/thiago/workspace/python/lab/letscode/letscode-env/lib/python3.8/site-packages/sklearn/metrics/_scorer.py", line 261, in _score
    y_pred = method_caller(estimator, "predict", X)
  File "/home/thiago/workspace/python/lab/letscode/letscode-env/lib/python3.8/site-packages/sklearn/metrics/_scorer.py", line 71, in _cached_call
    return getattr(estimator, method)(*args, **kwargs)
  File "/home/thiago/workspace/python/lab/letscode/letscode-env/lib/python3.8/site-packages/sklearn/pipeline.py", line 457, in predict
    Xt = transform.transform(Xt)
  File "/home/thiago/workspace/pytho

Traceback (most recent call last):
  File "/home/thiago/workspace/python/lab/letscode/letscode-env/lib/python3.8/site-packages/sklearn/model_selection/_validation.py", line 767, in _score
    scores = scorer(estimator, X_test, y_test)
  File "/home/thiago/workspace/python/lab/letscode/letscode-env/lib/python3.8/site-packages/sklearn/metrics/_scorer.py", line 219, in __call__
    return self._score(
  File "/home/thiago/workspace/python/lab/letscode/letscode-env/lib/python3.8/site-packages/sklearn/metrics/_scorer.py", line 261, in _score
    y_pred = method_caller(estimator, "predict", X)
  File "/home/thiago/workspace/python/lab/letscode/letscode-env/lib/python3.8/site-packages/sklearn/metrics/_scorer.py", line 71, in _cached_call
    return getattr(estimator, method)(*args, **kwargs)
  File "/home/thiago/workspace/python/lab/letscode/letscode-env/lib/python3.8/site-packages/sklearn/pipeline.py", line 457, in predict
    Xt = transform.transform(Xt)
  File "/home/thiago/workspace/pytho



Traceback (most recent call last):
  File "/home/thiago/workspace/python/lab/letscode/letscode-env/lib/python3.8/site-packages/sklearn/model_selection/_validation.py", line 767, in _score
    scores = scorer(estimator, X_test, y_test)
  File "/home/thiago/workspace/python/lab/letscode/letscode-env/lib/python3.8/site-packages/sklearn/metrics/_scorer.py", line 219, in __call__
    return self._score(
  File "/home/thiago/workspace/python/lab/letscode/letscode-env/lib/python3.8/site-packages/sklearn/metrics/_scorer.py", line 261, in _score
    y_pred = method_caller(estimator, "predict", X)
  File "/home/thiago/workspace/python/lab/letscode/letscode-env/lib/python3.8/site-packages/sklearn/metrics/_scorer.py", line 71, in _cached_call
    return getattr(estimator, method)(*args, **kwargs)
  File "/home/thiago/workspace/python/lab/letscode/letscode-env/lib/python3.8/site-packages/sklearn/pipeline.py", line 457, in predict
    Xt = transform.transform(Xt)
  File "/home/thiago/workspace/pytho







In [54]:
grid_ab.best_params_

{'ab__base_estimator__C': 0.1,
 'ab__base_estimator__penalty': 'l2',
 'ab__n_estimators': 50}

In [55]:
metricas_classificacao(grid_ab, X_train, y_train)

              precision    recall  f1-score   support

         bad       0.70      0.13      0.22       240
        good       0.72      0.97      0.83       560

    accuracy                           0.72       800
   macro avg       0.71      0.55      0.53       800
weighted avg       0.72      0.72      0.65       800



_________
_______
_________

### Exercício
Utilizando o dataset de cancer: crie um modelo para predizer o tipo.  
Desta vez utilizando o AdaBoost

In [2]:
from sklearn.datasets import load_breast_cancer

dados = load_breast_cancer(as_frame=True)
print(dados['DESCR'])

.. _breast_cancer_dataset:

Breast cancer wisconsin (diagnostic) dataset
--------------------------------------------

**Data Set Characteristics:**

    :Number of Instances: 569

    :Number of Attributes: 30 numeric, predictive attributes and the class

    :Attribute Information:
        - radius (mean of distances from center to points on the perimeter)
        - texture (standard deviation of gray-scale values)
        - perimeter
        - area
        - smoothness (local variation in radius lengths)
        - compactness (perimeter^2 / area - 1.0)
        - concavity (severity of concave portions of the contour)
        - concave points (number of concave portions of the contour)
        - symmetry
        - fractal dimension ("coastline approximation" - 1)

        The mean, standard error, and "worst" or largest (mean of the three
        worst/largest values) of these features were computed for each image,
        resulting in 30 features.  For instance, field 0 is Mean Radi

In [3]:
df = dados['frame']
df.head()

Unnamed: 0,mean radius,mean texture,mean perimeter,mean area,mean smoothness,mean compactness,mean concavity,mean concave points,mean symmetry,mean fractal dimension,...,worst texture,worst perimeter,worst area,worst smoothness,worst compactness,worst concavity,worst concave points,worst symmetry,worst fractal dimension,target
0,17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,0.2419,0.07871,...,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189,0
1,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,0.1812,0.05667,...,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902,0
2,19.69,21.25,130.0,1203.0,0.1096,0.1599,0.1974,0.1279,0.2069,0.05999,...,25.53,152.5,1709.0,0.1444,0.4245,0.4504,0.243,0.3613,0.08758,0
3,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,0.2597,0.09744,...,26.5,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.173,0
4,20.29,14.34,135.1,1297.0,0.1003,0.1328,0.198,0.1043,0.1809,0.05883,...,16.67,152.2,1575.0,0.1374,0.205,0.4,0.1625,0.2364,0.07678,0


##  Gradient boosting

Além dos métodos que estudamos, há ainda outras classes de métodos de ensemble!

Em particular, a classe de modelos que se utilizam do procedimento de **gradient boosting**.

O gradient boosting também é baseado no princípio de boosting (utilização de weak learners sequencialmente adicionados de modo a **sequencialmente minimizar os erros cometidos**).

<img src=https://miro.medium.com/max/788/1*pEu2LNmxf9ttXHIALPcEBw.png width=600>

Mas este método implementa o boosting através de um **gradiente** explícito.

A ideia é que caminhemos na direção do **erro mínimo** de maneira iterativa **passo a passo**.

Este caminho se dá justamente pelo **gradiente** da **função de custo/perda**, que mede justamente os erros cometidos.

<img src=https://upload.wikimedia.org/wikipedia/commons/a/a3/Gradient_descent.gif width=400>

Este método é conhecido como:

### Gradiente descendente

Deixei em ênfase porque este será um método de **enorme importância** no estudo de redes neurais (e é, em geral, um método de otimização muito utilizado).

O objetivo geral do método é bem simples: determinar quais são os **parâmetros** da hipótese que minimizam a função de custo/perda. Para isso, o método "percorre" a função de erro, indo em direção ao seu mínimo (e este "caminho" feito na função se dá justamente pela **determinação iterativa dos parâmetros**, isto é, **a cada passo, chegamos mais perto dos parâmetros finais da hipótese**, conforme eles são ajustados aos dados.

> **Pequeno interlúdio matemático:** o gradiente descendente implementado pelo gradient boosting é, na verdade, um **gradiente descendente funcional**, isto é, desejamos encontrar não um conjunto de parâmetros que minimiza o erro, mas sim **introduzir sequencialmente weak learners (hipótese simples) que minimizam o erro**. Desta forma, o gradient boosting minimiza a função de custo ao ecolher iterativamente hipóteses simples que apontam na direção do mínimo, neste espaço funcional.

Apesar do interlúdio acima, não precisamos nos preocupar muito com os detalhes matemáticos: o que importa é entender que no caso do gradient boosting, há alguns pontos importantes:

- Uma **função de custo/perda (loss)** é explicitamente minimizada por um procedimento de gradiente;

- O gradiente está relacionado com o procedimento de **encadeamento progressivo entre weak learners**, seguindo a ideia do boosting.

Pra quem quiser saber um pouco mais de detalhes (e se aventurar na matemática), sugiro [este post](https://www.gormanalysis.com/blog/gradient-boosting-explained/) ou então [este site](https://explained.ai/gradient-boosting/), que contém vários materiais ótimos para entender o método com todos os detalhes matemáticos.

Os [vídeos do StatQuest](https://www.youtube.com/playlist?list=PLblh5JKOoLUJjeXUvUE0maghNuY2_5fY6) também são uma boa referência!

As classes do sklearn são:

- [GradientBoostingClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html)

- [GradientBoostingRegressor](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingRegressor.html#sklearn.ensemble.GradientBoostingRegressor)

E os principais hiperparâmetros a serem ajustados são:

- `n_estimators` : novamente, o número de weak learners encadeados.

- `learning_rate` : a constante que multiplica o gradiente no gradiente descendente. Essencialmente, controla o "tamanho do passo" a ser dado em direção ao mínimo.

Segundo o próprio [User Guide](https://scikit-learn.org/stable/modules/ensemble.html#gradient-boosting): "*Empirical evidence suggests that small values of `learning_rate` favor better test error. The lireature recommends to set the learning rate to a small constant (e.g. `learning_rate <= 0.1`) and choose `n_estimators` by early stopping.*"

Ainda sobre a learning rate, as ilustrações a seguir ajudam a entender sua importância:

<img src=https://www.jeremyjordan.me/content/images/2018/02/Screen-Shot-2018-02-24-at-11.47.09-AM.png width=700>

<img src=https://cdn-images-1.medium.com/max/1440/0*A351v9EkS6Ps2zIg.gif width=500>

Vamos treinar nosso classificador baseline de gradient boosting:

Pra casa: grid search para otimizar os hiperparâmetros!

### Exercício
Utilizando o dataset de cancer: crie um modelo para predizer o tipo.  
Desta vez utilizando o GradienteBoosting