# Métodos Ensemble

**Métodos Ensemble**<br/>
Qual algoritmo vai ter o melhor desempenho para resolver o problema? <br/>
A melhor forma de melhorar a acurácia de um modelo é juntar vários modelos diferentes a fim de criar um modelo final mais preciso. <br/>
Métodos Ensemble são uma categoria de algoritmos de Machine Learning e podem ser usados tanto em Aprendizagem Supervisionada quanto em aprendizagem não supervisionada.
Construção de Ensembles consistem em dois passos:
1. Construir vários modelos;
2. Combinar suas estimativas.

<img src="assets/ensemble01.png"/><br/>
<img src="assets/ensemble02.png"/><br/>
<img src="assets/ensemble03.png"/><br/>



## Bagging

http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.BaggingClassifier.html

In [1]:
# Import
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import BaggingClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.datasets import load_digits
from sklearn.preprocessing import scale

In [2]:
# Carga de dados
digits = load_digits()

In [3]:
# Pré-processamento
data = scale(digits.data)

In [4]:
# Variáveis preditoras e variável target
X = data
y = digits.target

In [5]:
# Construção do Classificador
bagging = BaggingClassifier(KNeighborsClassifier(), max_samples = 0.5, max_features = 0.5)

In [6]:
bagging

BaggingClassifier(base_estimator=KNeighborsClassifier(), max_features=0.5,
                  max_samples=0.5)

In [7]:
?cross_val_score

[1;31mSignature:[0m
[0mcross_val_score[0m[1;33m([0m[1;33m
[0m    [0mestimator[0m[1;33m,[0m[1;33m
[0m    [0mX[0m[1;33m,[0m[1;33m
[0m    [0my[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m    [1;33m*[0m[1;33m,[0m[1;33m
[0m    [0mgroups[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0mscoring[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0mcv[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0mn_jobs[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0mverbose[0m[1;33m=[0m[1;36m0[0m[1;33m,[0m[1;33m
[0m    [0mfit_params[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0mpre_dispatch[0m[1;33m=[0m[1;34m'2*n_jobs'[0m[1;33m,[0m[1;33m
[0m    [0merror_score[0m[1;33m=[0m[0mnan[0m[1;33m,[0m[1;33m
[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Evaluate a score by cross-validation

Read more in the :ref:`User Guide <cross_validation>`.

Parameters
----------
esti

In [8]:
# Score do modelo
scores = cross_val_score(bagging, X, y)

In [9]:
# Média do score
mean = scores.mean()

In [10]:
print(scores)

[0.91944444 0.92777778 0.94428969 0.96935933 0.94150418]


In [11]:
print(mean)

0.9404750851129681


## Extremely Randomized Trees (ExtraTrees)

http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.ExtraTreesClassifier.html

http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html

In [12]:
# Import
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import ExtraTreesClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_digits
from sklearn.preprocessing import scale

In [13]:
# Carregando os dados
digits = load_digits()

In [14]:
# Pré-processamento
data = scale(digits.data)

In [15]:
# Variáveis preditoras e variável target
X = data
y = digits.target

In [16]:
# Cria o classificador
clf = DecisionTreeClassifier(max_depth = None, min_samples_split = 2, random_state = 0)
scores = cross_val_score(clf, X, y)
mean = scores.mean()
print(scores)
print(mean)

[0.78055556 0.71388889 0.80779944 0.8356546  0.79665738]
0.7869111730114515


In [17]:
clf = RandomForestClassifier(n_estimators = 10, max_depth = None, min_samples_split = 2, random_state = 0)
scores = cross_val_score(clf, X, y)
mean = scores.mean()
print(scores)
print(mean)

[0.89166667 0.88055556 0.91643454 0.93036212 0.90807799]
0.9054193748065614


In [18]:
clf = ExtraTreesClassifier(n_estimators = 10, max_depth = None, min_samples_split = 2, random_state = 0)
scores = cross_val_score(clf, X, y)
mean = scores.mean()
print(scores)
print(mean)

[0.90277778 0.86944444 0.93593315 0.95264624 0.91364903]
0.9148901268956979


## Adaboost

http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.AdaBoostClassifier.html

In [19]:
# Import
import numpy as np
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import AdaBoostClassifier
#from sklearn.datasets.mldata import fetch_mldata
from sklearn.model_selection import cross_val_score

In [20]:
# Carregandos os dados
data = load_digits()
# Pré-processamento
data = scale(digits.data)
# Variáveis preditoras e variável target
X = data
y = digits.target

In [21]:
# Datasets de treino e de teste
X_test, y_test = X[189:], y[189:]
X_train, y_train = X[:189], y[:189]

In [22]:
# Construindo o estimador base
estim_base = DecisionTreeClassifier(max_depth = 1, min_samples_leaf = 1)
estim_base.fit(X_train, y_train)
estim_base_err = 1.0 - estim_base.score(X_test, y_test)

In [23]:
ada_clf = AdaBoostClassifier(base_estimator = estim_base, learning_rate = 1.0, n_estimators = 400, algorithm = "SAMME")

In [24]:
ada_clf.fit(X_train, y_train)

AdaBoostClassifier(algorithm='SAMME',
                   base_estimator=DecisionTreeClassifier(max_depth=1),
                   n_estimators=400)

In [25]:
scores = cross_val_score(ada_clf, X_test, y_test)
print(scores)
means = scores.mean()
print(means)

[0.82919255 0.75776398 0.8757764  0.8317757  0.82242991]
0.8233877053462587
