<a href="https://colab.research.google.com/github/aplneto/IF1014/blob/main/08_Ensemble_Learning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# <center> Missão 8 </center>

#Equipe:
Antônio Paulino - apln2@cin.ufpe.br
Ailton Rodrigues - ajr@cin.ufpe.br
Douglas Tavares - dtrps@cin.ufpe.br

Os comitês de classificadores é um paradigma de aprendizagem de máquina em que vários modelos (muitas vezes chamados de “alunos fracos”) são treinados para resolver o mesmo problema e combinados para obter melhores resultados. A hipótese principal é que quando os modelos fracos são combinados corretamente podemos obter modelos mais precisos e / ou robustos.

Existem três técnicas de comitês: bagging, boosting e stacking 

Todos os três são chamados de "meta-algoritmos": abordagens para combinar várias técnicas de aprendizado de máquina em um modelo preditivo a fim de diminuir a variância (bagging), polarização (boosting) ou melhorar a força preditiva (agrupamento de alias de empilhamento).

Cada algoritmo consiste em duas etapas:

•	Produzir uma distribuição de modelos de ML simples em subconjuntos dos dados originais.

•	Combinando a distribuição em um modelo "agregado".

Aqui está uma breve descrição de todos os três métodos:

Bagging (significa Bootstrap Aggregating) é uma maneira de diminuir a variância de sua previsão, gerando dados adicionais para treinamento a partir de seu conjunto de dados original usando combinações com repetições para produzir multisets da mesma cardinalidade / tamanho de seus dados originais. Ao aumentar o tamanho do seu conjunto de treinamento, você não pode melhorar a força preditiva do modelo, mas apenas diminuir a variância, ajustando estreitamente a previsão ao resultado esperado.

Boosting: é uma abordagem de duas etapas, em que um primeiro usa subconjuntos dos dados originais para produzir uma série de modelos de desempenho médio e, em seguida, "aumenta" seu desempenho combinando-os usando uma função de custo específica (= voto da maioria). Ao contrário do bagging, no boost clássico, a criação do subconjunto não é aleatória e depende do desempenho dos modelos anteriores: cada novo subconjunto contém os elementos que foram (provavelmente) classificados incorretamente pelos modelos anteriores.

stacking : é semelhante a impulsionar: você também aplica vários modelos aos dados originais. A diferença aqui é, no entanto, que você não tem apenas uma fórmula empírica para sua função de peso, em vez disso, você introduz um metanível e usa outro modelo / abordagem para estimar a entrada junto com as saídas de cada modelo para estimar os pesos ou , em outras palavras, para determinar quais modelos têm um bom desempenho e o que é ruim, dados esses dados de entrada


## Instalação dos pacotes necessários

In [None]:
!python3 -m pip install optuna

Collecting optuna
  Downloading optuna-2.10.0-py3-none-any.whl (308 kB)
[K     |████████████████████████████████| 308 kB 5.4 MB/s 
Collecting alembic
  Downloading alembic-1.7.5-py3-none-any.whl (209 kB)
[K     |████████████████████████████████| 209 kB 43.8 MB/s 
[?25hCollecting cliff
  Downloading cliff-3.10.0-py3-none-any.whl (80 kB)
[K     |████████████████████████████████| 80 kB 8.3 MB/s 
Collecting colorlog
  Downloading colorlog-6.6.0-py2.py3-none-any.whl (11 kB)
Collecting cmaes>=0.8.2
  Downloading cmaes-0.8.2-py3-none-any.whl (15 kB)
Collecting Mako
  Downloading Mako-1.1.6-py2.py3-none-any.whl (75 kB)
[K     |████████████████████████████████| 75 kB 4.6 MB/s 
Collecting cmd2>=1.0.0
  Downloading cmd2-2.3.3-py3-none-any.whl (149 kB)
[K     |████████████████████████████████| 149 kB 52.1 MB/s 
[?25hCollecting autopage>=0.4.0
  Downloading autopage-0.4.0-py3-none-any.whl (20 kB)
Collecting stevedore>=2.0.1
  Downloading stevedore-3.5.0-py3-none-any.whl (49 kB)
[K     |████

# Análise Exploratória e preparação dos dados

In [None]:
DATA_FOLDER = (
    'https://archive.ics.uci.edu/ml/machine-learning-databases/'
    'credit-screening/'
)

DATA_DESCRIPTION = DATA_FOLDER + 'crx.names'
DATA_SET = DATA_FOLDER + 'crx.data'

In [None]:
import pandas
import numpy

aliases = [
  'Gender', 'Age', 'Debt', 'Married', 'BankCustomer', 'EducationLevel',
  'Ethnicity', 'YearsEmployed', 'PriorDefault', 'Employed', 'CreditScore',
  'DriversLicense', 'Citizen', 'ZipCode', 'Income', 'Approved'
]
data = pandas.read_csv(DATA_SET, names=aliases, na_values='?', header=None)
data.head()

Unnamed: 0,Gender,Age,Debt,Married,BankCustomer,EducationLevel,Ethnicity,YearsEmployed,PriorDefault,Employed,CreditScore,DriversLicense,Citizen,ZipCode,Income,Approved
0,b,30.83,0.0,u,g,w,v,1.25,t,t,1,f,g,202.0,0,+
1,a,58.67,4.46,u,g,q,h,3.04,t,t,6,f,g,43.0,560,+
2,a,24.5,0.5,u,g,q,h,1.5,t,f,0,f,g,280.0,824,+
3,b,27.83,1.54,u,g,w,v,3.75,t,t,5,t,g,100.0,3,+
4,b,20.17,5.625,u,g,w,v,1.71,t,f,0,f,s,120.0,0,+


#### Remoção das variáveis Ethnicity (A7) e ZipCode (A14) por não exercerem influência na variável alvo

In [None]:
# removing useless variables A7 (Ethnicity) and A14 (ZipCode)

data.drop(['Ethnicity', 'ZipCode'], axis=1, inplace=True)
data.head()

Unnamed: 0,Gender,Age,Debt,Married,BankCustomer,EducationLevel,YearsEmployed,PriorDefault,Employed,CreditScore,DriversLicense,Citizen,Income,Approved
0,b,30.83,0.0,u,g,w,1.25,t,t,1,f,g,0,+
1,a,58.67,4.46,u,g,q,3.04,t,t,6,f,g,560,+
2,a,24.5,0.5,u,g,q,1.5,t,f,0,f,g,824,+
3,b,27.83,1.54,u,g,w,3.75,t,t,5,t,g,3,+
4,b,20.17,5.625,u,g,w,1.71,t,f,0,f,s,0,+


#### Separação das variáveis em contínuas e categóricas

In [None]:
continuous = data.describe().columns
categorical = data.drop(list(continuous) + ['Approved'], axis=1).columns

print(continuous)
print(categorical)

Index(['Age', 'Debt', 'YearsEmployed', 'CreditScore', 'Income'], dtype='object')
Index(['Gender', 'Married', 'BankCustomer', 'EducationLevel', 'PriorDefault',
       'Employed', 'DriversLicense', 'Citizen'],
      dtype='object')


# Limpeza dos dados

## Modelo de regressão linear para completar dados continuos ausentes

Variáveis continuas ausentes serão preenchidas por valores previstos a partir de um modelo de regressão linear construído a partir da coluna com valores ausentes e da coluna com todos os valores mais fortemente correlacionada a ela

In [None]:
continuous_columns_missing_values = []

for column in continuous:
  if data[column].isnull().sum() > 0:
    continuous_columns_missing_values.append(column)

print(continuous_columns_missing_values)

['Age']


In [None]:
most_correlated_columns = {}
candidates = [
  x for x in continuous if x not in continuous_columns_missing_values
]
for column in continuous_columns_missing_values:
  most_correlated_columns[column] = max(
      candidates, key=lambda x: abs(data[x].corr(data[column]))
  )

In [None]:
from sklearn.linear_model import LinearRegression

In [None]:
prediction_models = {}

for pair in most_correlated_columns.items():
  rows = data[~data[list(pair)].isnull().any(axis=1)][list(pair)]
  y = rows[pair[0]]
  x = rows[pair[1]]
  lr = LinearRegression()
  lr.fit(x.values.reshape(-1, 1), y)
  d = pandas.DataFrame(data= {
      'value' : lr.predict(data[pair[1]].values.reshape(-1, 1))
  })
  data[pair[0]] = numpy.where(data[column].isna(), d['value'], data[column])

data[continuous].isna().sum()

Age              0
Debt             0
YearsEmployed    0
CreditScore      0
Income           0
dtype: int64

# Codificação ortogonal

Valores categóricos foram codificados no formato one hot encoding.

Os valores ausentes foram completados usando um algoritmo de árvore de decisão.

In [None]:
categorical_columns_missing_values = [
  p[0] for p in dict(data[categorical].isna().sum() > 0).items() if p[1]
]
complete_data = data.dropna()
print(categorical_columns_missing_values)

['Gender', 'Married', 'BankCustomer', 'EducationLevel']


In [None]:
from sklearn.preprocessing import LabelEncoder
from collections import defaultdict

label_dict = defaultdict(LabelEncoder)
complete_data = complete_data.apply(
    lambda x: label_dict[x.name].fit_transform(x)
    if x.name in list(categorical) + ['Approved']
    else x
)

In [None]:
from sklearn.tree import DecisionTreeClassifier

In [None]:
trees = {}
X = complete_data.drop(categorical_columns_missing_values, axis=1)
for column in categorical_columns_missing_values:
  Y = complete_data[column]
  tree = DecisionTreeClassifier(
      max_leaf_nodes=Y.nunique(), random_state=2**Y.nunique()
  )
  trees[column] = tree
  tree.fit(X.values, Y.values)

In [None]:
for column in trees:
  tree = trees[column]
  encoder = label_dict[column]
  d = pandas.DataFrame(data= {
      'value' : encoder.inverse_transform(
          tree.predict(
          data.drop(categorical_columns_missing_values, axis=1).apply(
                  lambda x: label_dict[x.name].fit_transform(x)
                  if x.name in list(categorical) + ['Approved']
                  else x
          ).values
        )
      )
    }
  )

  data[column] = numpy.where(data[column].isna(), d['value'], data[column])

In [None]:
from sklearn.preprocessing import OneHotEncoder

In [None]:
onehotencoders = defaultdict(OneHotEncoder)

new_categories = []

for column in categorical:
  encoder = onehotencoders[column]
  encoder.fit(data[column].values.reshape(-1, 1))
  arr = encoder.transform(data[column].values.reshape(-1, 1)).toarray()
  data.drop(column, axis=1, inplace=True)
  d = dict(zip(*[column+'_'+ c for c in encoder.categories_], zip(*arr)))
  for k in d:
    data[k] = d[k]
    new_categories.append(k)

data.head()
categorical = new_categories

In [None]:
# encoder = OneHotEncoder()
# encoder.fit(data['Gender'].values.reshape(-1, 1))
# arr = encoder.transform(data['Gender'].values.reshape(-1, 1)).toarray()
# d = dict(zip(*encoder.categories_, zip(*arr)))
# t = dict(zip(*['data_'+ c for c in encoder.categories_], zip(*arr)))

In [None]:
labels = data['Approved']
data.drop('Approved', axis=1, inplace=True)
X = data.apply(
    lambda x: label_dict[x.name].fit_transform(x)
    if x.name in categorical
    else x
)
print(X)

       Age    Debt  YearsEmployed  ...  Citizen_g  Citizen_p  Citizen_s
0    30.83   0.000           1.25  ...          1          0          0
1    58.67   4.460           3.04  ...          1          0          0
2    24.50   0.500           1.50  ...          1          0          0
3    27.83   1.540           3.75  ...          1          0          0
4    20.17   5.625           1.71  ...          0          0          1
..     ...     ...            ...  ...        ...        ...        ...
685  21.08  10.085           1.25  ...          1          0          0
686  22.67   0.750           2.00  ...          1          0          0
687  25.25  13.500           2.00  ...          1          0          0
688  17.92   0.205           0.04  ...          1          0          0
689  35.00   3.375           8.29  ...          1          0          0

[690 rows x 36 columns]


In [None]:
Y = pandas.DataFrame(
    LabelEncoder().fit_transform(labels), columns=[labels.name]
)
print(Y)

     Approved
0           0
1           0
2           0
3           0
4           0
..        ...
685         1
686         1
687         1
688         1
689         1

[690 rows x 1 columns]


# Divisão das instâncias em treinamento e teste

In [None]:
from sklearn.model_selection import train_test_split

In [None]:
X_train, X_test, Y_train, Y_test = train_test_split(
    X.values, Y.values, test_size = 0.2, random_state = 4
)

In [None]:
X_train = pandas.DataFrame(X_train, columns=X.columns)
X_test = pandas.DataFrame(X_test, columns=X.columns)
Y_train = pandas.DataFrame(Y_train, columns=Y.columns)
Y_test = pandas.DataFrame(Y_test, columns=Y.columns)

In [None]:
print(X_train.shape)
print(Y_train.shape)
print(X_test.shape)
print(Y_test.shape)

(552, 36)
(552, 1)
(138, 36)
(138, 1)


# Transformação dos Dados

* Uma vez que as variáveis continuas possuem valores entre 0 e um determinado limite, estes serão normalizados entre os valores 0.0 e 1.0 para análise de diminuição de dimensionalidade

In [None]:
X_train[continuous].describe()

Unnamed: 0,Age,Debt,YearsEmployed,CreditScore,Income
count,552.0,552.0,552.0,552.0,552.0
mean,31.375927,4.723342,2.222554,2.574275,1016.20471
std,11.79137,4.95801,3.34217,5.163208,5328.577631
min,13.75,0.0,0.0,0.0,0.0
25%,22.5,0.875,0.165,0.0,0.0
50%,28.448036,2.75,1.0,0.0,5.5
75%,37.52,7.3125,2.55125,3.0,462.25
max,80.25,28.0,28.5,67.0,100000.0


In [None]:
X_test[continuous].describe()

Unnamed: 0,Age,Debt,YearsEmployed,CreditScore,Income
count,138.0,138.0,138.0,138.0,138.0
mean,32.437283,4.900254,2.226812,1.702899,1022.108696
std,12.17297,5.073769,3.376051,3.331796,4724.580194
min,15.17,0.0,0.0,0.0,0.0
25%,23.08,1.25,0.125,0.0,0.0
50%,29.585,2.73,0.75,0.0,1.0
75%,39.1075,7.0,2.75,2.0,200.0
max,76.75,25.21,17.5,20.0,50000.0


In [None]:
from sklearn.preprocessing import MinMaxScaler

In [None]:
train_scalers = defaultdict(MinMaxScaler)
test_scalers = defaultdict(MinMaxScaler)

for column in continuous:
  train_scaler = train_scalers[column]
  test_scaler = test_scalers[column]
  X_train[column] = train_scaler.fit_transform(X_train[column].values.reshape(-1, 1))
  X_test[column] = test_scaler.fit_transform(X_test[column].values.reshape(-1, 1))

# Redução da dimensionalidade

<!--

* https://www.analyticsvidhya.com/blog/2018/08/dimensionality-reduction-techniques-python/

 -->

## Principal component analysis

<!--

* https://www.datasklr.com/principal-component-analysis-and-factor-analysis/principal-component-analysis
* https://www.youtube.com/watch?v=FgakZw6K1QQ
* https://jmausolf.github.io/code/pca_in_python/

-->

In [None]:
from sklearn.decomposition import PCA

In [None]:
pca_train = PCA()
pca_train.fit(X_train[continuous])

PCA()

In [None]:
numpy.cumsum(pca_train.explained_variance_ratio_)

array([0.49681538, 0.7953898 , 0.91164728, 0.96771056, 1.        ])

In [None]:
pca_X_train = pandas.DataFrame(
    data = pca_train.transform(X_train[continuous]),
    columns = ['PC%d' % (i) for i in numpy.arange(pca_train.n_components_)+1]
)

max_column = numpy.argmax(numpy.cumsum(pca_train.explained_variance_ratio_) > 0.9) + 1
principal_components = pca_X_train.columns[:max_column:]

pca_X_train = pandas.concat(
    [pca_X_train[principal_components], X_train[categorical]],
    axis = 1
)

pca_X_train

Unnamed: 0,PC1,PC2,PC3,Gender_a,Gender_b,Married_l,Married_u,Married_y,BankCustomer_g,BankCustomer_gg,BankCustomer_p,EducationLevel_aa,EducationLevel_c,EducationLevel_cc,EducationLevel_d,EducationLevel_e,EducationLevel_ff,EducationLevel_i,EducationLevel_j,EducationLevel_k,EducationLevel_m,EducationLevel_q,EducationLevel_r,EducationLevel_w,EducationLevel_x,PriorDefault_f,PriorDefault_t,Employed_f,Employed_t,DriversLicense_f,DriversLicense_t,Citizen_g,Citizen_p,Citizen_s
0,-0.080758,-0.184270,-0.065782,0.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0,1.0,0.0
1,-0.216770,-0.018099,0.009423,1.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,1.0
2,-0.189481,-0.012675,0.048359,0.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,1.0,1.0,0.0,0.0
3,0.087857,-0.047823,-0.124984,0.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,1.0,0.0,1.0,0.0,0.0
4,-0.130168,0.191575,-0.026154,0.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
547,-0.143830,0.060022,0.087901,0.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0,1.0,0.0,0.0
548,0.041393,0.023916,0.093961,1.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,1.0
549,-0.167171,-0.040376,-0.002445,0.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,1.0,1.0,0.0,0.0
550,0.028834,0.238163,-0.045819,0.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0


In [None]:
pca_X_test = pandas.DataFrame(
    data = pca_train.transform(X_test[continuous]),
    columns = ['PC%d' % (i) for i in numpy.arange(pca_train.n_components_)+1]
)

principal_components = pca_X_test.columns[:max_column:]

pca_X_test = pandas.concat(
    [pca_X_test[principal_components], X_test[categorical]],
    axis = 1
)

pca_X_test

Unnamed: 0,PC1,PC2,PC3,Gender_a,Gender_b,Married_l,Married_u,Married_y,BankCustomer_g,BankCustomer_gg,BankCustomer_p,EducationLevel_aa,EducationLevel_c,EducationLevel_cc,EducationLevel_d,EducationLevel_e,EducationLevel_ff,EducationLevel_i,EducationLevel_j,EducationLevel_k,EducationLevel_m,EducationLevel_q,EducationLevel_r,EducationLevel_w,EducationLevel_x,PriorDefault_f,PriorDefault_t,Employed_f,Employed_t,DriversLicense_f,DriversLicense_t,Citizen_g,Citizen_p,Citizen_s
0,0.013034,0.154900,0.158099,1.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,1.0,0.0,1.0,0.0,0.0
1,0.616798,-0.115678,-0.053012,0.0,1.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0
2,-0.141278,0.023545,-0.017648,1.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,1.0,1.0,0.0,0.0
3,0.207157,-0.161917,0.414889,0.0,1.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,1.0
4,-0.293477,0.043979,0.033666,0.0,1.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
133,-0.131200,0.060932,0.003077,0.0,1.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,1.0,1.0,0.0,0.0
134,-0.245238,0.012432,0.017578,0.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0
135,0.305105,-0.170936,-0.040540,1.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,1.0,1.0,0.0,0.0
136,0.237959,0.276361,-0.153109,0.0,1.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,1.0,0.0,1.0,0.0,0.0


In [None]:
import optuna
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
from sklearn.neural_network import MLPClassifier
from sklearn.ensemble import BaggingClassifier, StackingClassifier, \
AdaBoostClassifier

In [None]:
def bagging_factory(model, n_estimators, max_samples, max_features):
  return BaggingClassifier(
      model, n_estimators=n_estimators, max_samples=max_samples,
      max_features=max_features
)

def stacking_factory(model, n_estimators):
  estimators = []
  for i in range(n_estimators):
    estimators.append(('dt_%i' % i, model))
  return StackingClassifier(estimators=estimators)

def boosting_factory(
    model, n_estimators, learning_rate, algorithm
):
  return AdaBoostClassifier(
      model, n_estimators=n_estimators, learning_rate=learning_rate,
      algorithm=algorithm
  )

def ensemble_optimization_study(
    trial: optuna.trial.FixedTrial, ensemble_type, estimator
):
  trial.set_user_attr('Type', ensemble_type)
  n_estimators = trial.suggest_int('n_estimators', 3, 256, 2)
  if ensemble_type == 'Stacking':
    model = stacking_factory(estimator, n_estimators)
  elif ensemble_type == 'Bagging':
    max_samples = trial.suggest_float('max_features', 0.5, 1.0, step=0.1)
    max_features = trial.suggest_float('max_features', 0.5, 1.0, step=0.1)
    model = bagging_factory(estimator, n_estimators, max_samples, max_features)
  elif ensemble_type == 'AdaBoosting':
    learning_rate = trial.suggest_float('learning_rate', 0.1, 1.0, step=0.1)
    algorithm = trial.suggest_categorical('algorithm', ['SAMME'])
    model = boosting_factory(
        estimator, n_estimators, learning_rate, algorithm
    )
  model.fit(pca_X_train.values, Y_train.values.ravel())
  return model.score(pca_X_test.values, Y_test.values.ravel())

## Decision Tree

In [None]:
def decision_tree_model_factory(trial: optuna.trial.FixedTrial):
  criterion = trial.suggest_categorical('criterion', ['gini', 'entropy'])
  return DecisionTreeClassifier(criterion=criterion)

def decision_tree_bagging_model_optimization(trial: optuna.trial.FixedTrial):
  estimator = decision_tree_model_factory(trial)
  return ensemble_optimization_study(trial, 'Bagging', estimator)

def decision_tree_stacking_model_optimization(trial: optuna.trial.FixedTrial):
  estimator = decision_tree_model_factory(trial)
  return ensemble_optimization_study(trial, 'Stacking', estimator)

def decision_tree_boosting_model_optimization(trial: optuna.trial.FixedTrial):
  estimator = decision_tree_model_factory(trial)
  return ensemble_optimization_study(trial, 'AdaBoosting', estimator)

In [None]:
decision_tree_ensemble_optimization_study = optuna.study.create_study(
    study_name='Decision Tree Ensemble Optimization Study', direction='maximize'
)

decision_tree_ensemble_optimization_study.optimize(
    decision_tree_bagging_model_optimization, n_trials=100
)

decision_tree_ensemble_optimization_study.optimize(
    decision_tree_stacking_model_optimization, n_trials=25
)

decision_tree_ensemble_optimization_study.optimize(
    decision_tree_boosting_model_optimization, n_trials=50
)

[32m[I 2021-11-29 23:57:30,775][0m A new study created in memory with name: Decision Tree Ensemble Optimization Study[0m

The distribution is specified by [3, 256] and step=2, but the range is not divisible by `step`. It will be replaced by [3, 255].

[32m[I 2021-11-29 23:57:31,103][0m Trial 0 finished with value: 0.7971014492753623 and parameters: {'criterion': 'entropy', 'n_estimators': 75, 'max_features': 1.0}. Best is trial 0 with value: 0.7971014492753623.[0m

The distribution is specified by [3, 256] and step=2, but the range is not divisible by `step`. It will be replaced by [3, 255].

[32m[I 2021-11-29 23:57:31,339][0m Trial 1 finished with value: 0.8115942028985508 and parameters: {'criterion': 'gini', 'n_estimators': 85, 'max_features': 0.7}. Best is trial 1 with value: 0.8115942028985508.[0m

The distribution is specified by [3, 256] and step=2, but the range is not divisible by `step`. It will be replaced by [3, 255].

[32m[I 2021-11-29 23:57:31,531][0m Trial 2 f

KeyboardInterrupt: ignored

## SVC

In [None]:
def svc_model_factory(trial: optuna.trial.FixedTrial):
  C = trial.suggest_categorical('c_value', [0.01, 0.1, 1, 10, 100])
  kernel = trial.suggest_categorical(
      'kernel', ['linear', 'poly', 'rbf', 'sigmoid']
  )
  degree = trial.suggest_int('degree', 1, 3)
  gamma = trial.suggest_categorical(
      'gamma', ['scale', 'auto']
  )
  coef0 = trial.suggest_float('coef0', 0.1, 1.0, step=0.1)
  return SVC(C=C, kernel=kernel, degree=degree, gamma=gamma, coef0=coef0)

def svc_bagging_model_optimization(trial: optuna.trial.FixedTrial):
  estimator = svc_model_factory(trial)
  return ensemble_optimization_study(trial, 'Bagging', estimator)

def svc_stacking_model_optimization(trial: optuna.trial.FixedTrial):
  estimator = svc_model_factory(trial)
  return ensemble_optimization_study(trial, 'Stacking', estimator)

def svc_boosting_model_optimization(trial: optuna.trial.FixedTrial):
  estimator = svc_model_factory(trial)
  return ensemble_optimization_study(trial, 'AdaBoosting', estimator)

In [None]:
svc_ensemble_optimization_study = optuna.study.create_study(
    study_name='SVC Ensemble Optimization Study', direction='maximize'
)

svc_ensemble_optimization_study.optimize(
    svc_bagging_model_optimization, n_trials=150
)

svc_ensemble_optimization_study.optimize(
    svc_stacking_model_optimization, n_trials=100
)

svc_ensemble_optimization_study.optimize(
    svc_boosting_model_optimization, n_trials=100
)

## MLP

In [None]:
def mlp_generator(hidden_layers, activation_fn, _solver, l2_alpha, **args):
  return MLPClassifier(
      hidden_layer_sizes=hidden_layers,
      activation = activation_fn,
      solver = _solver,
      alpha = l2_alpha,
      **args
  )

def mlp_model_factory(trial: optuna.trial.FixedTrial):
  insize = pca_X_train.values.shape[1] + 1
  outsize = 2
  number_of_layers = trial.suggest_int('number_of_layers', 1, 2)
  hl1 = trial.suggest_int('hidden_layer_1', outsize, insize)
  hidden_layers = (
      (hl1, trial.suggest_int('hidden_layer_2', outsize, insize))
      if number_of_layers > 1
      else (hl1,)
  )
  activation_fn = trial.suggest_categorical(
      'activation_fn', ['identity', 'logistic', 'tanh', 'relu']
  )
  _solver = trial.suggest_categorical(
      '_solver', ['lbfgs', 'sgd', 'adam']
  )
  l2_alpha = trial.suggest_loguniform('l2_alpha', 0.1, 10)
  early_stopping = (
      trial.suggest_categorical('early_stopping', [False, True])
      if _solver != 'lbfgs'
      else False
  )
  epochs = (
      trial.suggest_int('epochs', 100, 500, 50)
      if _solver != 'lbfgs'
      else trial.suggest_int('epochs', 5000, 10000, 1000)
  )
  return (
      mlp_generator(hidden_layers, activation_fn, _solver, l2_alpha,
                    early_stopping = True)
      if early_stopping
      else mlp_generator(
          hidden_layers, activation_fn, _solver, l2_alpha,early_stopping = True,
          max_iter=epochs
      )
  )

def mlp_bagging_model_optimization(trial: optuna.trial.FixedTrial):
  estimator = mlp_model_factory(trial)
  return ensemble_optimization_study(trial, 'Bagging', estimator)

def mlp_stacking_model_optimization(trial: optuna.trial.FixedTrial):
  estimator = mlp_model_factory(trial)
  return ensemble_optimization_study(trial, 'Stacking', estimator)

def mlp_boosting_model_optimization(trial: optuna.trial.FixedTrial):
  estimator = mlp_model_factory(trial)
  return ensemble_optimization_study(trial, 'AdaBoosting', estimator)

In [None]:
mlp_ensemble_optimization_study = optuna.study.create_study(
    study_name='MLP Ensemble Optimization Study', direction='maximize'
)

mlp_ensemble_optimization_study.optimize(
    mlp_bagging_model_optimization, n_trials=50
)

mlp_ensemble_optimization_study.optimize(
    mlp_stacking_model_optimization, n_trials=50
)

mlp_ensemble_optimization_study.optimize(
    mlp_boosting_model_optimization, n_trials=50
)

In [None]:
from sklearn.ensemble import VotingClassifier

In [None]:
tree = DecisionTreeClassifier(criterion='entropy')
_svm = SVC(C=100, kernel='poly', degree=3, gamma='scale', coef0=0.3)
_mlp = MLPClassifier(
      hidden_layer_sizes=(23, 7),
      activation = 'relu',
      solver = 'lbfgs',
      alpha = 0.12,
      max_iter = 6000
  )

models = [('tree', tree), ('svc', _svm), ('mlp', _mlp)]
classifier = VotingClassifier(models)
classifier.fit(pca_X_train.values, Y_train.values.ravel())
classifier.score(pca_X_test.values, Y_test.values.ravel())

# <center>Considerações Finais </center>

Nesta missão foram utilizados os classificadores de Decision Tree, SVM e MLP para o Ensemble Learning. Também realizamos a implementação do bagging, boosting e stacking para cada algoritmo. Onúmero de estimadores variou entre 3 e 256 com passos de 2. Para o bagging foi analisado o max_features de 0.5 à 1.0 com passo de 0.1 e max_features de 0.5 à 1.0 com passo de 0.1. Já o Bosstinf utilizou o learning rate de 0.1 à 1.0 com passos de 0.1.

Para o decision Tree foi utilizado dois Critérios: gini e entropy para construir o modelo. Para o bagging foram testados 100 trials, 25 para o stacking e 50 para o boosting. Dentre os três o melhor resultado de teste foi no bagging com valor de aproximadamente: 0.84 e parametros: criterion: entropy, n_estimators: 103, max_features: 0.7.


Já o SVM, foi treinado com 4 *kernels*: linear, polynomial, rbf e sigmoid com ajustes dos parâmetros *Gamma*, *degree* e *C*. Os valores de *Gamma* foram: *Scale* e *auto*, o *degree* foram de 1 a 3, o coef0 usou uma distribuição uniforme *random* com valor miníno 0,1 e máximo de 1,0, o C  utilizou uma distribuição uniforme *random* com valor miníno de 0,01 e máximo de 100. Para o bagging foram testados 150 trials, 100 para o stacking e 100 para o boosting. Dentre os três o melhor resultado de teste foi no bagging com valor de aproximadamente: 0.86 e parametros: c_value: 100, kernel: poly, degree: 3, gamma: scale, coef0: 0.30, n_estimators: 155, max_features: 0.5.


Já o MLP foi treinado com 4 funções de ativação: identity, logistic, tanh e relu. Como também 3 solvers: lbfgs, sgd, adam. Foi utilizado também a regulariação L2 que variou de 0.1 à 10. Também foi utilizado o early stop que é outra forma usada para evitar o overfitting. Para os solvers sgd e adam foram usados a variação das epoch de 100 à 500, com incremento de 50. Já para o solver lbfgs foi utilizado a variação de 1000 à 5000, com incremento de 1000. Para o número de camadas foram testados dois valores 1 e 2, já para as camadas escondidas variando de 1 até 100. Para o bagging foram testados 50 trials, 50 para o stacking e 50 para o boosting. Dentre os três o melhor resultado de teste foi no bagging com valor de aproximadamente: 0.85 e parametros: number_of_layers 2, hidden_layer_1: 7, hidden_layer_2: 23, activation_fn: relu, _solver: lbfgs, l2_alpha: 0.1194505140855198, epochs: 6000, n_estimators: 151, max_features: 0.5. 


O algoritmo de Decision Tree não foi possível realizar as comparações, pois na missão de arvóres não foi realizado o treinamento do modelo, apenas foi uma missão sobre multicolinearidade.

O SVM treinado sozinho obteve uma acurácia de 0,83 nos dados de testes, já utilizando o SVM em Ensemble learning houver um aumento de 3 pontos percentuais na acurácia, obtendo 0,86.

Já o MLP treinado sozinho obteve uma acurácia de 0,87 nos dados de testes, já utilizando o MLP em Ensemble learning houver uma perda de acurácia, pois obteve um valor de aproximadamente 0,85. Não era esperado esse resultado, mas tentamos várias variações e não conseguimos superar o resultado do modelo isolado. 

No caso do SVM, o ensemble learning aumentou a acurácia como era esperado, visto que utiliza vários cômites.

Por fim, tentamos um vote classifier usando os melhores classificadores entre os testados, obtendo uma acurácia de 80%.


## <center> Referências </center>

* https://towardsdatascience.com/ensemble-methods-bagging-boosting-and-stacking-c9214a10a205

* https://stats.stackexchange.com/questions/18891/bagging-boosting-and-stacking-in-machine-learning
