## Itens Requeridos

- Incorporar um pipeline no aprendizado;
- Utilizar pré-processamento na base de dados;
- Reduzir a dimensionalidade da base;
- Incorporar a validação cruzada para melhor avaliar as técnicas;
- Reavaliar as metodologias e modelos com essa nova abordagem.

## Importando os dados

In [None]:
import pandas as pd
df = pd.read_csv('dados_sem_anomalias.csv')
df.head()

Unnamed: 0,dispositivo_1,dispositivo_2,dispositivo_3,dispositivo_4,dispositivo_5,dispositivo_6,dispositivo_7,dispositivo_8,dispositivo_9,dispositivo_10,...,dispositivo_42,dispositivo_43,dispositivo_44,dispositivo_45,dispositivo_46,dispositivo_47,dispositivo_48,dispositivo_49,dispositivo_50,falha
0,73.18,61.7,44.79,34.7,64.35,31.37,71.95,46.84,45.4,57.63,...,57.5,49.11,35.51,49.83,35.35,56.37,56.21,50.41,42.17,0
1,48.7,36.58,42.64,51.02,66.17,43.68,51.84,57.06,40.92,33.1,...,42.58,45.03,55.41,56.54,34.13,50.11,49.88,49.82,69.11,0
2,45.65,69.17,48.58,34.39,42.41,41.61,59.15,55.03,59.03,59.72,...,74.03,48.05,39.78,58.47,63.05,54.8,68.53,45.07,71.07,0
3,63.11,49.81,38.17,59.98,61.59,59.39,48.5,55.62,52.2,30.47,...,43.08,47.89,32.3,66.46,54.78,60.01,21.4,53.12,50.01,0
4,28.41,38.22,43.15,39.12,58.32,71.58,36.61,45.84,35.68,45.38,...,58.2,55.04,36.48,52.88,54.85,66.86,50.58,58.64,53.66,0


In [None]:
X = df.drop('falha', axis=1).values
y = df['falha'].values

## Incorporar um pipeline no aprendizado

In [None]:
from sklearn.pipeline import Pipeline

## Utilizar pré-processamento na base de dados e Reduzir a dimensionalidade da base

In [None]:
from sklearn.preprocessing import StandardScaler, MinMaxScaler, RobustScaler
from sklearn.decomposition import PCA, IncrementalPCA, FastICA

prepros = [
    None,
    StandardScaler(),
    MinMaxScaler(),
    RobustScaler(),
]

redutores = [
    None,
    PCA(random_state=42),
    IncrementalPCA(),
    FastICA(random_state=42),
]

T-SNE outra técnica de decomposição, mas é pesada melhor usar para conjunto de dados menor

## Incorporar a validação cruzada para melhor avaliar as técnicas

Validação cruzada só da para fazer com nível clássico de aprendizado, não da para fazer com deep learning

In [None]:
from sklearn.model_selection import cross_validate

### Invertendo a validação cruzada

In [None]:
from sklearn.model_selection import KFold

class InvertedKFold(KFold):
    def split(self, X, y=None, groups=None):
        for train, test in super().split(X, y, groups):
            yield test, train

cv = InvertedKFold(n_splits=4)

No código divide em 4 grupos, mas ao inves de usar 3 para treinar e 1 para testar ele faz o oposto, ou seja, só treina com um grupo.

### Importando técnicas de aprendizagem

In [None]:
from sklearn.linear_model import LogisticRegression, PassiveAggressiveClassifier
from sklearn.ensemble import AdaBoostClassifier, BaggingClassifier, ExtraTreesClassifier
from sklearn.ensemble import GradientBoostingClassifier, HistGradientBoostingClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
import time

aprendizados = [
    LogisticRegression(random_state=42),
    PassiveAggressiveClassifier(random_state=42),
    AdaBoostClassifier(random_state=42),
    BaggingClassifier(random_state=42),
    ExtraTreesClassifier(random_state=42),
    GradientBoostingClassifier(random_state=42),
    HistGradientBoostingClassifier(random_state=42),
    RandomForestClassifier(random_state=42),
    KNeighborsClassifier(),
    DecisionTreeClassifier(random_state=42),
]

### Realizando treinamento

In [None]:
from tqdm import tqdm
import numpy as np

resultados = []
for pp, red, ap in tqdm([(pp, red, ap) for pp in prepros for red in redutores for ap in aprendizados]):

    steps = []
    if pp is not None:
        steps.append((pp.__class__.__name__, pp))
    if red is not None:
        steps.append((red.__class__.__name__, red))
    steps.append((ap.__class__.__name__, ap))

    pipe = Pipeline(steps)

    t0 = time.time()
    cv_results = cross_validate(pipe, X, y, cv=cv, scoring=['accuracy', 'f1'], n_jobs=-1)
    tempo = time.time() - t0

    res = {
        'preprocessamento': pp.__class__.__name__,
        'reducao': red.__class__.__name__,
        'aprendizado': ap.__class__.__name__,
        'acuracia': np.mean(cv_results['test_accuracy']),
        'f1': np.mean(cv_results['test_f1']),
        'tempo': tempo
    }
    resultados.append(res)

df_res = pd.DataFrame(resultados)

# Ordenar decrescente os melhores modelos avaliados pelas métricas
df_res.sort_values('f1', ascending=False)

100%|██████████| 160/160 [4:46:17<00:00, 107.36s/it]


Unnamed: 0,preprocessamento,reducao,aprendizado,acuracia,f1,tempo
46,StandardScaler,NoneType,HistGradientBoostingClassifier,0.911012,0.909925,16.791515
6,NoneType,NoneType,HistGradientBoostingClassifier,0.911012,0.909925,16.651869
86,MinMaxScaler,NoneType,HistGradientBoostingClassifier,0.911012,0.909925,15.347199
126,RobustScaler,NoneType,HistGradientBoostingClassifier,0.911012,0.909925,15.639834
36,NoneType,FastICA,HistGradientBoostingClassifier,0.906245,0.904791,105.543447
...,...,...,...,...,...,...
111,MinMaxScaler,FastICA,PassiveAggressiveClassifier,0.692363,0.631689,89.274028
91,MinMaxScaler,PCA,PassiveAggressiveClassifier,0.668957,0.566836,1.971151
101,MinMaxScaler,IncrementalPCA,PassiveAggressiveClassifier,0.667366,0.564143,2.321188
81,MinMaxScaler,NoneType,PassiveAggressiveClassifier,0.641241,0.559081,1.173637



Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `y` variable to `hue` and set `legend=False` for the same effect.




Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `y` variable to `hue` and set `legend=False` for the same effect.




Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `y` variable to `hue` and set `legend=False` for the same effect.




Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `y` variable to `hue` and set `legend=False` for the same effect.



In [None]:
df_res.to_csv('resultados_etapa3.csv', index=False)