#**Notebook Utilizado no Desenvolvimento do Artigo: "Análise da Robustez de Algoritmos de Aprendizado de Máquina em Dados do Transtorno do Espectro Autista"**

Este notebook explora o desempenho de diversos algoritmos de aprendizado de máquina aplicados a uma base de dados de triagem de autismo voltada para crianças entre 4 e 11 anos. Além disso, ele calcula o desempenho desses mesmos modelos quando inseridos erros nas bases de dados, sendo possível comparar o desempenho original com os resultados obtidos após a introdução dos erros. As tabelas contidas neste notebook permitem obter uma visão detalhada sobre a robustez dos modelos em diferentes cenários.

#Child Autism Dataset

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


Carregando o Dataframe.

In [None]:
import pandas as pd

df = pd.read_csv("/content/drive/MyDrive/archive/Child-Data2018.csv")

In [None]:
pd.set_option('display.max_columns', None)

In [None]:
df.head()

Unnamed: 0,Case No,A1,A2,A3,A4,A5,A6,A7,A8,A9,A10,Age,Sex,Ethnicity,Jaundice,Family_ASD,Residence,Used_App_Before,Why taken the screening,Score,Screening Type,Language,User,Class
0,1,0,1,1,0,0,1,1,0,0,1,4,m,middle eastern,yes,no,Libya,no,,5,4-11 years,arabic,parent,NO
1,3,0,1,1,1,1,1,1,0,0,1,4,m,middle eastern,yes,no,Libya,yes,??????,7,4-11 years,arabic,parent,YES
2,4,0,1,1,1,1,1,0,1,1,1,5,m,white,no,no,Russia,no,,8,4-11 years,russian,parent,YES
3,8,0,1,1,0,1,1,1,0,0,0,4,m,middle eastern,yes,no,Libya,no,,5,4-11 years,arabic,parent,NO
4,9,0,0,1,1,1,1,0,1,0,1,5,m,white,no,no,Russia,no,,6,4-11 years,russian,parent,NO


In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 509 entries, 0 to 508
Data columns (total 24 columns):
 #   Column                    Non-Null Count  Dtype 
---  ------                    --------------  ----- 
 0   Case No                   509 non-null    int64 
 1   A1                        509 non-null    int64 
 2   A2                        509 non-null    int64 
 3   A3                        509 non-null    int64 
 4   A4                        509 non-null    int64 
 5   A5                        509 non-null    int64 
 6   A6                        509 non-null    int64 
 7   A7                        509 non-null    int64 
 8   A8                        509 non-null    int64 
 9   A9                        509 non-null    int64 
 10  A10                       509 non-null    int64 
 11  Age                       509 non-null    int64 
 12  Sex                       509 non-null    object
 13  Ethnicity                 509 non-null    object
 14  Jaundice                  

#Eliminando atributos desnecessários

Removeremos os atributos não necessários/importantes para nossa análise.

In [None]:
df.drop(["Case No", "Score", "Why taken the screening ", "Used_App_Before", "Screening Type", "Residence", "Language", "User"], axis=1, inplace=True)

In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 509 entries, 0 to 508
Data columns (total 16 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   A1          509 non-null    int64 
 1   A2          509 non-null    int64 
 2   A3          509 non-null    int64 
 3   A4          509 non-null    int64 
 4   A5          509 non-null    int64 
 5   A6          509 non-null    int64 
 6   A7          509 non-null    int64 
 7   A8          509 non-null    int64 
 8   A9          509 non-null    int64 
 9   A10         509 non-null    int64 
 10  Age         509 non-null    int64 
 11  Sex         509 non-null    object
 12  Ethnicity   509 non-null    object
 13  Jaundice    509 non-null    object
 14  Family_ASD  509 non-null    object
 15  Class       509 non-null    object
dtypes: int64(11), object(5)
memory usage: 63.8+ KB


#Tranformando os atributos

De forma a obter o melhor desempenho, testaremos os algoritmos em duas versões do dataset: uma onde o atributo Ethnicity utiliza codificação numérica e outra onde o atributo Ethnicity utiliza codificação One-Hot-Encoding.

In [None]:
df2 = df.copy()

In [None]:
df.Ethnicity.unique()

array(['middle eastern', 'white', 'asian', 'latino', 'south asians',
       'aboriginal', 'others ', 'hispanic', 'black'], dtype=object)

Codificação Numérica do atributo Ethnicity:

In [None]:
df['Ethnicity'], _ = pd.factorize(df['Ethnicity'])

One-Hot-Encoding do atributo Ethnicity:

In [None]:
def get_dummies(dataframe, coluna):
    dataframe = pd.get_dummies(dataframe, columns=[coluna])
    return dataframe

In [None]:
df2 = get_dummies(df2, 'Ethnicity')

Resultado das transformações:

In [None]:
df['Ethnicity'].unique()

array([0, 1, 2, 3, 4, 5, 6, 7, 8])

In [None]:
df2.head().loc[:, 'Ethnicity_aboriginal':'Ethnicity_white']

Unnamed: 0,Ethnicity_aboriginal,Ethnicity_asian,Ethnicity_black,Ethnicity_hispanic,Ethnicity_latino,Ethnicity_middle eastern,Ethnicity_others,Ethnicity_south asians,Ethnicity_white
0,False,False,False,False,False,True,False,False,False
1,False,False,False,False,False,True,False,False,False
2,False,False,False,False,False,False,False,False,True
3,False,False,False,False,False,True,False,False,False
4,False,False,False,False,False,False,False,False,True


Convertendo os atributos Jaundice e Family_ASD:

In [None]:
def converter_binario(dataframe, coluna):
  dataframe[coluna] = dataframe[coluna].map({'yes': 1, 'no': 0})

In [None]:
converter_binario(df, "Jaundice ")
converter_binario(df, "Family_ASD")

In [None]:
converter_binario(df2, "Jaundice ")
converter_binario(df2, "Family_ASD")

Resultado das conversões:

In [None]:
df.iloc[:6, 13:15]

Unnamed: 0,Jaundice,Family_ASD
0,1,0
1,1,0
2,0,0
3,1,0
4,0,0
5,0,0


In [None]:
df2.iloc[:6, 12:14]

Unnamed: 0,Jaundice,Family_ASD
0,1,0
1,1,0
2,0,0
3,1,0
4,0,0
5,0,0


Convertendo o atributo Sex:

In [None]:
def converter_sexo(df):
  df['Sex'] = df['Sex'].map({'m': 1, 'f': 0})

In [None]:
converter_sexo(df)
converter_sexo(df2)

Resultado das conversões:

In [None]:
df.iloc[:6,11]

Unnamed: 0,Sex
0,1
1,1
2,1
3,1
4,1
5,0


In [None]:
df2.iloc[:6,11]

Unnamed: 0,Sex
0,1
1,1
2,1
3,1
4,1
5,0


Convertendo o rótulo para 0 ou 1:

In [None]:
def converter_class(df):
  df['Class'] = df['Class'].map({'YES': 1, 'NO': 0})

In [None]:
converter_class(df)
converter_class(df2)

Resultado das conversões:

In [None]:
df.iloc[:6,15]

Unnamed: 0,Class
0,0
1,1
2,1
3,0
4,0
5,1


In [None]:
df2.iloc[:6,14]

Unnamed: 0,Class
0,0
1,1
2,1
3,0
4,0
5,1


Versão final dos datasets:

In [None]:
df.head()

Unnamed: 0,A1,A2,A3,A4,A5,A6,A7,A8,A9,A10,Age,Sex,Ethnicity,Jaundice,Family_ASD,Class
0,0,1,1,0,0,1,1,0,0,1,4,1,0,1,0,0
1,0,1,1,1,1,1,1,0,0,1,4,1,0,1,0,1
2,0,1,1,1,1,1,0,1,1,1,5,1,1,0,0,1
3,0,1,1,0,1,1,1,0,0,0,4,1,0,1,0,0
4,0,0,1,1,1,1,0,1,0,1,5,1,1,0,0,0


In [None]:
df2.head()

Unnamed: 0,A1,A2,A3,A4,A5,A6,A7,A8,A9,A10,Age,Sex,Jaundice,Family_ASD,Class,Ethnicity_aboriginal,Ethnicity_asian,Ethnicity_black,Ethnicity_hispanic,Ethnicity_latino,Ethnicity_middle eastern,Ethnicity_others,Ethnicity_south asians,Ethnicity_white
0,0,1,1,0,0,1,1,0,0,1,4,1,1,0,0,False,False,False,False,False,True,False,False,False
1,0,1,1,1,1,1,1,0,0,1,4,1,1,0,1,False,False,False,False,False,True,False,False,False
2,0,1,1,1,1,1,0,1,1,1,5,1,0,0,1,False,False,False,False,False,False,False,False,True
3,0,1,1,0,1,1,1,0,0,0,4,1,1,0,0,False,False,False,False,False,True,False,False,False
4,0,0,1,1,1,1,0,1,0,1,5,1,0,0,0,False,False,False,False,False,False,False,False,True


#Experimentos

In [None]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler
from itertools import product
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from IPython.display import display
from sklearn.neural_network import MLPClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import MinMaxScaler

Função utilizada para introduzir erros nos datasets:

In [None]:
def introduce_errors(df, error_percentage, error_count, random_state):
    # Escolhendo as amostras que conterão erros
    original_indices = df.index.tolist()
    np.random.seed(random_state)
    num_samples = df.shape[0]
    num_errors = int(num_samples * error_percentage)
    error_indices = np.random.choice(num_samples, num_errors, replace=False)
    df.reset_index(drop=True, inplace=True)

    # Escolhendo as respostas que conterão erros para cada amostra
    for index in error_indices:
        error_cols = np.random.choice(df.columns[:10], error_count, replace=False)
        df.loc[index, error_cols] = 1 - df.loc[index, error_cols]
    df.index = original_indices
    return df

## Regressão Logística

O código a seguir obtém a média e o desvio padrão das métricas analisadas no estudo para os 10 datasets gerados a partir do dataframe df, testando diferentes tipos de escalonamento.

In [None]:
from warnings import simplefilter

# ignore all warnings
simplefilter(action='ignore')

X = df.drop('Class', axis=1)
y = df['Class']

scalers = {
    "MinMaxScaler": MinMaxScaler(),
    "StandardScaler": StandardScaler()
}

random_states = [10, 23, 87, 41, 65, 12, 98, 34, 72, 19]

for scaler_name, scaler in scalers.items():

    # Listas para armazenar métricas
    accuracy_list = []
    precision_list = []
    recall_list = []
    f1_list = []

    for random_state in random_states:
        X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, stratify=y, random_state=random_state)

        X_train_scaled = scaler.fit_transform(X_train)
        X_test_scaled = scaler.transform(X_test)

        logistic_model = LogisticRegression(random_state=10)

        param_grid = {
            'C': [0.1, 0.5, 1, 5, 10, 100],
            'tol': [1e-5, 1e-4, 1e-3],
            'solver': ['lbfgs', 'liblinear', 'saga', 'newton-cg', 'sag'],
            'penalty': [None]  # Inicialmente, definido como None
        }

        # Ajustando o parâmetro 'penalty' de acordo com o solver
        for solver in ['lbfgs', 'newton-cg', 'sag']:
            param_grid['penalty'].extend(['l2', None])
        for solver in ['liblinear', 'saga']:
            param_grid['penalty'].extend(['l1', 'l2', 'elasticnet', None])

        grid_search = GridSearchCV(logistic_model, param_grid, cv=5, scoring='accuracy')
        grid_search.fit(X_train_scaled, y_train)

        best_model = grid_search.best_estimator_

        y_pred = best_model.predict(X_test_scaled)

        accuracy = accuracy_score(y_test, y_pred)
        precision = precision_score(y_test, y_pred)
        recall = recall_score(y_test, y_pred)
        f1 = f1_score(y_test, y_pred)

        # Adicionando métricas às listas
        accuracy_list.append(accuracy)
        precision_list.append(precision)
        recall_list.append(recall)
        f1_list.append(f1)

    # Calculando média e desvio padrão
    metrics_dict = {
        'Acurácia': accuracy_list,
        'Precisão': precision_list,
        'Recall': recall_list,
        'F1 Score': f1_list
    }

    metrics_df = pd.DataFrame(metrics_dict)
    metrics_mean = metrics_df.mean()
    metrics_std = metrics_df.std()

    # Exibindo média e desvio padrão
    print(f"Escalonamento: {scaler_name}")
    for metric in metrics_mean.index:
        print(f"{metric}: {metrics_mean[metric]:.2f} ± {metrics_std[metric]:.2f}")
    print("\n")

Escalonamento: MinMaxScaler
Acurácia: 1.00 ± 0.00
Precisão: 1.00 ± 0.00
Recall: 1.00 ± 0.00
F1 Score: 1.00 ± 0.00


Escalonamento: StandardScaler
Acurácia: 1.00 ± 0.00
Precisão: 1.00 ± 0.00
Recall: 1.00 ± 0.00
F1 Score: 1.00 ± 0.00




O código a seguir obtém a média e o desvio padrão das métricas analisadas no estudo para os 10 datasets gerados a partir do dataframe df2, testando diferentes tipos de escalonamento.

In [None]:
from warnings import simplefilter

# ignore all warnings
simplefilter(action='ignore')

X = df2.drop('Class', axis=1)
y = df2['Class']

scalers = {
    "MinMaxScaler": MinMaxScaler(),
    "StandardScaler": StandardScaler()
}

random_states = [10, 23, 87, 41, 65, 12, 98, 34, 72, 19]

for scaler_name, scaler in scalers.items():

    # Listas para armazenar métricas
    accuracy_list = []
    precision_list = []
    recall_list = []
    f1_list = []

    for random_state in random_states:
        X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, stratify=y, random_state=random_state)

        X_train_scaled = scaler.fit_transform(X_train)
        X_test_scaled = scaler.transform(X_test)

        logistic_model = LogisticRegression(random_state=10)

        param_grid = {
            'C': [0.1, 0.5, 1, 5, 10, 100],
            'tol': [1e-5, 1e-4, 1e-3],
            'solver': ['lbfgs', 'liblinear', 'saga', 'newton-cg', 'sag'],
            'penalty': [None]  # Inicialmente, definido como None
        }

        # Ajustando o parâmetro 'penalty' de acordo com o solver
        for solver in ['lbfgs', 'newton-cg', 'sag']:
            param_grid['penalty'].extend(['l2', None])
        for solver in ['liblinear', 'saga']:
            param_grid['penalty'].extend(['l1', 'l2', 'elasticnet', None])

        grid_search = GridSearchCV(logistic_model, param_grid, cv=5, scoring='accuracy')
        grid_search.fit(X_train_scaled, y_train)

        best_model = grid_search.best_estimator_

        y_pred = best_model.predict(X_test_scaled)

        accuracy = accuracy_score(y_test, y_pred)
        precision = precision_score(y_test, y_pred)
        recall = recall_score(y_test, y_pred)
        f1 = f1_score(y_test, y_pred)

        # Adicionando métricas às listas
        accuracy_list.append(accuracy)
        precision_list.append(precision)
        recall_list.append(recall)
        f1_list.append(f1)

    # Calculando média e desvio padrão
    metrics_dict = {
        'Acurácia': accuracy_list,
        'Precisão': precision_list,
        'Recall': recall_list,
        'F1 Score': f1_list
    }

    metrics_df = pd.DataFrame(metrics_dict)
    metrics_mean = metrics_df.mean()
    metrics_std = metrics_df.std()

    # Exibindo média e desvio padrão
    print(f"Escalonamento: {scaler_name}")
    for metric in metrics_mean.index:
        print(f"{metric}: {metrics_mean[metric]:.2f} ± {metrics_std[metric]:.2f}")
    print("\n")

Escalonamento: MinMaxScaler
Acurácia: 1.00 ± 0.00
Precisão: 1.00 ± 0.00
Recall: 1.00 ± 0.00
F1 Score: 1.00 ± 0.00


Escalonamento: StandardScaler
Acurácia: 1.00 ± 0.00
Precisão: 1.00 ± 0.00
Recall: 1.00 ± 0.00
F1 Score: 1.00 ± 0.00




O código a seguir obtém a média e o desvio padrão das métricas analisadas no estudo para os 10 datasets com erros gerados a partir do dataframe df nos 35 cenários analisados, testando diferentes tipos de escalonamento, uma vez que os resultados foram perfeitos em todos os casos sem erros testados.

In [None]:
import pickle
from warnings import simplefilter

# ignore all warnings
simplefilter(action='ignore')

# Funções para salvar e carregar o estado da lista results_metrics
def save_results(file_path, results):
    with open(file_path, 'wb') as f:
        pickle.dump(results, f)

def load_results(file_path):
    with open(file_path, 'rb') as f:
        return pickle.load(f)

# Carrega os resultados salvos, se existirem
results_file = "/content/drive/MyDrive/archive/results_metrics_rl.pkl"
try:
    results_metrics = load_results(results_file)
except FileNotFoundError:
    results_metrics = []

# Definição das listas de parâmetros
error_percentages = [0.05, 0.10, 0.15, 0.20, 0.30, 0.40, 0.5]
error_counts = [1, 2, 3, 4, 5]
random_states = [10, 23, 87, 41, 65, 12, 98, 34, 72, 19]

scalers = {
    "MinMaxScaler": MinMaxScaler(),
    "StandardScaler": StandardScaler()
}

X = df.drop('Class', axis=1)
y = df['Class']

for error_percentage, error_count in product(error_percentages, error_counts):
    for scaler_name, scaler in scalers.items():

        existing_result = [result for result in results_metrics if result[0] == error_percentage and result[1] == error_count and result[2] == scaler_name]
        if existing_result:
            continue

        # Listas para armazenar as métricas dos 10 datasets
        row_results_metrics = {'accuracy': [], 'precision': [], 'recall': [], 'f1_score': []}

        for random_state in random_states:
            X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, stratify=y, random_state=random_state)

            X_test_error = introduce_errors(X_test.copy(), error_percentage, error_count, random_state)

            X_train_scaled = scaler.fit_transform(X_train)
            X_test_error_scaled = scaler.transform(X_test_error)

            logistic_model = LogisticRegression(random_state=10)

            param_grid = {
                'C': [0.1, 0.5, 1, 5, 10, 100],
                'tol': [1e-5, 1e-4, 1e-3],
                'solver': ['lbfgs', 'liblinear', 'saga', 'newton-cg', 'sag'],
                'penalty': [None]  # Inicialmente, definido como None
            }

            # Ajustando o parâmetro 'penalty' de acordo com o solver
            for solver in ['lbfgs', 'newton-cg', 'sag']:
                param_grid['penalty'].extend(['l2', None])
            for solver in ['liblinear', 'saga']:
                param_grid['penalty'].extend(['l1', 'l2', 'elasticnet', None])

            grid_search = GridSearchCV(logistic_model, param_grid, cv=5, scoring='accuracy')
            grid_search.fit(X_train_scaled, y_train)

            best_model = grid_search.best_estimator_

            # Fazendo previsões no conjunto de teste com erros
            y_pred_error = best_model.predict(X_test_error_scaled)

            # Calcule as métricas e armazene-as
            accuracy = accuracy_score(y_test, y_pred_error)
            precision = precision_score(y_test, y_pred_error)
            recall = recall_score(y_test, y_pred_error)
            f1 = f1_score(y_test, y_pred_error)

            # Adicionando as métricas as listas
            row_results_metrics['accuracy'].append(accuracy)
            row_results_metrics['precision'].append(precision)
            row_results_metrics['recall'].append(recall)
            row_results_metrics['f1_score'].append(f1)

        # Cálculo das médias e desvios padrão das métricas
        mean_results_metrics = {metric: np.mean(values) for metric, values in row_results_metrics.items()}
        std_results_metrics = {metric: np.std(values) for metric, values in row_results_metrics.items()}
        results_metrics.append([error_percentage, error_count, scaler_name,
                                mean_results_metrics['accuracy'], std_results_metrics['accuracy'],
                                mean_results_metrics['precision'], std_results_metrics['precision'],
                                mean_results_metrics['recall'], std_results_metrics['recall'],
                                mean_results_metrics['f1_score'], std_results_metrics['f1_score']])

        # Salva os resultados após cada iteração
        save_results(results_file, results_metrics)

# Cria o DataFrame com os resultados
results_metrics_df = pd.DataFrame(results_metrics, columns=['Error Percentage', 'Error Count', 'Scaler',
                                                             'Mean Accuracy', 'Std Accuracy',
                                                             'Mean Precision', 'Std Precision',
                                                             'Mean Recall', 'Std Recall',
                                                             'Mean F1 Score', 'Std F1 Score'])

# Formata as métricas médias e desvios padrão em uma única coluna
metrics_columns = ['Accuracy', 'Precision', 'Recall', 'F1 Score']
for metric in metrics_columns:
    results_metrics_df[f'Mean {metric} & Std'] = results_metrics_df.apply(
        lambda row: f"{row[f'Mean {metric}']:.2f} ± {row[f'Std {metric}']:.2f}", axis=1
    )

# Remove as colunas de métricas médias e desvios padrão
results_metrics_df = results_metrics_df.drop(
    columns=[f'Mean {metric}' for metric in metrics_columns] + [f'Std {metric}' for metric in metrics_columns]
)

# Cria tabelas pivôs separadas para cada métrica e tipo de escalonamento
tables_metrics = {}
for metric in ['Accuracy', 'Precision', 'Recall', 'F1 Score']:
    for scaler_name in scalers.keys():
        table_metric = results_metrics_df[(results_metrics_df['Scaler'] == scaler_name)].pivot(index='Error Percentage',
                                                                                                columns='Error Count',
                                                                                                values=[f'Mean {metric} & Std'])
        tables_metrics[f'{metric} ({scaler_name})'] = table_metric

# Exibe as tabelas para cada métrica e tipo de escalonamento
print("\nTabelas para cada métrica (Acurácia, Precisão, Recall, F1-Score) e cada tipo de escalonamento:")
for metric, table_metric in tables_metrics.items():
    print(f"\nTabela para {metric}:")
    display(table_metric)

# Salva o DataFrame em um arquivo CSV
path = "/content/drive/MyDrive/archive/"
results_metrics_df.to_csv(path + 'resultados_metrics_rl.csv', index=False)


Tabelas para cada métrica (Acurácia, Precisão, Recall, F1-Score) e cada tipo de escalonamento:

Tabela para Accuracy (MinMaxScaler):


Unnamed: 0_level_0,Mean Accuracy & Std,Mean Accuracy & Std,Mean Accuracy & Std,Mean Accuracy & Std,Mean Accuracy & Std
Error Count,1,2,3,4,5
Error Percentage,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
0.05,0.99 ± 0.01,0.98 ± 0.01,0.97 ± 0.02,0.96 ± 0.01,0.96 ± 0.02
0.1,0.98 ± 0.01,0.97 ± 0.01,0.95 ± 0.02,0.93 ± 0.02,0.92 ± 0.02
0.15,0.97 ± 0.01,0.94 ± 0.02,0.92 ± 0.02,0.90 ± 0.03,0.88 ± 0.01
0.2,0.94 ± 0.02,0.92 ± 0.02,0.89 ± 0.02,0.86 ± 0.03,0.84 ± 0.03
0.3,0.95 ± 0.01,0.91 ± 0.02,0.86 ± 0.02,0.82 ± 0.02,0.78 ± 0.01
0.4,0.93 ± 0.01,0.88 ± 0.02,0.83 ± 0.02,0.78 ± 0.02,0.73 ± 0.02
0.5,0.93 ± 0.02,0.88 ± 0.01,0.82 ± 0.02,0.75 ± 0.03,0.68 ± 0.02



Tabela para Accuracy (StandardScaler):


Unnamed: 0_level_0,Mean Accuracy & Std,Mean Accuracy & Std,Mean Accuracy & Std,Mean Accuracy & Std,Mean Accuracy & Std
Error Count,1,2,3,4,5
Error Percentage,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
0.05,0.99 ± 0.01,0.98 ± 0.01,0.97 ± 0.01,0.96 ± 0.01,0.95 ± 0.02
0.1,0.98 ± 0.01,0.97 ± 0.02,0.95 ± 0.02,0.94 ± 0.02,0.92 ± 0.02
0.15,0.97 ± 0.01,0.94 ± 0.02,0.92 ± 0.02,0.90 ± 0.03,0.88 ± 0.02
0.2,0.95 ± 0.01,0.92 ± 0.02,0.89 ± 0.02,0.86 ± 0.02,0.83 ± 0.03
0.3,0.95 ± 0.02,0.90 ± 0.02,0.86 ± 0.02,0.82 ± 0.02,0.78 ± 0.01
0.4,0.93 ± 0.01,0.88 ± 0.02,0.83 ± 0.03,0.78 ± 0.02,0.73 ± 0.02
0.5,0.93 ± 0.02,0.88 ± 0.02,0.82 ± 0.02,0.75 ± 0.02,0.68 ± 0.02



Tabela para Precision (MinMaxScaler):


Unnamed: 0_level_0,Mean Precision & Std,Mean Precision & Std,Mean Precision & Std,Mean Precision & Std,Mean Precision & Std
Error Count,1,2,3,4,5
Error Percentage,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
0.05,0.99 ± 0.01,0.98 ± 0.02,0.97 ± 0.03,0.97 ± 0.03,0.96 ± 0.03
0.1,0.99 ± 0.01,0.97 ± 0.02,0.96 ± 0.03,0.93 ± 0.03,0.93 ± 0.04
0.15,0.98 ± 0.02,0.94 ± 0.03,0.92 ± 0.03,0.90 ± 0.03,0.88 ± 0.02
0.2,0.95 ± 0.02,0.93 ± 0.03,0.89 ± 0.03,0.87 ± 0.04,0.84 ± 0.04
0.3,0.96 ± 0.03,0.90 ± 0.03,0.86 ± 0.03,0.82 ± 0.03,0.78 ± 0.02
0.4,0.92 ± 0.02,0.87 ± 0.02,0.83 ± 0.03,0.79 ± 0.01,0.74 ± 0.03
0.5,0.93 ± 0.03,0.87 ± 0.03,0.82 ± 0.04,0.77 ± 0.06,0.68 ± 0.03



Tabela para Precision (StandardScaler):


Unnamed: 0_level_0,Mean Precision & Std,Mean Precision & Std,Mean Precision & Std,Mean Precision & Std,Mean Precision & Std
Error Count,1,2,3,4,5
Error Percentage,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
0.05,0.99 ± 0.02,0.98 ± 0.02,0.97 ± 0.02,0.97 ± 0.03,0.96 ± 0.03
0.1,0.99 ± 0.01,0.97 ± 0.03,0.96 ± 0.03,0.94 ± 0.03,0.92 ± 0.03
0.15,0.98 ± 0.02,0.94 ± 0.03,0.93 ± 0.03,0.90 ± 0.03,0.89 ± 0.02
0.2,0.95 ± 0.02,0.93 ± 0.04,0.89 ± 0.03,0.87 ± 0.02,0.84 ± 0.04
0.3,0.95 ± 0.04,0.90 ± 0.03,0.86 ± 0.04,0.82 ± 0.04,0.78 ± 0.02
0.4,0.93 ± 0.01,0.87 ± 0.03,0.83 ± 0.04,0.79 ± 0.02,0.74 ± 0.03
0.5,0.93 ± 0.03,0.87 ± 0.03,0.83 ± 0.04,0.77 ± 0.04,0.69 ± 0.03



Tabela para Recall (MinMaxScaler):


Unnamed: 0_level_0,Mean Recall & Std,Mean Recall & Std,Mean Recall & Std,Mean Recall & Std,Mean Recall & Std
Error Count,1,2,3,4,5
Error Percentage,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
0.05,0.99 ± 0.01,0.98 ± 0.01,0.97 ± 0.02,0.96 ± 0.01,0.95 ± 0.01
0.1,0.97 ± 0.01,0.97 ± 0.01,0.95 ± 0.01,0.94 ± 0.01,0.92 ± 0.02
0.15,0.97 ± 0.02,0.95 ± 0.02,0.92 ± 0.02,0.90 ± 0.03,0.88 ± 0.02
0.2,0.94 ± 0.03,0.91 ± 0.02,0.89 ± 0.02,0.86 ± 0.02,0.84 ± 0.02
0.3,0.95 ± 0.02,0.91 ± 0.04,0.87 ± 0.02,0.83 ± 0.03,0.78 ± 0.03
0.4,0.94 ± 0.02,0.90 ± 0.02,0.85 ± 0.03,0.77 ± 0.03,0.72 ± 0.03
0.5,0.92 ± 0.01,0.89 ± 0.03,0.83 ± 0.05,0.75 ± 0.05,0.69 ± 0.04



Tabela para Recall (StandardScaler):


Unnamed: 0_level_0,Mean Recall & Std,Mean Recall & Std,Mean Recall & Std,Mean Recall & Std,Mean Recall & Std
Error Count,1,2,3,4,5
Error Percentage,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
0.05,0.99 ± 0.01,0.98 ± 0.01,0.97 ± 0.01,0.96 ± 0.01,0.95 ± 0.01
0.1,0.97 ± 0.01,0.97 ± 0.01,0.95 ± 0.02,0.94 ± 0.01,0.92 ± 0.01
0.15,0.96 ± 0.02,0.95 ± 0.02,0.92 ± 0.02,0.90 ± 0.03,0.87 ± 0.02
0.2,0.94 ± 0.03,0.91 ± 0.02,0.89 ± 0.02,0.86 ± 0.02,0.83 ± 0.02
0.3,0.94 ± 0.02,0.91 ± 0.03,0.87 ± 0.02,0.82 ± 0.03,0.78 ± 0.03
0.4,0.94 ± 0.02,0.90 ± 0.02,0.83 ± 0.03,0.77 ± 0.03,0.71 ± 0.02
0.5,0.92 ± 0.01,0.89 ± 0.03,0.82 ± 0.05,0.74 ± 0.05,0.67 ± 0.04



Tabela para F1 Score (MinMaxScaler):


Unnamed: 0_level_0,Mean F1 Score & Std,Mean F1 Score & Std,Mean F1 Score & Std,Mean F1 Score & Std,Mean F1 Score & Std
Error Count,1,2,3,4,5
Error Percentage,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
0.05,0.99 ± 0.01,0.98 ± 0.01,0.97 ± 0.02,0.96 ± 0.01,0.96 ± 0.01
0.1,0.98 ± 0.01,0.97 ± 0.01,0.95 ± 0.02,0.93 ± 0.02,0.92 ± 0.02
0.15,0.97 ± 0.01,0.94 ± 0.02,0.92 ± 0.02,0.90 ± 0.02,0.88 ± 0.01
0.2,0.94 ± 0.02,0.92 ± 0.02,0.89 ± 0.01,0.87 ± 0.02,0.84 ± 0.03
0.3,0.95 ± 0.01,0.91 ± 0.02,0.86 ± 0.02,0.82 ± 0.02,0.78 ± 0.01
0.4,0.93 ± 0.01,0.89 ± 0.02,0.84 ± 0.02,0.78 ± 0.02,0.73 ± 0.02
0.5,0.93 ± 0.02,0.88 ± 0.01,0.82 ± 0.02,0.76 ± 0.03,0.69 ± 0.02



Tabela para F1 Score (StandardScaler):


Unnamed: 0_level_0,Mean F1 Score & Std,Mean F1 Score & Std,Mean F1 Score & Std,Mean F1 Score & Std,Mean F1 Score & Std
Error Count,1,2,3,4,5
Error Percentage,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
0.05,0.99 ± 0.01,0.98 ± 0.01,0.97 ± 0.01,0.96 ± 0.01,0.95 ± 0.02
0.1,0.98 ± 0.01,0.97 ± 0.02,0.95 ± 0.02,0.94 ± 0.02,0.92 ± 0.02
0.15,0.97 ± 0.01,0.94 ± 0.02,0.92 ± 0.02,0.90 ± 0.03,0.88 ± 0.02
0.2,0.95 ± 0.01,0.92 ± 0.02,0.89 ± 0.02,0.87 ± 0.02,0.83 ± 0.03
0.3,0.95 ± 0.02,0.91 ± 0.02,0.87 ± 0.02,0.82 ± 0.02,0.78 ± 0.02
0.4,0.94 ± 0.01,0.89 ± 0.02,0.83 ± 0.02,0.78 ± 0.02,0.73 ± 0.02
0.5,0.93 ± 0.02,0.88 ± 0.02,0.82 ± 0.02,0.75 ± 0.03,0.68 ± 0.02


O código a seguir obtém a média e o desvio padrão das métricas analisadas no estudo para os 10 datasets com erros gerados a partir do dataframe df2 nos 35 cenários analisados, testando diferentes tipos de escalonamento, uma vez que os resultados foram perfeitos em todos os casos sem erros testados.

In [None]:
import pickle
from warnings import simplefilter

# ignore all warnings
simplefilter(action='ignore')

# Funções para salvar e carregar o estado da lista results_metrics
def save_results(file_path, results):
    with open(file_path, 'wb') as f:
        pickle.dump(results, f)

def load_results(file_path):
    with open(file_path, 'rb') as f:
        return pickle.load(f)

# Carrega os resultados salvos, se existirem
results_file = "/content/drive/MyDrive/archive/results_metrics_rl2.pkl"
try:
    results_metrics = load_results(results_file)
except FileNotFoundError:
    results_metrics = []

# Definição das listas de parâmetros
error_percentages = [0.05, 0.10, 0.15, 0.20, 0.30, 0.40, 0.5]
error_counts = [1, 2, 3, 4, 5]
random_states = [10, 23, 87, 41, 65, 12, 98, 34, 72, 19]

scalers = {
    "MinMaxScaler": MinMaxScaler(),
    "StandardScaler": StandardScaler()
}

X = df2.drop('Class', axis=1)
y = df2['Class']

for error_percentage, error_count in product(error_percentages, error_counts):
    for scaler_name, scaler in scalers.items():

        existing_result = [result for result in results_metrics if result[0] == error_percentage and result[1] == error_count and result[2] == scaler_name]
        if existing_result:
            continue

        # Listas para armazenar as métricas dos 10 datasets
        row_results_metrics = {'accuracy': [], 'precision': [], 'recall': [], 'f1_score': []}

        for random_state in random_states:
            X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, stratify=y, random_state=random_state)

            X_test_error = introduce_errors(X_test.copy(), error_percentage, error_count, random_state)

            X_train_scaled = scaler.fit_transform(X_train)
            X_test_error_scaled = scaler.transform(X_test_error)

            logistic_model = LogisticRegression(random_state=10)

            param_grid = {
                'C': [0.1, 0.5, 1, 5, 10, 100],
                'tol': [1e-5, 1e-4, 1e-3],
                'solver': ['lbfgs', 'liblinear', 'saga', 'newton-cg', 'sag'],
                'penalty': [None]  # Inicialmente, definido como None
            }

            # Ajustando o parâmetro 'penalty' de acordo com o solver
            for solver in ['lbfgs', 'newton-cg', 'sag']:
                param_grid['penalty'].extend(['l2', None])
            for solver in ['liblinear', 'saga']:
                param_grid['penalty'].extend(['l1', 'l2', 'elasticnet', None])

            grid_search = GridSearchCV(logistic_model, param_grid, cv=5, scoring='accuracy')
            grid_search.fit(X_train_scaled, y_train)

            best_model = grid_search.best_estimator_

            # Fazendo previsões no conjunto de teste com erros
            y_pred_error = best_model.predict(X_test_error_scaled)

            # Calcule as métricas e armazene-as
            accuracy = accuracy_score(y_test, y_pred_error)
            precision = precision_score(y_test, y_pred_error)
            recall = recall_score(y_test, y_pred_error)
            f1 = f1_score(y_test, y_pred_error)

            # Adicionando as métricas as listas
            row_results_metrics['accuracy'].append(accuracy)
            row_results_metrics['precision'].append(precision)
            row_results_metrics['recall'].append(recall)
            row_results_metrics['f1_score'].append(f1)

        # Cálculo das médias e desvios padrão das métricas
        mean_results_metrics = {metric: np.mean(values) for metric, values in row_results_metrics.items()}
        std_results_metrics = {metric: np.std(values) for metric, values in row_results_metrics.items()}
        results_metrics.append([error_percentage, error_count, scaler_name,
                                mean_results_metrics['accuracy'], std_results_metrics['accuracy'],
                                mean_results_metrics['precision'], std_results_metrics['precision'],
                                mean_results_metrics['recall'], std_results_metrics['recall'],
                                mean_results_metrics['f1_score'], std_results_metrics['f1_score']])

        # Salva os resultados após cada iteração
        save_results(results_file, results_metrics)

# Cria o DataFrame com os resultados
results_metrics_df = pd.DataFrame(results_metrics, columns=['Error Percentage', 'Error Count', 'Scaler',
                                                             'Mean Accuracy', 'Std Accuracy',
                                                             'Mean Precision', 'Std Precision',
                                                             'Mean Recall', 'Std Recall',
                                                             'Mean F1 Score', 'Std F1 Score'])

# Formata as métricas médias e desvios padrão em uma única coluna
metrics_columns = ['Accuracy', 'Precision', 'Recall', 'F1 Score']
for metric in metrics_columns:
    results_metrics_df[f'Mean {metric} & Std'] = results_metrics_df.apply(
        lambda row: f"{row[f'Mean {metric}']:.2f} ± {row[f'Std {metric}']:.2f}", axis=1
    )

# Remove as colunas de métricas médias e desvios padrão
results_metrics_df = results_metrics_df.drop(
    columns=[f'Mean {metric}' for metric in metrics_columns] + [f'Std {metric}' for metric in metrics_columns]
)

# Cria tabelas pivôs separadas para cada métrica e tipo de escalonamento
tables_metrics = {}
for metric in ['Accuracy', 'Precision', 'Recall', 'F1 Score']:
    for scaler_name in scalers.keys():
        table_metric = results_metrics_df[(results_metrics_df['Scaler'] == scaler_name)].pivot(index='Error Percentage',
                                                                                                columns='Error Count',
                                                                                                values=[f'Mean {metric} & Std'])
        tables_metrics[f'{metric} ({scaler_name})'] = table_metric

# Exibe as tabelas para cada métrica e tipo de escalonamento
print("\nTabelas para cada métrica (Acurácia, Precisão, Recall, F1-Score) e cada tipo de escalonamento:")
for metric, table_metric in tables_metrics.items():
    print(f"\nTabela para {metric}:")
    display(table_metric)

# Salva o DataFrame em um arquivo CSV
path = "/content/drive/MyDrive/archive/"
results_metrics_df.to_csv(path + 'resultados_metrics_rl_2.csv', index=False)


Tabelas para cada métrica (Acurácia, Precisão, Recall, F1-Score) e cada tipo de escalonamento:

Tabela para Accuracy (MinMaxScaler):


Unnamed: 0_level_0,Mean Accuracy & Std,Mean Accuracy & Std,Mean Accuracy & Std,Mean Accuracy & Std,Mean Accuracy & Std
Error Count,1,2,3,4,5
Error Percentage,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
0.05,0.99 ± 0.01,0.97 ± 0.01,0.96 ± 0.02,0.96 ± 0.02,0.94 ± 0.02
0.1,0.98 ± 0.02,0.97 ± 0.02,0.95 ± 0.02,0.93 ± 0.02,0.92 ± 0.02
0.15,0.96 ± 0.01,0.94 ± 0.02,0.91 ± 0.02,0.89 ± 0.03,0.88 ± 0.03
0.2,0.94 ± 0.01,0.92 ± 0.02,0.89 ± 0.02,0.86 ± 0.02,0.83 ± 0.03
0.3,0.94 ± 0.02,0.90 ± 0.03,0.86 ± 0.02,0.82 ± 0.02,0.77 ± 0.02
0.4,0.93 ± 0.01,0.88 ± 0.02,0.83 ± 0.03,0.78 ± 0.02,0.73 ± 0.02
0.5,0.92 ± 0.02,0.88 ± 0.02,0.82 ± 0.01,0.75 ± 0.03,0.70 ± 0.02



Tabela para Accuracy (StandardScaler):


Unnamed: 0_level_0,Mean Accuracy & Std,Mean Accuracy & Std,Mean Accuracy & Std,Mean Accuracy & Std,Mean Accuracy & Std
Error Count,1,2,3,4,5
Error Percentage,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
0.05,0.98 ± 0.02,0.97 ± 0.02,0.96 ± 0.02,0.96 ± 0.02,0.94 ± 0.02
0.1,0.98 ± 0.02,0.97 ± 0.02,0.95 ± 0.02,0.93 ± 0.02,0.92 ± 0.02
0.15,0.96 ± 0.01,0.94 ± 0.01,0.91 ± 0.02,0.89 ± 0.02,0.87 ± 0.02
0.2,0.94 ± 0.02,0.92 ± 0.02,0.88 ± 0.02,0.86 ± 0.02,0.82 ± 0.03
0.3,0.94 ± 0.02,0.89 ± 0.03,0.86 ± 0.02,0.82 ± 0.03,0.76 ± 0.01
0.4,0.93 ± 0.01,0.88 ± 0.02,0.83 ± 0.03,0.78 ± 0.02,0.73 ± 0.03
0.5,0.92 ± 0.02,0.87 ± 0.02,0.82 ± 0.02,0.75 ± 0.02,0.69 ± 0.02



Tabela para Precision (MinMaxScaler):


Unnamed: 0_level_0,Mean Precision & Std,Mean Precision & Std,Mean Precision & Std,Mean Precision & Std,Mean Precision & Std
Error Count,1,2,3,4,5
Error Percentage,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
0.05,0.98 ± 0.02,0.97 ± 0.02,0.95 ± 0.03,0.95 ± 0.03,0.94 ± 0.03
0.1,0.99 ± 0.02,0.97 ± 0.02,0.95 ± 0.03,0.93 ± 0.03,0.92 ± 0.03
0.15,0.97 ± 0.02,0.94 ± 0.03,0.91 ± 0.03,0.89 ± 0.04,0.88 ± 0.03
0.2,0.95 ± 0.02,0.93 ± 0.03,0.89 ± 0.03,0.86 ± 0.03,0.84 ± 0.04
0.3,0.94 ± 0.03,0.89 ± 0.05,0.85 ± 0.03,0.82 ± 0.04,0.77 ± 0.02
0.4,0.93 ± 0.02,0.87 ± 0.03,0.83 ± 0.04,0.78 ± 0.02,0.74 ± 0.03
0.5,0.92 ± 0.04,0.87 ± 0.04,0.83 ± 0.04,0.76 ± 0.04,0.71 ± 0.04



Tabela para Precision (StandardScaler):


Unnamed: 0_level_0,Mean Precision & Std,Mean Precision & Std,Mean Precision & Std,Mean Precision & Std,Mean Precision & Std
Error Count,1,2,3,4,5
Error Percentage,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
0.05,0.98 ± 0.02,0.97 ± 0.02,0.96 ± 0.03,0.95 ± 0.03,0.94 ± 0.04
0.1,0.99 ± 0.02,0.97 ± 0.03,0.95 ± 0.03,0.93 ± 0.03,0.92 ± 0.03
0.15,0.96 ± 0.02,0.93 ± 0.02,0.91 ± 0.03,0.89 ± 0.03,0.87 ± 0.02
0.2,0.95 ± 0.03,0.93 ± 0.03,0.88 ± 0.03,0.87 ± 0.03,0.83 ± 0.04
0.3,0.94 ± 0.03,0.89 ± 0.04,0.86 ± 0.04,0.83 ± 0.04,0.76 ± 0.02
0.4,0.92 ± 0.02,0.87 ± 0.03,0.83 ± 0.04,0.79 ± 0.03,0.74 ± 0.03
0.5,0.92 ± 0.03,0.86 ± 0.03,0.82 ± 0.03,0.76 ± 0.04,0.71 ± 0.03



Tabela para Recall (MinMaxScaler):


Unnamed: 0_level_0,Mean Recall & Std,Mean Recall & Std,Mean Recall & Std,Mean Recall & Std,Mean Recall & Std
Error Count,1,2,3,4,5
Error Percentage,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
0.05,0.99 ± 0.01,0.98 ± 0.01,0.97 ± 0.02,0.96 ± 0.01,0.95 ± 0.01
0.1,0.97 ± 0.02,0.97 ± 0.01,0.95 ± 0.01,0.94 ± 0.01,0.92 ± 0.01
0.15,0.96 ± 0.01,0.95 ± 0.02,0.92 ± 0.02,0.90 ± 0.03,0.88 ± 0.03
0.2,0.93 ± 0.02,0.91 ± 0.02,0.89 ± 0.03,0.86 ± 0.03,0.84 ± 0.03
0.3,0.94 ± 0.02,0.91 ± 0.04,0.87 ± 0.03,0.83 ± 0.03,0.78 ± 0.03
0.4,0.94 ± 0.02,0.90 ± 0.02,0.84 ± 0.03,0.77 ± 0.03,0.72 ± 0.02
0.5,0.92 ± 0.02,0.89 ± 0.03,0.82 ± 0.05,0.74 ± 0.04,0.69 ± 0.04



Tabela para Recall (StandardScaler):


Unnamed: 0_level_0,Mean Recall & Std,Mean Recall & Std,Mean Recall & Std,Mean Recall & Std,Mean Recall & Std
Error Count,1,2,3,4,5
Error Percentage,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
0.05,0.99 ± 0.01,0.98 ± 0.01,0.96 ± 0.02,0.96 ± 0.02,0.95 ± 0.02
0.1,0.97 ± 0.02,0.97 ± 0.01,0.95 ± 0.01,0.93 ± 0.01,0.92 ± 0.01
0.15,0.97 ± 0.01,0.94 ± 0.02,0.92 ± 0.02,0.90 ± 0.02,0.87 ± 0.02
0.2,0.94 ± 0.02,0.91 ± 0.02,0.89 ± 0.02,0.86 ± 0.03,0.82 ± 0.02
0.3,0.94 ± 0.02,0.90 ± 0.04,0.86 ± 0.02,0.82 ± 0.03,0.77 ± 0.03
0.4,0.94 ± 0.02,0.90 ± 0.02,0.83 ± 0.03,0.77 ± 0.03,0.71 ± 0.02
0.5,0.92 ± 0.02,0.89 ± 0.03,0.82 ± 0.05,0.75 ± 0.04,0.67 ± 0.04



Tabela para F1 Score (MinMaxScaler):


Unnamed: 0_level_0,Mean F1 Score & Std,Mean F1 Score & Std,Mean F1 Score & Std,Mean F1 Score & Std,Mean F1 Score & Std
Error Count,1,2,3,4,5
Error Percentage,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
0.05,0.99 ± 0.01,0.97 ± 0.01,0.96 ± 0.02,0.96 ± 0.02,0.95 ± 0.01
0.1,0.98 ± 0.02,0.97 ± 0.02,0.95 ± 0.02,0.93 ± 0.02,0.92 ± 0.02
0.15,0.97 ± 0.01,0.94 ± 0.02,0.91 ± 0.02,0.90 ± 0.03,0.88 ± 0.03
0.2,0.94 ± 0.01,0.92 ± 0.02,0.89 ± 0.02,0.86 ± 0.02,0.84 ± 0.03
0.3,0.94 ± 0.02,0.90 ± 0.02,0.86 ± 0.01,0.82 ± 0.02,0.78 ± 0.02
0.4,0.93 ± 0.01,0.88 ± 0.02,0.83 ± 0.02,0.78 ± 0.02,0.73 ± 0.02
0.5,0.92 ± 0.02,0.88 ± 0.02,0.82 ± 0.02,0.75 ± 0.03,0.70 ± 0.02



Tabela para F1 Score (StandardScaler):


Unnamed: 0_level_0,Mean F1 Score & Std,Mean F1 Score & Std,Mean F1 Score & Std,Mean F1 Score & Std,Mean F1 Score & Std
Error Count,1,2,3,4,5
Error Percentage,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
0.05,0.98 ± 0.01,0.98 ± 0.02,0.96 ± 0.02,0.96 ± 0.02,0.94 ± 0.02
0.1,0.98 ± 0.02,0.97 ± 0.02,0.95 ± 0.02,0.93 ± 0.02,0.92 ± 0.02
0.15,0.96 ± 0.01,0.94 ± 0.01,0.91 ± 0.02,0.89 ± 0.02,0.87 ± 0.02
0.2,0.94 ± 0.02,0.92 ± 0.02,0.88 ± 0.01,0.86 ± 0.02,0.82 ± 0.03
0.3,0.94 ± 0.02,0.89 ± 0.02,0.86 ± 0.02,0.82 ± 0.02,0.77 ± 0.01
0.4,0.93 ± 0.01,0.89 ± 0.02,0.83 ± 0.02,0.78 ± 0.03,0.72 ± 0.02
0.5,0.92 ± 0.02,0.88 ± 0.02,0.82 ± 0.02,0.75 ± 0.02,0.69 ± 0.03


##SVM com Kernel Linear

O código a seguir obtém a média e o desvio padrão das métricas analisadas no estudo para os 10 datasets gerados a partir do dataframe df, testando diferentes tipos de escalonamento.

In [None]:
from warnings import simplefilter

# ignore all warnings
simplefilter(action='ignore')

X = df.drop('Class', axis=1)
y = df['Class']

scalers = {
    "MinMaxScaler": MinMaxScaler(),
    "StandardScaler": StandardScaler()
}

random_states = [10, 23, 87, 41, 65, 12, 98, 34, 72, 19]

for scaler_name, scaler in scalers.items():

    # Listas para armazenar métricas
    accuracy_list = []
    precision_list = []
    recall_list = []
    f1_list = []

    for random_state in random_states:
        X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, stratify=y, random_state=random_state)

        X_train_scaled = scaler.fit_transform(X_train)
        X_test_scaled = scaler.transform(X_test)

        svm_model = SVC(kernel='linear', random_state=10)

        param_grid = {
            'C': [0.1, 0.5, 1, 10, 100],
            'tol': [1e-5, 1e-4, 1e-3],
        }

        grid_search = GridSearchCV(svm_model, param_grid, cv=5, scoring='accuracy')
        grid_search.fit(X_train_scaled, y_train)

        best_model = grid_search.best_estimator_

        y_pred = best_model.predict(X_test_scaled)

        accuracy = accuracy_score(y_test, y_pred)
        precision = precision_score(y_test, y_pred)
        recall = recall_score(y_test, y_pred)
        f1 = f1_score(y_test, y_pred)

        # Adicionando métricas às listas
        accuracy_list.append(accuracy)
        precision_list.append(precision)
        recall_list.append(recall)
        f1_list.append(f1)

    # Calculando média e desvio padrão
    metrics_dict = {
        'Acurácia': accuracy_list,
        'Precisão': precision_list,
        'Recall': recall_list,
        'F1 Score': f1_list
    }

    metrics_df = pd.DataFrame(metrics_dict)
    metrics_mean = metrics_df.mean()
    metrics_std = metrics_df.std()

    # Exibindo média e desvio padrão
    print(f"Escalonamento: {scaler_name}")
    for metric in metrics_mean.index:
        print(f"{metric}: {metrics_mean[metric]:.2f} ± {metrics_std[metric]:.2f}")
    print("\n")

Escalonamento: MinMaxScaler
Acurácia: 1.00 ± 0.00
Precisão: 1.00 ± 0.00
Recall: 1.00 ± 0.00
F1 Score: 1.00 ± 0.00


Escalonamento: StandardScaler
Acurácia: 1.00 ± 0.00
Precisão: 1.00 ± 0.00
Recall: 1.00 ± 0.00
F1 Score: 1.00 ± 0.00




O código a seguir obtém a média e o desvio padrão das métricas analisadas no estudo para os 10 datasets gerados a partir do dataframe df2, testando diferentes tipos de escalonamento.

In [None]:
from warnings import simplefilter

# ignore all warnings
simplefilter(action='ignore')

X = df2.drop('Class', axis=1)
y = df2['Class']

scalers = {
    "MinMaxScaler": MinMaxScaler(),
    "StandardScaler": StandardScaler()
}

random_states = [10, 23, 87, 41, 65, 12, 98, 34, 72, 19]

for scaler_name, scaler in scalers.items():

    # Listas para armazenar métricas
    accuracy_list = []
    precision_list = []
    recall_list = []
    f1_list = []

    for random_state in random_states:
        X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, stratify=y, random_state=random_state)

        X_train_scaled = scaler.fit_transform(X_train)
        X_test_scaled = scaler.transform(X_test)

        svm_model = SVC(kernel='linear', random_state=10)

        param_grid = {
            'C': [0.1, 0.5, 1, 10, 100],
            'tol': [1e-5, 1e-4, 1e-3],
        }

        grid_search = GridSearchCV(svm_model, param_grid, cv=5, scoring='accuracy')
        grid_search.fit(X_train_scaled, y_train)

        best_model = grid_search.best_estimator_

        y_pred = best_model.predict(X_test_scaled)

        accuracy = accuracy_score(y_test, y_pred)
        precision = precision_score(y_test, y_pred)
        recall = recall_score(y_test, y_pred)
        f1 = f1_score(y_test, y_pred)

        # Adicionando métricas às listas
        accuracy_list.append(accuracy)
        precision_list.append(precision)
        recall_list.append(recall)
        f1_list.append(f1)

    # Calculando média e desvio padrão
    metrics_dict = {
        'Acurácia': accuracy_list,
        'Precisão': precision_list,
        'Recall': recall_list,
        'F1 Score': f1_list
    }

    metrics_df = pd.DataFrame(metrics_dict)
    metrics_mean = metrics_df.mean()
    metrics_std = metrics_df.std()

    # Exibindo média e desvio padrão
    print(f"Escalonamento: {scaler_name}")
    for metric in metrics_mean.index:
        print(f"{metric}: {metrics_mean[metric]:.2f} ± {metrics_std[metric]:.2f}")
    print("\n")

Escalonamento: MinMaxScaler
Acurácia: 1.00 ± 0.00
Precisão: 1.00 ± 0.00
Recall: 1.00 ± 0.00
F1 Score: 1.00 ± 0.00


Escalonamento: StandardScaler
Acurácia: 1.00 ± 0.00
Precisão: 1.00 ± 0.00
Recall: 1.00 ± 0.00
F1 Score: 1.00 ± 0.00




O código a seguir obtém a média e o desvio padrão das métricas analisadas no estudo para os 10 datasets com erros gerados a partir do dataframe df nos 35 cenários analisados, testando diferentes tipos de escalonamento, uma vez que os resultados foram perfeitos em todos os casos sem erros testados.

In [None]:
import pickle
from warnings import simplefilter

# ignore all warnings
simplefilter(action='ignore')

# Funções para salvar e carregar o estado da lista results_metrics
def save_results(file_path, results):
    with open(file_path, 'wb') as f:
        pickle.dump(results, f)

def load_results(file_path):
    with open(file_path, 'rb') as f:
        return pickle.load(f)

# Carrega os resultados salvos, se existirem
results_file = "/content/drive/MyDrive/archive/results_metrics_svm.pkl"
try:
    results_metrics = load_results(results_file)
except FileNotFoundError:
    results_metrics = []

# Definição das listas de parâmetros
error_percentages = [0.05, 0.10, 0.15, 0.20, 0.30, 0.40, 0.5]
error_counts = [1, 2, 3, 4, 5]
random_states = [10, 23, 87, 41, 65, 12, 98, 34, 72, 19]

scalers = {
    "MinMaxScaler": MinMaxScaler(),
    "StandardScaler": StandardScaler()
}

X = df.drop('Class', axis=1)
y = df['Class']

for error_percentage, error_count in product(error_percentages, error_counts):
    for scaler_name, scaler in scalers.items():

        existing_result = [result for result in results_metrics if result[0] == error_percentage and result[1] == error_count and result[2] == scaler_name]
        if existing_result:
            continue

        # Listas para armazenar as métricas dos 10 datasets
        row_results_metrics = {'accuracy': [], 'precision': [], 'recall': [], 'f1_score': []}

        for random_state in random_states:
            X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, stratify=y, random_state=random_state)

            X_test_error = introduce_errors(X_test.copy(), error_percentage, error_count, random_state)

            X_train_scaled = scaler.fit_transform(X_train)
            X_test_error_scaled = scaler.transform(X_test_error)

            # Modelo SVM com kernel linear
            svm_model = SVC(kernel='linear', random_state=10)

            # Grade de hiperparâmetros
            param_grid = {
            'C': [0.1, 0.5, 1, 10, 100],
            'tol': [1e-5, 1e-4, 1e-3],
            }

            # Busca em grade
            grid_search = GridSearchCV(svm_model, param_grid, cv=5, scoring='accuracy')
            grid_search.fit(X_train_scaled, y_train)

            best_model = grid_search.best_estimator_

            # Fazendo previsões no conjunto de teste com erros
            y_pred_error = best_model.predict(X_test_error_scaled)

            # Calcule as métricas e armazene-as
            accuracy = accuracy_score(y_test, y_pred_error)
            precision = precision_score(y_test, y_pred_error)
            recall = recall_score(y_test, y_pred_error)
            f1 = f1_score(y_test, y_pred_error)

            # Adicionando as métricas as listas
            row_results_metrics['accuracy'].append(accuracy)
            row_results_metrics['precision'].append(precision)
            row_results_metrics['recall'].append(recall)
            row_results_metrics['f1_score'].append(f1)

        # Cálculo das médias e desvios padrão das métricas
        mean_results_metrics = {metric: np.mean(values) for metric, values in row_results_metrics.items()}
        std_results_metrics = {metric: np.std(values) for metric, values in row_results_metrics.items()}
        results_metrics.append([error_percentage, error_count, scaler_name,
                                mean_results_metrics['accuracy'], std_results_metrics['accuracy'],
                                mean_results_metrics['precision'], std_results_metrics['precision'],
                                mean_results_metrics['recall'], std_results_metrics['recall'],
                                mean_results_metrics['f1_score'], std_results_metrics['f1_score']])

        # Salva os resultados após cada iteração
        save_results(results_file, results_metrics)

# Cria o DataFrame com os resultados
results_metrics_df = pd.DataFrame(results_metrics, columns=['Error Percentage', 'Error Count', 'Scaler',
                                                             'Mean Accuracy', 'Std Accuracy',
                                                             'Mean Precision', 'Std Precision',
                                                             'Mean Recall', 'Std Recall',
                                                             'Mean F1 Score', 'Std F1 Score'])

# Formata as métricas médias e desvios padrão em uma única coluna
metrics_columns = ['Accuracy', 'Precision', 'Recall', 'F1 Score']
for metric in metrics_columns:
    results_metrics_df[f'Mean {metric} & Std'] = results_metrics_df.apply(
        lambda row: f"{row[f'Mean {metric}']:.2f} ± {row[f'Std {metric}']:.2f}", axis=1
    )

# Remove as colunas de métricas médias e desvios padrão
results_metrics_df = results_metrics_df.drop(
    columns=[f'Mean {metric}' for metric in metrics_columns] + [f'Std {metric}' for metric in metrics_columns]
)

# Cria tabelas pivôs separadas para cada métrica e tipo de escalonamento
tables_metrics = {}
for metric in ['Accuracy', 'Precision', 'Recall', 'F1 Score']:
    for scaler_name in scalers.keys():
        table_metric = results_metrics_df[(results_metrics_df['Scaler'] == scaler_name)].pivot(index='Error Percentage',
                                                                                                columns='Error Count',
                                                                                                values=[f'Mean {metric} & Std'])
        tables_metrics[f'{metric} ({scaler_name})'] = table_metric

# Exibe as tabelas para cada métrica e tipo de escalonamento
print("\nTabelas para cada métrica (Acurácia, Precisão, Recall, F1-Score) e cada tipo de escalonamento:")
for metric, table_metric in tables_metrics.items():
    print(f"\nTabela para {metric}:")
    display(table_metric)

path = "/content/drive/MyDrive/archive/"

results_metrics_df.to_csv(path + 'resultados_metrics_svm.csv', index=False)


Tabelas para cada métrica (Acurácia, Precisão, Recall, F1-Score) e tipo de escalonamento:

Tabela para Accuracy (MinMaxScaler):


Unnamed: 0_level_0,Mean Accuracy & Std,Mean Accuracy & Std,Mean Accuracy & Std,Mean Accuracy & Std,Mean Accuracy & Std
Error Count,1,2,3,4,5
Error Percentage,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
0.05,0.99 ± 0.01,0.99 ± 0.01,0.98 ± 0.01,0.98 ± 0.01,0.97 ± 0.01
0.1,0.98 ± 0.01,0.98 ± 0.01,0.96 ± 0.01,0.94 ± 0.02,0.93 ± 0.02
0.15,0.97 ± 0.02,0.96 ± 0.01,0.93 ± 0.02,0.90 ± 0.02,0.89 ± 0.02
0.2,0.95 ± 0.01,0.94 ± 0.02,0.90 ± 0.02,0.87 ± 0.03,0.85 ± 0.02
0.3,0.94 ± 0.02,0.91 ± 0.02,0.85 ± 0.02,0.82 ± 0.02,0.79 ± 0.02
0.4,0.93 ± 0.02,0.89 ± 0.03,0.83 ± 0.03,0.78 ± 0.02,0.74 ± 0.02
0.5,0.93 ± 0.02,0.88 ± 0.01,0.81 ± 0.02,0.75 ± 0.03,0.69 ± 0.01



Tabela para Accuracy (StandardScaler):


Unnamed: 0_level_0,Mean Accuracy & Std,Mean Accuracy & Std,Mean Accuracy & Std,Mean Accuracy & Std,Mean Accuracy & Std
Error Count,1,2,3,4,5
Error Percentage,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
0.05,0.99 ± 0.01,0.99 ± 0.01,0.98 ± 0.01,0.98 ± 0.01,0.97 ± 0.01
0.1,0.98 ± 0.01,0.98 ± 0.01,0.96 ± 0.02,0.95 ± 0.02,0.93 ± 0.01
0.15,0.97 ± 0.02,0.97 ± 0.01,0.93 ± 0.02,0.90 ± 0.02,0.89 ± 0.01
0.2,0.95 ± 0.01,0.94 ± 0.01,0.90 ± 0.02,0.87 ± 0.02,0.84 ± 0.02
0.3,0.94 ± 0.02,0.91 ± 0.02,0.85 ± 0.02,0.82 ± 0.02,0.78 ± 0.02
0.4,0.93 ± 0.02,0.89 ± 0.02,0.83 ± 0.02,0.78 ± 0.02,0.73 ± 0.02
0.5,0.92 ± 0.02,0.88 ± 0.02,0.81 ± 0.01,0.75 ± 0.03,0.69 ± 0.02



Tabela para Precision (MinMaxScaler):


Unnamed: 0_level_0,Mean Precision & Std,Mean Precision & Std,Mean Precision & Std,Mean Precision & Std,Mean Precision & Std
Error Count,1,2,3,4,5
Error Percentage,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
0.05,0.99 ± 0.01,0.99 ± 0.01,0.99 ± 0.01,0.99 ± 0.01,0.99 ± 0.01
0.1,0.99 ± 0.01,0.99 ± 0.01,0.97 ± 0.02,0.96 ± 0.04,0.95 ± 0.03
0.15,0.98 ± 0.03,0.99 ± 0.02,0.95 ± 0.04,0.91 ± 0.02,0.90 ± 0.03
0.2,0.98 ± 0.02,0.97 ± 0.03,0.92 ± 0.03,0.88 ± 0.04,0.86 ± 0.02
0.3,0.95 ± 0.04,0.92 ± 0.04,0.86 ± 0.03,0.83 ± 0.03,0.81 ± 0.03
0.4,0.92 ± 0.02,0.89 ± 0.04,0.82 ± 0.04,0.79 ± 0.02,0.76 ± 0.03
0.5,0.94 ± 0.02,0.89 ± 0.03,0.82 ± 0.04,0.76 ± 0.04,0.71 ± 0.02



Tabela para Precision (StandardScaler):


Unnamed: 0_level_0,Mean Precision & Std,Mean Precision & Std,Mean Precision & Std,Mean Precision & Std,Mean Precision & Std
Error Count,1,2,3,4,5
Error Percentage,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
0.05,0.99 ± 0.01,0.99 ± 0.01,0.99 ± 0.01,0.99 ± 0.01,0.99 ± 0.01
0.1,0.99 ± 0.01,0.99 ± 0.01,0.98 ± 0.02,0.97 ± 0.03,0.95 ± 0.02
0.15,0.98 ± 0.02,0.99 ± 0.01,0.95 ± 0.04,0.90 ± 0.02,0.90 ± 0.02
0.2,0.98 ± 0.01,0.98 ± 0.02,0.92 ± 0.04,0.89 ± 0.04,0.85 ± 0.04
0.3,0.96 ± 0.04,0.93 ± 0.05,0.86 ± 0.03,0.83 ± 0.03,0.79 ± 0.03
0.4,0.92 ± 0.02,0.89 ± 0.04,0.83 ± 0.03,0.79 ± 0.02,0.76 ± 0.02
0.5,0.93 ± 0.03,0.89 ± 0.02,0.82 ± 0.04,0.76 ± 0.04,0.71 ± 0.03



Tabela para Recall (MinMaxScaler):


Unnamed: 0_level_0,Mean Recall & Std,Mean Recall & Std,Mean Recall & Std,Mean Recall & Std,Mean Recall & Std
Error Count,1,2,3,4,5
Error Percentage,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
0.05,0.99 ± 0.01,0.98 ± 0.01,0.97 ± 0.01,0.96 ± 0.01,0.95 ± 0.01
0.1,0.97 ± 0.01,0.97 ± 0.01,0.94 ± 0.02,0.93 ± 0.01,0.92 ± 0.02
0.15,0.96 ± 0.02,0.94 ± 0.03,0.91 ± 0.02,0.89 ± 0.03,0.87 ± 0.02
0.2,0.93 ± 0.03,0.91 ± 0.01,0.88 ± 0.03,0.86 ± 0.03,0.83 ± 0.02
0.3,0.94 ± 0.03,0.90 ± 0.04,0.84 ± 0.02,0.81 ± 0.03,0.76 ± 0.03
0.4,0.94 ± 0.02,0.89 ± 0.02,0.84 ± 0.03,0.76 ± 0.03,0.70 ± 0.02
0.5,0.92 ± 0.02,0.88 ± 0.03,0.82 ± 0.05,0.74 ± 0.05,0.65 ± 0.04



Tabela para Recall (StandardScaler):


Unnamed: 0_level_0,Mean Recall & Std,Mean Recall & Std,Mean Recall & Std,Mean Recall & Std,Mean Recall & Std
Error Count,1,2,3,4,5
Error Percentage,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
0.05,0.99 ± 0.01,0.98 ± 0.01,0.97 ± 0.01,0.96 ± 0.01,0.95 ± 0.01
0.1,0.97 ± 0.01,0.97 ± 0.01,0.94 ± 0.02,0.93 ± 0.01,0.91 ± 0.02
0.15,0.96 ± 0.01,0.94 ± 0.03,0.91 ± 0.03,0.89 ± 0.03,0.87 ± 0.02
0.2,0.93 ± 0.03,0.90 ± 0.01,0.88 ± 0.02,0.86 ± 0.02,0.82 ± 0.02
0.3,0.93 ± 0.03,0.90 ± 0.04,0.84 ± 0.02,0.82 ± 0.02,0.76 ± 0.03
0.4,0.94 ± 0.02,0.88 ± 0.02,0.84 ± 0.03,0.77 ± 0.04,0.70 ± 0.03
0.5,0.92 ± 0.01,0.87 ± 0.03,0.81 ± 0.05,0.74 ± 0.06,0.65 ± 0.03



Tabela para F1 Score (MinMaxScaler):


Unnamed: 0_level_0,Mean F1 Score & Std,Mean F1 Score & Std,Mean F1 Score & Std,Mean F1 Score & Std,Mean F1 Score & Std
Error Count,1,2,3,4,5
Error Percentage,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
0.05,0.99 ± 0.01,0.99 ± 0.01,0.98 ± 0.01,0.98 ± 0.01,0.97 ± 0.01
0.1,0.98 ± 0.01,0.98 ± 0.01,0.96 ± 0.01,0.94 ± 0.02,0.93 ± 0.02
0.15,0.97 ± 0.02,0.96 ± 0.01,0.93 ± 0.02,0.90 ± 0.02,0.89 ± 0.02
0.2,0.95 ± 0.01,0.94 ± 0.02,0.90 ± 0.02,0.87 ± 0.02,0.84 ± 0.02
0.3,0.94 ± 0.02,0.91 ± 0.02,0.85 ± 0.02,0.82 ± 0.02,0.78 ± 0.02
0.4,0.93 ± 0.02,0.89 ± 0.02,0.83 ± 0.02,0.77 ± 0.02,0.73 ± 0.02
0.5,0.93 ± 0.02,0.88 ± 0.01,0.82 ± 0.02,0.75 ± 0.03,0.68 ± 0.02



Tabela para F1 Score (StandardScaler):


Unnamed: 0_level_0,Mean F1 Score & Std,Mean F1 Score & Std,Mean F1 Score & Std,Mean F1 Score & Std,Mean F1 Score & Std
Error Count,1,2,3,4,5
Error Percentage,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
0.05,0.99 ± 0.01,0.99 ± 0.01,0.98 ± 0.01,0.98 ± 0.01,0.97 ± 0.01
0.1,0.98 ± 0.01,0.98 ± 0.01,0.96 ± 0.01,0.95 ± 0.02,0.93 ± 0.01
0.15,0.97 ± 0.02,0.97 ± 0.01,0.93 ± 0.02,0.90 ± 0.02,0.89 ± 0.01
0.2,0.95 ± 0.02,0.94 ± 0.01,0.90 ± 0.02,0.87 ± 0.02,0.84 ± 0.02
0.3,0.94 ± 0.02,0.91 ± 0.02,0.85 ± 0.02,0.82 ± 0.01,0.78 ± 0.01
0.4,0.93 ± 0.02,0.89 ± 0.02,0.83 ± 0.02,0.78 ± 0.02,0.73 ± 0.01
0.5,0.92 ± 0.02,0.88 ± 0.02,0.81 ± 0.01,0.75 ± 0.03,0.68 ± 0.02


O código a seguir obtém a média e o desvio padrão das métricas analisadas no estudo para os 10 datasets com erros gerados a partir do dataframe df2 nos 35 cenários analisados, testando diferentes tipos de escalonamento, uma vez que os resultados foram perfeitos em todos os casos sem erros testados.

In [None]:
import pickle
from warnings import simplefilter

# ignore all warnings
simplefilter(action='ignore')

# Funções para salvar e carregar o estado da lista results_metrics
def save_results(file_path, results):
    with open(file_path, 'wb') as f:
        pickle.dump(results, f)

def load_results(file_path):
    with open(file_path, 'rb') as f:
        return pickle.load(f)

# Carrega os resultados salvos, se existirem
results_file = "/content/drive/MyDrive/archive/results_metrics_svm_2.pkl"
try:
    results_metrics = load_results(results_file)
except FileNotFoundError:
    results_metrics = []

# Definição das listas de parâmetros
error_percentages = [0.05, 0.10, 0.15, 0.20, 0.30, 0.40, 0.5]
error_counts = [1, 2, 3, 4, 5]
random_states = [10, 23, 87, 41, 65, 12, 98, 34, 72, 19]

scalers = {
    "MinMaxScaler": MinMaxScaler(),
    "StandardScaler": StandardScaler()
}

X = df2.drop('Class', axis=1)
y = df2['Class']

for error_percentage, error_count in product(error_percentages, error_counts):
    for scaler_name, scaler in scalers.items():

        existing_result = [result for result in results_metrics if result[0] == error_percentage and result[1] == error_count and result[2] == scaler_name]
        if existing_result:
            continue

        # Listas para armazenar as métricas dos 10 datasets
        row_results_metrics = {'accuracy': [], 'precision': [], 'recall': [], 'f1_score': []}

        for random_state in random_states:
            X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, stratify=y, random_state=random_state)

            X_test_error = introduce_errors(X_test.copy(), error_percentage, error_count, random_state)

            X_train_scaled = scaler.fit_transform(X_train)
            X_test_error_scaled = scaler.transform(X_test_error)

            # Modelo SVM com kernel linear
            svm_model = SVC(kernel='linear', random_state=10)

            # Grade de hiperparâmetros
            param_grid = {
            'C': [0.1, 0.5, 1, 10, 100],
            'tol': [1e-5, 1e-4, 1e-3],
            }

            # Busca em grade
            grid_search = GridSearchCV(svm_model, param_grid, cv=5, scoring='accuracy')
            grid_search.fit(X_train_scaled, y_train)

            best_model = grid_search.best_estimator_

            # Fazendo previsões no conjunto de teste com erros
            y_pred_error = best_model.predict(X_test_error_scaled)

            # Calcule as métricas e armazene-as
            accuracy = accuracy_score(y_test, y_pred_error)
            precision = precision_score(y_test, y_pred_error)
            recall = recall_score(y_test, y_pred_error)
            f1 = f1_score(y_test, y_pred_error)

            # Adicionando as métricas as listas
            row_results_metrics['accuracy'].append(accuracy)
            row_results_metrics['precision'].append(precision)
            row_results_metrics['recall'].append(recall)
            row_results_metrics['f1_score'].append(f1)

        # Cálculo das médias e desvios padrão das métricas
        mean_results_metrics = {metric: np.mean(values) for metric, values in row_results_metrics.items()}
        std_results_metrics = {metric: np.std(values) for metric, values in row_results_metrics.items()}
        results_metrics.append([error_percentage, error_count, scaler_name,
                                mean_results_metrics['accuracy'], std_results_metrics['accuracy'],
                                mean_results_metrics['precision'], std_results_metrics['precision'],
                                mean_results_metrics['recall'], std_results_metrics['recall'],
                                mean_results_metrics['f1_score'], std_results_metrics['f1_score']])

        # Salva os resultados após cada iteração
        save_results(results_file, results_metrics)

# Cria o DataFrame com os resultados
results_metrics_df = pd.DataFrame(results_metrics, columns=['Error Percentage', 'Error Count', 'Scaler',
                                                             'Mean Accuracy', 'Std Accuracy',
                                                             'Mean Precision', 'Std Precision',
                                                             'Mean Recall', 'Std Recall',
                                                             'Mean F1 Score', 'Std F1 Score'])

# Formata as métricas médias e desvios padrão em uma única coluna
metrics_columns = ['Accuracy', 'Precision', 'Recall', 'F1 Score']
for metric in metrics_columns:
    results_metrics_df[f'Mean {metric} & Std'] = results_metrics_df.apply(
        lambda row: f"{row[f'Mean {metric}']:.2f} ± {row[f'Std {metric}']:.2f}", axis=1
    )

# Remove as colunas de métricas médias e desvios padrão
results_metrics_df = results_metrics_df.drop(
    columns=[f'Mean {metric}' for metric in metrics_columns] + [f'Std {metric}' for metric in metrics_columns]
)

# Cria tabelas pivôs separadas para cada métrica e tipo de escalonamento
tables_metrics = {}
for metric in ['Accuracy', 'Precision', 'Recall', 'F1 Score']:
    for scaler_name in scalers.keys():
        table_metric = results_metrics_df[(results_metrics_df['Scaler'] == scaler_name)].pivot(index='Error Percentage',
                                                                                                columns='Error Count',
                                                                                                values=[f'Mean {metric} & Std'])
        tables_metrics[f'{metric} ({scaler_name})'] = table_metric

# Exibe as tabelas para cada métrica e tipo de escalonamento
print("\nTabelas para cada métrica (Acurácia, Precisão, Recall, F1-Score) e cada tipo de escalonamento:")
for metric, table_metric in tables_metrics.items():
    print(f"\nTabela para {metric}:")
    display(table_metric)

path = "/content/drive/MyDrive/archive/"

results_metrics_df.to_csv(path + 'resultados_metrics_svm_2.csv', index=False)


Tabelas para cada métrica (Acurácia, Precisão, Recall, F1-Score) e tipo de escalonamento:

Tabela para Accuracy (MinMaxScaler):


Unnamed: 0_level_0,Mean Accuracy & Std,Mean Accuracy & Std,Mean Accuracy & Std,Mean Accuracy & Std,Mean Accuracy & Std
Error Count,1,2,3,4,5
Error Percentage,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
0.05,0.99 ± 0.01,0.98 ± 0.01,0.98 ± 0.01,0.97 ± 0.01,0.97 ± 0.01
0.1,0.98 ± 0.01,0.98 ± 0.01,0.96 ± 0.02,0.95 ± 0.01,0.93 ± 0.02
0.15,0.97 ± 0.02,0.96 ± 0.01,0.91 ± 0.02,0.90 ± 0.02,0.88 ± 0.02
0.2,0.94 ± 0.02,0.94 ± 0.02,0.89 ± 0.02,0.87 ± 0.02,0.84 ± 0.02
0.3,0.94 ± 0.02,0.90 ± 0.03,0.85 ± 0.03,0.82 ± 0.02,0.78 ± 0.02
0.4,0.92 ± 0.02,0.87 ± 0.02,0.83 ± 0.02,0.78 ± 0.03,0.73 ± 0.03
0.5,0.92 ± 0.01,0.87 ± 0.01,0.81 ± 0.02,0.75 ± 0.03,0.70 ± 0.02



Tabela para Accuracy (StandardScaler):


Unnamed: 0_level_0,Mean Accuracy & Std,Mean Accuracy & Std,Mean Accuracy & Std,Mean Accuracy & Std,Mean Accuracy & Std
Error Count,1,2,3,4,5
Error Percentage,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
0.05,0.99 ± 0.01,0.98 ± 0.01,0.98 ± 0.01,0.97 ± 0.01,0.97 ± 0.01
0.1,0.98 ± 0.01,0.98 ± 0.01,0.96 ± 0.01,0.95 ± 0.02,0.93 ± 0.02
0.15,0.97 ± 0.02,0.96 ± 0.01,0.92 ± 0.01,0.90 ± 0.03,0.88 ± 0.02
0.2,0.94 ± 0.02,0.93 ± 0.02,0.89 ± 0.02,0.86 ± 0.03,0.84 ± 0.03
0.3,0.94 ± 0.02,0.91 ± 0.02,0.85 ± 0.02,0.81 ± 0.03,0.77 ± 0.02
0.4,0.93 ± 0.02,0.88 ± 0.02,0.82 ± 0.03,0.78 ± 0.03,0.73 ± 0.03
0.5,0.92 ± 0.02,0.87 ± 0.01,0.82 ± 0.02,0.75 ± 0.03,0.69 ± 0.02



Tabela para Precision (MinMaxScaler):


Unnamed: 0_level_0,Mean Precision & Std,Mean Precision & Std,Mean Precision & Std,Mean Precision & Std,Mean Precision & Std
Error Count,1,2,3,4,5
Error Percentage,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
0.05,0.98 ± 0.02,0.99 ± 0.02,0.98 ± 0.02,0.99 ± 0.02,0.99 ± 0.01
0.1,0.99 ± 0.01,0.99 ± 0.01,0.98 ± 0.02,0.97 ± 0.02,0.94 ± 0.03
0.15,0.98 ± 0.02,0.98 ± 0.02,0.92 ± 0.02,0.91 ± 0.03,0.90 ± 0.03
0.2,0.96 ± 0.02,0.97 ± 0.03,0.90 ± 0.04,0.89 ± 0.04,0.86 ± 0.04
0.3,0.95 ± 0.03,0.92 ± 0.05,0.86 ± 0.04,0.83 ± 0.03,0.80 ± 0.03
0.4,0.92 ± 0.02,0.88 ± 0.04,0.83 ± 0.04,0.79 ± 0.03,0.75 ± 0.03
0.5,0.92 ± 0.02,0.88 ± 0.04,0.82 ± 0.04,0.76 ± 0.04,0.72 ± 0.03



Tabela para Precision (StandardScaler):


Unnamed: 0_level_0,Mean Precision & Std,Mean Precision & Std,Mean Precision & Std,Mean Precision & Std,Mean Precision & Std
Error Count,1,2,3,4,5
Error Percentage,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
0.05,0.99 ± 0.01,0.99 ± 0.02,0.98 ± 0.02,0.99 ± 0.02,0.99 ± 0.02
0.1,0.99 ± 0.01,0.99 ± 0.01,0.97 ± 0.02,0.97 ± 0.03,0.95 ± 0.03
0.15,0.97 ± 0.02,0.98 ± 0.02,0.94 ± 0.03,0.91 ± 0.04,0.90 ± 0.03
0.2,0.97 ± 0.02,0.97 ± 0.03,0.91 ± 0.04,0.88 ± 0.06,0.86 ± 0.05
0.3,0.95 ± 0.04,0.93 ± 0.05,0.86 ± 0.04,0.82 ± 0.05,0.79 ± 0.03
0.4,0.93 ± 0.02,0.89 ± 0.04,0.82 ± 0.04,0.79 ± 0.03,0.75 ± 0.03
0.5,0.93 ± 0.02,0.89 ± 0.03,0.82 ± 0.05,0.76 ± 0.04,0.71 ± 0.03



Tabela para Recall (MinMaxScaler):


Unnamed: 0_level_0,Mean Recall & Std,Mean Recall & Std,Mean Recall & Std,Mean Recall & Std,Mean Recall & Std
Error Count,1,2,3,4,5
Error Percentage,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
0.05,0.99 ± 0.01,0.98 ± 0.01,0.97 ± 0.01,0.96 ± 0.01,0.95 ± 0.01
0.1,0.97 ± 0.01,0.97 ± 0.01,0.94 ± 0.02,0.93 ± 0.01,0.91 ± 0.02
0.15,0.96 ± 0.02,0.94 ± 0.03,0.90 ± 0.03,0.89 ± 0.02,0.87 ± 0.03
0.2,0.93 ± 0.02,0.90 ± 0.02,0.88 ± 0.02,0.86 ± 0.02,0.82 ± 0.01
0.3,0.93 ± 0.02,0.89 ± 0.04,0.84 ± 0.02,0.81 ± 0.03,0.76 ± 0.03
0.4,0.93 ± 0.03,0.88 ± 0.02,0.83 ± 0.03,0.78 ± 0.04,0.70 ± 0.03
0.5,0.92 ± 0.02,0.87 ± 0.05,0.81 ± 0.04,0.73 ± 0.04,0.66 ± 0.03



Tabela para Recall (StandardScaler):


Unnamed: 0_level_0,Mean Recall & Std,Mean Recall & Std,Mean Recall & Std,Mean Recall & Std,Mean Recall & Std
Error Count,1,2,3,4,5
Error Percentage,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
0.05,0.99 ± 0.01,0.98 ± 0.01,0.97 ± 0.01,0.96 ± 0.01,0.95 ± 0.01
0.1,0.97 ± 0.01,0.97 ± 0.01,0.94 ± 0.02,0.93 ± 0.01,0.91 ± 0.02
0.15,0.96 ± 0.01,0.94 ± 0.03,0.90 ± 0.02,0.89 ± 0.03,0.87 ± 0.02
0.2,0.92 ± 0.03,0.90 ± 0.02,0.87 ± 0.02,0.84 ± 0.02,0.81 ± 0.02
0.3,0.93 ± 0.02,0.89 ± 0.04,0.83 ± 0.02,0.80 ± 0.02,0.76 ± 0.04
0.4,0.93 ± 0.03,0.87 ± 0.02,0.83 ± 0.03,0.77 ± 0.04,0.70 ± 0.03
0.5,0.92 ± 0.02,0.86 ± 0.03,0.81 ± 0.04,0.74 ± 0.04,0.66 ± 0.02



Tabela para F1 Score (MinMaxScaler):


Unnamed: 0_level_0,Mean F1 Score & Std,Mean F1 Score & Std,Mean F1 Score & Std,Mean F1 Score & Std,Mean F1 Score & Std
Error Count,1,2,3,4,5
Error Percentage,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
0.05,0.99 ± 0.01,0.98 ± 0.01,0.98 ± 0.01,0.97 ± 0.01,0.97 ± 0.01
0.1,0.98 ± 0.01,0.98 ± 0.01,0.96 ± 0.01,0.95 ± 0.01,0.93 ± 0.02
0.15,0.97 ± 0.02,0.96 ± 0.01,0.91 ± 0.02,0.90 ± 0.02,0.88 ± 0.02
0.2,0.94 ± 0.02,0.93 ± 0.02,0.89 ± 0.02,0.87 ± 0.02,0.84 ± 0.02
0.3,0.94 ± 0.02,0.90 ± 0.02,0.85 ± 0.03,0.82 ± 0.01,0.78 ± 0.02
0.4,0.92 ± 0.02,0.88 ± 0.02,0.83 ± 0.02,0.78 ± 0.03,0.72 ± 0.03
0.5,0.92 ± 0.01,0.87 ± 0.01,0.81 ± 0.02,0.75 ± 0.02,0.69 ± 0.02



Tabela para F1 Score (StandardScaler):


Unnamed: 0_level_0,Mean F1 Score & Std,Mean F1 Score & Std,Mean F1 Score & Std,Mean F1 Score & Std,Mean F1 Score & Std
Error Count,1,2,3,4,5
Error Percentage,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
0.05,0.99 ± 0.01,0.98 ± 0.01,0.98 ± 0.01,0.97 ± 0.01,0.97 ± 0.01
0.1,0.98 ± 0.01,0.98 ± 0.01,0.96 ± 0.01,0.95 ± 0.02,0.93 ± 0.02
0.15,0.97 ± 0.01,0.96 ± 0.01,0.92 ± 0.01,0.90 ± 0.03,0.88 ± 0.02
0.2,0.94 ± 0.02,0.93 ± 0.02,0.89 ± 0.02,0.86 ± 0.03,0.83 ± 0.03
0.3,0.94 ± 0.02,0.91 ± 0.02,0.85 ± 0.02,0.81 ± 0.02,0.77 ± 0.02
0.4,0.93 ± 0.02,0.88 ± 0.02,0.82 ± 0.03,0.78 ± 0.03,0.73 ± 0.03
0.5,0.92 ± 0.02,0.87 ± 0.02,0.82 ± 0.02,0.75 ± 0.03,0.68 ± 0.02


# KNN

O código a seguir obtém a média e o desvio padrão das métricas analisadas no estudo para os 10 datasets gerados a partir do dataframe df, testando diferentes tipos de escalonamento.

In [None]:
X = df.drop('Class', axis=1)
y = df['Class']

scalers = {
    "MinMaxScaler": MinMaxScaler(),
    "StandardScaler": StandardScaler()
}

random_states = [10, 23, 87, 41, 65, 12, 98, 34, 72, 19]

for scaler_name, scaler in scalers.items():

    # Listas para armazenar métricas
    accuracy_list = []
    precision_list = []
    recall_list = []
    f1_list = []

    for random_state in random_states:
        X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, stratify=y, random_state=random_state)

        X_train_scaled = scaler.fit_transform(X_train)
        X_test_scaled = scaler.transform(X_test)

        # Modelo KNN
        knn_model = KNeighborsClassifier()

        param_grid = {
            'n_neighbors': [5, 6, 7, 8, 9, 10, 11, 12],
            'weights': ['uniform', 'distance'],
            'p': [1, 2],
            'metric': ['minkowski', 'cosine',  'nan_euclidean']
        }

        # Executar a pesquisa em grade para encontrar os melhores parâmetros
        grid_search = GridSearchCV(knn_model, param_grid, cv=5, scoring='accuracy')
        grid_search.fit(X_train_scaled, y_train)

        # Obter o melhor modelo a partir da pesquisa em grade
        best_model = grid_search.best_estimator_

        y_pred = best_model.predict(X_test_scaled)

        accuracy = accuracy_score(y_test, y_pred)
        precision = precision_score(y_test, y_pred)
        recall = recall_score(y_test, y_pred)
        f1 = f1_score(y_test, y_pred)

        # Adicionando métricas às listas
        accuracy_list.append(accuracy)
        precision_list.append(precision)
        recall_list.append(recall)
        f1_list.append(f1)

    # Calculando média e desvio padrão
    metrics_dict = {
        'Acurácia': accuracy_list,
        'Precisão': precision_list,
        'Recall': recall_list,
        'F1 Score': f1_list
    }

    metrics_df = pd.DataFrame(metrics_dict)
    metrics_mean = metrics_df.mean()
    metrics_std = metrics_df.std()

    # Exibindo média e desvio padrão
    print(f"Escalonamento: {scaler_name}")
    for metric in metrics_mean.index:
        print(f"{metric}: {metrics_mean[metric]:.2f} ± {metrics_std[metric]:.2f}")
    print("\n")

Escalonamento: MinMaxScaler
Acurácia: 0.91 ± 0.01
Precisão: 0.86 ± 0.02
Recall: 0.98 ± 0.01
F1 Score: 0.92 ± 0.01


Escalonamento: StandardScaler
Acurácia: 0.93 ± 0.02
Precisão: 0.92 ± 0.03
Recall: 0.95 ± 0.02
F1 Score: 0.93 ± 0.02




O código a seguir obtém a média e o desvio padrão das métricas analisadas no estudo para os 10 datasets gerados a partir do dataframe df2, testando diferentes tipos de escalonamento.

In [None]:
X = df2.drop('Class', axis=1)
y = df2['Class']

scalers = {
    "MinMaxScaler": MinMaxScaler(),
    "StandardScaler": StandardScaler()
}

random_states = [10, 23, 87, 41, 65, 12, 98, 34, 72, 19]

for scaler_name, scaler in scalers.items():

    # Listas para armazenar métricas
    accuracy_list = []
    precision_list = []
    recall_list = []
    f1_list = []

    for random_state in random_states:
        X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, stratify=y, random_state=random_state)

        X_train_scaled = scaler.fit_transform(X_train)
        X_test_scaled = scaler.transform(X_test)

        # Modelo KNN
        knn_model = KNeighborsClassifier()

        param_grid = {
            'n_neighbors': [5, 6, 7, 8, 9, 10, 11, 12],
            'weights': ['uniform', 'distance'],
            'p': [1, 2],
            'metric': ['minkowski', 'cosine',  'nan_euclidean']
        }

        # Executar a pesquisa em grade para encontrar os melhores parâmetros
        grid_search = GridSearchCV(knn_model, param_grid, cv=5, scoring='accuracy')
        grid_search.fit(X_train_scaled, y_train)

        # Obter o melhor modelo a partir da pesquisa em grade
        best_model = grid_search.best_estimator_

        y_pred = best_model.predict(X_test_scaled)

        accuracy = accuracy_score(y_test, y_pred)
        precision = precision_score(y_test, y_pred)
        recall = recall_score(y_test, y_pred)
        f1 = f1_score(y_test, y_pred)

        # Adicionando métricas às listas
        accuracy_list.append(accuracy)
        precision_list.append(precision)
        recall_list.append(recall)
        f1_list.append(f1)

    # Calculando média e desvio padrão
    metrics_dict = {
        'Acurácia': accuracy_list,
        'Precisão': precision_list,
        'Recall': recall_list,
        'F1 Score': f1_list
    }

    metrics_df = pd.DataFrame(metrics_dict)
    metrics_mean = metrics_df.mean()
    metrics_std = metrics_df.std()

    # Exibindo média e desvio padrão
    print(f"Escalonamento: {scaler_name}")
    for metric in metrics_mean.index:
        print(f"{metric}: {metrics_mean[metric]:.2f} ± {metrics_std[metric]:.2f}")
    print("\n")

Escalonamento: MinMaxScaler
Acurácia: 0.88 ± 0.04
Precisão: 0.84 ± 0.05
Recall: 0.94 ± 0.03
F1 Score: 0.89 ± 0.03


Escalonamento: StandardScaler
Acurácia: 0.86 ± 0.03
Precisão: 0.82 ± 0.04
Recall: 0.94 ± 0.05
F1 Score: 0.87 ± 0.03




O código a seguir obtém a média e o desvio padrão das métricas analisadas no estudo para os 10 datasets com erros gerados a partir do dataframe df nos 35 cenários analisados, uma vez que esse dataframe gerou os melhores resultados sem a inserção de erros. Como o Standard Scaler obteve os melhores resultados anteriormente, utilizamos ele.

In [None]:
# Lista de resultados das métricas
results_metrics = []

# Definição das listas de parâmetros
error_percentages = [0.05, 0.10, 0.15, 0.20, 0.30, 0.40, 0.5]
error_counts = [1, 2, 3, 4, 5]
random_states = [10, 23, 87, 41, 65, 12, 98, 34, 72, 19]

scalers = {
    "StandardScaler": StandardScaler()
}

X = df.drop('Class', axis=1)
y = df['Class']

for error_percentage, error_count in product(error_percentages, error_counts):
    for scaler_name, scaler in scalers.items():

        # Listas para armazenar as métricas dos 10 datasets
        row_results_metrics = {'accuracy': [], 'precision': [], 'recall': [], 'f1_score': []}

        for random_state in random_states:
            X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, stratify=y, random_state=random_state)

            X_test_error = introduce_errors(X_test.copy(), error_percentage, error_count, random_state)

            X_train_scaled = scaler.fit_transform(X_train)
            X_test_error_scaled = scaler.transform(X_test_error)

            # Modelo KNN
            knn_model = KNeighborsClassifier()

            param_grid = {
                'n_neighbors': [5, 6, 7, 8, 9, 10, 11, 12],
                'weights': ['uniform', 'distance'],
                'p': [1, 2],
                'metric': ['minkowski', 'cosine',  'nan_euclidean']
            }

            # Executar a pesquisa em grade para encontrar os melhores parâmetros
            grid_search = GridSearchCV(knn_model, param_grid, cv=5, scoring='accuracy')
            grid_search.fit(X_train_scaled, y_train)

            best_model = grid_search.best_estimator_
            y_pred_error = best_model.predict(X_test_error_scaled)

             # Calcule as métricas e armazene-as
            accuracy = accuracy_score(y_test, y_pred_error)
            precision = precision_score(y_test, y_pred_error)
            recall = recall_score(y_test, y_pred_error)
            f1 = f1_score(y_test, y_pred_error)

            # Adicionando as métricas as listas
            row_results_metrics['accuracy'].append(accuracy)
            row_results_metrics['precision'].append(precision)
            row_results_metrics['recall'].append(recall)
            row_results_metrics['f1_score'].append(f1)

        # Cálculo das médias e desvios padrão das métricas
        mean_results_metrics = {metric: np.mean(values) for metric, values in row_results_metrics.items()}
        std_results_metrics = {metric: np.std(values) for metric, values in row_results_metrics.items()}
        results_metrics.append([error_percentage, error_count, scaler_name,
                                mean_results_metrics['accuracy'], std_results_metrics['accuracy'],
                                mean_results_metrics['precision'], std_results_metrics['precision'],
                                mean_results_metrics['recall'], std_results_metrics['recall'],
                                mean_results_metrics['f1_score'], std_results_metrics['f1_score']])

# Cria o DataFrame com os resultados
results_metrics_df = pd.DataFrame(results_metrics, columns=['Error Percentage', 'Error Count', 'Scaler',
                                                             'Mean Accuracy', 'Std Accuracy',
                                                             'Mean Precision', 'Std Precision',
                                                             'Mean Recall', 'Std Recall',
                                                             'Mean F1 Score', 'Std F1 Score'])

# Formata as métricas médias e desvios padrão em uma única coluna
metrics_columns = ['Accuracy', 'Precision', 'Recall', 'F1 Score']
for metric in metrics_columns:
    results_metrics_df[f'Mean {metric} & Std'] = results_metrics_df.apply(
        lambda row: f"{row[f'Mean {metric}']:.2f} ± {row[f'Std {metric}']:.2f}", axis=1
    )

# Remove as colunas de métricas médias e desvios padrão
results_metrics_df = results_metrics_df.drop(
    columns=[f'Mean {metric}' for metric in metrics_columns] + [f'Std {metric}' for metric in metrics_columns]
)

# Cria tabelas pivôs separadas para cada métrica e tipo de escalonamento
tables_metrics = {}
for metric in ['Accuracy', 'Precision', 'Recall', 'F1 Score']:
    for scaler_name in scalers.keys():
        table_metric = results_metrics_df[(results_metrics_df['Scaler'] == scaler_name)].pivot(index='Error Percentage',
                                                                                                columns='Error Count',
                                                                                                values=[f'Mean {metric} & Std'])
        tables_metrics[f'{metric} ({scaler_name})'] = table_metric

# Exibe as tabelas para cada métrica e tipo de escalonamento
print("\nTabelas para cada métrica (Acurácia, Precisão, Recall, F1-Score) e tipo de escalonamento:")
for (metric, scaler_name), table_metric in tables_metrics.items():
    print(f"\nTabela para {metric} com {scaler_name}:")
    display(table_metric)

path = "/content/drive/MyDrive/archive/"

results_metrics_df.to_csv(path + 'resultados_metrics_knn.csv', index=False)


Tabelas para cada métrica (Acurácia, Precisão, Recall, F1-Score) e tipo de escalonamento:

Tabela para Accuracy com StandardScaler:


Unnamed: 0_level_0,Mean Accuracy & Std,Mean Accuracy & Std,Mean Accuracy & Std,Mean Accuracy & Std,Mean Accuracy & Std
Error Count,1,2,3,4,5
Error Percentage,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
0.05,0.93 ± 0.02,0.92 ± 0.02,0.91 ± 0.02,0.91 ± 0.02,0.90 ± 0.02
0.1,0.92 ± 0.01,0.92 ± 0.03,0.89 ± 0.03,0.89 ± 0.02,0.88 ± 0.01
0.15,0.91 ± 0.02,0.90 ± 0.03,0.88 ± 0.02,0.86 ± 0.02,0.85 ± 0.02
0.2,0.90 ± 0.03,0.88 ± 0.02,0.85 ± 0.01,0.83 ± 0.02,0.82 ± 0.02
0.3,0.91 ± 0.03,0.87 ± 0.02,0.85 ± 0.03,0.80 ± 0.03,0.76 ± 0.03
0.4,0.90 ± 0.02,0.85 ± 0.02,0.81 ± 0.04,0.77 ± 0.03,0.73 ± 0.04
0.5,0.90 ± 0.02,0.85 ± 0.03,0.79 ± 0.02,0.72 ± 0.04,0.68 ± 0.03



Tabela para Precision com StandardScaler:


Unnamed: 0_level_0,Mean Precision & Std,Mean Precision & Std,Mean Precision & Std,Mean Precision & Std,Mean Precision & Std
Error Count,1,2,3,4,5
Error Percentage,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
0.05,0.91 ± 0.04,0.90 ± 0.04,0.89 ± 0.04,0.90 ± 0.03,0.90 ± 0.03
0.1,0.91 ± 0.03,0.91 ± 0.02,0.88 ± 0.03,0.89 ± 0.04,0.88 ± 0.03
0.15,0.89 ± 0.03,0.90 ± 0.03,0.87 ± 0.03,0.87 ± 0.03,0.87 ± 0.03
0.2,0.89 ± 0.03,0.87 ± 0.03,0.86 ± 0.04,0.85 ± 0.03,0.85 ± 0.03
0.3,0.90 ± 0.03,0.86 ± 0.02,0.85 ± 0.03,0.82 ± 0.04,0.79 ± 0.05
0.4,0.89 ± 0.04,0.85 ± 0.04,0.81 ± 0.05,0.79 ± 0.03,0.77 ± 0.05
0.5,0.89 ± 0.03,0.86 ± 0.05,0.80 ± 0.03,0.73 ± 0.05,0.70 ± 0.04



Tabela para Recall com StandardScaler:


Unnamed: 0_level_0,Mean Recall & Std,Mean Recall & Std,Mean Recall & Std,Mean Recall & Std,Mean Recall & Std
Error Count,1,2,3,4,5
Error Percentage,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
0.05,0.94 ± 0.03,0.94 ± 0.03,0.93 ± 0.02,0.93 ± 0.02,0.92 ± 0.03
0.1,0.94 ± 0.02,0.93 ± 0.04,0.91 ± 0.05,0.89 ± 0.04,0.88 ± 0.04
0.15,0.93 ± 0.03,0.91 ± 0.04,0.89 ± 0.04,0.86 ± 0.02,0.83 ± 0.03
0.2,0.92 ± 0.03,0.89 ± 0.03,0.84 ± 0.04,0.81 ± 0.05,0.78 ± 0.04
0.3,0.92 ± 0.05,0.88 ± 0.03,0.84 ± 0.04,0.77 ± 0.04,0.73 ± 0.03
0.4,0.91 ± 0.03,0.85 ± 0.03,0.81 ± 0.05,0.76 ± 0.06,0.67 ± 0.05
0.5,0.92 ± 0.03,0.85 ± 0.04,0.79 ± 0.04,0.72 ± 0.05,0.62 ± 0.04



Tabela para F1 Score com StandardScaler:


Unnamed: 0_level_0,Mean F1 Score & Std,Mean F1 Score & Std,Mean F1 Score & Std,Mean F1 Score & Std,Mean F1 Score & Std
Error Count,1,2,3,4,5
Error Percentage,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
0.05,0.93 ± 0.02,0.92 ± 0.02,0.91 ± 0.02,0.92 ± 0.02,0.91 ± 0.02
0.1,0.93 ± 0.01,0.92 ± 0.03,0.90 ± 0.03,0.89 ± 0.02,0.88 ± 0.02
0.15,0.91 ± 0.02,0.91 ± 0.03,0.88 ± 0.02,0.86 ± 0.02,0.85 ± 0.02
0.2,0.90 ± 0.02,0.88 ± 0.02,0.85 ± 0.01,0.83 ± 0.02,0.81 ± 0.02
0.3,0.91 ± 0.03,0.87 ± 0.02,0.85 ± 0.03,0.79 ± 0.03,0.76 ± 0.02
0.4,0.90 ± 0.02,0.85 ± 0.02,0.81 ± 0.04,0.77 ± 0.04,0.72 ± 0.04
0.5,0.91 ± 0.02,0.85 ± 0.03,0.79 ± 0.02,0.72 ± 0.04,0.66 ± 0.03


## Floresta Aleatória

O código a seguir obtém a média e o desvio padrão das métricas analisadas no estudo para os 10 datasets gerados a partir do dataframe df.

In [None]:
X = df.drop('Class', axis=1)
y = df['Class']

random_states = [10, 23, 87, 41, 65, 12, 98, 34, 72, 19]

accuracy_list = []
precision_list = []
recall_list = []
f1_list = []

for random_state in random_states:

    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, stratify=y, random_state=random_state)

    # Modelo de Floresta Aleatória
    rf_model = RandomForestClassifier(random_state=10)

    param_grid = {
        'n_estimators': [30, 50, 75, 100, 150, 200],
        'max_depth': [10, 15, 20, 30],
        'criterion': ['gini', 'entropy', 'log_loss']
    }

    # Executar a pesquisa em grade para encontrar os melhores parâmetros
    grid_search = GridSearchCV(rf_model, param_grid, cv=5, scoring='accuracy')
    grid_search.fit(X_train, y_train)

    # Obter o melhor modelo a partir da pesquisa em grade
    best_model = grid_search.best_estimator_

    # Fazendo previsões no conjunto de teste
    y_pred = best_model.predict(X_test)

    # Calcule as métricas e armazene-as
    accuracy = accuracy_score(y_test, y_pred)
    precision = precision_score(y_test, y_pred)
    recall = recall_score(y_test, y_pred)
    f1 = f1_score(y_test, y_pred)

    accuracy_list.append(accuracy)
    precision_list.append(precision)
    recall_list.append(recall)
    f1_list.append(f1)

# Calculando média e desvio padrão
mean_accuracy = np.mean(accuracy_list)
std_accuracy = np.std(accuracy_list)
mean_precision = np.mean(precision_list)
std_precision = np.std(precision_list)
mean_recall = np.mean(recall_list)
std_recall = np.std(recall_list)
mean_f1 = np.mean(f1_list)
std_f1 = np.std(f1_list)

# Exibir a média e o desvio padrão das métricas
print("\nMédia e Desvio Padrão das Métricas:")
print(f"Acurácia: {mean_accuracy:.2f} ± {std_accuracy:.2f}")
print(f"Precisão: {mean_precision:.2f} ± {std_precision:.2f}")
print(f"Recall: {mean_recall:.2f} ± {std_recall:.2f}")
print(f"F1 Score: {mean_f1:.2f} ± {std_f1:.2f}")


Média e Desvio Padrão das Métricas:
Acurácia: 0.95 ± 0.01
Precisão: 0.95 ± 0.02
Recall: 0.95 ± 0.02
F1 Score: 0.95 ± 0.01


O código a seguir obtém a média e o desvio padrão das métricas analisadas no estudo para os 10 datasets gerados a partir do dataframe df2.

In [None]:
X = df2.drop('Class', axis=1)
y = df2['Class']

random_states = [10, 23, 87, 41, 65, 12, 98, 34, 72, 19]

accuracy_list = []
precision_list = []
recall_list = []
f1_list = []

for random_state in random_states:
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, stratify=y, random_state=random_state)

    # Modelo de Floresta Aleatória
    rf_model = RandomForestClassifier(random_state=10)

    param_grid = {
        'n_estimators': [30, 50, 75, 100, 150, 200],
        'max_depth': [10, 15, 20, 30],
        'criterion': ['gini', 'entropy', 'log_loss']
    }

    # Executar a pesquisa em grade para encontrar os melhores parâmetros
    grid_search = GridSearchCV(rf_model, param_grid, cv=5, scoring='accuracy')
    grid_search.fit(X_train, y_train)

    # Obter o melhor modelo a partir da pesquisa em grade
    best_model = grid_search.best_estimator_

    # Fazendo previsões no conjunto de teste
    y_pred = best_model.predict(X_test)

    # Calcule as métricas e armazene-as
    accuracy = accuracy_score(y_test, y_pred)
    precision = precision_score(y_test, y_pred)
    recall = recall_score(y_test, y_pred)
    f1 = f1_score(y_test, y_pred)

    accuracy_list.append(accuracy)
    precision_list.append(precision)
    recall_list.append(recall)
    f1_list.append(f1)

# Calculando média e desvio padrão
mean_accuracy = np.mean(accuracy_list)
std_accuracy = np.std(accuracy_list)
mean_precision = np.mean(precision_list)
std_precision = np.std(precision_list)
mean_recall = np.mean(recall_list)
std_recall = np.std(recall_list)
mean_f1 = np.mean(f1_list)
std_f1 = np.std(f1_list)

# Exibir a média e o desvio padrão das métricas
print("\nMédia e Desvio Padrão das Métricas:")
print(f"Acurácia: {mean_accuracy:.2f} ± {std_accuracy:.2f}")
print(f"Precisão: {mean_precision:.2f} ± {std_precision:.2f}")
print(f"Recall: {mean_recall:.2f} ± {std_recall:.2f}")
print(f"F1 Score: {mean_f1:.2f} ± {std_f1:.2f}")


Média e Desvio Padrão das Métricas:
Acurácia: 0.94 ± 0.01
Precisão: 0.94 ± 0.02
Recall: 0.95 ± 0.02
F1 Score: 0.94 ± 0.01


O código a seguir obtém a média e o desvio padrão das métricas analisadas no estudo para os 10 datasets com erros gerados a partir do dataframe df nos 35 cenários analisados, uma vez que esse dataframe gerou os melhores resultados sem a inserção de erros.

In [None]:
import pickle

# Define a função para salvar e carregar o estado da lista results_metrics
def save_results(file_path, results):
    with open(file_path, 'wb') as f:
        pickle.dump(results, f)

def load_results(file_path):
    with open(file_path, 'rb') as f:
        return pickle.load(f)

# Carrega os resultados salvos, se existirem
results_file = "/content/drive/MyDrive/archive/results_metrics_rf.pkl"
try:
    results_metrics = load_results(results_file)
except FileNotFoundError:
    results_metrics = []

# Definição das listas de parâmetros
error_percentages = [0.05, 0.10, 0.15, 0.20, 0.30, 0.40, 0.5]
error_counts = [1, 2, 3, 4, 5]
random_states = [10, 23, 87, 41, 65, 12, 98, 34, 72, 19]

X = df.drop('Class', axis=1)
y = df['Class']

for error_percentage, error_count in product(error_percentages, error_counts):
    for scaler_name, scaler in scalers.items():

        existing_result = [result for result in results_metrics if result[0] == error_percentage and result[1] == error_count]
        if existing_result:
           continue

        # Listas para armazenar as métricas dos 10 datasets
        row_results_metrics = {'accuracy': [], 'precision': [], 'recall': [], 'f1_score': []}

        for random_state in random_states:
            X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, stratify=y, random_state=random_state)

            X_test_error = introduce_errors(X_test.copy(), error_percentage, error_count, random_state)

            # Modelo de Floresta Aleatória
            rf_model = RandomForestClassifier(random_state=10)

            param_grid = {
            'n_estimators': [30, 50, 75, 100, 150, 200],
            'max_depth': [10, 15, 20, 30],
            'criterion': ['gini', 'entropy', 'log_loss']
            }

            # Executar a pesquisa em grade para encontrar os melhores parâmetros
            grid_search = GridSearchCV(rf_model, param_grid, cv=5, scoring='accuracy')
            grid_search.fit(X_train, y_train)

            best_model = grid_search.best_estimator_

            # Fazendo previsões no conjunto de teste com erros
            y_pred_error = best_model.predict(X_test_error)

            # Calcule as métricas e armazene-as
            accuracy = accuracy_score(y_test, y_pred_error)
            precision = precision_score(y_test, y_pred_error)
            recall = recall_score(y_test, y_pred_error)
            f1 = f1_score(y_test, y_pred_error)

            row_results_metrics['accuracy'].append(accuracy)
            row_results_metrics['precision'].append(precision)
            row_results_metrics['recall'].append(recall)
            row_results_metrics['f1_score'].append(f1)

        mean_results_metrics = {metric: np.mean(values) for metric, values in row_results_metrics.items()}
        std_results_metrics = {metric: np.std(values) for metric, values in row_results_metrics.items()}
        results_metrics.append([error_percentage, error_count,
                                mean_results_metrics['accuracy'], std_results_metrics['accuracy'],
                                mean_results_metrics['precision'], std_results_metrics['precision'],
                                mean_results_metrics['recall'], std_results_metrics['recall'],
                                mean_results_metrics['f1_score'], std_results_metrics['f1_score']])

        save_results(results_file, results_metrics)

# Criar o DataFrame para os casos com erros
results_metrics_df = pd.DataFrame(results_metrics, columns=['Error Percentage', 'Error Count',
                                                              'Mean Accuracy', 'Std Accuracy',
                                                              'Mean Precision', 'Std Precision',
                                                              'Mean Recall', 'Std Recall',
                                                              'Mean F1 Score', 'Std F1 Score'])

# Combine as métricas médias e os desvios padrão em colunas únicas
for metric in ['Accuracy', 'Precision', 'Recall', 'F1 Score']:
    results_metrics_df[f'Mean {metric} & Std'] = results_metrics_df.apply(
        lambda row: f"{row[f'Mean {metric}']:.2f} ± {row[f'Std {metric}']:.2f}", axis=1
    )

# Remove as colunas de métricas médias e desvios padrão
results_metrics_df = results_metrics_df.drop(
    columns=[f'Mean {metric}' for metric in ['Accuracy', 'Precision', 'Recall', 'F1 Score']] + [f'Std {metric}' for metric in ['Accuracy', 'Precision', 'Recall', 'F1 Score']]
)

# Cria tabelas pivôs separadas para cada métrica
tables_metrics = {}
for metric in ['Accuracy', 'Precision', 'Recall', 'F1 Score']:
    table_metric = results_metrics_df.pivot(index='Error Percentage', columns='Error Count', values=[f'Mean {metric} & Std'])
    tables_metrics[metric] = table_metric

# Imprimir tabelas para cada métrica
print("\nTabelas para cada métrica (Acurácia, Precisão, Recall, F1-Score):")
for metric, table_metric in tables_metrics.items():
    print(f"\nTabela para {metric}:")
    display(table_metric)

path = "/content/drive/MyDrive/archive/"

results_metrics_df.to_csv(path + 'resultados_metrics_rf.csv', index=False)


Tabelas para cada métrica (Acurácia, Precisão, Recall, F1-Score):

Tabela para Accuracy:


Unnamed: 0_level_0,Mean Accuracy & Std,Mean Accuracy & Std,Mean Accuracy & Std,Mean Accuracy & Std,Mean Accuracy & Std
Error Count,1,2,3,4,5
Error Percentage,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
0.05,0.95 ± 0.01,0.95 ± 0.01,0.94 ± 0.01,0.94 ± 0.01,0.94 ± 0.01
0.1,0.96 ± 0.01,0.95 ± 0.01,0.94 ± 0.01,0.93 ± 0.01,0.93 ± 0.01
0.15,0.95 ± 0.01,0.94 ± 0.01,0.93 ± 0.01,0.92 ± 0.01,0.91 ± 0.02
0.2,0.95 ± 0.01,0.94 ± 0.01,0.93 ± 0.01,0.91 ± 0.01,0.90 ± 0.01
0.3,0.94 ± 0.01,0.92 ± 0.01,0.90 ± 0.02,0.87 ± 0.02,0.84 ± 0.02
0.4,0.94 ± 0.01,0.92 ± 0.01,0.89 ± 0.02,0.86 ± 0.01,0.82 ± 0.02
0.5,0.93 ± 0.01,0.90 ± 0.01,0.87 ± 0.01,0.83 ± 0.01,0.80 ± 0.01



Tabela para Precision:


Unnamed: 0_level_0,Mean Precision & Std,Mean Precision & Std,Mean Precision & Std,Mean Precision & Std,Mean Precision & Std
Error Count,1,2,3,4,5
Error Percentage,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
0.05,0.96 ± 0.00,0.96 ± 0.01,0.95 ± 0.01,0.95 ± 0.01,0.94 ± 0.01
0.1,0.96 ± 0.01,0.95 ± 0.01,0.94 ± 0.01,0.93 ± 0.01,0.92 ± 0.01
0.15,0.96 ± 0.01,0.95 ± 0.01,0.94 ± 0.01,0.92 ± 0.01,0.91 ± 0.02
0.2,0.96 ± 0.01,0.95 ± 0.01,0.93 ± 0.01,0.91 ± 0.01,0.90 ± 0.01
0.3,0.95 ± 0.01,0.94 ± 0.01,0.91 ± 0.01,0.88 ± 0.02,0.85 ± 0.02
0.4,0.95 ± 0.01,0.93 ± 0.01,0.90 ± 0.02,0.87 ± 0.01,0.83 ± 0.01
0.5,0.95 ± 0.01,0.93 ± 0.01,0.89 ± 0.02,0.84 ± 0.01,0.81 ± 0.01



Tabela para Recall:


Unnamed: 0_level_0,Mean Recall & Std,Mean Recall & Std,Mean Recall & Std,Mean Recall & Std,Mean Recall & Std
Error Count,1,2,3,4,5
Error Percentage,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
0.05,0.97 ± 0.01,0.97 ± 0.01,0.97 ± 0.01,0.97 ± 0.01,0.97 ± 0.01
0.1,0.98 ± 0.01,0.98 ± 0.01,0.98 ± 0.01,0.98 ± 0.02,0.97 ± 0.01
0.15,0.97 ± 0.01,0.97 ± 0.01,0.96 ± 0.01,0.96 ± 0.02,0.96 ± 0.01
0.2,0.98 ± 0.01,0.97 ± 0.01,0.97 ± 0.01,0.96 ± 0.01,0.96 ± 0.01
0.3,0.96 ± 0.02,0.95 ± 0.02,0.95 ± 0.02,0.94 ± 0.02,0.93 ± 0.02
0.4,0.96 ± 0.01,0.95 ± 0.01,0.95 ± 0.01,0.94 ± 0.01,0.93 ± 0.02
0.5,0.95 ± 0.01,0.94 ± 0.02,0.93 ± 0.02,0.93 ± 0.01,0.93 ± 0.02



Tabela para F1 Score:


Unnamed: 0_level_0,Mean F1 Score & Std,Mean F1 Score & Std,Mean F1 Score & Std,Mean F1 Score & Std,Mean F1 Score & Std
Error Count,1,2,3,4,5
Error Percentage,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
0.05,0.97 ± 0.00,0.96 ± 0.01,0.96 ± 0.01,0.96 ± 0.01,0.95 ± 0.00
0.1,0.97 ± 0.01,0.96 ± 0.01,0.96 ± 0.01,0.95 ± 0.01,0.95 ± 0.01
0.15,0.96 ± 0.01,0.96 ± 0.01,0.95 ± 0.01,0.94 ± 0.01,0.93 ± 0.01
0.2,0.97 ± 0.01,0.96 ± 0.01,0.95 ± 0.01,0.93 ± 0.01,0.93 ± 0.01
0.3,0.96 ± 0.01,0.94 ± 0.01,0.93 ± 0.01,0.91 ± 0.01,0.89 ± 0.01
0.4,0.96 ± 0.01,0.94 ± 0.01,0.92 ± 0.01,0.90 ± 0.01,0.88 ± 0.01
0.5,0.95 ± 0.01,0.93 ± 0.01,0.91 ± 0.01,0.88 ± 0.01,0.86 ± 0.01


## Naive Bayes

O código a seguir obtém a média e o desvio padrão das métricas analisadas no estudo para os 10 datasets gerados a partir do dataframe df.

In [None]:
X = df.drop('Class', axis=1)
y = df['Class']

random_states = [10, 23, 87, 41, 65, 12, 98, 34, 72, 19]

accuracy_list = []
precision_list = []
recall_list = []
f1_list = []

for random_state in random_states:

    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, stratify=y, random_state=random_state)

    # Modelo Naive Bayes
    nb_model = GaussianNB()

    # Não há parâmetros específicos para ajustar no Naive Bayes, então não há uma grade de parâmetros.

    # Treinar o modelo Naive Bayes
    nb_model.fit(X_train, y_train)

    # Fazendo previsões no conjunto de teste
    y_pred = nb_model.predict(X_test)

    # Calcule as métricas e armazene-as
    accuracy_list.append(accuracy_score(y_test, y_pred))
    precision_list.append(precision_score(y_test, y_pred))
    recall_list.append(recall_score(y_test, y_pred))
    f1_list.append(f1_score(y_test, y_pred))

# Calculando média e desvio padrão
mean_accuracy = np.mean(accuracy_list)
std_accuracy = np.std(accuracy_list)
mean_precision = np.mean(precision_list)
std_precision = np.std(precision_list)
mean_recall = np.mean(recall_list)
std_recall = np.std(recall_list)
mean_f1 = np.mean(f1_list)
std_f1 = np.std(f1_list)

# Exibir a média e o desvio padrão das métricas
print("\nMédia e Desvio Padrão das Métricas:")
print(f"Acurácia: {mean_accuracy:.2f} ± {std_accuracy:.2f}")
print(f"Precisão: {mean_precision:.2f} ± {std_precision:.2f}")
print(f"Recall: {mean_recall:.2f} ± {std_recall:.2f}")
print(f"F1 Score: {mean_f1:.2f} ± {std_f1:.2f}")


Média e Desvio Padrão das Métricas:
Acurácia: 0.91 ± 0.02
Precisão: 0.93 ± 0.03
Recall: 0.88 ± 0.04
F1 Score: 0.91 ± 0.03


O código a seguir obtém a média e o desvio padrão das métricas analisadas no estudo para os 10 datasets gerados a partir do dataframe df2.

In [None]:
# Dividir os dados em recursos (X) e rótulos (y)
X = df2.drop('Class', axis=1)
y = df2['Class']

random_states = [10, 23, 87, 41, 65, 12, 98, 34, 72, 19]

accuracy_list = []
precision_list = []
recall_list = []
f1_list = []

for random_state in random_states:

    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, stratify=y, random_state=random_state)

    # Modelo Naive Bayes
    nb_model = GaussianNB()

    # Não há parâmetros específicos para ajustar no Naive Bayes, então não há uma grade de parâmetros.

    # Treinar o modelo Naive Bayes
    nb_model.fit(X_train, y_train)

    # Fazendo previsões no conjunto de teste
    y_pred = nb_model.predict(X_test)

    # Calcule as métricas e armazene-as
    accuracy_list.append(accuracy_score(y_test, y_pred))
    precision_list.append(precision_score(y_test, y_pred))
    recall_list.append(recall_score(y_test, y_pred))
    f1_list.append(f1_score(y_test, y_pred))

# Calculando média e desvio padrão
mean_accuracy = np.mean(accuracy_list)
std_accuracy = np.std(accuracy_list)
mean_precision = np.mean(precision_list)
std_precision = np.std(precision_list)
mean_recall = np.mean(recall_list)
std_recall = np.std(recall_list)
mean_f1 = np.mean(f1_list)
std_f1 = np.std(f1_list)

# Exibir a média e o desvio padrão das métricas
print("\nMédia e Desvio Padrão das Métricas:")
print(f"Acurácia: {mean_accuracy:.2f} ± {std_accuracy:.2f}")
print(f"Precisão: {mean_precision:.2f} ± {std_precision:.2f}")
print(f"Recall: {mean_recall:.2f} ± {std_recall:.2f}")
print(f"F1 Score: {mean_f1:.2f} ± {std_f1:.2f}")


Média e Desvio Padrão das Métricas:
Acurácia: 0.80 ± 0.10
Precisão: 0.89 ± 0.03
Recall: 0.68 ± 0.22
F1 Score: 0.75 ± 0.19


O código a seguir obtém a média e o desvio padrão das métricas analisadas no estudo para os 10 datasets com erros gerados a partir do dataframe df nos 35 cenários analisados, uma vez que esse dataframe gerou os melhores resultados sem a inserção de erros.

In [None]:
# Lista de resultados das métricas
results_metrics = []

# Definição das listas de parâmetros
error_percentages = [0.05, 0.10, 0.15, 0.20, 0.30, 0.40, 0.5]
error_counts = [1, 2, 3, 4, 5]
random_states = [10, 23, 87, 41, 65, 12, 98, 34, 72, 19]

X = df.drop('Class', axis=1)
y = df['Class']

for error_percentage, error_count in product(error_percentages, error_counts):

    # Listas para armazenar as métricas dos 10 datasets
    row_results_metrics = {'accuracy': [], 'precision': [], 'recall': [], 'f1_score': []}

    for random_state in random_states:
        X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, stratify=y, random_state=random_state)

        X_test_error = introduce_errors(X_test.copy(), error_percentage, error_count, random_state)

        # Modelo Naive Bayes
        nb_model = GaussianNB()

        # Treinar o modelo Naive Bayes
        nb_model.fit(X_train, y_train)

        # Fazendo previsões no conjunto de teste com erros
        y_pred_error = nb_model.predict(X_test_error)

        # Calcule as métricas e armazene-as
        accuracy = accuracy_score(y_test, y_pred_error)
        precision = precision_score(y_test, y_pred_error)
        recall = recall_score(y_test, y_pred_error)
        f1 = f1_score(y_test, y_pred_error)

        row_results_metrics['accuracy'].append(accuracy)
        row_results_metrics['precision'].append(precision)
        row_results_metrics['recall'].append(recall)
        row_results_metrics['f1_score'].append(f1)

    mean_results_metrics = {metric: np.mean(values) for metric, values in row_results_metrics.items()}
    std_results_metrics = {metric: np.std(values) for metric, values in row_results_metrics.items()}
    results_metrics.append([error_percentage, error_count,
                            mean_results_metrics['accuracy'], std_results_metrics['accuracy'],
                            mean_results_metrics['precision'], std_results_metrics['precision'],
                            mean_results_metrics['recall'], std_results_metrics['recall'],
                            mean_results_metrics['f1_score'], std_results_metrics['f1_score']])

# Criar o DataFrame para os casos com erros
results_metrics_df = pd.DataFrame(results_metrics, columns=['Error Percentage', 'Error Count',
                                                              'Mean Accuracy', 'Std Accuracy',
                                                              'Mean Precision', 'Std Precision',
                                                              'Mean Recall', 'Std Recall',
                                                              'Mean F1 Score', 'Std F1 Score'])

# Combine as métricas médias e os desvios padrão em colunas únicas
for metric in ['Accuracy', 'Precision', 'Recall', 'F1 Score']:
    results_metrics_df[f'Mean {metric} & Std'] = results_metrics_df.apply(
        lambda row: f"{row[f'Mean {metric}']:.2f} ± {row[f'Std {metric}']:.2f}", axis=1
    )

# Remove as colunas de métricas médias e desvios padrão
results_metrics_df = results_metrics_df.drop(
    columns=[f'Mean {metric}' for metric in ['Accuracy', 'Precision', 'Recall', 'F1 Score']] + [f'Std {metric}' for metric in ['Accuracy', 'Precision', 'Recall', 'F1 Score']]
)

# Cria tabelas pivôs separadas para cada métrica
tables_metrics = {}
for metric in ['Accuracy', 'Precision', 'Recall', 'F1 Score']:
    table_metric = results_metrics_df.pivot(index='Error Percentage', columns='Error Count', values=[f'Mean {metric} & Std'])
    tables_metrics[metric] = table_metric

# Imprimir tabelas para cada métrica
print("\nTabelas para cada métrica (Acurácia, Precisão, Recall, F1-Score):")
for metric, table_metric in tables_metrics.items():
    print(f"\nTabela para {metric}:")
    display(table_metric)

path = "/content/drive/MyDrive/archive/"
-
results_metrics_df.to_csv(path + 'resultados_metrics_nb.csv', index=False)


Tabelas para cada métrica (Acurácia, Precisão, Recall, F1-Score):

Tabela para Accuracy:


Unnamed: 0_level_0,Mean Accuracy & Std,Mean Accuracy & Std,Mean Accuracy & Std,Mean Accuracy & Std,Mean Accuracy & Std
Error Count,1,2,3,4,5
Error Percentage,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
0.05,0.93 ± 0.03,0.92 ± 0.03,0.91 ± 0.03,0.91 ± 0.03,0.90 ± 0.03
0.1,0.92 ± 0.02,0.92 ± 0.02,0.91 ± 0.02,0.91 ± 0.02,0.90 ± 0.02
0.15,0.89 ± 0.02,0.89 ± 0.02,0.88 ± 0.02,0.87 ± 0.03,0.86 ± 0.02
0.2,0.89 ± 0.02,0.87 ± 0.02,0.86 ± 0.03,0.86 ± 0.01,0.83 ± 0.03
0.3,0.91 ± 0.03,0.88 ± 0.02,0.85 ± 0.02,0.82 ± 0.01,0.78 ± 0.02
0.4,0.89 ± 0.03,0.87 ± 0.03,0.82 ± 0.02,0.78 ± 0.03,0.73 ± 0.02
0.5,0.89 ± 0.02,0.87 ± 0.02,0.81 ± 0.03,0.75 ± 0.02,0.69 ± 0.02



Tabela para Precision:


Unnamed: 0_level_0,Mean Precision & Std,Mean Precision & Std,Mean Precision & Std,Mean Precision & Std,Mean Precision & Std
Error Count,1,2,3,4,5
Error Percentage,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
0.05,0.94 ± 0.04,0.93 ± 0.04,0.93 ± 0.04,0.93 ± 0.04,0.92 ± 0.04
0.1,0.94 ± 0.03,0.93 ± 0.03,0.92 ± 0.03,0.92 ± 0.03,0.92 ± 0.04
0.15,0.92 ± 0.02,0.92 ± 0.02,0.92 ± 0.02,0.90 ± 0.03,0.89 ± 0.02
0.2,0.91 ± 0.03,0.89 ± 0.03,0.88 ± 0.03,0.87 ± 0.02,0.84 ± 0.05
0.3,0.93 ± 0.03,0.90 ± 0.03,0.87 ± 0.03,0.84 ± 0.02,0.80 ± 0.03
0.4,0.90 ± 0.02,0.88 ± 0.03,0.83 ± 0.02,0.80 ± 0.02,0.74 ± 0.03
0.5,0.91 ± 0.02,0.88 ± 0.03,0.82 ± 0.04,0.77 ± 0.05,0.71 ± 0.03



Tabela para Recall:


Unnamed: 0_level_0,Mean Recall & Std,Mean Recall & Std,Mean Recall & Std,Mean Recall & Std,Mean Recall & Std
Error Count,1,2,3,4,5
Error Percentage,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
0.05,0.92 ± 0.02,0.91 ± 0.03,0.90 ± 0.02,0.89 ± 0.02,0.88 ± 0.03
0.1,0.90 ± 0.03,0.90 ± 0.02,0.90 ± 0.02,0.89 ± 0.03,0.89 ± 0.03
0.15,0.86 ± 0.04,0.86 ± 0.03,0.85 ± 0.02,0.84 ± 0.03,0.83 ± 0.03
0.2,0.87 ± 0.04,0.86 ± 0.03,0.84 ± 0.03,0.84 ± 0.03,0.82 ± 0.02
0.3,0.89 ± 0.03,0.86 ± 0.03,0.83 ± 0.04,0.79 ± 0.03,0.76 ± 0.03
0.4,0.87 ± 0.03,0.86 ± 0.03,0.81 ± 0.03,0.76 ± 0.04,0.70 ± 0.03
0.5,0.86 ± 0.05,0.86 ± 0.04,0.79 ± 0.06,0.71 ± 0.06,0.65 ± 0.04



Tabela para F1 Score:


Unnamed: 0_level_0,Mean F1 Score & Std,Mean F1 Score & Std,Mean F1 Score & Std,Mean F1 Score & Std,Mean F1 Score & Std
Error Count,1,2,3,4,5
Error Percentage,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
0.05,0.93 ± 0.02,0.92 ± 0.03,0.91 ± 0.03,0.91 ± 0.02,0.90 ± 0.03
0.1,0.92 ± 0.02,0.92 ± 0.02,0.91 ± 0.02,0.91 ± 0.02,0.90 ± 0.02
0.15,0.89 ± 0.03,0.89 ± 0.02,0.88 ± 0.02,0.87 ± 0.03,0.86 ± 0.02
0.2,0.89 ± 0.02,0.87 ± 0.02,0.86 ± 0.03,0.86 ± 0.02,0.83 ± 0.03
0.3,0.91 ± 0.03,0.88 ± 0.02,0.85 ± 0.03,0.82 ± 0.01,0.78 ± 0.02
0.4,0.89 ± 0.03,0.87 ± 0.03,0.82 ± 0.02,0.78 ± 0.03,0.72 ± 0.02
0.5,0.89 ± 0.03,0.87 ± 0.03,0.80 ± 0.03,0.74 ± 0.03,0.68 ± 0.03


##MLP

O código a seguir obtém a média e o desvio padrão das métricas analisadas no estudo para os 10 datasets gerados a partir do dataframe df, testando diferentes tipos de escalonamento.

In [None]:
from warnings import simplefilter

# Ignorar todos os avisos
simplefilter(action='ignore')

X = df.drop('Class', axis=1)
y = df['Class']

scalers = {
    "MinMaxScaler": MinMaxScaler(),
    "StandardScaler": StandardScaler()
}

random_states = [10, 23, 87, 41, 65, 12, 98, 34, 72, 19]

# Dicionário para armazenar as métricas para cada técnica de escalonamento
metrics_results = {scaler_name: {'Acurácia': [], 'Precisão': [], 'Recall': [], 'F1 Score': []} for scaler_name in scalers.keys()}

for random_state in random_states:
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, stratify=y, random_state=random_state)

    for scaler_name, scaler in scalers.items():

        X_train_scaled = scaler.fit_transform(X_train)
        X_test_scaled = scaler.transform(X_test)

        # Modelo MLP
        mlp_model = MLPClassifier(random_state=10)

        param_grid = {
            'hidden_layer_sizes': [(16,), (16, 16), (16, 32), (16, 32, 48)],
            'solver': ['adam', 'sgd', 'lbfgs'],
            'activation': ['identity', 'logistic', 'tanh', 'relu'],
        }

        # Executar a pesquisa em grade para encontrar os melhores parâmetros
        grid_search = GridSearchCV(mlp_model, param_grid, cv=5, scoring='accuracy')
        grid_search.fit(X_train_scaled, y_train)

        # Obter o melhor modelo a partir da pesquisa em grade
        best_model = grid_search.best_estimator_

        # Fazndo previsões no conjunto de teste
        y_pred = best_model.predict(X_test_scaled)

        # Calcule as métricas e armazene-as
        accuracy = accuracy_score(y_test, y_pred)
        precision = precision_score(y_test, y_pred)
        recall = recall_score(y_test, y_pred)
        f1 = f1_score(y_test, y_pred)

        # Adicionar métricas ao dicionário de resultados
        metrics_results[scaler_name]['Acurácia'].append(accuracy)
        metrics_results[scaler_name]['Precisão'].append(precision)
        metrics_results[scaler_name]['Recall'].append(recall)
        metrics_results[scaler_name]['F1 Score'].append(f1)

# Exibir a média e desvio padrão das métricas para cada técnica de escalonamento
print("\nMédia e desvio padrão das métricas para cada técnica de escalonamento:")
for scaler_name, metrics_dict in metrics_results.items():
    print(f"\nEscalonamento: {scaler_name}")
    for metric, values in metrics_dict.items():
        mean = np.mean(values)
        std_dev = np.std(values)
        print(f"{metric}: {mean:.2f} ± {std_dev:.2f}")


Média e desvio padrão das métricas para cada técnica de escalonamento:

Escalonamento: MinMaxScaler
Acurácia: 1.00 ± 0.00
Precisão: 1.00 ± 0.00
Recall: 1.00 ± 0.00
F1 Score: 1.00 ± 0.00

Escalonamento: StandardScaler
Acurácia: 1.00 ± 0.00
Precisão: 1.00 ± 0.00
Recall: 1.00 ± 0.00
F1 Score: 1.00 ± 0.00


O código a seguir obtém a média e o desvio padrão das métricas analisadas no estudo para os 10 datasets gerados a partir do dataframe df2, testando diferentes tipos de escalonamento.

In [None]:
from warnings import simplefilter

# Ignorar todos os avisos
simplefilter(action='ignore')

# Dividir os dados em recursos (X) e rótulos (y)
X = df2.drop('Class', axis=1)
y = df2['Class']

# Dicionário de escalonadores
scalers = {
    "MinMaxScaler": MinMaxScaler(),
    "StandardScaler": StandardScaler()
}

random_states = [10, 23, 87, 41, 65, 12, 98, 34, 72, 19]

# Dicionário para armazenar as métricas para cada técnica de escalonamento
metrics_results = {scaler_name: {'Acurácia': [], 'Precisão': [], 'Recall': [], 'F1 Score': []} for scaler_name in scalers.keys()}

for random_state in random_states:
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, stratify=y, random_state=random_state)

    for scaler_name, scaler in scalers.items():
        X_train_scaled = scaler.fit_transform(X_train)
        X_test_scaled = scaler.transform(X_test)

        # Modelo MLP
        mlp_model = MLPClassifier(random_state=10)

        param_grid = {
            'hidden_layer_sizes': [(24,), (24, 24), (24, 48), (24, 48, 72)],
            'solver': ['adam', 'sgd', 'lbfgs'],
            'activation': ['identity', 'logistic', 'tanh', 'relu'],
        }

        # Executar a pesquisa em grade para encontrar os melhores parâmetros
        grid_search = GridSearchCV(mlp_model, param_grid, cv=5, scoring='accuracy')
        grid_search.fit(X_train_scaled, y_train)

        # Obter o melhor modelo a partir da pesquisa em grade
        best_model = grid_search.best_estimator_

        # Fazendo previsões no conjunto de teste
        y_pred = best_model.predict(X_test_scaled)

        # Calcule as métricas e armazene-as
        accuracy = accuracy_score(y_test, y_pred)
        precision = precision_score(y_test, y_pred)
        recall = recall_score(y_test, y_pred)
        f1 = f1_score(y_test, y_pred)

        # Adicionar métricas ao dicionário de resultados
        metrics_results[scaler_name]['Acurácia'].append(accuracy)
        metrics_results[scaler_name]['Precisão'].append(precision)
        metrics_results[scaler_name]['Recall'].append(recall)
        metrics_results[scaler_name]['F1 Score'].append(f1)

# Exibir a média e desvio padrão das métricas para cada técnica de escalonamento
print("\nMédia e desvio padrão das métricas para cada técnica de escalonamento:")
for scaler_name, metrics_dict in metrics_results.items():
    print(f"\nEscalonamento: {scaler_name}")
    for metric, values in metrics_dict.items():
        mean = np.mean(values)
        std_dev = np.std(values)
        print(f"{metric}: {mean:.2f} ± {std_dev:.2f}")


Média e desvio padrão das métricas para cada técnica de escalonamento:

Escalonamento: MinMaxScaler
Acurácia: 1.00 ± 0.00
Precisão: 1.00 ± 0.01
Recall: 1.00 ± 0.00
F1 Score: 1.00 ± 0.00

Escalonamento: StandardScaler
Acurácia: 1.00 ± 0.00
Precisão: 1.00 ± 0.01
Recall: 1.00 ± 0.00
F1 Score: 1.00 ± 0.00


O código a seguir obtém a média e o desvio padrão das métricas analisadas no estudo para os 10 datasets com erros gerados a partir do dataframe df nos 35 cenários analisados, uma vez que esse dataframe gerou os melhores resultados sem a inserção de erros. Testamos os dois escalonadores, pois ambos obtiveram resultados perfeitos anteriormente.

In [None]:
import pickle
from warnings import simplefilter

# ignore all warnings
simplefilter(action='ignore')

# Define a função para salvar e carregar o estado da lista results_metrics
def save_results(file_path, results):
    with open(file_path, 'wb') as f:
        pickle.dump(results, f)

def load_results(file_path):
    with open(file_path, 'rb') as f:
        return pickle.load(f)

# Carrega os resultados salvos, se existirem
results_file = "/content/drive/MyDrive/archive/results_metrics_mlp.pkl"
try:
    results_metrics = load_results(results_file)
except FileNotFoundError:
    results_metrics = []

# Definição das listas de parâmetros
error_percentages = [0.05, 0.10, 0.15, 0.20, 0.30, 0.40, 0.5]
error_counts = [1, 2, 3, 4, 5]
random_states = [10, 23, 87, 41, 65, 12, 98, 34, 72, 19]

scalers = {
    "MinMaxScaler": MinMaxScaler(),
    "StandardScaler": StandardScaler()
}

X = df.drop('Class', axis=1)
y = df['Class']

display(results_metrics)

for error_percentage, error_count in product(error_percentages, error_counts):
    for scaler_name, scaler in scalers.items():

        existing_result = [result for result in results_metrics if result[0] == error_percentage and result[1] == error_count and result[2] == scaler_name]
        if existing_result:
            continue

        # Listas para armazenar as métricas dos 10 datasets
        row_results_metrics = {'accuracy': [], 'precision': [], 'recall': [], 'f1_score': []}

        for random_state in random_states:
            X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, stratify=y, random_state=random_state)

            X_test_error = introduce_errors(X_test.copy(), error_percentage, error_count, random_state)

            X_train_scaled = scaler.fit_transform(X_train)
            X_test_error_scaled = scaler.transform(X_test_error)

            # Modelo MLP
            mlp_model = MLPClassifier(random_state=10)

            param_grid = {
                'hidden_layer_sizes': [(16,), (16, 16), (16, 32), (16, 32, 48)],
                'solver': ['adam', 'sgd', 'lbfgs'],
                'activation': ['identity', 'logistic', 'tanh', 'relu'],
            }

            # Executar a pesquisa em grade para encontrar os melhores parâmetros
            grid_search = GridSearchCV(mlp_model, param_grid, cv=5, scoring='accuracy')
            grid_search.fit(X_train_scaled, y_train)

            best_model = grid_search.best_estimator_

            # Fazendo previsões no conjunto de teste
            y_pred_error = best_model.predict(X_test_error_scaled)

            # Calcule as métricas e armazene-as
            accuracy = accuracy_score(y_test, y_pred_error)
            precision = precision_score(y_test, y_pred_error)
            recall = recall_score(y_test, y_pred_error)
            f1 = f1_score(y_test, y_pred_error)

            row_results_metrics['accuracy'].append(accuracy)
            row_results_metrics['precision'].append(precision)
            row_results_metrics['recall'].append(recall)
            row_results_metrics['f1_score'].append(f1)

        mean_results_metrics = {metric: np.mean(values) for metric, values in row_results_metrics.items()}
        std_results_metrics = {metric: np.std(values) for metric, values in row_results_metrics.items()}
        results_metrics.append([error_percentage, error_count, scaler_name,
                                mean_results_metrics['accuracy'], std_results_metrics['accuracy'],
                                mean_results_metrics['precision'], std_results_metrics['precision'],
                                mean_results_metrics['recall'], std_results_metrics['recall'],
                                mean_results_metrics['f1_score'], std_results_metrics['f1_score']])

    # Salva os resultados após cada iteração
    save_results(results_file, results_metrics)

# Criar o DataFrame para os casos com erros
results_metrics_df = pd.DataFrame(results_metrics, columns=['Error Percentage', 'Error Count', 'Scaler',
                                                             'Mean Accuracy', 'Std Accuracy',
                                                             'Mean Precision', 'Std Precision',
                                                             'Mean Recall', 'Std Recall',
                                                             'Mean F1 Score', 'Std F1 Score'])

# Combine as métricas médias e os desvios padrão em colunas únicas
for metric in ['Accuracy', 'Precision', 'Recall', 'F1 Score']:
    results_metrics_df[f'Mean {metric} & Std'] = results_metrics_df.apply(
        lambda row: f"{row[f'Mean {metric}']:.2f} ± {row[f'Std {metric}']:.2f}", axis=1
    )

# Remove as colunas de métricas médias e desvios padrão
results_metrics_df = results_metrics_df.drop(
    columns=[f'Mean {metric}' for metric in ['Accuracy', 'Precision', 'Recall', 'F1 Score']] + [f'Std {metric}' for metric in ['Accuracy', 'Precision', 'Recall', 'F1 Score']]
)

# Cria tabelas pivôs separadas para cada métrica e tipo de escalonamento
tables_metrics = {}
for metric in ['Accuracy', 'Precision', 'Recall', 'F1 Score']:
    for scaler_name in scalers.keys():
        table_metric = results_metrics_df[(results_metrics_df['Scaler'] == scaler_name)].pivot(index='Error Percentage', columns='Error Count', values=[f'Mean {metric} & Std'])
        tables_metrics[(metric, scaler_name)] = table_metric

# Imprimir tabelas para cada métrica e tipo de escalonamento
print("\nTabelas para cada métrica (Acurácia, Precisão, Recall, F1-Score) e tipo de escalonamento:")
for (metric, scaler_name), table_metric in tables_metrics.items():
    print(f"\nTabela para {metric} com {scaler_name}:")
    display(table_metric)

path = "/content/drive/MyDrive/archive/"

results_metrics_df.to_csv(path + 'resultados_metrics_mlp.csv', index=False)


Tabelas para cada métrica (Acurácia, Precisão, Recall, F1-Score) e tipo de escalonamento:

Tabela para Accuracy com StandardScaler:


Unnamed: 0_level_0,Mean Accuracy & Std,Mean Accuracy & Std,Mean Accuracy & Std,Mean Accuracy & Std,Mean Accuracy & Std
Error Count,1,2,3,4,5
Error Percentage,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
0.05,0.99 ± 0.01,0.98 ± 0.01,0.98 ± 0.01,0.97 ± 0.01,0.97 ± 0.01
0.1,0.98 ± 0.01,0.97 ± 0.01,0.96 ± 0.02,0.95 ± 0.01,0.94 ± 0.01
0.15,0.97 ± 0.01,0.96 ± 0.01,0.93 ± 0.02,0.92 ± 0.02,0.90 ± 0.01
0.2,0.94 ± 0.02,0.93 ± 0.02,0.89 ± 0.02,0.88 ± 0.02,0.86 ± 0.02
0.3,0.95 ± 0.02,0.91 ± 0.02,0.86 ± 0.02,0.82 ± 0.02,0.79 ± 0.01
0.4,0.93 ± 0.01,0.88 ± 0.02,0.82 ± 0.03,0.77 ± 0.02,0.73 ± 0.03
0.5,0.93 ± 0.02,0.88 ± 0.03,0.81 ± 0.03,0.75 ± 0.02,0.70 ± 0.02



Tabela para Accuracy com MinMaxScaler:


Unnamed: 0_level_0,Mean Accuracy & Std,Mean Accuracy & Std,Mean Accuracy & Std,Mean Accuracy & Std,Mean Accuracy & Std
Error Count,1,2,3,4,5
Error Percentage,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
0.05,0.98 ± 0.01,0.97 ± 0.01,0.97 ± 0.02,0.96 ± 0.01,0.95 ± 0.02
0.1,0.98 ± 0.01,0.97 ± 0.01,0.95 ± 0.02,0.93 ± 0.02,0.92 ± 0.02
0.15,0.97 ± 0.02,0.93 ± 0.02,0.92 ± 0.02,0.89 ± 0.02,0.87 ± 0.02
0.2,0.94 ± 0.02,0.91 ± 0.02,0.88 ± 0.02,0.86 ± 0.02,0.83 ± 0.03
0.3,0.93 ± 0.02,0.90 ± 0.02,0.85 ± 0.03,0.82 ± 0.02,0.78 ± 0.02
0.4,0.92 ± 0.02,0.88 ± 0.02,0.83 ± 0.02,0.77 ± 0.02,0.72 ± 0.03
0.5,0.92 ± 0.02,0.87 ± 0.03,0.81 ± 0.03,0.74 ± 0.02,0.67 ± 0.03



Tabela para Precision com StandardScaler:


Unnamed: 0_level_0,Mean Precision & Std,Mean Precision & Std,Mean Precision & Std,Mean Precision & Std,Mean Precision & Std
Error Count,1,2,3,4,5
Error Percentage,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
0.05,0.99 ± 0.02,0.99 ± 0.02,0.98 ± 0.02,0.98 ± 0.02,0.98 ± 0.02
0.1,0.99 ± 0.01,0.98 ± 0.01,0.98 ± 0.02,0.96 ± 0.03,0.96 ± 0.02
0.15,0.98 ± 0.02,0.98 ± 0.02,0.96 ± 0.03,0.95 ± 0.02,0.93 ± 0.01
0.2,0.96 ± 0.02,0.95 ± 0.04,0.91 ± 0.04,0.90 ± 0.04,0.88 ± 0.04
0.3,0.96 ± 0.04,0.92 ± 0.03,0.88 ± 0.04,0.84 ± 0.04,0.81 ± 0.03
0.4,0.93 ± 0.01,0.88 ± 0.03,0.83 ± 0.04,0.79 ± 0.03,0.77 ± 0.05
0.5,0.94 ± 0.03,0.88 ± 0.03,0.81 ± 0.04,0.77 ± 0.03,0.72 ± 0.04



Tabela para Precision com MinMaxScaler:


Unnamed: 0_level_0,Mean Precision & Std,Mean Precision & Std,Mean Precision & Std,Mean Precision & Std,Mean Precision & Std
Error Count,1,2,3,4,5
Error Percentage,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
0.05,0.98 ± 0.01,0.97 ± 0.02,0.97 ± 0.03,0.96 ± 0.03,0.96 ± 0.03
0.1,0.99 ± 0.01,0.97 ± 0.02,0.96 ± 0.03,0.93 ± 0.03,0.91 ± 0.03
0.15,0.97 ± 0.02,0.93 ± 0.03,0.92 ± 0.03,0.89 ± 0.02,0.87 ± 0.02
0.2,0.96 ± 0.02,0.92 ± 0.04,0.88 ± 0.04,0.86 ± 0.03,0.84 ± 0.05
0.3,0.94 ± 0.04,0.90 ± 0.03,0.85 ± 0.03,0.82 ± 0.03,0.78 ± 0.04
0.4,0.91 ± 0.03,0.87 ± 0.03,0.83 ± 0.03,0.77 ± 0.02,0.73 ± 0.04
0.5,0.94 ± 0.03,0.87 ± 0.03,0.82 ± 0.04,0.76 ± 0.05,0.67 ± 0.04



Tabela para Recall com StandardScaler:


Unnamed: 0_level_0,Mean Recall & Std,Mean Recall & Std,Mean Recall & Std,Mean Recall & Std,Mean Recall & Std
Error Count,1,2,3,4,5
Error Percentage,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
0.05,0.99 ± 0.01,0.98 ± 0.01,0.97 ± 0.01,0.96 ± 0.01,0.95 ± 0.01
0.1,0.97 ± 0.01,0.97 ± 0.01,0.94 ± 0.02,0.94 ± 0.01,0.92 ± 0.01
0.15,0.96 ± 0.01,0.94 ± 0.02,0.91 ± 0.02,0.89 ± 0.02,0.87 ± 0.02
0.2,0.93 ± 0.03,0.91 ± 0.01,0.88 ± 0.02,0.86 ± 0.02,0.83 ± 0.01
0.3,0.94 ± 0.02,0.90 ± 0.04,0.86 ± 0.03,0.81 ± 0.03,0.77 ± 0.03
0.4,0.94 ± 0.02,0.89 ± 0.03,0.82 ± 0.04,0.76 ± 0.03,0.69 ± 0.03
0.5,0.92 ± 0.01,0.88 ± 0.03,0.81 ± 0.04,0.72 ± 0.04,0.67 ± 0.04



Tabela para Recall com MinMaxScaler:


Unnamed: 0_level_0,Mean Recall & Std,Mean Recall & Std,Mean Recall & Std,Mean Recall & Std,Mean Recall & Std
Error Count,1,2,3,4,5
Error Percentage,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
0.05,0.98 ± 0.01,0.98 ± 0.01,0.97 ± 0.02,0.96 ± 0.01,0.94 ± 0.02
0.1,0.97 ± 0.01,0.97 ± 0.01,0.95 ± 0.02,0.93 ± 0.01,0.92 ± 0.01
0.15,0.97 ± 0.02,0.94 ± 0.02,0.92 ± 0.03,0.89 ± 0.02,0.88 ± 0.02
0.2,0.93 ± 0.03,0.90 ± 0.02,0.88 ± 0.03,0.87 ± 0.02,0.82 ± 0.02
0.3,0.93 ± 0.02,0.90 ± 0.04,0.86 ± 0.04,0.82 ± 0.03,0.79 ± 0.03
0.4,0.94 ± 0.02,0.90 ± 0.02,0.84 ± 0.03,0.77 ± 0.03,0.71 ± 0.04
0.5,0.91 ± 0.01,0.88 ± 0.04,0.80 ± 0.04,0.74 ± 0.05,0.68 ± 0.05



Tabela para F1 Score com StandardScaler:


Unnamed: 0_level_0,Mean F1 Score & Std,Mean F1 Score & Std,Mean F1 Score & Std,Mean F1 Score & Std,Mean F1 Score & Std
Error Count,1,2,3,4,5
Error Percentage,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
0.05,0.99 ± 0.01,0.98 ± 0.01,0.98 ± 0.01,0.97 ± 0.01,0.97 ± 0.01
0.1,0.98 ± 0.01,0.97 ± 0.01,0.96 ± 0.02,0.95 ± 0.01,0.94 ± 0.01
0.15,0.97 ± 0.01,0.96 ± 0.01,0.93 ± 0.02,0.92 ± 0.02,0.90 ± 0.01
0.2,0.94 ± 0.02,0.93 ± 0.02,0.89 ± 0.02,0.88 ± 0.01,0.85 ± 0.02
0.3,0.95 ± 0.02,0.91 ± 0.02,0.86 ± 0.02,0.82 ± 0.02,0.79 ± 0.01
0.4,0.93 ± 0.01,0.89 ± 0.02,0.82 ± 0.03,0.77 ± 0.02,0.72 ± 0.03
0.5,0.93 ± 0.02,0.88 ± 0.03,0.81 ± 0.02,0.74 ± 0.02,0.69 ± 0.01



Tabela para F1 Score com MinMaxScaler:


Unnamed: 0_level_0,Mean F1 Score & Std,Mean F1 Score & Std,Mean F1 Score & Std,Mean F1 Score & Std,Mean F1 Score & Std
Error Count,1,2,3,4,5
Error Percentage,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
0.05,0.98 ± 0.01,0.97 ± 0.01,0.97 ± 0.02,0.96 ± 0.01,0.95 ± 0.02
0.1,0.98 ± 0.01,0.97 ± 0.01,0.95 ± 0.02,0.93 ± 0.02,0.92 ± 0.02
0.15,0.97 ± 0.02,0.93 ± 0.02,0.92 ± 0.02,0.89 ± 0.02,0.88 ± 0.02
0.2,0.94 ± 0.02,0.91 ± 0.02,0.88 ± 0.02,0.86 ± 0.01,0.83 ± 0.03
0.3,0.93 ± 0.02,0.90 ± 0.03,0.85 ± 0.03,0.82 ± 0.02,0.78 ± 0.02
0.4,0.92 ± 0.02,0.89 ± 0.02,0.83 ± 0.01,0.77 ± 0.02,0.72 ± 0.03
0.5,0.92 ± 0.02,0.87 ± 0.03,0.81 ± 0.02,0.74 ± 0.02,0.67 ± 0.02
