<a href="https://colab.research.google.com/github/aureavaleria/DataBalancing-Research/blob/main/papers/Artigo%201/V5/CLinsmote/Vers%C3%A3o_5_(CLinsmote_vers%C3%A3o_5_catboost).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### ***Machine learning for predicting liver and/or lung metastasis in colorectal cancer: A retrospective study based on the SEER database***

Este estudo prop√µe um modelo de aprendizado de m√°quina para prever o risco de met√°stase hep√°tica e/ou pulmonar em pacientes com c√¢ncer colorretal (CRC). A partir da base de dados SEER, foram extra√≠dos dados aproximadamente 53 mil pacientes com diagn√≥stico patol√≥gico de CRC entre 2010 e 2015, desenvolvendo sete modelos de algoritmos(Decision tree, Randon Forest, Naive Bayes,  KNN,XGBoost, Gradient Boosting.

### Parte 1:  Importa√ß√£o das Bibliotecas e Carregamento do Dataset

Nesta etapa, importamos as bibliotecas necess√°rias para an√°lise e carregamos o dataset. Realizamos uma verifica√ß√£o inicial para identificar e remover valores faltantes e definimos as vari√°veis preditoras (X) e as vari√°veis alvo (y), preparando os dados para o pr√©-processamento e a modelagem.

In [None]:
from imblearn.over_sampling import SMOTE, BorderlineSMOTE, ADASYN
from sklearn.model_selection import StratifiedKFold
from sklearn.metrics import f1_score, roc_auc_score, confusion_matrix, average_precision_score
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.svm import SVC
from sklearn.naive_bayes import GaussianNB
from sklearn.neighbors import KNeighborsClassifier
from xgboost import XGBClassifier
import numpy as np
import pandas as pd


# Carregar o dataset
df = pd.read_csv('https://raw.githubusercontent.com/aureavaleria/dataset/refs/heads/main/export.csv')
df.dropna(inplace=True)

# Definir as vari√°veis preditoras e a vari√°vel alvo
X = df[['Age recode with <1 year olds', 'Sex', 'Race recode (White, Black, Other)',
        'Histologic Type ICD-O-3', 'Grade Recode (thru 2017)', 'Primary Site',
        'Derived AJCC T, 7th ed (2010-2015)', 'Derived AJCC N, 7th ed (2010-2015)',
        'CS tumor size (2004-2015)', 'CEA Pretreatment Interpretation Recode (2010+)',
        'Tumor Deposits Recode (2010+)', 'Marital status at diagnosis']]

y_liver = df['SEER Combined Mets at DX-liver (2010+)']
y_lung = df['SEER Combined Mets at DX-lung (2010+)']

y = pd.concat([y_liver, y_lung], axis=1)

###Parte 2:  Prepara√ß√£o das Vari√°veis Alvo e Codifica√ß√£o de Vari√°veis Categ√≥ricas

Nesta etapa, preparamos as vari√°veis alvo (y), combinando as informa√ß√µes de met√°stase hep√°tica e pulmonar em uma coluna bin√°ria para indicar a presen√ßa de met√°stase. Tamb√©m aplicamos LabelEncoder para transformar vari√°veis categ√≥ricas de X em valores num√©ricos, facilitando o uso dos dados em modelos de aprendizado de m√°quina.

In [None]:
y = pd.concat([y_liver, y_lung], axis=1)

# Fun√ß√£o para combinar as informa√ß√µes de met√°stase hep√°tica e pulmonar em uma coluna bin√°ria 'Binary Mets'
def combine_mets_binary(row):
    if row['SEER Combined Mets at DX-liver (2010+)'] == 'Yes' or row['SEER Combined Mets at DX-lung (2010+)'] == 'Yes':
        return 1  # Com met√°stase
    else:
        return 0  # Sem met√°stase

# Aplicar a fun√ß√£o para criar a nova coluna bin√°ria 'Binary Mets' em 'y'
y['Binary Mets'] = y.apply(combine_mets_binary, axis=1)

# Verificar se 'X' e 'y' t√™m o mesmo n√∫mero de amostras
print(f"Tamanho de X: {len(X)}")
print(f"Tamanho de y: {len(y)}")

# Salvar o DataFrame 'y' em um arquivo CSV para refer√™ncia futura ou an√°lise adicional
y.to_csv('/content/Y.csv')

Tamanho de X: 53448
Tamanho de y: 53448


###Parte 3: Defini√ß√£o e Configura√ß√£o dos Modelos de Aprendizado de M√°quina e Valida√ß√£o Cruzada

Aqui, configuramos os principais algoritmos de aprendizado de m√°quina, incluindo Decision Tree, Random Forest, SVM, Naive Bayes, KNN, XGBoost e Gradient Boosting. Cada modelo √© definido com par√¢metros espec√≠ficos para otimizar o desempenho. Em seguida, aplicamos uma valida√ß√£o cruzada estratificada com 5 divis√µes para avaliar e comparar a performance dos modelos de maneira consistente e robusta.

In [None]:
import pandas as pd
import numpy as np
from sklearn.impute import SimpleImputer
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
import warnings
warnings.filterwarnings('ignore', category=UserWarning, module='sklearn')

best_score = -float('inf')
best_params = None


def clin_smote_balance(X, y):
    # Imputa√ß√£o de valores ausentes apenas para colunas num√©ricas
    num_cols = X.select_dtypes(include=[np.number]).columns
    imputer = SimpleImputer(strategy='mean')
    X[num_cols] = imputer.fit_transform(X[num_cols])

    # Separa√ß√£o das classes
    X['Binary Mets'] = y
    X_major = X[X['Binary Mets'] == 0].drop(columns=['Binary Mets'])
    X_minor = X[X['Binary Mets'] == 1].drop(columns=['Binary Mets'])

    # Perfil t√≠pico da classe majorit√°ria via KMeans para num√©ricas
    num_cols = X_major.select_dtypes(include=[np.number]).columns

    # 1. Padroniza as vari√°veis num√©ricas
    scaler = StandardScaler()
    X_major_num_scaled = scaler.fit_transform(X_major[num_cols])

    # 2. Aplica KMeans para identificar subgrupos dentro da classe majorit√°ria
    n_clusters = 15
    kmeans = KMeans(n_clusters=n_clusters, random_state=42)
    kmeans.fit(X_major_num_scaled)

    # 3. Determina os perfis t√≠picos para cada cluster
    perfil_majoritario = []

    for cluster_label in range(n_clusters):
        perfil_cluster = {}
        cluster_indices = (kmeans.labels_ == cluster_label)
        X_cluster = X_major[cluster_indices]

        # Perfil para vari√°veis num√©ricas
        for col in num_cols:
            perfil_cluster[col] = X_cluster[col].median()

        # Perfil para vari√°veis categ√≥ricas
        for col in X_major.columns:
            if col not in num_cols:
                if X_cluster[col].nunique() < 10:
                    freq = X_cluster[col].value_counts(normalize=True)
                    dominante = freq[freq >= 0.7]
                    if not dominante.empty:
                        perfil_cluster[col] = dominante.index[0]
                    else:
                        perfil_cluster[col] = X_cluster[col].mode()[0]
                else:
                    perfil_cluster[col] = X_cluster[col].mode()[0]

        perfil_majoritario.append(perfil_cluster)

    minor_status = []
    # Classifica√ß√£o das amostras minorit√°rias
    for _, row in X_minor.iterrows():
        match, total = 0, 0

        # --- NUM√âRICAS: compara com centroide mais pr√≥ximo
        row_scaled = scaler.transform([row[num_cols].values])[0]
        dists = np.linalg.norm(kmeans.cluster_centers_ - row_scaled, axis=1)
        cluster_idx = np.argmin(dists)  # √≠ndice do cluster mais pr√≥ximo
        centroide_prox = kmeans.cluster_centers_[cluster_idx]

        # Compara√ß√£o num√©ricas
        for i, col in enumerate(num_cols):
            total += 1
            if abs(row_scaled[i] - centroide_prox[i]) < 0.1:
                match += 1

        # Compara√ß√£o categ√≥ricas (usa o perfil do cluster correspondente)
        perfil_categorico = perfil_majoritario[cluster_idx]
        for col in perfil_categorico:
            if col not in num_cols:
                total += 1
                if row[col] == perfil_categorico[col]:
                    match += 1

        percent = match / total if total > 0 else 0
        minor_status.append('üü¢' if percent < 0.3 else 'üü°' if percent < 0.7 else 'üî¥')


    X_minor['Status'] = minor_status


    # Supondo que X_minor j√° tem a coluna 'Status' com os r√≥tulos üü¢üü°üî¥

    # 1. Seleciona todos os verdes
    X_minor_green = X_minor[X_minor['Status'] == 'üü¢'].drop(columns='Status')

    # 1a. Remove outliers num√©ricos do grupo verde usando IQR
    def remove_outliers_iqr(df):
        mask = pd.Series([True]*len(df), index=df.index)
        num_cols = df.select_dtypes(include=[np.number]).columns
        for col in num_cols:
            Q1 = df[col].quantile(0.25)
            Q3 = df[col].quantile(0.75)
            IQR = Q3 - Q1
            lower = Q1 - 1.5 * IQR
            upper = Q3 + 1.5 * IQR
            mask = mask & df[col].between(lower, upper)
        return df[mask]

    X_minor_green = remove_outliers_iqr(X_minor_green)


    # 2. Seleciona uma fra√ß√£o dos intermedi√°rios (por exemplo, 60%)
    frac_intermediarios = 0.4
    X_minor_yellow = X_minor[X_minor['Status'] == 'üü°'].sample(frac=frac_intermediarios, random_state=42).drop(columns='Status')

    # 3. Junta verdes e os intermedi√°rios escolhidos
    X_pool_sintetico = pd.concat([X_minor_green, X_minor_yellow])

    # 4. Gera as amostras sint√©ticas a partir desse pool
    def gerar_sinteticas(df, n_amostras):
        sinteticas = []
        for _ in range(n_amostras):
            base = df.sample(2, replace=True)
            nova = {col: np.random.choice(base[col].values) for col in df.columns}
            sinteticas.append(nova)
        return pd.DataFrame(sinteticas)

    n_sinteticas = max(0, int(0.3 * (len(X_major) - len(X_minor))))
    X_sinteticas = gerar_sinteticas(X_pool_sintetico, n_sinteticas)

    # 5. Junta tudo
    X_major['Binary Mets'] = 0
    X_minor['Binary Mets'] = 1
    X_sinteticas['Binary Mets'] = 1

    X_balanceado = pd.concat([X_major, X_minor, X_sinteticas], ignore_index=True)
    y_balanceado = X_balanceado['Binary Mets']
    X_balanceado.drop(columns=['Binary Mets', 'Status'], errors='ignore', inplace=True)

    return X_balanceado, y_balanceado



In [None]:
!pip install catboost



In [None]:
from catboost import CatBoostClassifier


# Pegue os √≠ndices das colunas categ√≥ricas (tipo 'object' ou 'category')
cat_cols = X.select_dtypes(include=['object', 'category']).columns
cat_features_idx = [X.columns.get_loc(col) for col in cat_cols]

# Defina seus modelos:
models = {
    "CatBoost": CatBoostClassifier(
        iterations=500,
        learning_rate=0.1,
        depth=6,
        cat_features=cat_features_idx,  # √≠ndice das colunas categ√≥ricas
        verbose=100
    )
}
smote_techniques = {
    "clin_smote_balance": clin_smote_balance
}


# Configura√ß√£o da valida√ß√£o cruzada estratificada com 5 divis√µes (folds)
# Isso garante que a propor√ß√£o de classes seja mantida em cada divis√£o, e o shuffle embaralha os dados antes de dividir
kf = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)

### Parte 4: Avalia√ß√£o e Compara√ß√£o dos Modelos de Aprendizado de M√°quina em Conjuntos de Treino, Valida√ß√£o e Teste


Este bloco de c√≥digo implementa a valida√ß√£o cruzada para treinar e avaliar os modelos de aprendizado de m√°quina definidos no pipeline. Ele utiliza a t√©cnica de K-Fold Cross-Validation para dividir os dados em m√∫ltiplos folds, garantindo uma avalia√ß√£o robusta do desempenho dos modelos. Durante cada fold, os dados de treinamento s√£o balanceados utilizando o SMOTE e escalados com o StandardScaler. M√©tricas de desempenho, como precis√£o, recall, F1-Score, especificidade, AUC-ROC e AUPR, s√£o calculadas tanto para o conjunto de treinamento quanto para o conjunto de teste. Al√©m disso, visualiza√ß√µes como matrizes de confus√£o e curvas ROC e Precis√£o-Recall s√£o geradas. Ao final, as m√©tricas m√©dias de todos os folds s√£o compiladas para compara√ß√£o.

In [None]:
import re
from sklearn.metrics import roc_curve, auc, accuracy_score, precision_score, recall_score, f1_score
from catboost import CatBoostClassifier
import itertools

# 1. Defina o grid de par√¢metros
param_grid = {
    'iterations': [200, 500],
    'learning_rate': [0.01, 0.1],
    'depth': [4, 6, 8]
}
param_list = list(itertools.product(
    param_grid['iterations'],
    param_grid['learning_rate'],
    param_grid['depth']
))

roc_curves_test = {smote_name: {} for smote_name in smote_techniques.keys()}
roc_curves_train = {smote_name: {} for smote_name in smote_techniques.keys()}
results_table_test = []
results_table_train = []

for smote_name, smote in smote_techniques.items():
    for iters, lr, dp in param_list:
        model_name = f"CatBoost_i{iters}_lr{lr}_d{dp}"
        print(f"\nAplicando {smote_name} com {model_name}")
        mean_fpr = np.linspace(0, 1, 100)
        tprs_test, aucs_test = [], []
        accuracies_test, precisions_test, recalls_test, f1_scores_test = [], [], [], []
        tprs_train, aucs_train = [], []
        accuracies_train, precisions_train, recalls_train, f1_scores_train = [], [], [], []

        for train_index, test_index in kf.split(X, y['Binary Mets']):
            X_train, X_test = X.iloc[train_index], X.iloc[test_index]
            y_train, y_test = y['Binary Mets'].iloc[train_index], y['Binary Mets'].iloc[test_index]

            # Balanceamento com RandomOverSampler (funciona com categ√≥ricos)
            if smote_name == 'clin_smote_balance':
                X_train_res, y_train_res = smote(X_train.copy(), y_train.copy())
            else:
                X_train_res, y_train_res = smote.fit_resample(X_train, y_train)
            for col in X_train_res.select_dtypes(include='object').columns:
                X_train_res[col] = X_train_res[col].astype('category')
            for col in X_test.select_dtypes(include='object').columns:
                X_test[col] = X_test[col].astype('category')

            # Pegue os √≠ndices das colunas categ√≥ricas **no X_train_res do fold atual**
            cat_cols = X_train_res.select_dtypes(include=['object', 'category']).columns
            cat_features_idx = [X_train_res.columns.get_loc(col) for col in cat_cols]

            def clean_column_names(df):
                df.columns = [
                    re.sub(r'[^A-Za-z0-9_]+', '_', col) for col in df.columns
                ]
                return df

            # Limpar nomes das colunas dos datasets usados no fit e predict
            X_train_res = clean_column_names(X_train_res)
            X_test = clean_column_names(X_test)

            # Crie o modelo dentro do fold, para cada conjunto de hiperpar√¢metros!
            model = CatBoostClassifier(
                iterations=iters, learning_rate=lr, depth=dp,
                verbose=0, random_state=42
            )

            # Treinamento e avalia√ß√£o, passando cat_features
            model.fit(X_train_res, y_train_res, cat_features=cat_features_idx)
            y_pred_test = model.predict(X_test)
            y_pred_proba_test = model.predict_proba(X_test)[:, 1]

            fpr_test, tpr_test, _ = roc_curve(y_test, y_pred_proba_test)
            interp_tpr_test = np.interp(mean_fpr, fpr_test, tpr_test)
            interp_tpr_test[0] = 0.0
            tprs_test.append(interp_tpr_test)
            aucs_test.append(auc(fpr_test, tpr_test))
            accuracies_test.append(accuracy_score(y_test, y_pred_test))
            precisions_test.append(precision_score(y_test, y_pred_test))
            recalls_test.append(recall_score(y_test, y_pred_test))
            f1_scores_test.append(f1_score(y_test, y_pred_test))

            y_pred_train = model.predict(X_train_res)
            y_pred_proba_train = model.predict_proba(X_train_res)[:, 1]

            fpr_train, tpr_train, _ = roc_curve(y_train_res, y_pred_proba_train)
            interp_tpr_train = np.interp(mean_fpr, fpr_train, tpr_train)
            interp_tpr_train[0] = 0.0
            tprs_train.append(interp_tpr_train)
            aucs_train.append(auc(fpr_train, tpr_train))
            accuracies_train.append(accuracy_score(y_train_res, y_pred_train))
            precisions_train.append(precision_score(y_train_res, y_pred_train))
            recalls_train.append(recall_score(y_train_res, y_pred_train))
            f1_scores_train.append(f1_score(y_train_res, y_pred_train))

            score = np.mean(f1_scores_test)  # ou a m√©trica que voc√™ preferir
            if score > best_score:
                best_score = score
                best_params = (iters, lr, dp)
                print(f"Novo melhor para {smote_name}: iters={iters}, lr={lr}, depth={dp}, F1={score:.4f}")

        mean_accuracy_test = np.mean(accuracies_test)
        mean_auc_test = np.mean(aucs_test)
        mean_precision_test = np.mean(precisions_test)
        mean_recall_test = np.mean(recalls_test)
        mean_f1_test = np.mean(f1_scores_test)
        mean_accuracy_train = np.mean(accuracies_train)
        mean_auc_train = np.mean(aucs_train)
        mean_precision_train = np.mean(precisions_train)
        mean_recall_train = np.mean(recalls_train)
        mean_f1_train = np.mean(f1_scores_train)

        print(f"Teste - Accuracy: {mean_accuracy_test:.4f}, AUC: {mean_auc_test:.4f}, Precision: {mean_precision_test:.4f}, Recall: {mean_recall_test:.4f}, F1-score: {mean_f1_test:.4f}")
        print(f"Treinamento - Accuracy: {mean_accuracy_train:.4f}, AUC: {mean_auc_train:.4f}, Precision: {mean_precision_train:.4f}, Recall: {mean_recall_train:.4f}, F1-score: {mean_f1_train:.4f}")

        results_table_test.append({
            "SMOTE Technique": smote_name,
            "Model": model_name,
            "Accuracy": mean_accuracy_test,
            "AUC": mean_auc_test,
            "Precision": mean_precision_test,
            "Recall rate": mean_recall_test,
            "F1-score": mean_f1_test
        })
        results_table_train.append({
            "SMOTE Technique": smote_name,
            "Model": model_name,
            "Accuracy": mean_accuracy_train,
            "AUC": mean_auc_train,
            "Precision": mean_precision_train,
            "Recall rate": mean_recall_train,
            "F1-score": mean_f1_train
        })



Aplicando clin_smote_balance com CatBoost_i200_lr0.01_d4


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X_test[col] = X_test[col].astype('category')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X_test[col] = X_test[col].astype('category')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X_test[col] = X_test[col].astype('category')
A value is trying to be set on a copy of a slice from a DataFrame.
Try

Novo melhor para clin_smote_balance: iters=200, lr=0.01, depth=4, F1=0.5706


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X_test[col] = X_test[col].astype('category')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X_test[col] = X_test[col].astype('category')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X_test[col] = X_test[col].astype('category')
A value is trying to be set on a copy of a slice from a DataFrame.
Try

Novo melhor para clin_smote_balance: iters=200, lr=0.01, depth=4, F1=0.5709


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X_test[col] = X_test[col].astype('category')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X_test[col] = X_test[col].astype('category')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X_test[col] = X_test[col].astype('category')
A value is trying to be set on a copy of a slice from a DataFrame.
Try

Novo melhor para clin_smote_balance: iters=200, lr=0.01, depth=4, F1=0.5738


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X_test[col] = X_test[col].astype('category')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X_test[col] = X_test[col].astype('category')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X_test[col] = X_test[col].astype('category')
A value is trying to be set on a copy of a slice from a DataFrame.
Try

Teste - Accuracy: 0.8664, AUC: 0.8740, Precision: 0.5449, Recall: 0.5880, F1-score: 0.5656
Treinamento - Accuracy: 0.8326, AUC: 0.8930, Precision: 0.7603, Recall: 0.6358, F1-score: 0.6925

Aplicando clin_smote_balance com CatBoost_i200_lr0.01_d6


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X_test[col] = X_test[col].astype('category')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X_test[col] = X_test[col].astype('category')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X_test[col] = X_test[col].astype('category')
A value is trying to be set on a copy of a slice from a DataFrame.
Try

Novo melhor para clin_smote_balance: iters=200, lr=0.01, depth=6, F1=0.5765


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X_test[col] = X_test[col].astype('category')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X_test[col] = X_test[col].astype('category')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X_test[col] = X_test[col].astype('category')
A value is trying to be set on a copy of a slice from a DataFrame.
Try

Novo melhor para clin_smote_balance: iters=200, lr=0.01, depth=6, F1=0.5780


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X_test[col] = X_test[col].astype('category')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X_test[col] = X_test[col].astype('category')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X_test[col] = X_test[col].astype('category')
A value is trying to be set on a copy of a slice from a DataFrame.
Try

Teste - Accuracy: 0.8677, AUC: 0.8778, Precision: 0.5488, Recall: 0.5935, F1-score: 0.5702
Treinamento - Accuracy: 0.8388, AUC: 0.9003, Precision: 0.7675, Recall: 0.6544, F1-score: 0.7065

Aplicando clin_smote_balance com CatBoost_i200_lr0.01_d8


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X_test[col] = X_test[col].astype('category')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X_test[col] = X_test[col].astype('category')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X_test[col] = X_test[col].astype('category')
A value is trying to be set on a copy of a slice from a DataFrame.
Try

Novo melhor para clin_smote_balance: iters=200, lr=0.01, depth=8, F1=0.5855


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X_test[col] = X_test[col].astype('category')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X_test[col] = X_test[col].astype('category')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X_test[col] = X_test[col].astype('category')
A value is trying to be set on a copy of a slice from a DataFrame.
Try

Novo melhor para clin_smote_balance: iters=200, lr=0.01, depth=8, F1=0.5863


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X_test[col] = X_test[col].astype('category')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X_test[col] = X_test[col].astype('category')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X_test[col] = X_test[col].astype('category')
A value is trying to be set on a copy of a slice from a DataFrame.
Try

Teste - Accuracy: 0.8689, AUC: 0.8801, Precision: 0.5521, Recall: 0.6019, F1-score: 0.5758
Treinamento - Accuracy: 0.8453, AUC: 0.9060, Precision: 0.7767, Recall: 0.6714, F1-score: 0.7202

Aplicando clin_smote_balance com CatBoost_i200_lr0.1_d4


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X_test[col] = X_test[col].astype('category')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X_test[col] = X_test[col].astype('category')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X_test[col] = X_test[col].astype('category')
A value is trying to be set on a copy of a slice from a DataFrame.
Try

Novo melhor para clin_smote_balance: iters=200, lr=0.1, depth=4, F1=0.5941


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X_test[col] = X_test[col].astype('category')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X_test[col] = X_test[col].astype('category')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X_test[col] = X_test[col].astype('category')
A value is trying to be set on a copy of a slice from a DataFrame.
Try

Novo melhor para clin_smote_balance: iters=200, lr=0.1, depth=4, F1=0.5949


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X_test[col] = X_test[col].astype('category')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X_test[col] = X_test[col].astype('category')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X_test[col] = X_test[col].astype('category')
A value is trying to be set on a copy of a slice from a DataFrame.
Try

Teste - Accuracy: 0.8711, AUC: 0.8844, Precision: 0.5592, Recall: 0.6075, F1-score: 0.5822
Treinamento - Accuracy: 0.8503, AUC: 0.9127, Precision: 0.7803, Recall: 0.6892, F1-score: 0.7320

Aplicando clin_smote_balance com CatBoost_i200_lr0.1_d6


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X_test[col] = X_test[col].astype('category')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X_test[col] = X_test[col].astype('category')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X_test[col] = X_test[col].astype('category')
A value is trying to be set on a copy of a slice from a DataFrame.
Try

Novo melhor para clin_smote_balance: iters=200, lr=0.1, depth=6, F1=0.5950


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X_test[col] = X_test[col].astype('category')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X_test[col] = X_test[col].astype('category')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X_test[col] = X_test[col].astype('category')
A value is trying to be set on a copy of a slice from a DataFrame.
Try

Novo melhor para clin_smote_balance: iters=200, lr=0.1, depth=6, F1=0.5960


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X_test[col] = X_test[col].astype('category')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X_test[col] = X_test[col].astype('category')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X_test[col] = X_test[col].astype('category')
A value is trying to be set on a copy of a slice from a DataFrame.
Try

Teste - Accuracy: 0.8721, AUC: 0.8858, Precision: 0.5628, Recall: 0.6081, F1-score: 0.5845
Treinamento - Accuracy: 0.8575, AUC: 0.9196, Precision: 0.7921, Recall: 0.7045, F1-score: 0.7457

Aplicando clin_smote_balance com CatBoost_i200_lr0.1_d8


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X_test[col] = X_test[col].astype('category')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X_test[col] = X_test[col].astype('category')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X_test[col] = X_test[col].astype('category')
A value is trying to be set on a copy of a slice from a DataFrame.
Try

Teste - Accuracy: 0.8711, AUC: 0.8849, Precision: 0.5598, Recall: 0.6026, F1-score: 0.5804
Treinamento - Accuracy: 0.8685, AUC: 0.9290, Precision: 0.8124, Recall: 0.7236, F1-score: 0.7654

Aplicando clin_smote_balance com CatBoost_i500_lr0.01_d4


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X_test[col] = X_test[col].astype('category')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X_test[col] = X_test[col].astype('category')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X_test[col] = X_test[col].astype('category')
A value is trying to be set on a copy of a slice from a DataFrame.
Try

Teste - Accuracy: 0.8678, AUC: 0.8804, Precision: 0.5484, Recall: 0.6026, F1-score: 0.5741
Treinamento - Accuracy: 0.8397, AUC: 0.9014, Precision: 0.7667, Recall: 0.6604, F1-score: 0.7096

Aplicando clin_smote_balance com CatBoost_i500_lr0.01_d6


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X_test[col] = X_test[col].astype('category')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X_test[col] = X_test[col].astype('category')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X_test[col] = X_test[col].astype('category')
A value is trying to be set on a copy of a slice from a DataFrame.
Try

Teste - Accuracy: 0.8702, AUC: 0.8830, Precision: 0.5563, Recall: 0.6037, F1-score: 0.5790
Treinamento - Accuracy: 0.8463, AUC: 0.9081, Precision: 0.7775, Recall: 0.6747, F1-score: 0.7224

Aplicando clin_smote_balance com CatBoost_i500_lr0.01_d8


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X_test[col] = X_test[col].astype('category')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X_test[col] = X_test[col].astype('category')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X_test[col] = X_test[col].astype('category')
A value is trying to be set on a copy of a slice from a DataFrame.
Try

Teste - Accuracy: 0.8716, AUC: 0.8842, Precision: 0.5608, Recall: 0.6079, F1-score: 0.5832
Treinamento - Accuracy: 0.8524, AUC: 0.9146, Precision: 0.7872, Recall: 0.6880, F1-score: 0.7343

Aplicando clin_smote_balance com CatBoost_i500_lr0.1_d4


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X_test[col] = X_test[col].astype('category')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X_test[col] = X_test[col].astype('category')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X_test[col] = X_test[col].astype('category')
A value is trying to be set on a copy of a slice from a DataFrame.
Try

Novo melhor para clin_smote_balance: iters=500, lr=0.1, depth=4, F1=0.5961


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X_test[col] = X_test[col].astype('category')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X_test[col] = X_test[col].astype('category')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X_test[col] = X_test[col].astype('category')
A value is trying to be set on a copy of a slice from a DataFrame.
Try

Teste - Accuracy: 0.8717, AUC: 0.8850, Precision: 0.5611, Recall: 0.6079, F1-score: 0.5834
Treinamento - Accuracy: 0.8562, AUC: 0.9185, Precision: 0.7891, Recall: 0.7030, F1-score: 0.7436

Aplicando clin_smote_balance com CatBoost_i500_lr0.1_d6


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X_test[col] = X_test[col].astype('category')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X_test[col] = X_test[col].astype('category')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X_test[col] = X_test[col].astype('category')
A value is trying to be set on a copy of a slice from a DataFrame.
Try

Teste - Accuracy: 0.8712, AUC: 0.8851, Precision: 0.5602, Recall: 0.6016, F1-score: 0.5801
Treinamento - Accuracy: 0.8674, AUC: 0.9277, Precision: 0.8085, Recall: 0.7242, F1-score: 0.7641

Aplicando clin_smote_balance com CatBoost_i500_lr0.1_d8


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X_test[col] = X_test[col].astype('category')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X_test[col] = X_test[col].astype('category')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X_test[col] = X_test[col].astype('category')
A value is trying to be set on a copy of a slice from a DataFrame.
Try

Teste - Accuracy: 0.8711, AUC: 0.8842, Precision: 0.5598, Recall: 0.6036, F1-score: 0.5808
Treinamento - Accuracy: 0.8831, AUC: 0.9413, Precision: 0.8348, Recall: 0.7553, F1-score: 0.7930
