#### Esclarecimento: o LinearRegressionCV é o LinearRegression mais otimizado, como o C crítico para mostrar a regularização é importante, o scikit-learn também disponibiliza uma versão já com Cross Validation do Linear Regression. Resumindo, é o Linear Regression com GridSearchCV implementado.

In [None]:
import pandas as pd
from sklearn.linear_model import LogisticRegressionCV
from sklearn.pipeline import Pipeline
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from dotenv import load_dotenv
import os
from pymongo import MongoClient
import psycopg2
from sklearn.preprocessing import LabelEncoder
import functions as fn

In [None]:
data = fn.dt_get_data()

# === Exibir resultado ===
print("\n=== DataFrame Final ===")
display(data.head())
print("\nColunas no DataFrame final:")
print(data.columns)


INFO: Tentando conectar DB1 (DESTINO): Host=pg-intersegundo-intercalbon.h.aivencloud.com, DB=bancosegundoano, User=avnadmin, Port=17807
INFO: Conexão DB1 (DESTINO) estabelecida com sucesso.


  merged_sql = pd.read_sql(sql_query, conn_1)



=== DataFrame Final ===


Unnamed: 0,nivel_emissao,classificacao_emissao,nivel_cargo,estado_residencia,cidade_residencia,nome_categoria
0,1.42,Baixo,Médio,Minas Gerais,Lima das Flores,Architecto
1,1.52,Baixo,Médio,Pará,da Cunha das Pedras,Adipisci
2,3.4,Médio,Alto,São Paulo,Castro do Galho,At
3,4.72,Alto,Alto,Espírito Santo,Rodrigues das Pedras,Quis
4,2.26,Baixo,Alto,Goiás,Freitas,Quis



Colunas no DataFrame final:
Index(['nivel_emissao', 'classificacao_emissao', 'nivel_cargo',
       'estado_residencia', 'cidade_residencia', 'nome_categoria'],
      dtype='object')


In [36]:
X, y = fn.separate_features_and_target(data, 'classificacao_emissao')
y_encoded = LabelEncoder().fit_transform(y)

In [37]:
df_num_columns = fn.get_data_numeric(X)
df_cat_columns = fn.get_data_string(X, 'classificacao_emissao')

print("Colunas Numéricas:", df_num_columns)
print("Colunas Categóricas:", df_cat_columns)

Colunas Numéricas: ['nivel_emissao']
Colunas Categóricas: ['nivel_cargo', 'estado_residencia', 'cidade_residencia', 'nome_categoria']


In [38]:

preprocessor = fn.preprocess_data(df_num_columns, df_cat_columns)


In [39]:
if preprocessor is not None:
    model = Pipeline(steps=[
        ("preprocessor", preprocessor),
        ("classifier", LogisticRegressionCV(
            Cs=[3,5, 10, 20,30],
            max_iter=1000, #como eu coloquei saga e elasticnet, o ideal é aumentar o max_iter -> eles demoram pacas
            solver='saga', # 'saga' suporta penalty='elasticnet' e multi_class='multinomial'
            penalty='elasticnet', #vou deixar assim por enquanto pq vou mudar do dataset, o elasticnet combina l1 e l2 - l1 encolhe os coeficientes de forma suave e o l2 força alguns coeficientes a zero; a diferença está no calculo feito e o elasticnet tenta balancear os dois
            class_weight='balanced', #defini o peso balanceado conforme a distribuicao das classes
            cv=10, #deixei 10 para mais robustez
            random_state=42, #Num padrão para reprodutibilidade - não influencia mt no resultado
            n_jobs=-1, # Usar todos os núcleos disponíveis para acelerar o treinamento
            verbose=1, # Para ver o progresso do treinamento
            multi_class='multinomial', #Ele treina todas as classes de uma vez então para 10k de dados é melhor do que os outros
            l1_ratios=[0.1, 0.5, 0.9] #Mistura l1 e l2 na regularização
        ))
    ])

In [40]:
X_train, X_test, y_train, y_test = train_test_split(X, y_encoded, test_size=0.2, random_state=42)


In [41]:
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

[Parallel(n_jobs=-1)]: Using backend ThreadingBackend with 12 concurrent workers.


max_iter reached after 14 seconds




convergence after 169 epochs took 3 seconds
max_iter reached after 20 seconds
max_iter reached after 20 seconds




max_iter reached after 20 seconds




convergence after 172 epochs took 6 seconds
convergence after 169 epochs took 6 seconds
convergence after 261 epochs took 9 seconds
convergence after 169 epochs took 6 seconds
convergence after 260 epochs took 11 seconds
convergence after 259 epochs took 12 seconds
convergence after 268 epochs took 12 seconds
max_iter reached after 38 seconds




convergence after 367 epochs took 15 seconds
convergence after 67 epochs took 3 seconds
convergence after 104 epochs took 5 seconds
max_iter reached after 49 seconds
max_iter reached after 49 seconds




max_iter reached after 49 seconds




convergence after 329 epochs took 11 seconds
convergence after 66 epochs took 5 seconds
convergence after 66 epochs took 5 seconds
convergence after 154 epochs took 8 seconds
convergence after 68 epochs took 6 seconds
convergence after 352 epochs took 17 seconds
convergence after 357 epochs took 18 seconds
convergence after 372 epochs took 17 seconds
convergence after 103 epochs took 8 seconds
convergence after 104 epochs took 8 seconds
convergence after 103 epochs took 8 seconds
convergence after 168 epochs took 9 seconds
max_iter reached after 70 seconds




max_iter reached after 71 seconds




convergence after 322 epochs took 17 seconds
convergence after 320 epochs took 17 seconds
convergence after 334 epochs took 19 seconds
convergence after 46 epochs took 4 seconds
convergence after 46 epochs took 3 seconds
convergence after 155 epochs took 13 seconds
convergence after 154 epochs took 13 seconds
convergence after 154 epochs took 13 seconds
convergence after 73 epochs took 6 seconds
convergence after 72 epochs took 7 seconds
convergence after 109 epochs took 11 seconds
convergence after 109 epochs took 10 seconds
convergence after 169 epochs took 18 seconds
convergence after 168 epochs took 18 seconds
convergence after 166 epochs took 18 seconds
convergence after 123 epochs took 13 seconds
convergence after 123 epochs took 14 seconds
max_iter reached after 114 seconds




max_iter reached after 118 seconds




max_iter reached after 58 seconds




convergence after 66 epochs took 7 seconds
convergence after 46 epochs took 14 seconds
convergence after 46 epochs took 14 seconds
max_iter reached after 61 seconds




convergence after 104 epochs took 10 seconds
convergence after 73 epochs took 17 seconds
max_iter reached after 42 seconds




convergence after 166 epochs took 13 seconds
convergence after 73 epochs took 15 seconds
convergence after 154 epochs took 12 seconds
convergence after 169 epochs took 5 seconds
max_iter reached after 105 seconds




convergence after 169 epochs took 10 seconds
convergence after 262 epochs took 9 seconds
convergence after 46 epochs took 4 seconds
max_iter reached after 68 seconds




convergence after 110 epochs took 16 seconds
convergence after 258 epochs took 17 seconds
convergence after 110 epochs took 16 seconds
convergence after 72 epochs took 5 seconds
convergence after 170 epochs took 7 seconds
convergence after 365 epochs took 13 seconds
convergence after 109 epochs took 9 seconds
max_iter reached after 101 seconds




convergence after 124 epochs took 15 seconds
convergence after 124 epochs took 14 seconds
convergence after 259 epochs took 12 seconds
convergence after 66 epochs took 5 seconds
convergence after 362 epochs took 19 seconds
convergence after 124 epochs took 9 seconds
convergence after 328 epochs took 12 seconds
convergence after 104 epochs took 9 seconds
max_iter reached after 97 seconds




convergence after 67 epochs took 5 seconds
convergence after 359 epochs took 20 seconds
convergence after 329 epochs took 20 seconds
convergence after 155 epochs took 16 seconds
max_iter reached after 101 seconds




max_iter reached after 46 seconds




convergence after 104 epochs took 10 seconds
max_iter reached after 24 seconds
convergence after 66 epochs took 3 seconds
convergence after 46 epochs took 4 seconds




max_iter reached after 36 seconds




convergence after 168 epochs took 6 seconds
convergence after 104 epochs took 7 seconds
convergence after 73 epochs took 6 seconds
convergence after 166 epochs took 8 seconds
convergence after 156 epochs took 15 seconds
convergence after 329 epochs took 22 seconds
convergence after 168 epochs took 17 seconds
convergence after 261 epochs took 8 seconds
convergence after 155 epochs took 9 seconds
convergence after 110 epochs took 10 seconds
max_iter reached after 158 seconds




max_iter reached after 49 seconds




convergence after 262 epochs took 13 seconds
convergence after 168 epochs took 11 seconds
convergence after 124 epochs took 10 seconds
convergence after 66 epochs took 3 seconds
convergence after 47 epochs took 7 seconds
convergence after 364 epochs took 14 seconds
convergence after 168 epochs took 16 seconds
convergence after 104 epochs took 6 seconds
convergence after 73 epochs took 8 seconds
max_iter reached after 154 seconds




convergence after 329 epochs took 13 seconds
convergence after 153 epochs took 10 seconds
convergence after 47 epochs took 6 seconds
convergence after 365 epochs took 22 seconds
max_iter reached after 40 seconds




convergence after 110 epochs took 19 seconds
convergence after 73 epochs took 15 seconds
convergence after 167 epochs took 17 seconds
convergence after 172 epochs took 10 seconds
convergence after 124 epochs took 20 seconds
convergence after 331 epochs took 30 seconds
convergence after 110 epochs took 18 seconds
convergence after 261 epochs took 19 seconds
convergence after 124 epochs took 17 seconds
max_iter reached after 82 seconds




convergence after 67 epochs took 5 seconds
convergence after 365 epochs took 21 seconds
max_iter reached after 137 seconds




convergence after 106 epochs took 8 seconds
convergence after 46 epochs took 5 seconds
convergence after 72 epochs took 7 seconds
convergence after 333 epochs took 15 seconds
convergence after 156 epochs took 10 seconds
max_iter reached after 131 seconds




convergence after 108 epochs took 8 seconds
convergence after 47 epochs took 2 seconds
convergence after 171 epochs took 9 seconds
convergence after 73 epochs took 6 seconds
convergence after 122 epochs took 9 seconds
convergence after 111 epochs took 5 seconds
convergence after 125 epochs took 5 seconds


[Parallel(n_jobs=-1)]: Done  30 out of  30 | elapsed:  5.9min finished


In [42]:
print(classification_report(y_test, y_pred, target_names=LabelEncoder().fit(y).classes_))

              precision    recall  f1-score   support

        Alto       0.99      0.98      0.99       539
       Baixo       0.98      0.98      0.98       519
       Médio       0.94      0.97      0.95       362

    accuracy                           0.98      1420
   macro avg       0.97      0.98      0.97      1420
weighted avg       0.98      0.98      0.98      1420

