## In√≠cio do Pipeline de Modelagem e Rastreamento com MLflow

Este notebook tem como objetivo iniciar o pipeline de experimentos, carregando os datasets finais da camada `Curated` para criar os conjuntos de treino e teste totalmente alinhados. O fluxo inclui a separa√ß√£o de vari√°veis preditoras (`X`) e vari√°vel alvo (`y`), al√©m da configura√ß√£o inicial do MLflow Tracking, garantindo que todos os par√¢metros, m√©tricas e artefatos do modelo sejam rastreados de forma coerente e version√°vel.


In [1]:
#  ETAPA: Carga dos Dados Curados e Configura√ß√£o do MLflow Tracking

"""
Executa:
1) Valida√ß√£o do diret√≥rio de trabalho.
2) Carregamento de 'train_curated.csv' e 'test_curated.csv'.
3) Separa√ß√£o de X_train, y_train, X_test, y_test.
4) Configura√ß√£o do Tracking URI do MLflow.
"""

import os
import pandas as pd
import mlflow

# 1Ô∏è‚É£ Validar CWD
print("Current Working Directory:", os.getcwd())

# 2Ô∏è‚É£ Paths coerentes
TRAIN_PATH = 'data/curated/train_curated.csv'
TEST_PATH = 'data/curated/test_curated.csv'

# 3Ô∏è‚É£ Carregar datasets
train_df = pd.read_csv(TRAIN_PATH)
test_df = pd.read_csv(TEST_PATH)

print("\nTreino shape:", train_df.shape)
print("Teste shape:", test_df.shape)

# 4Ô∏è‚É£ Separar X e y
TARGET = 'Credit_Score_Standard'  # Ajuste para seu target real

X_train = train_df.drop(columns=[TARGET])
y_train = train_df[TARGET]

X_test = test_df.drop(columns=[TARGET])
y_test = test_df[TARGET]

print("\nX_train:", X_train.shape)
print("y_train:", y_train.shape)
print("X_test:", X_test.shape)
print("y_test:", y_test.shape)

# 5Ô∏è‚É£ Configurar MLflow Tracking URI
mlflow.set_tracking_uri("http://mlflow:5000")  # Ajuste se necess√°rio
print("\nTracking URI configurado:", mlflow.get_tracking_uri())


Current Working Directory: /workspace

Treino shape: (100000, 6305)
Teste shape: (50000, 6305)

X_train: (100000, 6304)
y_train: (100000,)
X_test: (50000, 6304)
y_test: (50000,)

Tracking URI configurado: http://mlflow:5000


## Experimento Baseline com MLflow e Monitoramento de Progresso

Nesta etapa ser√° rodado o primeiro experimento baseline usando o MLflow para rastrear par√¢metros, m√©tricas e artefatos. Para acompanhar opera√ß√µes potencialmente demoradas, como o ajuste do modelo (`fit`) e a gera√ß√£o de m√©tricas, ser√° utilizado o `tqdm` para monitorar loops de forma expl√≠cita. Isso garante visibilidade do progresso em tempo real, al√©m de manter a rastreabilidade completa do pipeline.


In [18]:
"""
Executa:
1) Tracking URI coerente para rede Docker.
2) Exporta credenciais MinIO e endpoint fixos.
3) Silencia logs redundantes do MLflow.
4) Treino com barra de progresso (tqdm).
5) Logging de hiperpar√¢metros, m√©tricas e artefato.
6) Prints finais APENAS com links 127.0.0.1.
"""

import os
import logging
import mlflow
import mlflow.sklearn
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, f1_score
from tqdm import tqdm

# ‚úÖ Silencia logs redundantes do MLflow
logging.getLogger("mlflow").setLevel(logging.ERROR)

# 1Ô∏è‚É£ Tracking URI interno
mlflow.set_tracking_uri("http://mlflow:5000")

# 2Ô∏è‚É£ Credenciais e endpoint MinIO expl√≠citos
os.environ['AWS_ACCESS_KEY_ID'] = 'wrm'
os.environ['AWS_SECRET_ACCESS_KEY'] = 'senha_segura'
os.environ['MLFLOW_S3_ENDPOINT_URL'] = 'http://minio:9000'

print("Tracking URI:", mlflow.get_tracking_uri())
print("MLFLOW_S3_ENDPOINT_URL:", os.environ['MLFLOW_S3_ENDPOINT_URL'])

# 3Ô∏è‚É£ Cria/recupera experimento
experiment_name = "QuantumFinance_CreditScore"
mlflow.set_experiment(experiment_name)

with mlflow.start_run(run_name="Baseline_DecisionTree") as run:
    params = {"max_depth": 5, "random_state": 42}
    mlflow.log_params(params)

    model = DecisionTreeClassifier(**params)

    print("\nTreinando modelo com barra de progresso:")
    for _ in tqdm(range(1), desc="Fitting model"):
        model.fit(X_train, y_train)

    y_pred = model.predict(X_test)
    acc = accuracy_score(y_test, y_pred)
    f1 = f1_score(y_test, y_pred, average='weighted')

    print(f"\nAccuracy: {acc:.4f}")
    print(f"F1 Score: {f1:.4f}")

    mlflow.log_metric("accuracy", acc)
    mlflow.log_metric("f1_score", f1)

    mlflow.sklearn.log_model(model, "model")

    # ‚úÖ Prints finais coerentes ‚Äî SOMENTE com 127.0.0.1
    print(f"\nRun ID: {run.info.run_id}")
    print(f"Acesse: http://127.0.0.1:5000/#/experiments/{run.info.experiment_id}/runs/{run.info.run_id}")
    print(f"üèÉ View run Baseline_DecisionTree at: http://127.0.0.1:5000/#/experiments/{run.info.experiment_id}/runs/{run.info.run_id}")
    print(f"üß™ View experiment at: http://127.0.0.1:5000/#/experiments/{run.info.experiment_id}")


Tracking URI: http://mlflow:5000
MLFLOW_S3_ENDPOINT_URL: http://minio:9000

Treinando modelo com barra de progresso:


Fitting model: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1/1 [00:07<00:00,  7.10s/it]



Accuracy: 0.2209
F1 Score: 0.3618

Run ID: c5c5e9a7ee704c50b9575962b3bf6722
Acesse: http://127.0.0.1:5000/#/experiments/1/runs/c5c5e9a7ee704c50b9575962b3bf6722
üèÉ View run Baseline_DecisionTree at: http://127.0.0.1:5000/#/experiments/1/runs/c5c5e9a7ee704c50b9575962b3bf6722
üß™ View experiment at: http://127.0.0.1:5000/#/experiments/1
üèÉ View run Baseline_DecisionTree at: http://mlflow:5000/#/experiments/1/runs/c5c5e9a7ee704c50b9575962b3bf6722
üß™ View experiment at: http://mlflow:5000/#/experiments/1
