<div style="color:#3c4d5a; border-top: 7px solid #42A5F5; border-bottom: 7px solid #42A5F5; padding: 5px; text-align: center; text-transform: uppercase"><h1>Incremental Retraining of XGBOOST Model
</h1> </div>

This notebook implements the incremental retraining process of the Alzheimer's risk prediction model based on XGBoost. The objective is to update the previously trained model by incorporating new patient data, without the need to retrain it from scratch.

To do this, XGBoost's ability to continue training from an existing model is used through the xgb_model parameter. In this approach, the base model acts as a starting point, and new decision trees are added using the recent preprocessed data. This improves the model's predictive power while retaining previously learned knowledge.

- [Transform new samples](#tp)
- [Re-training](#re)
- [Save new version](#se)
- [Results](#results)
- [Conclusion](#conclusion)
- [References](#references)

<div style="color:#37475a"><h2>Imported modules</h2> </div>

---

In [2]:
import mlflow.xgboost
import pandas as pd
import numpy as np
import pickle
import matplotlib.pyplot as plt
import seaborn as sns
import mlflow
import mlflow.xgboost
from mlflow.models import infer_signature

<div style="color:#37475a"><h2>Transformer load and model</h2> </div>

---

In [1]:
experiment_name = "Alzheimer_Preprocesamiento"
run_id = "a9302cdf7df7439d8a59ea7c3fb148ff"

# --- Transformer load and model---
prep_path = mlflow.artifacts.download_artifacts(
    run_id=run_id,
    artifact_path="preprocessor/preprocessor.pkl"
)

with open(prep_path, "rb") as f:
    transformador = pickle.load(f)

modelo = mlflow.xgboost.load_model(
    model_uri="models:/Alzheimer_XGBoost/1"
)

<div id="tp" style="color:#37475a; border-bottom: 7px solid orange; width: 100%; margin-bottom: 15px; padding-bottom: 2px"><h2>Transform new samples</h2> </div>

In [None]:
# X_new_raw, y_new_raw → nuevos pacientes
X_new_prep = transformador.transform(X_new_raw)
y_new = y_new_raw

<div id="re" style="color:#37475a; border-bottom: 7px solid orange; width: 100%; margin-bottom: 15px; padding-bottom: 2px"><h2>Re-training</h2> </div>

In [None]:
from xgboost import XGBClassifier

modelo_retrain = XGBClassifier(
    n_estimators=100,          # SOLO árboles nuevos
    learning_rate=0.05,
    max_depth=5,
    eval_metric="logloss",
    random_state=42
)

modelo_retrain.fit(
    X_new_prep,
    y_new,
    xgb_model=modelo_base      # ← continúa desde v1
)

Create a new XGBoost classifier configured to add additional trees (n_estimators=100) and continue training from the previously trained base model using:

**xgb_model=base_model**


This allows you to:

* not lose previous learning

* incorporate new patterns

* update the model efficiently

* reduce training time

* maintain model stability

In [None]:
from sklearn.metrics import accuracy_score, f1_score

y_pred = modelo_retrain.predict(X_test)

acc = accuracy_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)

print("Accuracy (retrain):", acc)
print("F1 (retrain):", f1)

<div id="se" style="color:#37475a; border-bottom: 7px solid orange; width: 100%; margin-bottom: 15px; padding-bottom: 2px"><h2>Save new version</h2> </div>

In [None]:
mlflow.set_experiment("Alzheimer_Modelamiento")

with mlflow.start_run(run_name="XGBoost_Retrain_v2"):

    # ===== métricas =====
    mlflow.log_param("retrain", True)
    mlflow.log_param("base_model_version", 1)

    mlflow.log_metric("accuracy", acc)
    mlflow.log_metric("f1", f1)

    # ===== signature =====
    signature = infer_signature(X_new_prep, modelo_retrain.predict(X_new_prep))

    # ===== guardar modelo =====
    mlflow.xgboost.log_model(
        xgb_model=modelo_retrain,
        name="xgb_model",
        registered_model_name="Alzheimer_XGBoost",
        signature=signature,
        input_example=X_new_prep[:5]
    )

print("Retrained model saved as NEW VERSION")


<div id="conclusion" style="color:#37475a; border-bottom: 7px solid orange; width: 100%; margin-bottom: 15px; padding-bottom: 2px"><h2>Conclusion</h2> </div>

Incremental retraining allowed the Alzheimer's risk prediction model to be updated by incorporating new patient records without completely rebuilding the original model. This strategy retains previously learned knowledge and extends it with new information, which is especially useful in clinical settings where data grows progressively.

Post-retraining evaluation metrics show that the model maintains—or improves—its predictive performance, confirming that integrating new data adds value without degrading model quality. In addition, the retrained model can be versioned and logged, enabling traceability and change control within the MLOps flow.

Taken together, this procedure demonstrates a best practice for updating models in production, aligned with principles of continuous learning and maintenance of applied artificial intelligence systems.