<img src='https://upload.wikimedia.org/wikipedia/commons/thumb/a/aa/Logo_DuocUC.svg/2560px-Logo_DuocUC.svg.png' width=50%, height=20%>

## Intoducción

En este ejercicio compararemos varios modelos de "ensamblado" para un problema de Regresión.

**Debes generar en cada celda de código una celda de texto que explique lo que se hace e interpretar los resultados (métricas) cuando corresponda.
Finalmente, generar una CONCLUSION con tu análisis. Comparando y justificando en forma adecuada, cual fue el mejor modelo aplicado en este caso.**


Como DATASET utilizaremos Pacientes Enfermos con Diabetes

Se obtuvieron diez variables basales, edad, sexo, índice de masa corporal, presión arterial promedio y seis mediciones del suero sanguíneo para cada uno de n = 442 pacientes con diabetes, así como la respuesta de interés, una medida cuantitativa de la progresión de la enfermedad un año después del inicio.

* age: age in years
* sex
* bmi: body mass index
* bp: average blood pressure
* s1: T-Cells (a type of white blood cells)
* s2: low-density lipoproteins
* s3: high-density lipoproteins
* s4: thyroid stimulating hormone
* s5: lamotrigine
* s6: blood sugar level

Objetivo: medida cuantitativa de la progresión de la enfermedad.


## Antes de empezar a modelar

In [None]:
import pandas as pd
from sklearn.datasets import load_diabetes
datos = load_diabetes()

X = pd.DataFrame(datos.data, columns=datos.feature_names)
Y = datos.target

In [None]:
# Collect the metrics for each model
metrics = []

## Bagging

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.ensemble import BaggingRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import GridSearchCV

Xtrain, Xtest, Ytrain, Ytest = train_test_split(X,Y,test_size=0.2)
mo = DecisionTreeRegressor()
model = BaggingRegressor(mo, n_estimators=10)

params = {"estimator__max_depth": [2,3],   # Changed from base_estimator__max_depth
          "max_samples": [0.1,0.2],
          "estimator__min_samples_split": [2,3,4]}  # [2,3,4]
grid = GridSearchCV(estimator=model, param_grid=params,cv=5)
_=grid.fit(Xtrain, Ytrain)

In [None]:
print(grid.best_score_)
print(grid.best_params_)
#pd.DataFrame(grid.cv_results_)

In [None]:
from sklearn.metrics import mean_squared_error
from sklearn.metrics import mean_absolute_error
from sklearn.metrics import r2_score

Yhat = grid.predict(Xtest)

mse = mean_squared_error(Ytest, Yhat)
mae = mean_absolute_error(Ytest, Yhat)
R2 = r2_score(Ytest, Yhat)

print("MSE: ",mse)
print("MAE: ",mae)
print("R^2: ",R2)

In [None]:
metrics.append({
    'Model': 'Bagging Regressor',
    'MSE': mean_squared_error(Ytest, grid.predict(Xtest)), # Using the last computed Yhat
    'MAE': mean_absolute_error(Ytest, grid.predict(Xtest)),
    'R2': r2_score(Ytest, grid.predict(Xtest))
})

## RandomForest

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import GridSearchCV

Xtrain, Xtest, Ytrain, Ytest = train_test_split(X,Y,test_size=0.2)

model = RandomForestRegressor()

params = {"n_estimators": [100],
          #"max_depth": [2,3],
          "min_samples_split": [2,3,4],
          "max_leaf_nodes": [5,8,10,15]}

grid = GridSearchCV(estimator=model, param_grid=params,cv=5)
_=grid.fit(Xtrain, Ytrain)

In [None]:
print(grid.best_score_)
print(grid.best_params_)
#pd.DataFrame(grid.cv_results_)

In [None]:
from sklearn.metrics import mean_squared_error
from sklearn.metrics import mean_absolute_error
from sklearn.metrics import r2_score

Yhat = grid.predict(Xtest)

mse = mean_squared_error(Ytest, Yhat)
mae = mean_absolute_error(Ytest, Yhat)
R2 = r2_score(Ytest, Yhat)

print("MSE: ",mse)
print("MAE: ",mae)
print("R^2: ",R2)

In [None]:
metrics.append({
    'Model': 'Random Forest Regressor',
    'MSE': mean_squared_error(Ytest, grid.predict(Xtest)), # Using the last computed Yhat
    'MAE': mean_absolute_error(Ytest, grid.predict(Xtest)),
    'R2': r2_score(Ytest, grid.predict(Xtest))
})

## AdaBoost

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.ensemble import AdaBoostRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import GridSearchCV

Xtrain, Xtest, Ytrain, Ytest = train_test_split(X,Y,test_size=0.2)

mo = DecisionTreeRegressor()
model = AdaBoostRegressor(mo)

params = {"n_estimators": [10],
          # Changed from base_estimator__max_depth to estimator__max_depth
          "estimator__max_depth": [2,3],
          # Changed from base_estimator__max_leaf_nodes to estimator__max_leaf_nodes
          "estimator__max_leaf_nodes": [5,8],
          # Changed from base_estimator__min_samples_split to estimator__min_samples_split
          "estimator__min_samples_split": [2,3,4]}


grid = GridSearchCV(estimator=model, param_grid=params,cv=5)
_=grid.fit(Xtrain, Ytrain)

In [None]:
print(grid.best_score_)
print(grid.best_params_)
#pd.DataFrame(grid.cv_results_)

In [None]:
from sklearn.metrics import mean_squared_error
from sklearn.metrics import mean_absolute_error
from sklearn.metrics import r2_score

Yhat = grid.predict(Xtest)

mse = mean_squared_error(Ytest, Yhat)
mae = mean_absolute_error(Ytest, Yhat)
R2 = r2_score(Ytest, Yhat)

print("MSE: ",mse)
print("MAE: ",mae)
print("R^2: ",R2)

In [None]:
metrics.append({
    'Model': 'AdaBoost Regressor',
    'MSE': mean_squared_error(Ytest, grid.predict(Xtest)), # Using the last computed Yhat
    'MAE': mean_absolute_error(Ytest, grid.predict(Xtest)),
    'R2': r2_score(Ytest, grid.predict(Xtest))
})

## Gradient Boosting

**Investiga y explica como funciona este modelo.**


In [None]:
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.model_selection import GridSearchCV

Xtrain, Xtest, Ytrain, Ytest = train_test_split(X,Y,test_size=0.2)

model = GradientBoostingRegressor()

params = {"n_estimators": [10],
          "learning_rate": [0.01],
          #"max_depth": [2,3,5,10],
          "min_samples_split": [2,3,4],
          "max_leaf_nodes": [5,10,15,20]
          }

grid = GridSearchCV(estimator=model, param_grid=params,cv=5)
_=grid.fit(Xtrain, Ytrain)

In [None]:
print(grid.best_score_)
print(grid.best_params_)
#pd.DataFrame(grid.cv_results_)

In [None]:
from sklearn.metrics import mean_squared_error
from sklearn.metrics import mean_absolute_error
from sklearn.metrics import r2_score

Yhat = grid.predict(Xtest)

mse = mean_squared_error(Ytest, Yhat)
mae = mean_absolute_error(Ytest, Yhat)
R2 = r2_score(Ytest, Yhat)

print("MSE: ",mse)
print("MAE: ",mae)
print("R^2: ",R2)

In [None]:
metrics.append({
    'Model': 'Gradient Boosting Regressor',
    'MSE': mean_squared_error(Ytest, grid.predict(Xtest)), # Using the last computed Yhat
    'MAE': mean_absolute_error(Ytest, grid.predict(Xtest)),
    'R2': r2_score(Ytest, grid.predict(Xtest))
})

# Create a DataFrame from the collected metrics
df_comparison = pd.DataFrame(metrics)

print(df_comparison)

In [None]:
Xtrain, Xtest, Ytrain, Ytest = train_test_split(X,Y,test_size=0.2)

mo = DecisionTreeRegressor()
model = AdaBoostRegressor(mo)

params = {"n_estimators": [10],
          # Changed from base_estimator__max_depth to estimator__max_depth
          "estimator__max_depth": [2,3],
          # Changed from base_estimator__max_leaf_nodes to estimator__max_leaf_nodes
          "estimator__max_leaf_nodes": [5,8],
          # Changed from base_estimator__min_samples_split to estimator__min_samples_split
          "estimator__min_samples_split": [2,3,4]}


grid = GridSearchCV(estimator=model, param_grid=params,cv=5)
_=grid.fit(Xtrain, Ytrain)

In [None]:
# prompt: genera la predicción para el Xtrain

# Predecimos con el mejor modelo encontrado en la búsqueda de hiperparámetros en Xtrain
Yhat_train = grid.predict(Xtrain)

In [None]:
# prompt: generar un dataframe para que tenga junte el Ytrain y Yhat_train

# Create a DataFrame combining the actual Ytrain and the predicted Yhat_train
df_comparison_train = pd.DataFrame({'Actual Ytrain': Ytrain, 'Predicted Yhat_train': Yhat_train})
df_comparison_train