In [3]:
import pandas as pd


In [4]:
# Select the experiment id that you want to analysis
experiment_id = "461092783944107271"


In [5]:
# load data from experiment that contains model-metric-fold and value
df = pd.read_parquet(f"../data/raw/selection-models-{experiment_id}.parquet")
df = df.unstack(level=0)
df = df.reset_index()
df.columns = ["model", "metric", "value"]
df = df.pivot(columns=["metric"], index=["model"], values=["value"])
df.columns = df.columns.map(lambda c: c[1])

df


Unnamed: 0_level_0,test_mae,test_r2,test_rmse,train_mae,train_r2,train_rmse
model,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
DecisionTreeRegressor,69609.303571,0.565239,107801.591871,0.0,1.0,0.0
MLPRegressor,385102.216503,-5.512126,417216.437654,402261.463321,-3.707505,448756.586097
RandomForestRegressor,47992.934821,0.767568,78822.101259,20083.180719,0.971494,34920.512864
Ridge,63403.494578,0.655359,95980.564054,54562.062831,0.885458,70000.016523
XGBRegressor,62937.618862,0.641802,97850.219553,45.317646,1.0,63.074549


In [6]:
# we select rmse as metric, and add a weight to each row depending on its position
by_test_rmse = df.sort_values(by=["test_rmse", "train_rmse"]).assign(weight = lambda x: range(len(x), 0, -1))
by_train_rmse = df.sort_values(by=["train_rmse", "test_rmse"]).assign(weight = lambda x: range(len(x), 0, -1))

In [7]:

by_test_rmse[["test_rmse", "train_rmse"]]

Unnamed: 0_level_0,test_rmse,train_rmse
model,Unnamed: 1_level_1,Unnamed: 2_level_1
RandomForestRegressor,78822.101259,34920.512864
Ridge,95980.564054,70000.016523
XGBRegressor,97850.219553,63.074549
DecisionTreeRegressor,107801.591871,0.0
MLPRegressor,417216.437654,448756.586097


In [8]:
by_train_rmse[["test_rmse", "train_rmse"]]

Unnamed: 0_level_0,test_rmse,train_rmse
model,Unnamed: 1_level_1,Unnamed: 2_level_1
DecisionTreeRegressor,107801.591871,0.0
XGBRegressor,97850.219553,63.074549
RandomForestRegressor,78822.101259,34920.512864
Ridge,95980.564054,70000.016523
MLPRegressor,417216.437654,448756.586097


In [9]:
# add r2 score to helps to break a tie.
by_train_r2 = df.sort_values(by=["train_r2", "test_r2"], ascending=False).assign(
    weight=lambda x: range(len(x), 0, -1)
)
by_test_r2 = df.sort_values(by=["test_r2", "train_r2"], ascending=False).assign(
    weight=lambda x: range(len(x), 0, -1)
)


In [10]:
# concat weights an compare them (visually)
weights = pd.concat(
    [
        by_train_rmse[["weight"]],
        by_test_rmse[["weight"]],
        by_train_r2[["weight"]],
        by_test_r2[["weight"]],
    ],
    axis=1,
)
weights.columns = ["train_rmse", "test_rmse", "train_r2", "test_r2"]
weights


Unnamed: 0_level_0,train_rmse,test_rmse,train_r2,test_r2
model,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
DecisionTreeRegressor,5,2,5,2
XGBRegressor,4,3,4,3
RandomForestRegressor,3,5,3,5
Ridge,2,4,2,4
MLPRegressor,1,1,1,1


## Summary

The comparison among models through rmse shows:

- DecisionTreeRegresor: 
    - it is good for training, but for testing it is not complex as we need (poor generalization)
    - this model tends to overfit, but there are other regularization techniques that could help.

- XGBRegressor: 
    - it is good for training and testing (Generalization), using regularization could be a first well approach
    - the difference between test/train is large, so it looks "well" fit but it will need regularization
    - comparing testing to ridge is almost the same, and to RandomForest it is a bit worst.

- Ridge: 
    - it is good for testing and training, it looks accurate (train/test similar values).
    - the difference betweet test/train is small, perhaps needs more training (underfit)
    - if I want to apply KISS or occam's razor, I would select this one.

- RandomForestRegressor
    - it is better for testing (unseen) than train
        - we need to train in a stratify-kfold way, to evaluate other results.
        - check data leakages


- MLPRegressor: it doesn't look good to the current problem.

The next step, in a real scenario, a team might evaluate Ridge, XGB and RandomForest (and DecisionTree why not...)  
I will decide to tune just one, in my case XGBRegressor.
