-
Notifications
You must be signed in to change notification settings - Fork 0
Model Selection
Giacomo Saccaggi edited this page Jun 19, 2026
·
1 revision
scomp-link uses a decision-tree approach to automatically select the best model based on data characteristics.
Task Type?
├── numerical_prediction
│ ├── < 1000 rows → Econometric Model (OLS)
│ ├── 1000–100k rows → Ridge / SVR / Lasso / ElasticNet
│ └── > 100k rows → SGD Regressor / GBM
├── categorical_known (classification)
│ ├── text data → Contrastive Text / Sentence-Transformers
│ ├── image data → CNN (ResNet/Inception)
│ ├── categorical features → Naive Bayes / Decision Tree
│ └── mixed/numeric → SVC / GBM / Random Forest
└── categorical_unknown (clustering)
├── known n_clusters → KMeans / Hierarchical
└── unknown n_clusters → Mean-Shift
pipe.choose_model("numerical_prediction", metadata={
"only_numerical_exogenous": True,
"all_variables_important": False,
})After model selection, you can fine-tune with:
from scomp_link.models.advanced_tuning import OptunaOptimizer
def param_space(trial):
return {
'n_estimators': trial.suggest_int('n_estimators', 50, 500),
'max_depth': trial.suggest_int('max_depth', 3, 20),
'learning_rate': trial.suggest_float('learning_rate', 0.01, 0.3, log=True),
}
optimizer = OptunaOptimizer(GradientBoostingRegressor, param_space, scoring='r2', n_trials=100)
best_model = optimizer.optimize(X_train, y_train)