Skip to content

Model Selection

Giacomo Saccaggi edited this page Jun 19, 2026 · 1 revision

Model Selection

scomp-link uses a decision-tree approach to automatically select the best model based on data characteristics.

Decision Logic

Task Type?
├── numerical_prediction
│   ├── < 1000 rows → Econometric Model (OLS)
│   ├── 1000–100k rows → Ridge / SVR / Lasso / ElasticNet
│   └── > 100k rows → SGD Regressor / GBM
├── categorical_known (classification)
│   ├── text data → Contrastive Text / Sentence-Transformers
│   ├── image data → CNN (ResNet/Inception)
│   ├── categorical features → Naive Bayes / Decision Tree
│   └── mixed/numeric → SVC / GBM / Random Forest
└── categorical_unknown (clustering)
    ├── known n_clusters → KMeans / Hierarchical
    └── unknown n_clusters → Mean-Shift

Manual Override

pipe.choose_model("numerical_prediction", metadata={
    "only_numerical_exogenous": True,
    "all_variables_important": False,
})

Advanced Tuning

After model selection, you can fine-tune with:

from scomp_link.models.advanced_tuning import OptunaOptimizer

def param_space(trial):
    return {
        'n_estimators': trial.suggest_int('n_estimators', 50, 500),
        'max_depth': trial.suggest_int('max_depth', 3, 20),
        'learning_rate': trial.suggest_float('learning_rate', 0.01, 0.3, log=True),
    }

optimizer = OptunaOptimizer(GradientBoostingRegressor, param_space, scoring='r2', n_trials=100)
best_model = optimizer.optimize(X_train, y_train)

Clone this wiki locally