This folder contains model code and artifacts used by the CellDivider prediction pipeline. The repository focuses on three primary modelling approaches used for phenotype prediction from processed expression features:
- ElasticNet (regularized linear model)
- Multilayer Perceptron (MLP) neural network
- XGBoost (gradient-boosted trees)
ElasticNet
- Description: Linear regression with combined L1/L2 regularization (a mix of Lasso and Ridge). Useful as a baseline and for interpretable feature selection.
- Key hyperparameters:
alpha(overall regularization strength)l1_ratio(mix between L1 and L2 regularization)
Multilayer Perceptron (MLP)
- Description: Feed-forward neural network with one or more hidden layers and non-linear activations.
- Key hyperparameters:
hidden_dim(hidden layer sizes)num_layers(number of hidden layers)dropout_rate(regularization)start_lr(learning rate for Adam optimizer)batch size(dataloader batch size)
The main training code for the MLP can be found in mlp/train_mlp.py
XGBoost
- Description: Gradient-boosted decision trees.
- Key hyperparameters:
n_estimators(number of trees)max_depth(maximum tree depth)learning_rate(shrinkage)subsample,colsample_bytree(row/column sampling for regularization)gamma(regularization)
Activate your python enviroment, would recommend conda or venv.
pip install -r requirements.txt
If the GPU install doesn't work out of the box install pytorch for your GPU setup: https://pytorch.org/get-started/locally/