# DrEvalPy Demo
You can execute the DrEval Framework either via Nextflow as nf-core pipeline or as Python standalone.

Approximate runtime standalone demo: 38 minutes, Nextflow demo: 5 minutes

In [None]:
!pip install drevalpy

In [1]:
import drevalpy
drevalpy.__version__

'1.4.0'

First let us see which dataset and models are already implemented in drevalpy.
You can test your own model on all the datasets and comapre your model to all.the implemented ones:

In [2]:
from drevalpy.models import MODEL_FACTORY
from drevalpy.datasets import AVAILABLE_DATASETS
print(f"Models: {list(MODEL_FACTORY.keys())}")
print(f"Dataset: {list(AVAILABLE_DATASETS.keys())}")

  from .autonotebook import tqdm as notebook_tqdm


Models: ['NaivePredictor', 'NaiveDrugMeanPredictor', 'NaiveCellLineMeanPredictor', 'NaiveMeanEffectsPredictor', 'NaiveTissueMeanPredictor', 'ElasticNet', 'RandomForest', 'SVR', 'SimpleNeuralNetwork', 'MultiOmicsNeuralNetwork', 'MultiOmicsRandomForest', 'GradientBoosting', 'SRMF', 'DIPK', 'ProteomicsRandomForest', 'ProteomicsElasticNet', 'DrugGNN', 'ChemBERTaNeuralNetwork', 'SingleDrugRandomForest', 'MOLIR', 'SuperFELTR', 'SingleDrugElasticNet', 'SingleDrugProteomicsElasticNet', 'SingleDrugProteomicsRandomForest']
Dataset: ['GDSC1', 'GDSC2', 'CCLE', 'TOYv1', 'TOYv2', 'CTRPv1', 'CTRPv2', 'BeatAML2', 'PDX_Bruna']


In [3]:
# let us first train a model on the toy dataset. It will download the dataset for you.
from drevalpy.experiment import drug_response_experiment

naive_mean = MODEL_FACTORY["NaivePredictor"] # a naive model that just predicts the training mean
enet = MODEL_FACTORY["ElasticNet"] # An Elastic Net based on drug fingerprints and gene expression of 1000 landmark genes
simple_nn = MODEL_FACTORY["SimpleNeuralNetwork"] # A neural network based on drug fingerprints and gene expression of 1000 landmark genes

toyv2 = AVAILABLE_DATASETS["TOYv1"](path_data="data")

drug_response_experiment(
            models=[enet, simple_nn],
            baselines=[naive_mean], # Ablation studies and robustness tests are not done for baselines.
            response_data=toyv2,
            n_cv_splits=2, # the number of cross validation splits. Should be higher in practice :)
            test_mode="LCO", # LCO means Leave-Cell-Line out. This means that the test and validation splits only contain unseed cell lines.
            run_id="my_first_run",
            path_data="data", # where the downloaded drug response and feature data is stored
            path_out="results", # results are stored here :)
            hyperparameter_tuning=False) # if True (default), hyperparameters of the models and baselines are tuned.

2025-11-20 11:30:25,057	INFO util.py:154 -- Missing packages: ['ipywidgets']. Run `pip install -U ipywidgets`, then restart the notebook server for rich notebook output.


Downloading TOYv1 from https://zenodo.org/api/records/17611663/files/TOYv1.zip/content...
TOYv1 data downloaded and extracted to data
Downloading meta from https://zenodo.org/api/records/17611663/files/meta.zip/content...
meta data downloaded and extracted to data
Creating cv splits at results/my_first_run/TOYv1/LCO/splits
Running ElasticNet
- Full Test -

################# FOLD 1/2 #################

Best hyperparameters: {'alpha': 1, 'l1_ratio': 0}
Training model on full train and validation set to predict test set
Loading cell line features ...
Loading drug features ...
Number of cell lines in features: 88
Number of drugs in features: 36
Number of cell lines in train dataset: 45
Number of drugs in train dataset: 36
Reduced training dataset from 889 to 858, due to missing features
Reduced prediction dataset from 887 to 871, due to missing features
Training model ...
Using temporary directory: /var/folders/3x/f8j9tddj7flfxt9zx1gkws1m0000gn/T/tmptnn7wkxp for model checkpoints

########

GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores


Loading drug features ...
Number of cell lines in features: 88
Number of drugs in features: 36
Number of cell lines in train dataset: 44
Number of drugs in train dataset: 36
Reduced early stopping dataset from 31 to 0
Training model ...
Using temporary directory: /var/folders/3x/f8j9tddj7flfxt9zx1gkws1m0000gn/T/tmp2k2op5hb for model checkpoints
SimpleNeuralNetwork: Early stopping dataset empty. Using training data for early stopping
Probably, your training dataset is small.



  | Name                   | Type       | Params | Mode 
--------------------------------------------------------------
0 | loss                   | MSELoss    | 0      | train
1 | fully_connected_layers | ModuleList | 13.5 K | train
2 | batch_norm_layers      | ModuleList | 120    | train
3 | dropout_layer          | Dropout    | 0      | train
--------------------------------------------------------------
13.6 K    Trainable params
0         Non-trainable params
13.6 K    Total params
0.054     Total estimated model params size (MB)
13        Modules in train mode
0         Modules in eval mode


Epoch 0: 100%|██████████| 58/58 [00:01<00:00, 32.78it/s, v_num=0, train_loss_step=12.20]
Validation: |          | 0/? [00:00<?, ?it/s][A
Validation: |          | 0/? [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/59 [00:00<?, ?it/s][A
Validation DataLoader 0: 100%|██████████| 59/59 [00:00<00:00, 129.40it/s][A
Epoch 1: 100%|██████████| 58/58 [00:00<00:00, 103.84it/s, v_num=0, train_loss_step=10.70, val_loss_step=11.40, val_loss_epoch=8.320, train_loss_epoch=8.590]
Validation: |          | 0/? [00:00<?, ?it/s][A
Validation: |          | 0/? [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/59 [00:00<?, ?it/s][A
Validation DataLoader 0: 100%|██████████| 59/59 [00:00<00:00, 155.42it/s][A
Epoch 2: 100%|██████████| 58/58 [00:00<00:00, 116.77it/s, v_num=0, train_loss_step=6.920, val_loss_step=11.10, val_loss_epoch=8.150, train_loss_epoch=8.300]
Validation: |          | 0/? [00:00<?, ?it/s][A
Validation: |          | 0/? [00:00<?, ?it/s][A
Validation Data

GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores

  | Name                   | Type       | Params | Mode 
--------------------------------------------------------------
0 | loss                   | MSELoss    | 0      | train
1 | fully_connected_layers | ModuleList | 13.5 K | train
2 | batch_norm_layers      | ModuleList | 120    | train
3 | dropout_layer          | Dropout    | 0      | train
--------------------------------------------------------------
13.6 K    Trainable params
0         Non-trainable params
13.6 K    Total params
0.054     Total estimated model params size (MB)
13        Modules in train mode
0         Modules in eval mode


Epoch 0: 100%|██████████| 59/59 [00:00<00:00, 77.80it/s, v_num=0, train_loss_step=9.410]
Validation: |          | 0/? [00:00<?, ?it/s][A
Validation: |          | 0/? [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/2 [00:00<?, ?it/s][A
Validation DataLoader 0: 100%|██████████| 2/2 [00:00<00:00, 89.94it/s][A
Epoch 1: 100%|██████████| 59/59 [00:00<00:00, 175.17it/s, v_num=0, train_loss_step=4.980, val_loss_step=7.690, val_loss_epoch=8.250, train_loss_epoch=8.140]
Validation: |          | 0/? [00:00<?, ?it/s][A
Validation: |          | 0/? [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/2 [00:00<?, ?it/s][A
Validation DataLoader 0: 100%|██████████| 2/2 [00:00<00:00, 304.73it/s][A
Epoch 2: 100%|██████████| 59/59 [00:00<00:00, 189.82it/s, v_num=0, train_loss_step=7.400, val_loss_step=7.370, val_loss_epoch=8.030, train_loss_epoch=7.660]
Validation: |          | 0/? [00:00<?, ?it/s][A
Validation: |          | 0/? [00:00<?, ?it/s][A
Validation DataLoader 

In [4]:
import os
import pandas as pd
os.listdir("results/my_first_run/TOYv1/LCO")
# the results folder holds splits and the results for all models. Lets look at the predictions of the simple neural network for the 0'th fold:
pd.read_csv("results/my_first_run/TOYv1/LCO/SimpleNeuralNetwork/predictions/predictions_split_0.csv")


Unnamed: 0,cell_line_name,pubchem_id,response,predictions,tissue
0,DU4475,123631,1.639112,2.327340,Breast
1,SK-MEL-31,123631,3.342029,2.933131,Skin
2,SK-MEL-24,637858,2.090905,2.584991,Skin
3,LNCaP clone FGC,11152667,0.359540,0.096984,Prostate
4,NCI-H2196,123631,2.912142,3.123437,Lung
...,...,...,...,...,...
866,HCC1143,3062316,0.306898,-0.228287,Breast
867,Karpas-299,24821094,3.124550,2.441677,Lymph
868,Namalwa,24771867,-1.506492,-2.305591,Lymph
869,TE-10,36314,1.665760,-3.401662,Esophagus


In [5]:
# you can generate your own evaluations from these predictions.
# However, we recommend using our evaluation pipeline, which calculates meaningful metrics, creates figures and prepares an HTML report:
from drevalpy.visualization.create_report import create_report
create_report(run_id="my_first_run", dataset="TOYv1")

# this will create a report in the results/my_first_run/index.html which you can open in your browser.

Generating result tables ...
Evaluating file: "TOYv1/LCO/ElasticNet/predictions/predictions_split_0.csv" ...
Parsing file: /Users/judithbernett/PycharmProjects/drevalpy/results/my_first_run/TOYv1/LCO/ElasticNet/predictions/predictions_split_0.csv
Calculating cell_line-wise evaluation measures …
Evaluating file: "TOYv1/LCO/ElasticNet/predictions/predictions_split_1.csv" ...
Parsing file: /Users/judithbernett/PycharmProjects/drevalpy/results/my_first_run/TOYv1/LCO/ElasticNet/predictions/predictions_split_1.csv
Calculating cell_line-wise evaluation measures …
Evaluating file: "TOYv1/LCO/NaivePredictor/predictions/predictions_split_0.csv" ...
Parsing file: /Users/judithbernett/PycharmProjects/drevalpy/results/my_first_run/TOYv1/LCO/NaivePredictor/predictions/predictions_split_0.csv
Calculating cell_line-wise evaluation measures …
Evaluating file: "TOYv1/LCO/NaivePredictor/predictions/predictions_split_1.csv" ...
Parsing file: /Users/judithbernett/PycharmProjects/drevalpy/results/my_first_r

In [6]:
# We prefer running this in the console:
!drevalpy --models RandomForest --dataset_name TOYv1 --n_cv_splits 2 --test_mode LPO --run_id my_second_run --no_hyperparameter_tuning
!drevalpy-report --run_id my_second_run --dataset TOYv1

Creating cv splits at results/my_second_run/TOYv1/LPO/splits
Running RandomForest
- Full Test -

################# FOLD 1/2 #################

Best hyperparameters: {'criterion': 'squared_error', 'max_depth': 5, 'max_samples': 0.2, 'n_estimators': 100, 'n_jobs': -1}
Training model on full train and validation set to predict test set
Loading cell line features ...
Loading drug features ...
Number of cell lines in features: 88
Number of drugs in features: 36
Number of cell lines in train dataset: 90
Number of drugs in train dataset: 36
Reduced training dataset from 888 to 865, due to missing features
Reduced prediction dataset from 888 to 864, due to missing features
Training model ...
Using temporary directory: /var/folders/3x/f8j9tddj7flfxt9zx1gkws1m0000gn/T/tmpmv442vyb for model checkpoints

################# FOLD 2/2 #################

Best hyperparameters: {'criterion': 'squared_error', 'max_depth': 5, 'max_samples': 0.2, 'n_estimators': 100, 'n_jobs': -1}
Trai

## Using the drevalpy nextflow pipeline for highly optimized runs:

You should use DrEval with Nextflow on high-performance clusters or clouds. Nextflow supports various systems like Slurm, AWS, Azure, Kubernetes, or SGE. On a local machine, you can also use the pipeline but probably, the overhang from spawning processes is not worth it so you might prefer the standalone. Nextflow needs a java version >=17, so we need to install that, too.

In [7]:
!pip install nextflow
!apt-get install openjdk-17-jre-headless -qq > /dev/null
!java --version

Collecting nextflow
  Downloading nextflow-25.10.0.tar.gz (7.7 kB)
  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone
[?25hBuilding wheels for collected packages: nextflow
  Building wheel for nextflow (pyproject.toml) ... [?25ldone
[?25h  Created wheel for nextflow: filename=nextflow-25.10.0-py3-none-any.whl size=7871 sha256=fa4251705ec9b8ea100573727ee2b922e26121c1c645159d17e46f469d74d953
  Stored in directory: /Users/judithbernett/Library/Caches/pip/wheels/d9/3d/9f/f98531f3e6826cd9e58951157b2588a55a3426ecdb9b9b20dd
Successfully built nextflow
Installing collected packages: nextflow
Successfully installed nextflow-25.10.0
zsh:1: command not found: apt-get
openjdk 23.0.2 2025-01-21
OpenJDK Runtime Environment Homebrew (build 23.0.2)
OpenJDK 64-Bit Server VM Homebrew (build 23.0.2, mixed mode, sharing)


In [8]:
# we need a demo config for nextflow because on colab, we only have two CPUs available:
with open('demo.config', 'w') as f:
  f.write('process {\n')
  f.write('\tresourceLimits = [\n')
  f.write('\t\tcpus: 2,\n')
  f.write('\t\tmemory: "3.GB",\n')
  f.write('\t\ttime: "1.h",\n')
  f.write('\t]\n')
  f.write('}')

We run the pipeline with the TOYv1 dataset which was subset from CTRPv2. For the demo, we don't do hyperparameter tuning and we just do 2 CV splits. We want to inspect the final model which is why we train a final model on the full dataset. This should take about 10 minutes.
If you were on a compute cluster, you could now decide if you want to run the pipeline inside conda, docker, singularity, ... via the -profile option (-profile singularity, e.g.). If you want the executor to be slurm/..., you can write this in your config. You can find plenty of config examples online, e.g., the one for our group: [daisybio](https://github.com/nf-core/configs/blob/master/conf/daisybio.config)


In [9]:
!nextflow run nf-core/drugresponseeval -r dev -c demo.config --dataset_name TOYv1 --models ElasticNet --baselines NaiveMeanEffectsPredictor --n_cv_splits 2 --no_hyperparameter_tuning --final_model_on_full_data

[Knloading nextflow dependencies. It may require a few seconds, please wait .. 
[1m[38;5;232m[48;5;43m N E X T F L O W [0;2m  ~  [mversion 25.10.0[m
[K
Pulling nf-core/drugresponseeval ...
 downloaded from https://github.com/nf-core/drugresponseeval.git
Launching[35m `https://github.com/nf-core/drugresponseeval` [0;2m[[0;1;36mgrave_mclean[0;2m] DSL2 - [36mrevision: [0;36mae31d78d85 [dev][m
[K

------------------------------------------------------
                                        ,--./,-.
        ___     __   __   __   ___     /,-._.--~'
  |\ | |__  __ /  ` /  \ |__) |__         }  {
  | \| |       \__, \__/ |  \ |___     \`-._,-`-,
                                        `._,._,'
  nf-core/drugresponseeval 1.1.1dev
------------------------------------------------------
[1mModel options[0m
  [0;34mmodels                  : [0;32mElasticNet[0m

[1mInput/output options[0m
  [0;34mdataset_name            : [0;32mTOYv1[0m

[1mAddition

The results will be stored in `results/my_run`. You can inspect pipeline information like runtime or memory in `results/pipeline_info`. In `my_run/report`, you can find the html report where you can look at your results interactively. The underlying data is in `my_run/evaluation_results.csv` or `true_vs_pred.csv`.

We now inspect the final model saved in `results/my_run/LCO/ElasticNet/final_model` with `drevalpy` functions.

In [10]:
from drevalpy.models import MODEL_FACTORY
enet_class = MODEL_FACTORY["ElasticNet"]
enet = enet_class.load("results/my_run/LCO/ElasticNet/final_model")
enet

<drevalpy.models.baselines.sklearn_models.ElasticNetModel at 0x31ae574d0>

We now want to extract the top scoring features.

In [11]:
# get the top features
cell_line_input = enet.load_cell_line_features(data_path="data", dataset_name="TOYv1")
drug_input = enet.load_drug_features(data_path="data", dataset_name="TOYv1")
all_features = list(cell_line_input.meta_info['gene_expression'])+[f'fingerprint_{i}' for i in range(128)]

In [12]:
import pandas as pd
df = pd.DataFrame({'feature': all_features, 'coef': enet.model.coef_})
df.sort_values(by="coef", ascending=False)

Unnamed: 0,feature,coef
303,fingerprint_33,0.948429
345,fingerprint_75,0.657288
386,fingerprint_116,0.581950
314,fingerprint_44,0.462468
293,fingerprint_23,0.446101
...,...,...
335,fingerprint_65,-0.507304
342,fingerprint_72,-0.613356
393,fingerprint_123,-0.638303
298,fingerprint_28,-0.668780


In [13]:
print("Top 50 features:")
list(df.sort_values(by="coef", ascending=False)["feature"][:50])

Top 50 features:


['fingerprint_33',
 'fingerprint_75',
 'fingerprint_116',
 'fingerprint_44',
 'fingerprint_23',
 'fingerprint_4',
 'fingerprint_120',
 'fingerprint_61',
 'fingerprint_57',
 'fingerprint_112',
 'fingerprint_69',
 'fingerprint_109',
 'fingerprint_126',
 'fingerprint_31',
 'fingerprint_14',
 'fingerprint_50',
 'fingerprint_43',
 'fingerprint_121',
 'fingerprint_20',
 'fingerprint_47',
 'fingerprint_107',
 'fingerprint_110',
 'fingerprint_85',
 'fingerprint_24',
 'fingerprint_122',
 'fingerprint_63',
 'fingerprint_55',
 'fingerprint_91',
 'fingerprint_53',
 'fingerprint_30',
 'fingerprint_37',
 'fingerprint_62',
 'fingerprint_38',
 'fingerprint_78',
 np.str_('CLPX'),
 'fingerprint_76',
 'fingerprint_92',
 'fingerprint_82',
 'fingerprint_64',
 'fingerprint_83',
 'fingerprint_87',
 'fingerprint_66',
 'fingerprint_3',
 'fingerprint_56',
 'fingerprint_118',
 'fingerprint_52',
 np.str_('CPNE3'),
 'fingerprint_115',
 np.str_('LIG1'),
 np.str_('CAPN1')]

The fingerprints are the most important features as the drug identity is responsible for the most variation between responses.