Skip to content

Latest commit

 

History

History
176 lines (138 loc) · 8.88 KB

README.md

File metadata and controls

176 lines (138 loc) · 8.88 KB

Evaluation

The following sections will have steps to evaluate a given model on a given dataset. As an example, we have used config iclrw/cough/v9.7/adamW_1e-4.yml but the steps hold for any other config. Please see details of configs used as part of our ICLRW submission.

Evaluating given checkpoints on existing datasets

Given the model checkpoint and corresponding config file, you can evaluate on a given dataset.

  • Checkpoint: assets/models/iclrw/cough/v9.7/adamW_1e-4/checkpoints/113_ckpt.pth.tar
  • Corresponding config: configs/experiments/iclrw/cough/v9.7/adamW_1e-4.yml
  • Dataset: wiai-facility | v9.7 | test
  1. Copy model checkpoint in appropriate output folder (run inside docker):
# copies from assets/models/ckpt_path/ to /output/experiments/ckpt_path/
ckpt_path=experiments/iclrw/cough/v9.7/adamW_1e-4/checkpoints/113_ckpt.pth.tar
python training/copy_model_ckpts.py -p $ckpt_path --dst_prefix experiments
  1. Run forward pass and store metrics
cfg=iclrw/cough/v9.7/adamW_1e-4.yml
python evaluation/inference.py -v $cfg -e 113 -dn wiai-facility -dv v9.7 -m test --at softmax -t 0.1317

The results are published on the terminal with key metric being AUC-ROC. Here, explanation of args:

  • -v: experiment version (config file)
  • -u: corresponds to the user who trained the model, no need to pass this when you have config file in configs/ folder.
  • -e: epoch/checkpoint number of the trained model
  • -dn: dataset name
  • -dv: dataset version (name of .yml file stored)
  • -m: mode, train/test/val
  • -at: point of the outputs where aggregation is applied, e.g. after softmax
  • -t: threshold at which the model is evaluated against at the given mode

Summary of ICLRW Experiments

Model Dataset Config file W&B link Best val AUC/epoch/threshold ILA threshold
Cough V9.4 experiments/iclrw/cough/v9.4/adamW_1e-4_cyclic.yml Link 0.6558/38/0.1565 0.2827
Cough V9.7 experiments/iclrw/cough/v9.7/adamW_1e-4.yml Link 0.6293/113/0.06858 0.1317
Cough V9.8 experiments/iclrw/cough/v9.8/adamW_1e-4.yml Lnk 0.789/47/0.1604 0.2170
Context V9.4 experiments/iclrw/context/v9.4/context-neural.yml Link 0.6849/9/0.2339 0.2339
Context V9.7 experiments/iclrw/context/v9.7/context-neural.yml Link 0.6054/31/0.2069 0.2069
Context V9.8 experiments/iclrw/context/v9.8/context-neural.yml Link 0.6484/44/0.2282 0.2282

Note: W&B link may not work for you since it is within Wadhwani AI W&B account.

Evaluating your trained models on existing datasets

Given you ran training with following config, you can run evaluation as follows:

  • Corresponding config: configs/experiments/iclrw/cough/v9.7/adamW_1e-4.yml
  • Dataset: wiai-facility | v9.7 | test
  1. Run forward pass and store metrics
cfg=iclrw/cough/v9.7/adamW_1e-4.yml
python evaluation/inference.py -v $cfg -e 113 -dn wiai-facility -dv v9.7 -m test --at softmax

Note: Here, you do not need to copy checkpoint since checkpoints are saved during training itself. Plus, you do not need to explicitly pass -threshold since it picks it up from validation set logs saved during training.

Evaluating an ensemble of cough-based and context-based model on a given dataset

  1. Before running evaluation of ensemble of predictions, you need to run inference for the individual models. Follow aforementioned steps.

  2. Create a meta config for ensembling models (e.g.). In this example, we are ensembling a cough-based model and context-based models with ensemling weights of 0.5 each.

models:
  cough:
    version: experiments/iclrw/cough/v9.7/adamW_1e-4.yml
    epoch: 113
    user: null
    weight: 0.5
    agg_method: max

  context:
    version: experiments/iclrw/context/v9.7/context-neural.yml
    epoch: 31
    user: null
    weight: 0.5
    agg_method: max

data:
  mode: test
  1. Run the ensembling to see result metrics
python evaluation/ensemble.py -c experiments/iclrw/ensemble/cough_context_v9.7.yml