# ML-based COPD inference demo

This notebook demonstrates how to run inference using the ML-based COPD
checkpoints and the UKB demo spirometry blow showcased in field
[3066](https://biobank.ctsu.ox.ac.uk/crystal/field.cgi?id=3066).

In this notebook, we download the ML-based COPD source code and model
checkpoints. After installing dependencies, we generate a mock dataset using the
UKB demo spirometry field from the UKB data showcase. See
[`ukb_3066_demo_preprocessing.py`](https://github.com/Google-Health/genomics-research/blob/main/ml-based-copd/learning/ukb_3066_demo_preprocessing.py)
for details. We then load the model checkpoints and run inference on the mock
data.

This notebook can be run on [colab.google](https://colab.google).

## Setup the notebook environment

The following cells install dependencies and setup the notebook environment.
Libraries can be imported from `genomics_research.ml_based_copd.learning`.
Checkpoints are saved to `/content/checkpoints/ml_based_copd_member_ckpts`. Data
will be saved to `/content/data/`.

In [None]:
%%bash
# Download code from the repository.
git clone https://github.com/Google-Health/genomics-research genomics_research

# Move libraries to an importable path.
mv genomics_research/ml-based-copd/ genomics_research/ml_based_copd

# Install requirements.
pip install -r genomics_research/ml_based_copd/learning/requirements.txt

# Download the model checkpoints.
mkdir -p checkpoints
wget -O checkpoints/ml_based_copd_member_ckpts.zip https://github.com/Google-Health/genomics-research/releases/download/v0.2.0-ML-COPD/ml_based_copd_member_ckpts.zip
unzip -o checkpoints/ml_based_copd_member_ckpts.zip -d checkpoints

# Make a data directory.
mkdir -p data

In [None]:
# Add the ML-based COPD source code directory to the system path.
import sys

sys.path.append("genomics_research/ml_based_copd/learning")

## Import dependencies, load checkpoints and data, and run inference

In [None]:
import pathlib

import ml_collections

from genomics_research.ml_based_copd.learning import train
from genomics_research.ml_based_copd.learning import ukb_3066_demo_preprocessing
from genomics_research.ml_based_copd.learning.configs import resnet18_fv_copd

In [None]:
def load_demo_config(
    data_dir: pathlib.Path,
    version: str,
) -> ml_collections.ConfigDict:
  config = resnet18_fv_copd.get_config()
  # Apply config overrides for the local demo dataset.
  config.backbone_config.kwargs.input_names = 'flow_volume_pad_last'
  config.dataset_config.data_dir = str(data_dir)
  config.dataset_config.inputs = {'eid', 'flow_volume_pad_last'}
  config.dataset_config.use_feature_scaling = False
  config.dataset_config.version_suffix = version
  return config

In [None]:
# The directory in which mock data will be written.
DATA_DIR = pathlib.Path('/content/data')

# A dataset version suffix included in the dataset filepaths.
DATASET_VERSION = 'v00'

# The working directory containing model checkpoints.
WORK_DIR = '/content/checkpoints/ml_based_copd_member_ckpts/1'

# Write the mock demo dataset to the local filesystem.
ukb_3066_demo_preprocessing.write_demo_pkl(DATA_DIR, DATASET_VERSION)

# Load an experimental config parameterized for the demo dataset.
g_config = load_demo_config(DATA_DIR, DATASET_VERSION)

# Generate predictions using the model checkpoints.
train.predict(WORK_DIR, g_config)

In [None]:
%%bash
# Predictions are written to the working directory.
ls /content/checkpoints/ml_based_copd_member_ckpts/1
cat /content/checkpoints/ml_based_copd_member_ckpts/1/train_predictions.tsv
cat /content/checkpoints/ml_based_copd_member_ckpts/1/validation_predictions.tsv