This notebook shows how to use [dvc](https://dvc.org/) [experiments](https://github.com/iterative/dvc/wiki/Experiments) in model development. This example uses the [MNIST](http://yann.lecun.com/exdb/mnist/) data of handwritten digits and builds a classification model to predict the digit (0-9) in each image. The model is built in [pytorch](https://pytorch.org/) as convolutional neural network with a simplified architecture, which should be able to quickly run on most computers.

### Getting started

To get started, clone this repository and navigate to it.

The only other prerequisite is [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/). Once conda is installed, create a virtual environment from the existing `environment.yaml` file and activate it:

```bash
conda env create -f environment.yml
conda activate dvc
```

If you want to run this notebook directly, do so after activating the conda environment.

Finally, enable the experiments feature:

In [7]:
%%bash
dvc config --global core.experiments true

### Establishing the pipeline DAG

Before experimenting, a dvc pipeline must be established (see the docs if you are new to dvc). Review the contents of `dvc.yaml` below to see the pipeline.

In [1]:
%%bash
cat dvc.yaml

stages:
  download:
    cmd: python download.py
    deps:
    - download.py
    outs:
    - data/MNIST
  train:
    cmd: python train.py --model_path=model.pt --metrics_path=metrics.yaml
    deps:
    - data/MNIST
    - train.py
    params:
    - lr
    - weight_decay
    outs:
    - model.pt
    metrics:
    - metrics.yaml


The download stage gets the data using the `download.py` script. The train stage performs model training and evaluation on the downloaded data using the `train.py` script. The train stage uses the lr and weight_decay metrics defined in `params.yaml`. The model output is saved to `model.pt`, and the metrics are saved to `metrics.yaml`. The train_checkpoint stage is similar but saves output periodically.

Execute the pipeline to reproduce the train stage:

In [2]:
%%bash
dvc repro train

Stage 'download' didn't change, skipping
Stage 'train' is cached - skipping run, checking out outputs
Updating lock file 'dvc.lock'

To track the changes with git, run:

	git add dvc.lock


**IMPORTANT:** Be sure to run the `git add` command above and also `git commit` before running experiments. Anytime you modify the pipeline, be sure to `dvc repro` and track changes with git before running experiments.

In [3]:
%%bash
git add data/.gitignore dvc.lock
git commit -m "download data"

[no-checkpoint 9e408f6] download data
 1 file changed, 3 insertions(+), 3 deletions(-)


### Run an experiment

Run an experiment with the default parameters defined in `params.yaml`.

In [4]:
%%bash
dvc exp run train

Reproducing existing experiment '9e408f6'.
Stage 'download' didn't change, skipping
Stage 'train' didn't change, skipping


Review the output of the run, including identifying hashes, metrics, and parameters:

In [5]:
%%bash
dvc exp show

┏━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Experiment    ┃ Created  ┃    acc ┃   loss ┃ lr    ┃ weight_decay ┃
┡━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━┩
│ workspace     │ -        │ 0.4461 │ 2.2074 │ 0.001 │ 0            │
│ no-checkpoint │ 12:00 PM │ 0.4461 │ 2.2074 │ 0.001 │ 0            │
└───────────────┴──────────┴────────┴────────┴───────┴──────────────┘


### Experimenting with different parameters

Experiments can be run and compared with different parameters.

In [6]:
%%bash
dvc exp run train --params weight_decay=0.1

Stage 'download' didn't change, skipping
Running stage 'train' with command:
	python train.py --model_path=model.pt --metrics_path=metrics.yaml
Updating lock file 'dvc.lock'
Reproduced experiment 'c19c733'.


In [8]:
%%bash
dvc exp show

┏━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Experiment    ┃ Created  ┃    acc ┃   loss ┃ lr    ┃ weight_decay ┃
┡━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━┩
│ workspace     │ -        │ 0.1533 │ 2.2986 │ 0.001 │ 0.1          │
│ no-checkpoint │ 12:00 PM │ 0.4461 │ 2.2074 │ 0.001 │ 0            │
│ └── c19c733   │ 12:00 PM │ 0.1533 │ 2.2986 │ 0.001 │ 0.1          │
└───────────────┴──────────┴────────┴────────┴───────┴──────────────┘


Increasing weight_decay didn't help, so revert back to original parameters:

In [9]:
%%bash
git checkout params.yaml

Experiments can also be added in bulk to the queue and executed on demand (see the -j flag for parallel execution!).