This notebook shows how to use [dvc](https://dvc.org/) [experiments](https://github.com/iterative/dvc/wiki/Experiments) in model development. This example uses the [MNIST](http://yann.lecun.com/exdb/mnist/) data of handwritten digits and builds a classification model to predict the digit (0-9) in each image. The model is built in [pytorch](https://pytorch.org/) as convolutional neural network with a simplified architecture, which should be able to quickly run on most computers.

### Getting started

To get started, clone this repository and navigate to it.

The only other prerequisite is [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/). Once conda is installed, create a virtual environment from the existing `environment.yaml` file and activate it:

```bash
conda env create -f environment.yml
conda activate dvc
```

If you want to run this notebook directly, do so after activating the conda environment.

### Establishing the pipeline DAG

Before experimenting, a dvc pipeline must be established (see the docs if you are new to dvc). Review the contents of `dvc.yaml` below to see the pipeline.

In [6]:
%%bash
cat dvc.yaml

stages:
  download:
    cmd: python download.py
    deps:
    - download.py
    outs:
    - data/MNIST
  train:
    cmd: python train.py
    deps:
    - data/MNIST
    - train.py
    params:
    - lr
    - weight_decay
    outs:
    - model.pt:
        checkpoint: true
    metrics:
    - metrics.yaml


The download stage gets the data using the `download.py` script. The train stage performs model training and evaluation on the downloaded data using the `train.py` script. The train stage uses the lr and weight_decay metrics defined in `params.yaml`. The model output is saved to `model.pt`, and the metrics are saved to `metrics.yaml`.

Execute the download stage to get the data.

In [3]:
%%bash
dvc repro download

Running stage 'download' with command:
	python download.py
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to data/MNIST/raw/train-images-idx3-ubyte.gz
Extracting data/MNIST/raw/train-images-idx3-ubyte.gz to data/MNIST/raw
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to data/MNIST/raw/train-labels-idx1-ubyte.gz
Extracting data/MNIST/raw/train-labels-idx1-ubyte.gz to data/MNIST/raw
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to data/MNIST/raw/t10k-images-idx3-ubyte.gz
Extracting data/MNIST/raw/t10k-images-idx3-ubyte.gz to data/MNIST/raw
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to data/MNIST/raw/t10k-labels-idx1-ubyte.gz
Extracting data/MNIST/raw/t10k-labels-idx1-ubyte.gz to data/MNIST/raw
Processing...
Done!
Generating lock file 'dvc.lock'
Updating lock file 'dvc.lock'

To track the changes with git, run:

	git add dvc.lock data/.gitignore
Use `dvc push` to send your updates to re

0it [00:00, ?it/s]  0%|          | 0/9912422 [00:00<?, ?it/s]  3%|▎         | 294912/9912422 [00:00<00:03, 2945983.18it/s]  8%|▊         | 802816/9912422 [00:00<00:02, 3318779.95it/s] 14%|█▎        | 1343488/9912422 [00:00<00:02, 3744390.79it/s] 19%|█▉        | 1916928/9912422 [00:00<00:01, 4161020.24it/s] 23%|██▎       | 2301952/9912422 [00:00<00:01, 3899405.30it/s] 28%|██▊       | 2785280/9912422 [00:00<00:01, 4122136.16it/s] 34%|███▎      | 3342336/9912422 [00:00<00:01, 4458372.23it/s] 39%|███▉      | 3891200/9912422 [00:00<00:01, 4681247.52it/s] 45%|████▍     | 4431872/9912422 [00:01<00:01, 4854567.48it/s] 51%|█████     | 5054464/9912422 [00:01<00:00, 5188470.11it/s] 57%|█████▋    | 5611520/9912422 [00:01<00:00, 5257168.27it/s] 62%|██████▏   | 6144000/9912422 [00:01<00:00, 4945588.32it/s] 67%|██████▋   | 6651904/9912422 [00:01<00:00, 4568875.90it/s] 72%|███████▏  | 7127040/9912422 [00:01<00:00, 4305785.39it/s] 76%|███████▋  | 7569408/9912422 [00:01<00:00, 4250571.1

**IMPORTANT:** Be sure to run the `git add` command above and also `git commit` before running experiments. Anytime you modify the pipeline, be sure to `dvc repro` and track changes with git before running experiments.

In [16]:
%%bash
git add dvc.lock data/.gitignore
git commit -m "download data"

[exp fdc7e66] download data
 2 files changed, 12 insertions(+)
 create mode 100644 data/.gitignore
 create mode 100644 dvc.lock


### Experimenting with parameters

Now that we have a pipeline to train the model, let's try out various parameters. We can start with the defaults defined in `params.yaml`.

In [17]:
%%bash
dvc exp run

Stage 'download' didn't change, skipping
Running stage 'train' with command:
	python train.py
Updating lock file 'dvc.lock'
Checkpoint experiment iteration '52d08d8'.
Updating lock file 'dvc.lock'
Checkpoint experiment iteration '0952fe3'.
Reproduced experiment '0952fe3'.
