# Lightweight Development Pipelines with DVC

In this notebook we will highlight important elements of DVC. You can find extensive information for dvc on their [website](https://dvc.org).

As a showcase we will implement a simple classification pipeline.

### Some Preparations

In [None]:
# --no-scm because we don't want to interfere with the workshops' git
!dvc init -f --no-scm

Optional: We add a new remote storage (could be S3, GCS, SSH, ...)

In [None]:
!dvc remote add -d -f local_storage /tmp/dvc_introduction

Let's check our current status. Attention: DVC does not have a sophisticated git-like `stage area`, but a cache-directory, that is being synced with the remote.

In [None]:
!dvc status -c

That wasn't too surprising...

We can either add files to our DVC versioning by manually adding them or implicitly in a pipeline.

### Building a Pipeline

In [None]:
%%sh 
dvc run -f configure.dvc \
        -d dvc_introduction.py \
        -o output-introduction/config.pickle \
        python dvc_introduction.py configure output-introduction/config.pickle

In [None]:
%%sh 
dvc run -f train.dvc \
        -d dvc_introduction.py \
        -d output-introduction/config.pickle \
        -d ../00-datasets/iris.data.csv \
        -o output-introduction/model \
        python dvc_introduction.py train_model ../00-datasets/iris.data.csv \
                                               output-introduction/config.pickle \
                                               output-introduction/model

In [None]:
%%sh 
dvc run -f Dvcfile \
        -d dvc_introduction.py \
        -d output-introduction/model \
        -O ../04-models/iris/2 \
        python dvc_introduction.py export output-introduction/model ../04-models/iris/2

### Inspecting and Modifying a Pipeline 

In [None]:
%%sh 
dvc pipeline show --ascii

In [None]:
!dvc status -c

In [None]:
!dvc push

In [None]:
!dvc repro

Let's modify a file and reproduce our pipeline!

#### New Features

Get a file from another (external) git+DVC repository.

In [None]:
!dvc import https://github.com/iterative/example-get-started model.pkl

In [None]:
!dvc get https://github.com/iterative/example-get-started model.pkl

In [None]:
!rm model.pkl

Get a file *including* its .dvc file from another (external) git+DVC repository.

#### New Features in real life

#### Metrics and evaluation
Metrics can be used to track scores and evaluations over all branches

```bash
$ dvc metrics show --all-branches
experiment1:
    metrics.json: {"loss": 0.0012, "accuracy": 0.9765}
experiment2:
    metrics.json: {"loss": 0.0010, "accuracy": 0.9865}
working tree:
    metrics.json: {"loss": 0.0010, "accuracy": 0.9865}
```


#### Releasing and Deployment with git tags
Git tags can be used to keep track over releases:

```bash
$ git checkout master
$ git merge experiment2
$ git tag -a release/0.1 -m "0.1 release"
```

And use DVC get to download the release (e.g. using a deploy job)
```bash
$ GIT_REPO=...
$ dvc get --rev release/0.1 $GIT_REPO model.h5
```

Even metrics can be used to get an overview over the releases and their performance:

```bash
$ dvc metrics show -T
release/0.1:
    metrics.json: {"loss": 0.0112, "accuracy": 0.9865}
working tree:
    metrics.json: {"loss": 0.0112, "accuracy": 0.9865}
```

#### Debug only - pls ignore :-)

In [None]:
%%sh
rm -rf .dvc
rm -rf *.dvc
rm Dvcfile
rm -rf /tmp/dvc_introduction