# Lightweight Development Pipelines with DVC

In this notebook we will highlight important elements of DVC. You can find extensive information on their [website](https://dvc.org).

As a showcase we will implement a simple classification pipeline.

### Some Preparations
We create a new directory, copy some files and change the cwd.

In [None]:
%%bash
mkdir /workshop/workspace/dvc_intro -p
cp /workshop/notebooks/dvc/{DVC_exercise.py,deployment_location,dvc_introduction.py} /workshop/workspace/dvc_intro
cp -r /workshop/notebooks/dvc/exercise-dataset-dvc /workshop/workspace/dvc_intro

In [None]:
import os
os.chdir("/workshop/workspace/dvc_intro")

### Initialize Git

DVC works on top of git..

In [None]:
!git init

You might want to set your git configuration.

In [None]:
!git config --global user.email "you@example.com"
!git config --global user.name "Your Name"

### Initialize DVC

In [None]:
!dvc init -f

We can either add files to our versioning system by manually adding them or implicitly in a pipeline.

In [None]:
!dvc add exercise-dataset-dvc

Optional: We add a new remote storage (could be S3, GCS, SSH, ...)

In [None]:
!dvc remote add -d -f local_storage /tmp/dvc_introduction

In [None]:
!git status

In [None]:
!git add .

In [None]:
!git commit -m "initial commit"

Let's check our current status. Attention: DVC does not have a sophisticated git-like `stage area`, but a cache-directory, that is being synced with the remote.

In [None]:
!dvc status -c

In [None]:
!dvc push

### Building a Pipeline

In [None]:
!mkdir output-introduction -p

In [None]:
%%sh 
dvc run -n configure \
        -d dvc_introduction.py \
        -o output-introduction/config.pickle \
        python dvc_introduction.py configure output-introduction/config.pickle

In [None]:
%%sh 
dvc run -n train \
        -d dvc_introduction.py \
        -d output-introduction/config.pickle \
        -d /workshop/fruits \
        -o output-introduction/model.h5 \
        python dvc_introduction.py train_model /workshop/fruits output-introduction/config.pickle output-introduction/model.h5

In [None]:
%%sh 
dvc run -n export \
        -d dvc_introduction.py \
        -d output-introduction/model.h5 \
        -o models/fruits/ \
        python dvc_introduction.py export output-introduction/model.h5 models/fruits/

In [None]:
!git add .
!git commit -m "Add pipeline"

### Inspecting and Modifying a Pipeline 

In [None]:
!dvc dag

In [None]:
!dvc status -c

In [None]:
!dvc push

In [None]:
!dvc status -c

Let's modify a file and reproduce our pipeline!

In [None]:
!dvc status

In [None]:
!dvc repro

#### More Features

Get a file from another (external) git+DVC repository.

In [None]:
!dvc get https://github.com/iterative/example-get-started model.pkl

In [None]:
!rm model.pkl

Get a file *including* its .dvc file from another (external) git+DVC repository.

In [None]:
!dvc import https://github.com/iterative/example-get-started model.pkl

In [None]:
!cat model.pkl.dvc

#### Clean-up

In [None]:
import os
os.chdir("/workshop/notebooks/dvc")

In [None]:
%%sh
rm -rf /workshop/workspace/dvc_intro
rm -rf /tmp/dvc_introduction