# Task: Build a simple pipeline with DVC

In this task we want to build a simple DVC pipeline. The elements of the pipeline will classify images into either *lemons* or *bananas*.

The pipelines consists of 2 python functions and one shell command for deployment:

1. `preprocess(inputpath, outputpath)`, that processes images (convert to grayscale, resize to (100, 100)). `inputpath` is the location of the input image, and `outputpath` the location of the preprocessed image. 
2. `classify(inputpath, outputpath)`, that classifies images and write the results into a JSON file. `inputpath` is the location of the preprocessed image, and `outputpath` the location of the JSON file.
3. `"cat deployment_location | xargs -I% cp jsonfile %"`. `deployment_location` is a file containing the location we want to deploy to, and `jsonfile` should be the location of the JSON file.

We have already provided the python functions for you. You can find them in the file `DVC_exercise.py`. Your mission is to wrap a pipeline around them using DVC.

You can call the functions from the shell via `python <file> <function name> <input params>`.

Remember that you can run shell commands in a Jupyter notebook by starting a line in a cell with `!` or putting `%%sh` at the beginning of a cell.

#### Initialise DVC

Please use `--no-scm` to avoid problems with this git repository.

In [None]:
!dvc init -f --no-scm

#### Define first pipeline stage

In [None]:
!dvc run -f preprocess.dvc -d DVC_exercise.py -d exercise-dataset-dvc/image.jpg -o preprocessed.jpg \
         python DVC_exercise.py preprocess exercise-dataset-dvc/image.jpg preprocessed.jpg

#### Define second pipeline stage

In [None]:
!dvc run -f classify.dvc -d DVC_exercise.py -d preprocessed.jpg -o result.json \
         python DVC_exercise.py classify preprocessed.jpg result.json

#### Define third pipeline stage
Tip: Don't specify the name of the .dvc file in the last stage. DVC will then use the default name DVCfile, which is handy for later reproduction.

In [None]:
!dvc run -d result.json -d deployment_location \
         "cat deployment_location | xargs -I% cp result.json %"

#### Check your pipeline here:

In [None]:
%%sh
dvc pipeline show --ascii

#### Modify the deployment location and reproduce the pipeline
We want to change our pipeline slightly. Our customer would like the classified fruit to be deployed (copied) to `./output-exercise` now.

In [None]:
# change desired deployment location
!echo output-exercise > deployment_location

In [None]:
!dvc repro

In [None]:
# debug code
!rm -rf *.dvc