# Lightweight pipelines with DVC

We will build a small pipeline with DVC in order to get started. The task is to classify images into either *lemons* or *bananas*.

The pipelines consists of 2 python functions:

1. preprocess(inputpath, outputpath), that processes images (convert to grayscale, resize to (100, 100))
1. classify(inputpath, outputpath), that classifies images and write the results into a JSON-File

Write the 2 functions and wrap a pipeline around them using DVC.

The best approach is to create a python file and implement the functions. Googles "fire" is an easy approach to invoke preprocess and classify.

Install fire:
!pip install fire

Use fire:

```python
import fire

...
...
...

if __name__ == '__main__':
  fire.Fire()
```

Invoke function with fire:

```bash
python <your file>.py preprocess exercise-dataset-dvc/image.jpg output/preprocessed.jpg
```

You can use the "%%sh" shell-magic to run shell commands in a cell. 

## Testing the functions

In [43]:
%%sh 
python DVC_exercise.py preprocess exercise-dataset-dvc/image.jpg output-exercise/test-processed.jpg

In [44]:
%%sh 
python DVC_exercise.py classify output-exercise/test-processed.jpg output-exercise/test-result.json

## Initialize DVC

In [45]:
!dvc init -f --no-scm

[31m+---------------------------------------------------------------------+
[39m[31m|[39m                                                                     [31m|[39m
[31m|[39m        DVC has enabled anonymous aggregate usage analytics.         [31m|[39m
[31m|[39m     Read the analytics documentation (and how to opt-out) here:     [31m|[39m
[31m|[39m              [34mhttps://dvc.org/doc/user-guide/analytics[39m               [31m|[39m
[31m|[39m                                                                     [31m|[39m
[31m+---------------------------------------------------------------------+
[39m
[33mWhat's next?[39m
[33m------------[39m
- Check out the documentation: [34mhttps://dvc.org/doc[39m
- Get help and share ideas: [34mhttps://dvc.org/chat[39m
- Star us on GitHub: [34mhttps://github.com/iterative/dvc[39m
[0m

## Invoke the functions

In [47]:
%%sh 
dvc run -d DVC_exercise.py \
        -d exercise-dataset-dvc/image.jpg \
        -o output-exercise/processed.jpg \
        python DVC_exercise.py preprocess exercise-dataset-dvc/image.jpg output-exercise/processed.jpg

Running command:
	python DVC_exercise.py preprocess exercise-dataset-dvc/image.jpg output-exercise/processed.jpg
Saving 'output-exercise/processed.jpg' to cache '.dvc/cache'.
Saving information to 'processed.jpg.dvc'.

To track the changes with git run:

	git add processed.jpg.dvc


In [48]:
%%sh 
dvc run -d DVC_exercise.py \
        -d output-exercise/processed.jpg \
        -o output-exercise/result.json \
        python DVC_exercise.py classify output-exercise/test-processed.jpg output-exercise/result.json

Running command:
	python DVC_exercise.py classify output-exercise/test-processed.jpg output-exercise/result.json
Saving 'output-exercise/result.json' to cache '.dvc/cache'.
Saving information to 'result.json.dvc'.

To track the changes with git run:

	git add result.json.dvc


## Look at the pipeline

In [49]:
%%sh
dvc pipeline show result.json.dvc --ascii --commands

+------------------------------------------------------------------------------------------------+ 
| python DVC_exercise.py preprocess exercise-dataset-dvc/image.jpg output-exercise/processed.jpg | 
+------------------------------------------------------------------------------------------------+ 
                                                 *                                                 
                                                 *                                                 
                                                 *                                                 
+------------------------------------------------------------------------------------------------+ 
| python DVC_exercise.py classify output-exercise/test-processed.jpg output-exercise/result.json | 
+------------------------------------------------------------------------------------------------+ 
