# Versioning Example

###  This notebook contains two parts 
1. Part1: Versioning with dvc
2. Part2: Working with versioned dvc repositories 

### PART1: MODEL VERSIONING WITH DVC

Let's build a model to classify dogs and cats based on the images

We first train a classifier model using 1000 labeled images, then we double the number of images (2000) and retrain our model. We capture both datasets and classifier results and show how to use dvc checkout to switch between workspace versions.

Downloading the code and seting up a Git repository:
Code is from a tutorial of François Chollet

In [None]:
!git clone https://github.com/iterative/example-versioning.git
cd example-versioning

Creata a virtual environment and install the requirements(YOu can creata a conda environment instead of this.

In [None]:
!python3 -m venv .env
!source .env/bin/activate
!pip install -r requirements.txt

#### First model version

Adding data

In [None]:
!dvc get https://github.com/iterative/dataset-registry \
          tutorials/versioning/data.zip
!unzip -q data.zip
!rm -f data.zip

Capturing the current state of this dataset with dvc add

In [None]:
!dvc add data

Training and adding the trained model with dvc

In [None]:
!python train.py
!dvc add model.h5

Commiting the current state

In [None]:
!git add data.dvc model.h5.dvc metrics.csv .gitignore
!git commit -m "First model, trained with 1000 images"
!git tag -a "v1.0" -m "model v1.0, 1000 images"

#### Second model version

In [None]:
extracts 500 new cat images and 500 new dog images into data/train

In [None]:
!dvc get https://github.com/iterative/dataset-registry \
          tutorials/versioning/new-labels.zip
!unzip -q new-labels.zip
!rm -f new-labels.zip

Leverage these new labels and retrain the mode

In [None]:
!dvc add data
!python train.py
!dvc add model.h5

In [None]:
!git add data.dvc model.h5.dvc metrics.csv
!git commit -m "Second model, trained with 2000 images"
!git tag -a "v2.0" -m "model v2.0, 2000 images"

To switch between workspace versions you can use;

In [None]:
git checkout v1.0
$ dvc checkout

Pushing the dvc data to a remote repository
1. We have a google drive folder at https://drive.google.com/drive/folders/15kNDc1-SchSZsb1l-rt0ymG0nG7SLqe5?usp=sharing
2. We can use a drive folder like this to pull the dvc tracked files and cache

In [None]:
!dvc remote add --default myremote gdrive://15kNDc1-SchSZsb1l-rt0ymG0nG7SLqe5
!dvc push

Now we have all the dvc tracked versions and files in the google drive

### PART2: Working with versioned dvc repositories

A versioned repository of PART1 is available at https://github.com/Dinithipurna/example-versioning.git

Let's clone that and see how we can get the versioning on track again

In [None]:
!git clone https://github.com/Dinithipurna/example-versioning.git

In [None]:
#have the same remote dvc storage configured
!dvc remote add --default myremote gdrive://15kNDc1-SchSZsb1l-rt0ymG0nG7SLqe5

In [None]:
!dvc fetch
!dvc pull

now you will get the data files downloaded to your new repository with all the dvc versioning data

In [11]:
pwd

'/Users/macbook/Documents/GitHub/example-versioning'

In [12]:
!dvc checkout

zsh:1: command not found: dvc
