# Part 2 : Data versionning using DVC

## Chapter 1 : Getting started with DVC

This is the second part of the tutorial on data versionning using [DVC](https://dvc.org).

In this first chapter, we will install DVC and take a look at how DVC works with GIT in its most simple usage.

### Install DVC

First, we need to install [DVC](https://dvc.org). There are various ways to install it depending on you OS, which you can browse [here](https://dvc.org/doc/install). For example, on MacOS, you can install it with `brew`, `conda`, or `pip`.

If you are following this tutorial on your own machine, chose the option that makes the most sense. If you are following on the notebook, we will install DVC with `pip`:

In [1]:
! pip install dvc



We can check that DVC is installed:

In [2]:
! which dvc

/Users/nicolas.gensollen/opt/anaconda3/envs/now/bin/dvc


In [3]:
! dvc --version

3.27.0
[0m

### Initialize a DVC repository

Now that we have DVC installed we can start using it !

First of all, it is very important to understand that DVC is not a replacement for GIT. It is a tool designed to work WITH GIT as it solves a different problem than GIT.

In other words, you need both GIT and DVC to manage both code and data.

To initialize a DVC repository, we need to be in a GIT-initialized repository, so let's do that:

In [8]:
! git init

Initialized empty Git repository in /Users/nicolas.gensollen/GitRepos/NOW-2023/notebooks/.git/


You can check that a `.git` hidden folder was created:

In [9]:
! ls -lah

total 40
drwxr-xr-x   7 nicolas.gensollen  10513   224B 26 oct 14:14 [1m[36m.[m[m
drwxr-xr-x  19 nicolas.gensollen  10513   608B 25 oct 16:35 [1m[36m..[m[m
-rw-r--r--   1 nicolas.gensollen  10513   139B 26 oct 14:06 .dvcignore
drwxr-xr-x   9 nicolas.gensollen  10513   288B 26 oct 14:14 [1m[36m.git[m[m
drwxr-xr-x   5 nicolas.gensollen  10513   160B 25 oct 12:51 [1m[36m.ipynb_checkpoints[m[m
-rw-r--r--   1 nicolas.gensollen  10513   2,0K 25 oct 12:50 code_versionning.ipynb
-rw-r--r--   1 nicolas.gensollen  10513   8,9K 26 oct 14:12 data_versionning.ipynb


Now we can initialize the DVC repository:

In [10]:
! dvc init

Initialized DVC repository.

You can now commit the changes to git.

[31m+---------------------------------------------------------------------+
[0m[31m|[0m                                                                     [31m|[0m
[31m|[0m        DVC has enabled anonymous aggregate usage analytics.         [31m|[0m
[31m|[0m     Read the analytics documentation (and how to opt-out) here:     [31m|[0m
[31m|[0m             <[36mhttps://dvc.org/doc/user-guide/analytics[39m>              [31m|[0m
[31m|[0m                                                                     [31m|[0m
[31m+---------------------------------------------------------------------+
[0m
[33mWhat's next?[39m
[33m------------[39m
- Check out the documentation: <[36mhttps://dvc.org/doc[39m>
- Get help and share ideas: <[36mhttps://dvc.org/chat[39m>
- Star us on GitHub: <[36mhttps://github.com/iterative/dvc[39m>
[0m

In the exact same way as for `git init`, `dvc init` created a hidden folder named `.dvc`:

In [11]:
! ls -lah

total 72
drwxr-xr-x   8 nicolas.gensollen  10513   256B 26 oct 14:14 [1m[36m.[m[m
drwxr-xr-x  19 nicolas.gensollen  10513   608B 25 oct 16:35 [1m[36m..[m[m
drwxr-xr-x   5 nicolas.gensollen  10513   160B 26 oct 14:14 [1m[36m.dvc[m[m
-rw-r--r--   1 nicolas.gensollen  10513   139B 26 oct 14:06 .dvcignore
drwxr-xr-x  10 nicolas.gensollen  10513   320B 26 oct 14:14 [1m[36m.git[m[m
drwxr-xr-x   5 nicolas.gensollen  10513   160B 25 oct 12:51 [1m[36m.ipynb_checkpoints[m[m
-rw-r--r--   1 nicolas.gensollen  10513   2,0K 25 oct 12:50 code_versionning.ipynb
-rw-r--r--   1 nicolas.gensollen  10513    26K 26 oct 14:14 data_versionning.ipynb


In addition to this, DVC created a few files for us. To see that, we can use the `git status` command since we have a git repository:

In [12]:
! git status

On branch main

No commits yet

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)
	[32mnew file:   .dvc/.gitignore[m
	[32mnew file:   .dvc/config[m
	[32mnew file:   .dvcignore[m

Untracked files:
  (use "git add <file>..." to include in what will be committed)
	[31m.ipynb_checkpoints/[m
	[31mcode_versionning.ipynb[m
	[31mdata_versionning.ipynb[m



As we can see, DVC created 2 files in the `.dvc` folder as well as one file in the current workspace :

- `.dvc/.gitignore`
- `.dvc/config`
- `.dvcignore`

These files need to be versionned with GIT, DVC already added them to the stagging aread, so all we need to do is commit them:

In [13]:
! git commit -m "initialize DVC"

[main (root-commit) 8867a0a] initialize DVC
 3 files changed, 6 insertions(+)
 create mode 100644 .dvc/.gitignore
 create mode 100644 .dvc/config
 create mode 100644 .dvcignore


In [14]:
! git log

[33mcommit 8867a0ac229b70fc5b14ec7502779cdb6d91a3a1[m[33m ([m[1;36mHEAD -> [m[1;32mmain[m[33m)[m
Author: NicolasGensollen <nicolas.gensollen@gmail.com>
Date:   Thu Oct 26 14:20:03 2023 +0200

    initialize DVC


And that's it, we have successfully initialized a DVC repository and we are now ready to track some code and data !

### Track code and data