# Self-supervised learning using TAO NVDinoV2

Transfer learning is the process of transferring learned features from one application to another. It is a commonly used training technique where you use a model trained on one task and re-train to use it on a different task. 

Train Adapt Optimize (TAO) Toolkit  is a simple and easy-to-use Python based AI toolkit for taking purpose-built AI models and customizing them with users' own data.

<img align="center" src="https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/TAO/tlt-tao-toolkit-bring-your-own-model-diagram.png" width="1080">


## 0. Set up env variables and map drives <a class="anchor" id="head-0"></a>

The TAO launcher uses docker containers under the hood, and **for our data and results directory to be visible to the docker, they need to be mapped**. The launcher can be configured using the config file `~/.tao_mounts.json`. Apart from the mounts, you can also configure additional options like the Environment Variables and amount of Shared Memory available to the TAO launcher. <br>

`IMPORTANT NOTE:` The code below creates a sample `~/.tao_mounts.json`  file. Here, we can map directories in which we save the data, specs, results and cache. You should configure it for your specific case so these directories are correctly visible to the docker container.


In [None]:
import os

# Please define this local project directory that needs to be mapped to the TAO docker session.
%env LOCAL_PROJECT_DIR=FIXME

os.environ["HOST_DATA_DIR"] = os.path.join(os.getenv("LOCAL_PROJECT_DIR", os.getcwd()), "data")
os.environ["HOST_RESULTS_DIR"] = os.path.join(os.getenv("LOCAL_PROJECT_DIR", os.getcwd()), "results")
os.environ["HOST_MODEL_DIR"] = os.path.join(os.getenv("LOCAL_PROJECT_DIR", os.getcwd()), "model")

os.environ["HOST_SPECS_DIR"] = os.path.join(os.getenv("NOTEBOOK_ROOT", os.getcwd()), "specs")

In [None]:
! mkdir -p $HOST_DATA_DIR
! mkdir -p $HOST_SPECS_DIR
! mkdir -p $HOST_RESULTS_DIR
! mkdir -p $HOST_MODEL_DIR

In [None]:
# Mapping up the local directories to the TAO docker.
import json
import os
mounts_file = os.path.expanduser("~/.tao_mounts.json")
tlt_configs = {
   "Mounts":[
       # Mapping the data directory
       {
           "source": os.environ["LOCAL_PROJECT_DIR"],
           "destination": "/tao-pt/tao-experiments"
       },
       {
           "source": os.environ["HOST_DATA_DIR"],
           "destination": "/tao-pt/tao-experiments/data"
       },
       {
           "source": os.environ["HOST_SPECS_DIR"],
           "destination": "/tao-pt/tao-experiments/specs"
       },
       {
           "source": os.environ["HOST_RESULTS_DIR"],
           "destination": "/tao-pt/tao-experiments/results"
       }
   ],
   "DockerOptions": {
        "shm_size": "16G",
        "ulimits": {
            "memlock": -1,
            "stack": 67108864
         }
   }
}
# Writing the mounts file.
with open(mounts_file, "w") as mfile:
    json.dump(tlt_configs, mfile, indent=4)

In [None]:
!cat ~/.tao_mounts.json

## 1. What is NVDinoV2 ?

NVDinoV2 is a self-supervised learning foundation model for downstream task such as classification and segmentation. The self-supervised learning techniques could be powerful especially when the existing real dataset is not labeled for training downstream models.


## Learning Objectives

In this notebook, you will learn how to use TAO to `train`, `inference`, and `export` with NVDinoV2


## 2. Prepare Dataset <a class="anchor" id="head-2"></a>

In this example notebook, we will train an NVDinoV2 model using the cats images in `Cats and Dogs` training set. Since self-supervised learning does not require annotations, all training images should be placed in a single folder.

Here is the script to download `Cats and Dogs` dataset:

In [None]:
!wget https://www.dropbox.com/s/wml49yrtdo53mie/cats_dogs_dataset_reorg.zip?dl=0 -O cats_dogs_dataset.zip
!unzip -qo cats_dogs_dataset.zip -d $HOST_DATA_DIR/

## 3. Run TAO train

### 3.0 Provide training specification <a class="anchor" id="head-2"></a>

We provide specification files to configure the training process. Please ensure you update the following settings to suit your environment:  

1. **`results_dir`**: Update this field if the default path is not suitable for your setup.  
2. **Dataset Paths**: Modify the `images_dir` under `train_dataset` to point to **your** dataset files as outlined in **Section 2**.


### 3.1 Training the Pretrained NVDinoV2

In the specification file, the `pretrained_model_path` key is set to Pretrained NVDinoV2 ([Link](https://catalog.ngc.nvidia.com/orgs/nvaie/models/nv_dinov2_classification_model)). The training dataset used is the `COCO2017` training set.

In [None]:
! cat specs/train_spec.yaml

In [None]:
!tao model nvdinov2 train \
-e /tao-pt/tao-experiments/specs/train_spec.yaml \
dataset.train_dataset.images_dir=/tao-pt/tao-experiments/data/cats_dogs_dataset/training_set/training_set/cats  \
train.num_epochs=3 \
train.pretrained_model_path=null # Set to null for training from scratch, or provide the path to a pretrained NVDINOv2 model for SSL fine-tuning.
    

## 4. Run TAO Infernce <a class="anchor" id="head-3"></a>

In this section, we run the `nvdinov2` inference script to assess the embeddings of the trained NVDinoV2 model. We set the `checkpoint` path in the `inference` section to the location of the `teacher_step_000000100.pth` checkpoint file.

In [None]:
! cat specs/inference_spec.yaml

In [None]:
!tao model nvdinov2 inference \
-e /tao-pt/tao-experiments/specs/inference_spec.yaml \
dataset.test_dataset.images_dir=/tao-pt/tao-experiments/data/cats_dogs_dataset/training_set/training_set/cats  \
inference.checkpoint=/tao-pt/tao-experiments/results/nvdinov2/train/teacher_epoch_002_step_00600.pth

You can visualize the embedding from `results` directory. The default path is `results/nvdinov2/inference/`

## 5. Run TAO Export <a class="anchor" id="head-3"></a>

In this section, we run the `nvdinov2` export script to assess the embeddings of the trained NVDinoV2 model. We set the `checkpoint` path in the `export` section to the location of the `teacher_step_000000100.pth` checkpoint file.

In [None]:
! cat specs/export_spec.yaml

In [None]:
!tao model nvdinov2 export \
-e /tao-pt/tao-experiments/specs/export_spec.yaml \
export.checkpoint=/tao-pt/tao-experiments/results/nvdinov2/train/teacher_epoch_002_step_00600.pth

You can see the onnx file in `results` directory. The default path is `results/nvdinov2/export/`

This notebook has come to an end.