# Self_supervised training with VISSL

## Generate custom dataset

Note: (1) do not change the name of these folders: "train" and "val"; (2) put all your images into "train/label1" and "train/label2" folders in any split.

```
path/to/your/dataset
├──train
├  ├── label1/
├  ├    ├── images1.jpg
├  ├    ├── images2.jpg
├  ├
├  └── label2/
├       ├── images1.jpg
├       ├── images2.jpg
├
├──val (leave it empty)
   ├── label1/
   ├    ├── images1.jpg
   ├    ├── images2.jpg
   ├
   └── label2/
       ├── images1.jpg
       ├── images2.jpg
```

Load custom dataset

(1) Modify the custom dataset path in **"tools/run_distributed_engines.py"** file;

(2) Add the project root path in **"tools/run_distributed_engines.py"** file.

In [None]:
# (1) Modify the custom dataset path in the below code in "tools/run_distributed_engines.py" file;

from vissl.data.dataset_catalog import VisslDatasetCatalog

train_path="/workspace/Data/pre_train_500k/train"
val_path="/workspace/Data/pre_train_500k/val"
VisslDatasetCatalog.register_data(name="Flux", data_dict={"train": train_path, "test": val_path})

In [None]:
# (2) Add the project root path in the second code in "tools/run_distributed_engines.py" file.

import sys

sys.path.append('/workspace/Project/deep_plastic_Flux_SSL')

## SwAV

Steps:

(1) Pretrained ResNet50 or ResNet101 on ImageNet-1k dataset (1k categories, 1.2 million images); The weights can be downloaded from: 

https://dl.fbaipublicfiles.com/detectron2/ImageNetPretrained/MSRA/R-50.pkl

https://dl.fbaipublicfiles.com/detectron2/ImageNetPretrained/MSRA/R-101.pkl

(2) Modify the hyperparameters in **"pretrain/simclr/XXX.yaml"** file if needed, e.g., data augmentation;

(3) Modify the hyperparameters in the below codes, e.g., dataset name, train data path, batch size, epoches, checkpoint output path, pre-trained model weights path, and fine-tune strategy (here: fine-tune all layers of the backbone);

Note: select lr.lengths according to No.epochs

lengths: [0.1, 0.9]   # 100ep

lengths: [0.05, 0.95]  # 200ep

lengths: [0.025, 0.975]   # 400ep

lengths: [0.0125, 0.9875]   # 800ep

lengths: [0.0128, 0.9872]    # 1ep IG-1B

lengths: [0.00641, 0.99359]    # 2ep IG-1B

lengths: [0.002563, 0.997437]   # 5ep IG-1B = 50 ep IG-100M

(4) Train the full model (all layers) on the custom dataset

(5) change the number of GPU in **"config.DISTRIBUTED.NUM_PROC_PER_NODE"**

In [None]:
# The ImageNet pretrained weights are here:

# /workspace/Project/deep_plastic_Flux_SSL/checkpoint/pretrained_model/R-50.pkl
# /workspace/Project/deep_plastic_Flux_SSL/checkpoint/pretrained_model/R-101.pkl

#### (1) ResNet50

In [None]:
# SwAV on 25K
# backbone: RN50
# Pretrained on ImageNet, and fine tune all layers (FTAL)

!python tools/run_distributed_engines_25K.py \
  hydra.verbose=true \
  config=pretrain/swav/swav_1_gpu_resnet50_flux.yaml \
  config.DATA.TRAIN.DATASET_NAMES=[Flux] \
  config.DATA.TRAIN.DATA_SOURCES=[disk_folder] \
  config.DATA.TRAIN.DATA_PATHS=["/home/tjian/Data/Flux/images_pretrain_25K/train"] \
  config.DATA.TRAIN.BATCHSIZE_PER_REPLICA=64 \
  config.OPTIMIZER.num_epochs=100 \
  config.OPTIMIZER.param_schedulers.lr.lengths="[0.1, 0.9]" \
  config.CHECKPOINT.DIR="/scratch/tjian/PythonProject/deep_plastic_Flux_SSL/checkpoint/train_weights/RN50_25K_100e/vissl" \
  config.HOOKS.TENSORBOARD_SETUP.USE_TENSORBOARD=False \
  config.WEIGHTS_INIT.PARAMS_FILE="/workspace/Project/deep_plastic_Flux_SSL/checkpoint/pretrained_model/R-50.pkl" \
  config.WEIGHTS_INIT.APPEND_PREFIX="trunk._feature_blocks." \
  config.DISTRIBUTED.NUM_PROC_PER_NODE=1


In [None]:
# SwAV on 50K
# backbone: RN50
# Pretrained on ImageNet, and fine tune all layers (FTAL)

!python tools/run_distributed_engines_50K.py \
  hydra.verbose=true \
  config=pretrain/swav/swav_1_gpu_resnet50_flux.yaml \
  config.DATA.TRAIN.DATASET_NAMES=[Flux] \
  config.DATA.TRAIN.DATA_SOURCES=[disk_folder] \
  config.DATA.TRAIN.DATA_PATHS=["/scratch/tjian/Data/Flux/images_pretrain_50K/train"] \
  config.DATA.TRAIN.BATCHSIZE_PER_REPLICA=64 \
  config.OPTIMIZER.num_epochs=100 \
  config.OPTIMIZER.param_schedulers.lr.lengths="[0.1, 0.9]" \
  config.CHECKPOINT.DIR="/scratch/tjian/PythonProject/deep_plastic_Flux_SSL/checkpoint/train_weights/RN50_50K_100e/vissl" \
  config.HOOKS.TENSORBOARD_SETUP.USE_TENSORBOARD=False \
  config.WEIGHTS_INIT.PARAMS_FILE="/workspace/Project/deep_plastic_Flux_SSL/checkpoint/pretrained_model/R-50.pkl" \
  config.WEIGHTS_INIT.APPEND_PREFIX="trunk._feature_blocks." \
  config.DISTRIBUTED.NUM_PROC_PER_NODE=1


In [None]:
# SwAV on 200K
# backbone: RN50
# Pretrained on ImageNet, and fine tune all layers (FTAL)

!python tools/run_distributed_engines_200K.py \
  hydra.verbose=true \
  config=pretrain/swav/swav_1_gpu_resnet50_flux.yaml \
  config.DATA.TRAIN.DATASET_NAMES=[Flux] \
  config.DATA.TRAIN.DATA_SOURCES=[disk_folder] \
  config.DATA.TRAIN.DATA_PATHS=["/home/tjian/Data/Flux/images_pretrain_200K/train"] \
  config.DATA.TRAIN.BATCHSIZE_PER_REPLICA=64 \
  config.OPTIMIZER.num_epochs=100 \
  config.OPTIMIZER.param_schedulers.lr.lengths="[0.1, 0.9]" \
  config.CHECKPOINT.DIR="/scratch/tjian/PythonProject/deep_plastic_Flux_SSL/checkpoint/train_weights/RN50_200K_100e/vissl" \
  config.HOOKS.TENSORBOARD_SETUP.USE_TENSORBOARD=False \
  config.WEIGHTS_INIT.PARAMS_FILE="/workspace/Project/deep_plastic_Flux_SSL/checkpoint/pretrained_model/R-50.pkl" \
  config.WEIGHTS_INIT.APPEND_PREFIX="trunk._feature_blocks." \
  config.DISTRIBUTED.NUM_PROC_PER_NODE=1


In [None]:
# SwAV on 100K
# backbone: RN50
# Pretrained on ImageNet, and fine tune all layers (FTAL)

!python tools/run_distributed_engines_100K.py \
  hydra.verbose=true \
  config=pretrain/swav/swav_1_gpu_resnet50_flux.yaml \
  config.DATA.TRAIN.DATASET_NAMES=[Flux] \
  config.DATA.TRAIN.DATA_SOURCES=[disk_folder] \
  config.DATA.TRAIN.DATA_PATHS=["/scratch/tjian/Data/Flux/images_pretrain_100K/train"] \
  config.DATA.TRAIN.BATCHSIZE_PER_REPLICA=64 \
  config.OPTIMIZER.num_epochs=100 \
  config.OPTIMIZER.param_schedulers.lr.lengths="[0.1, 0.9]" \
  config.CHECKPOINT.DIR="/scratch/tjian/PythonProject/deep_plastic_Flux_SSL/checkpoint/train_weights/RN50_100K_100e/vissl" \
  config.HOOKS.TENSORBOARD_SETUP.USE_TENSORBOARD=False \
  config.WEIGHTS_INIT.PARAMS_FILE="/workspace/Project/deep_plastic_Flux_SSL/checkpoint/pretrained_model/R-50.pkl" \
  config.WEIGHTS_INIT.APPEND_PREFIX="trunk._feature_blocks." \
  config.DISTRIBUTED.NUM_PROC_PER_NODE=1


In [None]:
# SwAV on 300K
# backbone: RN50
# Pretrained on ImageNet, and fine tune all layers (FTAL)

!python tools/run_distributed_engines_300K.py \
  hydra.verbose=true \
  config=pretrain/swav/swav_1_gpu_resnet50_flux.yaml \
  config.DATA.TRAIN.DATASET_NAMES=[Flux] \
  config.DATA.TRAIN.DATA_SOURCES=[disk_folder] \
  config.DATA.TRAIN.DATA_PATHS=["/scratch/tjian/Data/Flux/images_pretrain_300K/train"] \
  config.DATA.TRAIN.BATCHSIZE_PER_REPLICA=64 \
  config.OPTIMIZER.num_epochs=200 \
  config.OPTIMIZER.param_schedulers.lr.lengths="[0.05, 0.95]" \
  config.CHECKPOINT.DIR="/scratch/tjian/PythonProject/deep_plastic_Flux_SSL/checkpoint/train_weights/RN50_300K/vissl_100_to" \
  config.HOOKS.TENSORBOARD_SETUP.USE_TENSORBOARD=False \
  config.WEIGHTS_INIT.PARAMS_FILE="/scratch/tjian/PythonProject/deep_plastic_Flux_SSL/checkpoint/train_weights/RN50_300K/vissl_100e/model_final_checkpoint_phase49.torch" \
  config.WEIGHTS_INIT.APPEND_PREFIX="trunk._feature_blocks." \
  config.DISTRIBUTED.NUM_PROC_PER_NODE=1



In [None]:
# SwAV on 500K
# backbone: RN50
# Pretrained on ImageNet, and fine tune all layers (FTAL)

!python tools/run_distributed_engines_500K.py \
  hydra.verbose=true \
  config=pretrain/swav/swav_1_gpu_resnet50_flux.yaml \
  config.DATA.TRAIN.DATASET_NAMES=[Flux] \
  config.DATA.TRAIN.DATA_SOURCES=[disk_folder] \
  config.DATA.TRAIN.DATA_PATHS=["/scratch/tjian/Data/Flux/images_pretrain_500K/train"] \
  config.DATA.TRAIN.BATCHSIZE_PER_REPLICA=64 \
  config.OPTIMIZER.num_epochs=200 \
  config.OPTIMIZER.param_schedulers.lr.lengths="[0.05, 0.95]" \
  config.CHECKPOINT.DIR="/scratch/tjian/PythonProject/deep_plastic_Flux_SSL/checkpoint/train_weights/RN50_500K/vissl_100e_to" \
  config.HOOKS.TENSORBOARD_SETUP.USE_TENSORBOARD=False \
  config.WEIGHTS_INIT.PARAMS_FILE="/scratch/tjian/PythonProject/deep_plastic_Flux_SSL/checkpoint/train_weights/RN50_500K/vissl_80_to_100e/model_final_checkpoint_phase19.torch" \
  config.WEIGHTS_INIT.APPEND_PREFIX="trunk._feature_blocks." \
  config.DISTRIBUTED.NUM_PROC_PER_NODE=1


In [None]:
# SwAV
# Backbone: resnet50
# Train from scratch

# !python tools/run_distributed_engines.py \
#   hydra.verbose=true \
#   config=pretrain/swav/swav_1_gpu_RN50_scratch.yaml \
#   config.DATA.TRAIN.DATASET_NAMES=[Flux] \
#   config.DATA.TRAIN.DATA_SOURCES=[disk_folder] \
#   config.DATA.TRAIN.DATA_PATHS=["/workspace/Data/pre_train_500k/train"] \
#   config.DATA.TRAIN.BATCHSIZE_PER_REPLICA=16 \
#   config.OPTIMIZER.num_epochs=10 \
#   config.CHECKPOINT.DIR="/scratch/tjian/PythonProject/deep_plastic_Flux_SSL/checkpoint/train_weights/test_2_GPU" \
#   config.HOOKS.TENSORBOARD_SETUP.USE_TENSORBOARD=False \
#   config.DISTRIBUTED.NUM_PROC_PER_NODE=2

#### (2) ResNet101

In [None]:
# SwAV on 100K
# backbone: RN101
# Pretrained on ImageNet, and fine tune all layers (FTAL)

!python tools/run_distributed_engines_100K.py \
  hydra.verbose=true \
  config=pretrain/swav/swav_1_gpu_resnet101_flux.yaml \
  config.DATA.TRAIN.DATASET_NAMES=[Flux] \
  config.DATA.TRAIN.DATA_SOURCES=[disk_folder] \
  config.DATA.TRAIN.DATA_PATHS=["/scratch/tjian/Data/Flux/images_pretrain_100K/train"] \
  config.OPTIMIZER.num_epochs=100 \
  config.DATA.TRAIN.BATCHSIZE_PER_REPLICA=64 \
  config.CHECKPOINT.DIR="/scratch/tjian/PythonProject/deep_plastic_Flux_SSL/checkpoint/train_weights/RN101_100K_100e/vissl" \
  config.HOOKS.TENSORBOARD_SETUP.USE_TENSORBOARD=False \
  config.WEIGHTS_INIT.PARAMS_FILE="/workspace/Project/deep_plastic_Flux_SSL/checkpoint/pretrained_model/R-101.pkl" \
  config.WEIGHTS_INIT.APPEND_PREFIX="trunk._feature_blocks." \
  config.DISTRIBUTED.NUM_PROC_PER_NODE=1


In [None]:
# SwAV on 300K
# backbone: RN101
# Pretrained on ImageNet, and fine tune all layers (FTAL)

!python tools/run_distributed_engines_300K.py \
  hydra.verbose=true \
  config=pretrain/swav/swav_1_gpu_resnet101_flux.yaml \
  config.DATA.TRAIN.DATASET_NAMES=[Flux] \
  config.DATA.TRAIN.DATA_SOURCES=[disk_folder] \
  config.DATA.TRAIN.DATA_PATHS=["/scratch/tjian/Data/Flux/images_pretrain_300K/train"] \
  config.OPTIMIZER.num_epochs=50 \
  config.DATA.TRAIN.BATCHSIZE_PER_REPLICA=64 \
  config.CHECKPOINT.DIR="/scratch/tjian/PythonProject/deep_plastic_Flux_SSL/checkpoint/train_weights/RN101_300K_100e/vissl_100e" \
  config.HOOKS.TENSORBOARD_SETUP.USE_TENSORBOARD=False \
  config.WEIGHTS_INIT.PARAMS_FILE="/scratch/tjian/PythonProject/deep_plastic_Flux_SSL/checkpoint/train_weights/RN101_300K_100e/vissl_50e/model_final_checkpoint_phase49.torch" \
  config.WEIGHTS_INIT.APPEND_PREFIX="trunk._feature_blocks." \
  config.DISTRIBUTED.NUM_PROC_PER_NODE=1


In [None]:
# SwAV on 500K
# backbone: RN101
# Pretrained on ImageNet, and fine tune all layers (FTAL)

!python tools/run_distributed_engines_500K.py \
  hydra.verbose=true \
  config=pretrain/swav/swav_1_gpu_resnet101_flux.yaml \
  config.DATA.TRAIN.DATASET_NAMES=[Flux] \
  config.DATA.TRAIN.DATA_SOURCES=[disk_folder] \
  config.DATA.TRAIN.DATA_PATHS=["/scratch/tjian/Data/Flux/images_pretrain/train"] \
  config.OPTIMIZER.num_epochs=30 \
  config.DATA.TRAIN.BATCHSIZE_PER_REPLICA=64 \
  config.CHECKPOINT.DIR="/scratch/tjian/PythonProject/deep_plastic_Flux_SSL/checkpoint/train_weights/RN101_500K_100e/vissl_30_to_60e" \
  config.HOOKS.TENSORBOARD_SETUP.USE_TENSORBOARD=False \
  config.WEIGHTS_INIT.PARAMS_FILE="/scratch/tjian/PythonProject/deep_plastic_Flux_SSL/checkpoint/train_weights/RN101_500K_100e/vissl_30e/model_final_checkpoint_phase29.torch" \
  config.WEIGHTS_INIT.APPEND_PREFIX="trunk._feature_blocks." \
  config.DISTRIBUTED.NUM_PROC_PER_NODE=1


In [None]:
# SwAV
# Backbone: resnet101
# Train from scratch

# !python tools/run_distributed_engines.py \
#   hydra.verbose=true \
#   config=pretrain/swav/swav_1_gpu_RN101_scratch.yaml \
#   config.DATA.TRAIN.DATASET_NAMES=[GJO] \
#   config.DATA.TRAIN.DATA_SOURCES=[disk_folder] \
#   config.DATA.TRAIN.DATA_PATHS=["/scratch/tjian/Data/GJO_SSL/images_tiles_224_pretrain/train"] \
#   config.DATA.TRAIN.BATCHSIZE_PER_REPLICA=16 \
#   config.OPTIMIZER.num_epochs=100 \
#   config.CHECKPOINT.DIR="/scratch/tjian/PythonProject/deep_plastic_SSL/checkpoints/train_weights/Self_train_tiles_224/RN101_Sw_100e_Scratch/vissl" \
#   config.HOOKS.TENSORBOARD_SETUP.USE_TENSORBOARD=False
  

## Training logs, checkpoints, metrics (optional)

VISSL dumps model checkpoints in the checkpoint directory specified by user. In above example, we used `./checkpoints` directory.

We notice:
- model checkpoints `.torch` files after every epoch, 
- model training log `log.txt` which has the full stdout but saved in file
- `metrics.json` if your training calculated some metrics, those metrics values will be saved there..
- `tb_logs` which are the tensorboard events

## Visualizing Tensorboard Logs (optional)

If you have enabled `config.TENSORBOARD_SETUP.USE_TENSORBOARD=true` , you will see the tensorboard events dumped in `tb_logs/` directory. You can use this to visualize the events in tensorboard as follows:

In [None]:
# Look at training curves in tensorboard:
%reload_ext tensorboard
%tensorboard --logdir /scratch/tjian/PythonProject/deep_plastic_SSL/checkpoints/train_weights/Self_train_bbox/SimCLR_50_epochs/vissl/tb_logs/