# Contrastive Language-Image Pretraining with SogCLR

### **Introduction**

In this tutorial, you will learn how to conduct contrastive language-image pretraining by optimizing the [Global Contrastive Loss](https://arxiv.org/abs/2202.12387) (GCL) on a subset of the [Conceptual Captions](https://ai.google.com/research/ConceptualCaptions/) dataset. Also, you will learn how to evaluate the model on retrieval task using the [MSCOCO](https://cocodataset.org/#home) dataset and zero-shot classification task using the [ImageNet](https://www.image-net.org/challenges/LSVRC/index.php) dataset. The code is based on [iSogCLR's](https://github.com/zhqiu/contrastive-learning-iSogCLR) codebase, which includes the implementation of CLIP, SogCLR and iSogCLR.

### Preparation

First, we:

1. Download the source code and data
2. Install required packages

In [None]:
#!git clone -b project https://github.com/xywei00/csce689_iSogCLR.git iSogCLR

!export PYTHONPATH="$PYTHONPATH:./iSogCLR/bimodal_exps"
!export HUGGINGFACE_HUB_CACHE='./checkpoints/huggingface'
!mkdir checkpoints

!gdown 142xxRoMaHxX3BIfCw_1b_G_dgu-02Yq3    # clip_train.tar.gz
!gdown 142zQjlOw0Xw4tKzXMrQjYE6NtGRTeasT    # cc3m_subset_100k.tar.gz
!gdown 142tMsnclHTTPpnTXHSeNgTUlBk4She6o    # ms_coco_val.tar.gz
!gdown 1NXhfhwFy-nhdABACkodgYqm9pomDKE39    # val.tar

!mkdir datasets
!mkdir -p datasets/imagenet
!tar xf clip_train.tar.gz
!tar xf cc3m_subset_100k.tar.gz -C datasets
!tar xf mscoco_val.tar.gz -C datasets
!tar xf val.tar -C datasets/imagenet

!pip install -r ./iSogCLR/requirements_colab.txt    # there may be pip warnings/ errors, should be fine to ignore them

### Training

The following command runs the training script to train a ResNet50 (pretrained on ImageNet) and a DistilBERT (pretrained on BookCorpus and English Wikipedia) on the cc3m dataset using the SogCLR loss for 30 epochs with temperature 0.01.

In [1]:
!CUDA_VISIBLE_DEVICES=0 python3 ./iSogCLR/bimodal_exps/clip.py \
    --data_path ./datasets \
    --ann_path ./clip_train \
    --train_file cc3m_train_subset.json \
    --train_image_root cc3m_subset_100k \
    --output_dir output/clip_cc3m_g0.8_e30 \
    --init_model \
    --use_amp \
    --ita_type clip \
    --tau_init 0.01 \
    --sogclr_gamma 0.8 \
    --eta_init 0.03 --sched cosine \
    --no-distributed \
    --epochs 30

Creating retrieval dataset
len of train_dataset: 100000
len of coco val: 5000
Creating model
Start training
Train Epoch: [0]  [  0/781]  eta: 1:14:29  lr: 0.000010  lr_temp_net: 0.00000100  loss_ita: 11.8477  avg_image_tau: 0.0100  avg_text_tau: 0.0100  cur_eta: 0.0000  grad_tau_image: 0.0000  grad_tau_text: 0.0000  b_I: 0.0000  b_T: 0.0000  v: 0.0000  lamda: 0.0000  weights_image_pos: 0.0000  weights_text_pos: 0.0000  time: 5.7234  data: 0.8225  max mem: 9358
Train Epoch: [0]  [ 50/781]  eta: 0:04:55  lr: 0.000010  lr_temp_net: 0.00000100  loss_ita: 7.1008  avg_image_tau: 0.0100  avg_text_tau: 0.0100  cur_eta: 0.0000  grad_tau_image: 0.0000  grad_tau_text: 0.0000  b_I: 0.0000  b_T: 0.0000  v: 0.0000  lamda: 0.0000  weights_image_pos: 0.0000  weights_text_pos: 0.0000  time: 0.2988  data: 0.0002  max mem: 9358
Train Epoch: [0]  [100/781]  eta: 0:03:59  lr: 0.000010  lr_temp_net: 0.00000100  loss_ita: 6.2750  avg_image_tau: 0.0100  avg_text_tau: 0.0100  cur_eta: 0.0000  grad_tau_image: 0

### Evaluation

The following command runs the evaluation script to evaluate the retrieval performance of the trained model on the MSCOCO validation dataset and the zero-shot classification performance on the ImageNet validation dataset. The evaluation command is obtained by appending `--evaluate --checkpoint /path/to/your/checkpoint --zs_dataset imagenet --zs_datafolder /path/to/imagenet/val` to the training command.

In [3]:
!CUDA_VISIBLE_DEVICES=0 python3 ./iSogCLR/bimodal_exps/clip.py \
    --data_path ./datasets \
    --ann_path ./clip_train \
    --train_file cc3m_train_subset.json \
    --train_image_root cc3m_subset_100k \
    --output_dir output/clip_cc3m_g0.8_e30 \
    --init_model \
    --use_amp \
    --ita_type clip \
    --tau_init 0.01 \
    --sogclr_gamma 0.8 \
    --eta_init 0.03 --sched cosine \
    --no-distributed \
    --epochs 30 \
    --evaluate --checkpoint './output/clip_cc3m_g0.8_e30/checkpoint_30.pth' \
    --zs_dataset imagenet --zs_datafolder ./datasets/imagenet/val

Creating retrieval dataset
len of train_dataset: 100000
len of coco val: 5000
Creating model
load checkpoint from ./output/clip_cc3m_g0.8_e30/checkpoint_30.pth
Start training
Computing features for evaluation...
Evaluation time 0:00:15
coco val: {'txt_r1': 11.62, 'txt_r5': 30.78, 'txt_r10': 43.36, 'txt_r_mean': 28.586666666666662, 'img_r1': 9.160702147227, 'img_r5': 25.31488664080931, 'img_r10': 35.995041784957415, 'img_r_mean': 23.490210190997903, 'r_mean': 26.038438428832283}
zeroshot: {'zeroshot_top1': 21.658, 'zeroshot_top3': 34.418, 'zeroshot_top5': 40.576, 'zeroshot_top10': 48.798}
Training time 0:04:31


### Benchmarks

The following results are recall at 1 results on the provided MSCOCO and ImageNet datasets. The first row of results are from the model trained using the CLIP loss, and the second row of results are from the model trained using the SogCLR loss. All results are based on a batch size of 128 for 30-epoch pretraining. IR@1 denotes the recall at 1 of image retrieval on MSCOCO, TR@1 denotes the recall at 1 of text retrieval on MSCOCO, and ACC@1 denotes the top 1 accuracy on ImageNet. Average denotes the average of the three metrics.

| Method | MSCOCO TR@1 | MSCOCO IR@1 | ImageNet ACC@1 | Average |
|:----------:|:--------:|:--------:|:--------:|:--------:|
| CLIP | 12.0 | 9.32 | 21.35 | 14.22 |
| SogCLR |  14.38  |  10.73  | 24.54 | 16.55 |

## Optimizer: adamW (default), Loss Function: cyclip

In [2]:
!CUDA_VISIBLE_DEVICES=0 python3 ./iSogCLR/bimodal_exps/clip.py \
    --data_path ./datasets \
    --ann_path ./clip_train \
    --train_file cc3m_train_subset.json \
    --train_image_root cc3m_subset_100k \
    --output_dir output/cyclip_cc3m_g0.8_e30 \
    --init_model \
    --use_amp \
    --ita_type cyclip \
    --tau_init 0.01 \
    --sogclr_gamma 0.8 \
    --eta_init 0.03 --sched cosine \
    --no-distributed \
    --epochs 30

Creating retrieval dataset
len of train_dataset: 100000
len of coco val: 5000
Creating model
Start training
Train Epoch: [0]  [  0/781]  eta: 1:28:23  lr: 0.000010  lr_temp_net: 0.00000100  loss_ita: 20.7273  avg_image_tau: 0.0100  avg_text_tau: 0.0100  cur_eta: 0.0000  grad_tau_image: 0.0000  grad_tau_text: 0.0000  b_I: 0.0000  b_T: 0.0000  v: 0.0000  lamda: 0.0000  weights_image_pos: 0.0000  weights_text_pos: 0.0000  time: 6.7906  data: 0.8027  max mem: 9358
Train Epoch: [0]  [ 50/781]  eta: 0:05:59  lr: 0.000010  lr_temp_net: 0.00000100  loss_ita: 8.8137  avg_image_tau: 0.0100  avg_text_tau: 0.0100  cur_eta: 0.0000  grad_tau_image: 0.0000  grad_tau_text: 0.0000  b_I: 0.0000  b_T: 0.0000  v: 0.0000  lamda: 0.0000  weights_image_pos: 0.0000  weights_text_pos: 0.0000  time: 0.4202  data: 0.1183  max mem: 9358
Train Epoch: [0]  [100/781]  eta: 0:05:05  lr: 0.000010  lr_temp_net: 0.00000100  loss_ita: 6.4159  avg_image_tau: 0.0100  avg_text_tau: 0.0100  cur_eta: 0.0000  grad_tau_image: 0

In [3]:
!CUDA_VISIBLE_DEVICES=0 python3 ./iSogCLR/bimodal_exps/clip.py \
    --data_path ./datasets \
    --ann_path ./clip_train \
    --train_file cc3m_train_subset.json \
    --train_image_root cc3m_subset_100k \
    --output_dir output/cyclip_cc3m_g0.8_e30 \
    --init_model \
    --use_amp \
    --ita_type cyclip \
    --tau_init 0.01 \
    --sogclr_gamma 0.8 \
    --eta_init 0.03 --sched cosine \
    --no-distributed \
    --epochs 30 \
    --evaluate --checkpoint './output/cyclip_cc3m_g0.8_e30/checkpoint_30.pth' \
    --zs_dataset imagenet --zs_datafolder ./datasets/imagenet/val

Creating retrieval dataset
len of train_dataset: 100000
len of coco val: 5000
Creating model
load checkpoint from ./output/cyclip_cc3m_g0.8_e30/checkpoint_30.pth
Start training
Computing features for evaluation...
Evaluation time 0:01:30
coco val: {'txt_r1': 14.1, 'txt_r5': 33.84, 'txt_r10': 46.3, 'txt_r_mean': 31.413333333333338, 'img_r1': 10.68415370466632, 'img_r5': 27.694030149146307, 'img_r10': 38.17825582790196, 'img_r_mean': 25.518813227238194, 'r_mean': 28.466073280285766}
zeroshot: {'zeroshot_top1': 25.906, 'zeroshot_top3': 39.492, 'zeroshot_top5': 45.658, 'zeroshot_top10': 53.904}
Training time 0:10:15


## Optimizer: adamW (default), Loss Function: vicreg

In [2]:
!CUDA_VISIBLE_DEVICES=0 python3 ./iSogCLR/bimodal_exps/clip.py \
    --data_path ./datasets \
    --ann_path ./clip_train \
    --train_file cc3m_train_subset.json \
    --train_image_root cc3m_subset_100k \
    --output_dir output/vicreg_cc3m_g0.8_e30 \
    --init_model \
    --use_amp \
    --ita_type vicreg \
    --tau_init 0.01 \
    --sogclr_gamma 0.8 \
    --eta_init 0.03 --sched cosine \
    --no-distributed \
    --epochs 30

Creating retrieval dataset
len of train_dataset: 100000
len of coco val: 5000
Creating model
Start training
Train Epoch: [0]  [  0/781]  eta: 0:58:01  lr: 0.000010  lr_temp_net: 0.00000100  loss_ita: 24.4050  avg_image_tau: 0.0000  avg_text_tau: 0.0000  cur_eta: 0.0000  grad_tau_image: 0.0000  grad_tau_text: 0.0000  b_I: 0.0000  b_T: 0.0000  v: 0.0000  lamda: 0.0000  weights_image_pos: 0.0000  weights_text_pos: 0.0000  time: 4.4571  data: 0.7351  max mem: 9358
Train Epoch: [0]  [ 50/781]  eta: 0:04:37  lr: 0.000010  lr_temp_net: 0.00000100  loss_ita: 22.6329  avg_image_tau: 0.0000  avg_text_tau: 0.0000  cur_eta: 0.0000  grad_tau_image: 0.0000  grad_tau_text: 0.0000  b_I: 0.0000  b_T: 0.0000  v: 0.0000  lamda: 0.0000  weights_image_pos: 0.0000  weights_text_pos: 0.0000  time: 0.2992  data: 0.0001  max mem: 9358
Train Epoch: [0]  [100/781]  eta: 0:03:51  lr: 0.000010  lr_temp_net: 0.00000100  loss_ita: 22.4322  avg_image_tau: 0.0000  avg_text_tau: 0.0000  cur_eta: 0.0000  grad_tau_image:

In [3]:
!CUDA_VISIBLE_DEVICES=0 python3 ./iSogCLR/bimodal_exps/clip.py \
    --data_path ./datasets \
    --ann_path ./clip_train \
    --train_file cc3m_train_subset.json \
    --train_image_root cc3m_subset_100k \
    --output_dir output/vicreg_cc3m_g0.8_e30 \
    --init_model \
    --use_amp \
    --ita_type vicreg \
    --tau_init 0.01 \
    --sogclr_gamma 0.8 \
    --eta_init 0.03 --sched cosine \
    --no-distributed \
    --epochs 30 \
    --evaluate --checkpoint './output/vicreg_cc3m_g0.8_e30/checkpoint_30.pth' \
    --zs_dataset imagenet --zs_datafolder ./datasets/imagenet/val

Creating retrieval dataset
len of train_dataset: 100000
len of coco val: 5000
Creating model
load checkpoint from ./output/vicreg_cc3m_g0.8_e30/checkpoint_30.pth
Start training
Computing features for evaluation...
Evaluation time 0:01:05
coco val: {'txt_r1': 2.86, 'txt_r5': 9.14, 'txt_r10': 14.5, 'txt_r_mean': 8.833333333333334, 'img_r1': 2.155224119317046, 'img_r5': 7.185413251229558, 'img_r10': 11.755767923547523, 'img_r_mean': 7.032135098031375, 'r_mean': 7.932734215682355}
zeroshot: {'zeroshot_top1': 5.788, 'zeroshot_top3': 12.746, 'zeroshot_top5': 17.972, 'zeroshot_top10': 26.93}
Training time 0:09:49


## Optimizer: adamW (default), Loss Function: onlineclr

In [4]:

!CUDA_VISIBLE_DEVICES=0 python3 ./iSogCLR/bimodal_exps/clip.py \
    --data_path ./datasets \
    --ann_path ./clip_train \
    --train_file cc3m_train_subset.json \
    --train_image_root cc3m_subset_100k \
    --output_dir output/onlineclr_cc3m_g0.8_e30 \
    --init_model \
    --use_amp \
    --ita_type onlineclr \
    --tau_init 0.01 \
    --sogclr_gamma 0.8 \
    --eta_init 0.03 --sched cosine \
    --no-distributed \
    --epochs 30


Creating retrieval dataset
len of train_dataset: 100000
len of coco val: 5000
Creating model
Start training
Train Epoch: [0]  [  0/781]  eta: 0:57:58  lr: 0.000010  lr_temp_net: 0.00000100  loss_ita: 0.2749  avg_image_tau: 0.0000  avg_text_tau: 0.0000  cur_eta: 0.0000  grad_tau_image: 0.0000  grad_tau_text: 0.0000  b_I: 0.0000  b_T: 0.0000  v: 0.0000  lamda: 0.0000  weights_image_pos: 0.0000  weights_text_pos: 0.0000  time: 4.4536  data: 0.7449  max mem: 9358
Train Epoch: [0]  [ 50/781]  eta: 0:04:40  lr: 0.000010  lr_temp_net: 0.00000100  loss_ita: 0.0978  avg_image_tau: 0.0000  avg_text_tau: 0.0000  cur_eta: 0.0000  grad_tau_image: 0.0000  grad_tau_text: 0.0000  b_I: 0.0000  b_T: 0.0000  v: 0.0000  lamda: 0.0000  weights_image_pos: 0.0000  weights_text_pos: 0.0000  time: 0.3024  data: 0.0001  max mem: 9358
Train Epoch: [0]  [100/781]  eta: 0:03:54  lr: 0.000010  lr_temp_net: 0.00000100  loss_ita: 0.1119  avg_image_tau: 0.0000  avg_text_tau: 0.0000  cur_eta: 0.0000  grad_tau_image: 0.

In [5]:
!CUDA_VISIBLE_DEVICES=0 python3 ./iSogCLR/bimodal_exps/clip.py \
    --data_path ./datasets \
    --ann_path ./clip_train \
    --train_file cc3m_train_subset.json \
    --train_image_root cc3m_subset_100k \
    --output_dir output/onlineclr_cc3m_g0.8_e30 \
    --init_model \
    --use_amp \
    --ita_type onlineclr \
    --tau_init 0.01 \
    --sogclr_gamma 0.8 \
    --eta_init 0.03 --sched cosine \
    --no-distributed \
    --epochs 30 \
    --evaluate --checkpoint './output/onlineclr_cc3m_g0.8_e30/checkpoint_30.pth' \
    --zs_dataset imagenet --zs_datafolder ./datasets/imagenet/val

Creating retrieval dataset
len of train_dataset: 100000
len of coco val: 5000
Creating model
load checkpoint from ./output/onlineclr_cc3m_g0.8_e30/checkpoint_30.pth
Start training
Computing features for evaluation...
Evaluation time 0:01:03
coco val: {'txt_r1': 10.96, 'txt_r5': 29.5, 'txt_r10': 40.52, 'txt_r_mean': 26.993333333333336, 'img_r1': 8.644887840377464, 'img_r5': 23.53952577072254, 'img_r10': 34.37162621456276, 'img_r_mean': 22.185346608554255, 'r_mean': 24.589339970943797}
zeroshot: {'zeroshot_top1': 20.522, 'zeroshot_top3': 32.686, 'zeroshot_top5': 38.286, 'zeroshot_top10': 46.144}
Training time 0:09:41


## Optimizer: adamW (default), Loss Function: isogclr_new

In [7]:
!CUDA_VISIBLE_DEVICES=0 python3 ./iSogCLR/bimodal_exps/clip.py \
    --data_path ./datasets \
    --ann_path ./clip_train \
    --train_file cc3m_train_subset.json \
    --train_image_root cc3m_subset_100k \
    --output_dir output/isogclr_new_cc3m_g0.8_e30 \
    --init_model \
    --use_amp \
    --ita_type isogclr_new \
    --tau_init 0.01 \
    --sogclr_gamma 0.8 \
    --eta_init 0.03 --sched cosine \
    --no-distributed \
    --epochs 30

Creating retrieval dataset
len of train_dataset: 100000
len of coco val: 5000
Creating model
Start training
Train Epoch: [0]  [  0/781]  eta: 0:54:17  lr: 0.000010  lr_temp_net: 0.00000100  loss_ita: 0.2059  avg_image_tau: 0.0100  avg_text_tau: 0.0100  cur_eta: 0.0000  grad_tau_image: 4.7022  grad_tau_text: 4.2088  b_I: 5.0000  b_T: 3.2676  v: 0.0000  lamda: 0.0000  weights_image_pos: 0.0000  weights_text_pos: 0.0000  time: 4.1715  data: 0.7316  max mem: 9454
Train Epoch: [0]  [ 50/781]  eta: 0:04:36  lr: 0.000010  lr_temp_net: 0.00000100  loss_ita: 0.0835  avg_image_tau: 0.0100  avg_text_tau: 0.0100  cur_eta: 0.0000  grad_tau_image: 4.9971  grad_tau_text: 4.8445  b_I: 5.0000  b_T: 4.8082  v: 0.0000  lamda: 0.0000  weights_image_pos: 0.0000  weights_text_pos: 0.0000  time: 0.3021  data: 0.0001  max mem: 9454
Train Epoch: [0]  [100/781]  eta: 0:03:52  lr: 0.000010  lr_temp_net: 0.00000100  loss_ita: 0.0716  avg_image_tau: 0.0100  avg_text_tau: 0.0100  cur_eta: 0.0000  grad_tau_image: 4.

In [8]:
!CUDA_VISIBLE_DEVICES=0 python3 ./iSogCLR/bimodal_exps/clip.py \
    --data_path ./datasets \
    --ann_path ./clip_train \
    --train_file cc3m_train_subset.json \
    --train_image_root cc3m_subset_100k \
    --output_dir output/isogclr_new_cc3m_g0.8_e30 \
    --init_model \
    --use_amp \
    --ita_type isogclr_new \
    --tau_init 0.01 \
    --sogclr_gamma 0.8 \
    --eta_init 0.03 --sched cosine \
    --no-distributed \
    --epochs 30 \
    --evaluate --checkpoint './output/isogclr_new_cc3m_g0.8_e30/checkpoint_30.pth' \
    --zs_dataset imagenet --zs_datafolder ./datasets/imagenet/val

Creating retrieval dataset
len of train_dataset: 100000
len of coco val: 5000
Creating model
load checkpoint from ./output/isogclr_new_cc3m_g0.8_e30/checkpoint_30.pth
Start training
Computing features for evaluation...
Evaluation time 0:00:17
coco val: {'txt_r1': 13.88, 'txt_r5': 34.3, 'txt_r10': 46.72, 'txt_r_mean': 31.633333333333336, 'img_r1': 11.111999680115158, 'img_r5': 27.71002439121916, 'img_r10': 38.27422128033908, 'img_r_mean': 25.698748450557797, 'r_mean': 28.666040891945567}
zeroshot: {'zeroshot_top1': 26.46, 'zeroshot_top3': 39.534, 'zeroshot_top5': 45.264, 'zeroshot_top10': 52.26}
Training time 0:02:37


## Optimizer: adamW (default), Loss Function: isogclr_new, Hyper: learnable_temp, personalized_tau

In [19]:
!CUDA_VISIBLE_DEVICES=0 python3 ./iSogCLR/bimodal_exps/clip.py \
    --data_path ./datasets \
    --ann_path ./clip_train \
    --train_file cc3m_train_subset.json \
    --train_image_root cc3m_subset_100k \
    --output_dir output/isogclr_new_hyper_cc3m_g0.8_e30 \
    --init_model \
    --use_amp \
    --ita_type isogclr_new \
    --personalized_tau \
    --learnable_temp \
    --tau_init 0.01 \
    --sogclr_gamma 0.8 \
    --eta_init 0.03 --sched cosine \
    --no-distributed \
    --epochs 30

Creating retrieval dataset
len of train_dataset: 100000
len of coco val: 5000
Creating model
Start training
Train Epoch: [0]  [  0/781]  eta: 1:54:42  lr: 0.000010  lr_temp_net: 0.00000100  loss_ita: 0.2059  avg_image_tau: 0.0100  avg_text_tau: 0.0100  cur_eta: 0.0000  grad_tau_image: 4.7023  grad_tau_text: 4.2083  b_I: 5.0000  b_T: 3.2693  v: 0.0000  lamda: 0.0000  weights_image_pos: 0.0000  weights_text_pos: 0.0000  time: 8.8122  data: 2.0326  max mem: 9478
Train Epoch: [0]  [ 50/781]  eta: 0:07:55  lr: 0.000010  lr_temp_net: 0.00000100  loss_ita: 0.0836  avg_image_tau: 0.0100  avg_text_tau: 0.0100  cur_eta: 0.0000  grad_tau_image: 4.9973  grad_tau_text: 4.8390  b_I: 5.0000  b_T: 4.8171  v: 0.0000  lamda: 0.0000  weights_image_pos: 0.0000  weights_text_pos: 0.0000  time: 0.5682  data: 0.2663  max mem: 9478
Train Epoch: [0]  [100/781]  eta: 0:06:42  lr: 0.000010  lr_temp_net: 0.00000100  loss_ita: 0.0718  avg_image_tau: 0.0100  avg_text_tau: 0.0100  cur_eta: 0.0000  grad_tau_image: 4.

In [20]:
!CUDA_VISIBLE_DEVICES=0 python3 ./iSogCLR/bimodal_exps/clip.py \
    --data_path ./datasets \
    --ann_path ./clip_train \
    --train_file cc3m_train_subset.json \
    --train_image_root cc3m_subset_100k \
    --output_dir output/isogclr_new_hyper_cc3m_g0.8_e30 \
    --init_model \
    --use_amp \
    --ita_type isogclr_new \
    --personalized_tau \
    --learnable_temp \
    --tau_init 0.01 \
    --sogclr_gamma 0.8 \
    --eta_init 0.03 --sched cosine \
    --no-distributed \
    --epochs 30 \
    --evaluate --checkpoint './output/isogclr_new_hyper_cc3m_g0.8_e30/checkpoint_30.pth' \
    --zs_dataset imagenet --zs_datafolder ./datasets/imagenet/val

Creating retrieval dataset
len of train_dataset: 100000
len of coco val: 5000
Creating model
load checkpoint from ./output/isogclr_new_hyper_cc3m_g0.8_e30/checkpoint_30.pth
Start training
Computing features for evaluation...
Evaluation time 0:00:56
coco val: {'txt_r1': 13.84, 'txt_r5': 33.5, 'txt_r10': 45.2, 'txt_r_mean': 30.846666666666668, 'img_r1': 10.780119157103442, 'img_r5': 27.518093486344917, 'img_r10': 38.64608740853293, 'img_r_mean': 25.648100017327096, 'r_mean': 28.24738334199688}
zeroshot: {'zeroshot_top1': 26.492, 'zeroshot_top3': 39.334, 'zeroshot_top5': 44.948, 'zeroshot_top10': 52.256}
Training time 0:03:34


## Optimizer: adamW (default), Loss Function: isogclr_new_v2

In [9]:
!CUDA_VISIBLE_DEVICES=0 python3 ./iSogCLR/bimodal_exps/clip.py \
    --data_path ./datasets \
    --ann_path ./clip_train \
    --train_file cc3m_train_subset.json \
    --train_image_root cc3m_subset_100k \
    --output_dir output/isogclr_new_v2_cc3m_g0.8_e30 \
    --init_model \
    --use_amp \
    --ita_type isogclr_new_v2 \
    --tau_init 0.01 \
    --sogclr_gamma 0.8 \
    --eta_init 0.03 --sched cosine \
    --no-distributed \
    --epochs 30

Creating retrieval dataset
len of train_dataset: 100000
len of coco val: 5000
Creating model
Start training
Train Epoch: [0]  [  0/781]  eta: 0:58:06  lr: 0.000010  lr_temp_net: 0.00000100  loss_ita: 0.2201  avg_image_tau: 0.0050  avg_text_tau: 0.0050  cur_eta: 0.0300  grad_tau_image: 5.1772  grad_tau_text: 4.2104  b_I: 0.0000  b_T: 0.0000  v: 0.0000  lamda: 0.0000  weights_image_pos: 0.0000  weights_text_pos: 0.0000  time: 4.4635  data: 0.7638  max mem: 9454
Train Epoch: [0]  [ 50/781]  eta: 0:04:41  lr: 0.000010  lr_temp_net: 0.00000100  loss_ita: 0.0885  avg_image_tau: 0.0050  avg_text_tau: 0.0050  cur_eta: 0.0300  grad_tau_image: 6.7971  grad_tau_text: 5.0938  b_I: 0.0000  b_T: 0.0000  v: 0.0000  lamda: 0.0000  weights_image_pos: 0.0000  weights_text_pos: 0.0000  time: 0.3032  data: 0.0002  max mem: 9454
Train Epoch: [0]  [100/781]  eta: 0:03:54  lr: 0.000010  lr_temp_net: 0.00000100  loss_ita: 0.0737  avg_image_tau: 0.0050  avg_text_tau: 0.0050  cur_eta: 0.0300  grad_tau_image: 6.

In [10]:
!CUDA_VISIBLE_DEVICES=0 python3 ./iSogCLR/bimodal_exps/clip.py \
    --data_path ./datasets \
    --ann_path ./clip_train \
    --train_file cc3m_train_subset.json \
    --train_image_root cc3m_subset_100k \
    --output_dir output/isogclr_new_v2_cc3m_g0.8_e30 \
    --init_model \
    --use_amp \
    --ita_type isogclr_new_v2 \
    --tau_init 0.01 \
    --sogclr_gamma 0.8 \
    --eta_init 0.03 --sched cosine \
    --no-distributed \
    --epochs 30 \
    --evaluate --checkpoint './output/isogclr_new_v2_cc3m_g0.8_e30/checkpoint_30.pth' \
    --zs_dataset imagenet --zs_datafolder ./datasets/imagenet/val

Creating retrieval dataset
len of train_dataset: 100000
len of coco val: 5000
Creating model
load checkpoint from ./output/isogclr_new_v2_cc3m_g0.8_e30/checkpoint_30.pth
Start training
Computing features for evaluation...
Evaluation time 0:00:16
coco val: {'txt_r1': 9.66, 'txt_r5': 25.5, 'txt_r10': 36.04, 'txt_r_mean': 23.73333333333333, 'img_r1': 7.225398856411692, 'img_r5': 20.212723419568956, 'img_r10': 29.493382382342357, 'img_r_mean': 18.977168219441, 'r_mean': 21.355250776387166}
zeroshot: {'zeroshot_top1': 19.84, 'zeroshot_top3': 32.526, 'zeroshot_top5': 38.786, 'zeroshot_top10': 47.12}
Training time 0:05:40


## Optimizer: SGD, Loss Function: SogCLR

In [17]:
!CUDA_VISIBLE_DEVICES=0 python3 ./iSogCLR/bimodal_exps/clip.py \
    --data_path ./datasets \
    --ann_path ./clip_train \
    --train_file cc3m_train_subset.json \
    --train_image_root cc3m_subset_100k \
    --output_dir output/sgd_sogclr_cc3m_g0.8_e30 \
    --init_model \
    --use_amp \
    --opt sgd \
    --momentum 0.9 \
    --ita_type sogclr \
    --tau_init 0.01 \
    --sogclr_gamma 0.8 \
    --eta_init 0.03 --sched cosine \
    --no-distributed \
    --epochs 30

Creating retrieval dataset
len of train_dataset: 100000
len of coco val: 5000
Creating model
Start training
Train Epoch: [0]  [  0/781]  eta: 0:54:06  lr: 0.000010  lr_temp_net: 0.00000100  loss_ita: 0.2059  avg_image_tau: 0.0000  avg_text_tau: 0.0000  cur_eta: 0.0000  grad_tau_image: 0.0000  grad_tau_text: 0.0000  b_I: 0.0000  b_T: 0.0000  v: 0.0000  lamda: 0.0000  weights_image_pos: 0.0000  weights_text_pos: 0.0000  time: 4.1569  data: 0.8105  max mem: 9406
Train Epoch: [0]  [ 50/781]  eta: 0:04:31  lr: 0.000010  lr_temp_net: 0.00000100  loss_ita: 0.2283  avg_image_tau: 0.0000  avg_text_tau: 0.0000  cur_eta: 0.0000  grad_tau_image: 0.0000  grad_tau_text: 0.0000  b_I: 0.0000  b_T: 0.0000  v: 0.0000  lamda: 0.0000  weights_image_pos: 0.0000  weights_text_pos: 0.0000  time: 0.2957  data: 0.0001  max mem: 9406
Train Epoch: [0]  [100/781]  eta: 0:03:47  lr: 0.000010  lr_temp_net: 0.00000100  loss_ita: 0.1884  avg_image_tau: 0.0000  avg_text_tau: 0.0000  cur_eta: 0.0000  grad_tau_image: 0.

In [18]:
!CUDA_VISIBLE_DEVICES=0 python3 ./iSogCLR/bimodal_exps/clip.py \
    --data_path ./datasets \
    --ann_path ./clip_train \
    --train_file cc3m_train_subset.json \
    --train_image_root cc3m_subset_100k \
    --output_dir output/sgd_sogclr_cc3m_g0.8_e30 \
    --init_model \
    --use_amp \
    --opt sgd \
    --momentum 0.9 \
    --ita_type sogclr \
    --tau_init 0.01 \
    --sogclr_gamma 0.8 \
    --eta_init 0.03 --sched cosine \
    --no-distributed \
    --epochs 30 \
    --evaluate --checkpoint './output/sgd_sogclr_cc3m_g0.8_e30/checkpoint_30.pth' \
    --zs_dataset imagenet --zs_datafolder ./datasets/imagenet/val

Creating retrieval dataset
len of train_dataset: 100000
len of coco val: 5000
Creating model
load checkpoint from ./output/sgd_sogclr_cc3m_g0.8_e30/checkpoint_30.pth
Start training
Computing features for evaluation...
Evaluation time 0:00:33
coco val: {'txt_r1': 1.56, 'txt_r5': 5.78, 'txt_r10': 10.34, 'txt_r_mean': 5.8933333333333335, 'img_r1': 1.0036386900715741, 'img_r5': 4.286456875524811, 'img_r10': 7.825182934143708, 'img_r_mean': 4.371759499913364, 'r_mean': 5.132546416623349}
zeroshot: {'zeroshot_top1': 2.872, 'zeroshot_top3': 6.72, 'zeroshot_top5': 9.468, 'zeroshot_top10': 14.442}
Training time 0:08:48


## Optimizer: SGD, Loss Function: CLIP

In [19]:
!CUDA_VISIBLE_DEVICES=0 python3 ./iSogCLR/bimodal_exps/clip.py \
    --data_path ./datasets \
    --ann_path ./clip_train \
    --train_file cc3m_train_subset.json \
    --train_image_root cc3m_subset_100k \
    --output_dir output/sgd_clip_cc3m_g0.8_e30 \
    --init_model \
    --use_amp \
    --opt sgd \
    --momentum 0.9 \
    --ita_type clip \
    --tau_init 0.01 \
    --sogclr_gamma 0.8 \
    --eta_init 0.03 --sched cosine \
    --no-distributed \
    --epochs 30

Creating retrieval dataset
len of train_dataset: 100000
len of coco val: 5000
Creating model
Start training
Train Epoch: [0]  [  0/781]  eta: 0:52:53  lr: 0.000010  lr_temp_net: 0.00000100  loss_ita: 11.8487  avg_image_tau: 0.0100  avg_text_tau: 0.0100  cur_eta: 0.0000  grad_tau_image: 0.0000  grad_tau_text: 0.0000  b_I: 0.0000  b_T: 0.0000  v: 0.0000  lamda: 0.0000  weights_image_pos: 0.0000  weights_text_pos: 0.0000  time: 4.0631  data: 0.7498  max mem: 9358
Train Epoch: [0]  [ 50/781]  eta: 0:04:27  lr: 0.000010  lr_temp_net: 0.00000100  loss_ita: 7.3330  avg_image_tau: 0.0100  avg_text_tau: 0.0100  cur_eta: 0.0000  grad_tau_image: 0.0000  grad_tau_text: 0.0000  b_I: 0.0000  b_T: 0.0000  v: 0.0000  lamda: 0.0000  weights_image_pos: 0.0000  weights_text_pos: 0.0000  time: 0.2924  data: 0.0001  max mem: 9358
Train Epoch: [0]  [100/781]  eta: 0:03:44  lr: 0.000010  lr_temp_net: 0.00000100  loss_ita: 6.9571  avg_image_tau: 0.0100  avg_text_tau: 0.0100  cur_eta: 0.0000  grad_tau_image: 0

In [20]:
!CUDA_VISIBLE_DEVICES=0 python3 ./iSogCLR/bimodal_exps/clip.py \
    --data_path ./datasets \
    --ann_path ./clip_train \
    --train_file cc3m_train_subset.json \
    --train_image_root cc3m_subset_100k \
    --output_dir output/sgd_clip_cc3m_g0.8_e30 \
    --init_model \
    --use_amp \
    --opt sgd \
    --momentum 0.9 \
    --ita_type clip \
    --tau_init 0.01 \
    --sogclr_gamma 0.8 \
    --eta_init 0.03 --sched cosine \
    --no-distributed \
    --epochs 30 \
    --evaluate --checkpoint './output/sgd_clip_cc3m_g0.8_e30/checkpoint_30.pth' \
    --zs_dataset imagenet --zs_datafolder ./datasets/imagenet/val

Creating retrieval dataset
len of train_dataset: 100000
len of coco val: 5000
Creating model
load checkpoint from ./output/sgd_clip_cc3m_g0.8_e30/checkpoint_30.pth
Start training
Computing features for evaluation...
Evaluation time 0:00:57
coco val: {'txt_r1': 10.3, 'txt_r5': 27.64, 'txt_r10': 38.16, 'txt_r_mean': 25.366666666666664, 'img_r1': 7.001479467391739, 'img_r5': 20.468631292734617, 'img_r10': 30.253108880802912, 'img_r_mean': 19.24107321364309, 'r_mean': 22.303869940154875}
zeroshot: {'zeroshot_top1': 17.006, 'zeroshot_top3': 29.352, 'zeroshot_top5': 35.214, 'zeroshot_top10': 43.664}
Training time 0:11:20


## Optimizer: SGD, Loss Function: cyclip

In [21]:
!CUDA_VISIBLE_DEVICES=0 python3 ./iSogCLR/bimodal_exps/clip.py \
    --data_path ./datasets \
    --ann_path ./clip_train \
    --train_file cc3m_train_subset.json \
    --train_image_root cc3m_subset_100k \
    --output_dir output/sgd_cyclip_cc3m_g0.8_e30 \
    --init_model \
    --use_amp \
    --opt sgd \
    --momentum 0.9 \
    --ita_type cyclip \
    --tau_init 0.01 \
    --sogclr_gamma 0.8 \
    --eta_init 0.03 --sched cosine \
    --no-distributed \
    --epochs 30

Creating retrieval dataset
len of train_dataset: 100000
len of coco val: 5000
Creating model
Start training
Train Epoch: [0]  [  0/781]  eta: 0:53:12  lr: 0.000010  lr_temp_net: 0.00000100  loss_ita: 20.7273  avg_image_tau: 0.0100  avg_text_tau: 0.0100  cur_eta: 0.0000  grad_tau_image: 0.0000  grad_tau_text: 0.0000  b_I: 0.0000  b_T: 0.0000  v: 0.0000  lamda: 0.0000  weights_image_pos: 0.0000  weights_text_pos: 0.0000  time: 4.0873  data: 0.7327  max mem: 9358
Train Epoch: [0]  [ 50/781]  eta: 0:04:29  lr: 0.000010  lr_temp_net: 0.00000100  loss_ita: 8.9171  avg_image_tau: 0.0100  avg_text_tau: 0.0100  cur_eta: 0.0000  grad_tau_image: 0.0000  grad_tau_text: 0.0000  b_I: 0.0000  b_T: 0.0000  v: 0.0000  lamda: 0.0000  weights_image_pos: 0.0000  weights_text_pos: 0.0000  time: 0.2955  data: 0.0001  max mem: 9358
Train Epoch: [0]  [100/781]  eta: 0:03:46  lr: 0.000010  lr_temp_net: 0.00000100  loss_ita: 7.3483  avg_image_tau: 0.0100  avg_text_tau: 0.0100  cur_eta: 0.0000  grad_tau_image: 0

In [22]:
!CUDA_VISIBLE_DEVICES=0 python3 ./iSogCLR/bimodal_exps/clip.py \
    --data_path ./datasets \
    --ann_path ./clip_train \
    --train_file cc3m_train_subset.json \
    --train_image_root cc3m_subset_100k \
    --output_dir output/sgd_cyclip_cc3m_g0.8_e30 \
    --init_model \
    --use_amp \
    --opt sgd \
    --momentum 0.9 \
    --ita_type cyclip \
    --tau_init 0.01 \
    --sogclr_gamma 0.8 \
    --eta_init 0.03 --sched cosine \
    --no-distributed \
    --epochs 30 \
    --evaluate --checkpoint './output/sgd_cyclip_cc3m_g0.8_e30/checkpoint_30.pth' \
    --zs_dataset imagenet --zs_datafolder ./datasets/imagenet/val

Creating retrieval dataset
len of train_dataset: 100000
len of coco val: 5000
Creating model
load checkpoint from ./output/sgd_cyclip_cc3m_g0.8_e30/checkpoint_30.pth
Start training
Computing features for evaluation...
Evaluation time 0:00:31
coco val: {'txt_r1': 10.38, 'txt_r5': 27.54, 'txt_r10': 38.94, 'txt_r_mean': 25.62, 'img_r1': 7.313367187812387, 'img_r5': 20.920468631292735, 'img_r10': 30.64496781158783, 'img_r_mean': 19.626267876897654, 'r_mean': 22.623133938448827}
zeroshot: {'zeroshot_top1': 16.858, 'zeroshot_top3': 29.238, 'zeroshot_top5': 35.094, 'zeroshot_top10': 43.78}
Training time 0:08:48


## Optimizer: SGD, Loss Function: vicreg

In [23]:
!CUDA_VISIBLE_DEVICES=0 python3 ./iSogCLR/bimodal_exps/clip.py \
    --data_path ./datasets \
    --ann_path ./clip_train \
    --train_file cc3m_train_subset.json \
    --train_image_root cc3m_subset_100k \
    --output_dir output/sgd_vicreg_cc3m_g0.8_e30 \
    --init_model \
    --use_amp \
    --opt sgd \
    --momentum 0.9 \
    --ita_type vicreg \
    --tau_init 0.01 \
    --sogclr_gamma 0.8 \
    --eta_init 0.03 --sched cosine \
    --no-distributed \
    --epochs 30

Creating retrieval dataset
len of train_dataset: 100000
len of coco val: 5000
Creating model
Start training
Train Epoch: [0]  [  0/781]  eta: 0:53:31  lr: 0.000010  lr_temp_net: 0.00000100  loss_ita: 24.4050  avg_image_tau: 0.0000  avg_text_tau: 0.0000  cur_eta: 0.0000  grad_tau_image: 0.0000  grad_tau_text: 0.0000  b_I: 0.0000  b_T: 0.0000  v: 0.0000  lamda: 0.0000  weights_image_pos: 0.0000  weights_text_pos: 0.0000  time: 4.1125  data: 0.7461  max mem: 9358
Train Epoch: [0]  [ 50/781]  eta: 0:04:28  lr: 0.000010  lr_temp_net: 0.00000100  loss_ita: 23.7173  avg_image_tau: 0.0000  avg_text_tau: 0.0000  cur_eta: 0.0000  grad_tau_image: 0.0000  grad_tau_text: 0.0000  b_I: 0.0000  b_T: 0.0000  v: 0.0000  lamda: 0.0000  weights_image_pos: 0.0000  weights_text_pos: 0.0000  time: 0.2927  data: 0.0002  max mem: 9358
Train Epoch: [0]  [100/781]  eta: 0:03:44  lr: 0.000010  lr_temp_net: 0.00000100  loss_ita: 23.4470  avg_image_tau: 0.0000  avg_text_tau: 0.0000  cur_eta: 0.0000  grad_tau_image:

In [24]:
!CUDA_VISIBLE_DEVICES=0 python3 ./iSogCLR/bimodal_exps/clip.py \
    --data_path ./datasets \
    --ann_path ./clip_train \
    --train_file cc3m_train_subset.json \
    --train_image_root cc3m_subset_100k \
    --output_dir output/sgd_vicreg_cc3m_g0.8_e30 \
    --init_model \
    --use_amp \
    --opt sgd \
    --momentum 0.9 \
    --ita_type vicreg \
    --tau_init 0.01 \
    --sogclr_gamma 0.8 \
    --eta_init 0.03 --sched cosine \
    --no-distributed \
    --epochs 30 \
    --evaluate --checkpoint './output/sgd_vicreg_cc3m_g0.8_e30/checkpoint_30.pth' \
    --zs_dataset imagenet --zs_datafolder ./datasets/imagenet/val

Creating retrieval dataset
len of train_dataset: 100000
len of coco val: 5000
Creating model
load checkpoint from ./output/sgd_vicreg_cc3m_g0.8_e30/checkpoint_30.pth
Start training
Computing features for evaluation...
Evaluation time 0:00:24
coco val: {'txt_r1': 2.0, 'txt_r5': 7.42, 'txt_r10': 12.36, 'txt_r_mean': 7.260000000000001, 'img_r1': 1.595425646767164, 'img_r5': 5.71394298052701, 'img_r10': 9.844455995841496, 'img_r_mean': 5.7179415410452235, 'r_mean': 6.488970770522612}
zeroshot: {'zeroshot_top1': 2.432, 'zeroshot_top3': 6.106, 'zeroshot_top5': 9.104, 'zeroshot_top10': 15.374}
Training time 0:08:28


## Optimizer: SGD, Loss Function: onlineclr

In [25]:
!CUDA_VISIBLE_DEVICES=0 python3 ./iSogCLR/bimodal_exps/clip.py \
    --data_path ./datasets \
    --ann_path ./clip_train \
    --train_file cc3m_train_subset.json \
    --train_image_root cc3m_subset_100k \
    --output_dir output/sgd_onlineclr_cc3m_g0.8_e30 \
    --init_model \
    --use_amp \
    --opt sgd \
    --momentum 0.9 \
    --ita_type onlineclr \
    --tau_init 0.01 \
    --sogclr_gamma 0.8 \
    --eta_init 0.03 --sched cosine \
    --no-distributed \
    --epochs 30

Creating retrieval dataset
len of train_dataset: 100000
len of coco val: 5000
Creating model
Start training
Train Epoch: [0]  [  0/781]  eta: 0:53:57  lr: 0.000010  lr_temp_net: 0.00000100  loss_ita: 0.2749  avg_image_tau: 0.0000  avg_text_tau: 0.0000  cur_eta: 0.0000  grad_tau_image: 0.0000  grad_tau_text: 0.0000  b_I: 0.0000  b_T: 0.0000  v: 0.0000  lamda: 0.0000  weights_image_pos: 0.0000  weights_text_pos: 0.0000  time: 4.1456  data: 0.7682  max mem: 9358
Train Epoch: [0]  [ 50/781]  eta: 0:04:32  lr: 0.000010  lr_temp_net: 0.00000100  loss_ita: 0.2888  avg_image_tau: 0.0000  avg_text_tau: 0.0000  cur_eta: 0.0000  grad_tau_image: 0.0000  grad_tau_text: 0.0000  b_I: 0.0000  b_T: 0.0000  v: 0.0000  lamda: 0.0000  weights_image_pos: 0.0000  weights_text_pos: 0.0000  time: 0.2973  data: 0.0001  max mem: 9358
Train Epoch: [0]  [100/781]  eta: 0:03:47  lr: 0.000010  lr_temp_net: 0.00000100  loss_ita: 0.2420  avg_image_tau: 0.0000  avg_text_tau: 0.0000  cur_eta: 0.0000  grad_tau_image: 0.

In [26]:
!CUDA_VISIBLE_DEVICES=0 python3 ./iSogCLR/bimodal_exps/clip.py \
    --data_path ./datasets \
    --ann_path ./clip_train \
    --train_file cc3m_train_subset.json \
    --train_image_root cc3m_subset_100k \
    --output_dir output/sgd_onlineclr_cc3m_g0.8_e30 \
    --init_model \
    --use_amp \
    --opt sgd \
    --momentum 0.9 \
    --ita_type onlineclr \
    --tau_init 0.01 \
    --sogclr_gamma 0.8 \
    --eta_init 0.03 --sched cosine \
    --no-distributed \
    --epochs 30 \
    --evaluate --checkpoint './output/sgd_onlineclr_cc3m_g0.8_e30/checkpoint_30.pth' \
    --zs_dataset imagenet --zs_datafolder ./datasets/imagenet/val

Creating retrieval dataset
len of train_dataset: 100000
len of coco val: 5000
Creating model
load checkpoint from ./output/sgd_onlineclr_cc3m_g0.8_e30/checkpoint_30.pth
Start training
Computing features for evaluation...
Evaluation time 0:00:20
coco val: {'txt_r1': 0.74, 'txt_r5': 3.08, 'txt_r10': 5.4, 'txt_r_mean': 3.0733333333333337, 'img_r1': 0.5557999120316686, 'img_r5': 2.463113279219481, 'img_r10': 4.454396417289776, 'img_r_mean': 2.491103202846975, 'r_mean': 2.7822182680901544}
zeroshot: {'zeroshot_top1': 1.5, 'zeroshot_top3': 3.772, 'zeroshot_top5': 5.642, 'zeroshot_top10': 9.286}
Training time 0:08:27


## Optimizer: SGD, Loss Function: isogclr_new

In [11]:
!CUDA_VISIBLE_DEVICES=0 python3 ./iSogCLR/bimodal_exps/clip.py \
    --data_path ./datasets \
    --ann_path ./clip_train \
    --train_file cc3m_train_subset.json \
    --train_image_root cc3m_subset_100k \
    --output_dir output/sgd_isogclr_new_cc3m_g0.8_e30 \
    --init_model \
    --use_amp \
    --opt sgd \
    --momentum 0.9 \
    --ita_type isogclr_new \
    --tau_init 0.01 \
    --sogclr_gamma 0.8 \
    --eta_init 0.03 --sched cosine \
    --no-distributed \
    --epochs 30

Creating retrieval dataset
len of train_dataset: 100000
len of coco val: 5000
Creating model
Start training
Train Epoch: [0]  [  0/781]  eta: 1:06:13  lr: 0.000010  lr_temp_net: 0.00000100  loss_ita: 0.2059  avg_image_tau: 0.0100  avg_text_tau: 0.0100  cur_eta: 0.0000  grad_tau_image: 4.7022  grad_tau_text: 4.2080  b_I: 5.0000  b_T: 3.2695  v: 0.0000  lamda: 0.0000  weights_image_pos: 0.0000  weights_text_pos: 0.0000  time: 5.0871  data: 0.7972  max mem: 9454
Train Epoch: [0]  [ 50/781]  eta: 0:04:44  lr: 0.000010  lr_temp_net: 0.00000100  loss_ita: 0.2282  avg_image_tau: 0.0100  avg_text_tau: 0.0100  cur_eta: 0.0000  grad_tau_image: 4.6586  grad_tau_text: 3.7451  b_I: 5.0000  b_T: 3.1973  v: 0.0000  lamda: 0.0000  weights_image_pos: 0.0000  weights_text_pos: 0.0000  time: 0.2957  data: 0.0001  max mem: 9454
Train Epoch: [0]  [100/781]  eta: 0:03:53  lr: 0.000010  lr_temp_net: 0.00000100  loss_ita: 0.1884  avg_image_tau: 0.0100  avg_text_tau: 0.0100  cur_eta: 0.0000  grad_tau_image: 4.

In [12]:
!CUDA_VISIBLE_DEVICES=0 python3 ./iSogCLR/bimodal_exps/clip.py \
    --data_path ./datasets \
    --ann_path ./clip_train \
    --train_file cc3m_train_subset.json \
    --train_image_root cc3m_subset_100k \
    --output_dir output/sgd_isogclr_new_cc3m_g0.8_e30 \
    --init_model \
    --use_amp \
    --opt sgd \
    --momentum 0.9 \
    --ita_type isogclr_new \
    --tau_init 0.01 \
    --sogclr_gamma 0.8 \
    --eta_init 0.03 --sched cosine \
    --no-distributed \
    --epochs 30 \
    --evaluate --checkpoint './output/sgd_isogclr_new_cc3m_g0.8_e30/checkpoint_30.pth' \
    --zs_dataset imagenet --zs_datafolder ./datasets/imagenet/val

Creating retrieval dataset
len of train_dataset: 100000
len of coco val: 5000
Creating model
load checkpoint from ./output/sgd_isogclr_new_cc3m_g0.8_e30/checkpoint_30.pth
Start training
Computing features for evaluation...
Evaluation time 0:00:18
coco val: {'txt_r1': 1.54, 'txt_r5': 5.54, 'txt_r10': 9.78, 'txt_r_mean': 5.62, 'img_r1': 0.9596545243712263, 'img_r5': 4.1585029389419805, 'img_r10': 7.585269303050902, 'img_r_mean': 4.234475588788036, 'r_mean': 4.927237794394018}
zeroshot: {'zeroshot_top1': 2.798, 'zeroshot_top3': 6.524, 'zeroshot_top5': 9.264, 'zeroshot_top10': 14.184}
Training time 0:04:24


## Optimizer: SGD, Loss Function: isogclr_new_v2

In [13]:
!CUDA_VISIBLE_DEVICES=0 python3 ./iSogCLR/bimodal_exps/clip.py \
    --data_path ./datasets \
    --ann_path ./clip_train \
    --train_file cc3m_train_subset.json \
    --train_image_root cc3m_subset_100k \
    --output_dir output/sgd_isogclr_new_v2_cc3m_g0.8_e30 \
    --init_model \
    --use_amp \
    --opt sgd \
    --momentum 0.9 \
    --ita_type isogclr_new_v2 \
    --tau_init 0.01 \
    --sogclr_gamma 0.8 \
    --eta_init 0.03 --sched cosine \
    --no-distributed \
    --epochs 30

Creating retrieval dataset
len of train_dataset: 100000
len of coco val: 5000
Creating model
Start training
Train Epoch: [0]  [  0/781]  eta: 1:37:52  lr: 0.000010  lr_temp_net: 0.00000100  loss_ita: 0.2201  avg_image_tau: 0.0050  avg_text_tau: 0.0050  cur_eta: 0.0300  grad_tau_image: 5.1772  grad_tau_text: 4.2097  b_I: 0.0000  b_T: 0.0000  v: 0.0000  lamda: 0.0000  weights_image_pos: 0.0000  weights_text_pos: 0.0000  time: 7.5194  data: 0.7408  max mem: 9454
Train Epoch: [0]  [ 50/781]  eta: 0:05:21  lr: 0.000010  lr_temp_net: 0.00000100  loss_ita: 0.2389  avg_image_tau: 0.0050  avg_text_tau: 0.0050  cur_eta: 0.0300  grad_tau_image: 5.0848  grad_tau_text: 3.7581  b_I: 0.0000  b_T: 0.0000  v: 0.0000  lamda: 0.0000  weights_image_pos: 0.0000  weights_text_pos: 0.0000  time: 0.2960  data: 0.0001  max mem: 9454
Train Epoch: [0]  [100/781]  eta: 0:04:11  lr: 0.000010  lr_temp_net: 0.00000100  loss_ita: 0.1984  avg_image_tau: 0.0050  avg_text_tau: 0.0050  cur_eta: 0.0300  grad_tau_image: 5.

In [14]:
!CUDA_VISIBLE_DEVICES=0 python3 ./iSogCLR/bimodal_exps/clip.py \
    --data_path ./datasets \
    --ann_path ./clip_train \
    --train_file cc3m_train_subset.json \
    --train_image_root cc3m_subset_100k \
    --output_dir output/sgd_isogclr_new_v2_cc3m_g0.8_e30 \
    --init_model \
    --use_amp \
    --opt sgd \
    --momentum 0.9 \
    --ita_type isogclr_new_v2 \
    --tau_init 0.01 \
    --sogclr_gamma 0.8 \
    --eta_init 0.03 --sched cosine \
    --no-distributed \
    --epochs 30 \
    --evaluate --checkpoint './output/sgd_isogclr_new_v2_cc3m_g0.8_e30/checkpoint_30.pth' \
    --zs_dataset imagenet --zs_datafolder ./datasets/imagenet/val

Creating retrieval dataset
len of train_dataset: 100000
len of coco val: 5000
Creating model
load checkpoint from ./output/sgd_isogclr_new_v2_cc3m_g0.8_e30/checkpoint_30.pth
Start training
Computing features for evaluation...
Evaluation time 0:00:24
coco val: {'txt_r1': 0.92, 'txt_r5': 3.36, 'txt_r10': 6.04, 'txt_r_mean': 3.44, 'img_r1': 0.7477308169059139, 'img_r5': 2.9189491782958137, 'img_r10': 5.478027909952417, 'img_r_mean': 3.048235968384715, 'r_mean': 3.2441179841923575}
zeroshot: {'zeroshot_top1': 1.824, 'zeroshot_top3': 4.636, 'zeroshot_top5': 6.78, 'zeroshot_top10': 10.874}
Training time 0:05:45


## Optimizer: Adam, Loss Function: SogCLR

In [1]:
!CUDA_VISIBLE_DEVICES=0 python3 ./iSogCLR/bimodal_exps/clip.py \
    --data_path ./datasets \
    --ann_path ./clip_train \
    --train_file cc3m_train_subset.json \
    --train_image_root cc3m_subset_100k \
    --output_dir output/adam_sogclr_cc3m_g0.8_e30 \
    --init_model \
    --use_amp \
    --opt adam \
    --ita_type sogclr \
    --tau_init 0.01 \
    --sogclr_gamma 0.8 \
    --eta_init 0.03 --sched cosine \
    --no-distributed \
    --epochs 30

Creating retrieval dataset
len of train_dataset: 100000
len of coco val: 5000
Creating model
Start training
Train Epoch: [0]  [  0/781]  eta: 0:54:54  lr: 0.000010  lr_temp_net: 0.00000100  loss_ita: 0.2059  avg_image_tau: 0.0000  avg_text_tau: 0.0000  cur_eta: 0.0000  grad_tau_image: 0.0000  grad_tau_text: 0.0000  b_I: 0.0000  b_T: 0.0000  v: 0.0000  lamda: 0.0000  weights_image_pos: 0.0000  weights_text_pos: 0.0000  time: 4.2188  data: 0.7632  max mem: 9406
Train Epoch: [0]  [ 50/781]  eta: 0:04:37  lr: 0.000010  lr_temp_net: 0.00000100  loss_ita: 0.1024  avg_image_tau: 0.0000  avg_text_tau: 0.0000  cur_eta: 0.0000  grad_tau_image: 0.0000  grad_tau_text: 0.0000  b_I: 0.0000  b_T: 0.0000  v: 0.0000  lamda: 0.0000  weights_image_pos: 0.0000  weights_text_pos: 0.0000  time: 0.3034  data: 0.0001  max mem: 9406
Train Epoch: [0]  [100/781]  eta: 0:03:53  lr: 0.000010  lr_temp_net: 0.00000100  loss_ita: 0.0972  avg_image_tau: 0.0000  avg_text_tau: 0.0000  cur_eta: 0.0000  grad_tau_image: 0.

In [2]:
!CUDA_VISIBLE_DEVICES=0 python3 ./iSogCLR/bimodal_exps/clip.py \
    --data_path ./datasets \
    --ann_path ./clip_train \
    --train_file cc3m_train_subset.json \
    --train_image_root cc3m_subset_100k \
    --output_dir output/adam_sogclr_cc3m_g0.8_e30 \
    --init_model \
    --use_amp \
    --opt adam \
    --ita_type sogclr \
    --tau_init 0.01 \
    --sogclr_gamma 0.8 \
    --eta_init 0.03 --sched cosine \
    --no-distributed \
    --epochs 30 \
    --evaluate --checkpoint './output/adam_sogclr_cc3m_g0.8_e30/checkpoint_30.pth' \
    --zs_dataset imagenet --zs_datafolder ./datasets/imagenet/val

Creating retrieval dataset
len of train_dataset: 100000
len of coco val: 5000
Creating model
load checkpoint from ./output/adam_sogclr_cc3m_g0.8_e30/checkpoint_30.pth
Start training
Computing features for evaluation...
Evaluation time 0:00:37
coco val: {'txt_r1': 0.1, 'txt_r5': 0.5, 'txt_r10': 0.94, 'txt_r_mean': 0.5133333333333333, 'img_r1': 0.10396257347354952, 'img_r5': 0.5038186252948939, 'img_r10': 0.9356631612619457, 'img_r_mean': 0.5144814533434631, 'r_mean': 0.5139073933383982}
zeroshot: {'zeroshot_top1': 0.23, 'zeroshot_top3': 0.664, 'zeroshot_top5': 1.144, 'zeroshot_top10': 2.216}
Training time 0:09:18


## Optimizer: Adam, Loss Function: CLIP

In [1]:
!CUDA_VISIBLE_DEVICES=0 python3 ./iSogCLR/bimodal_exps/clip.py \
    --data_path ./datasets \
    --ann_path ./clip_train \
    --train_file cc3m_train_subset.json \
    --train_image_root cc3m_subset_100k \
    --output_dir output/adam_clip_cc3m_g0.8_e30 \
    --init_model \
    --use_amp \
    --opt adam \
    --ita_type clip \
    --tau_init 0.01 \
    --sogclr_gamma 0.8 \
    --eta_init 0.03 --sched cosine \
    --no-distributed \
    --epochs 30

Creating retrieval dataset
len of train_dataset: 100000
len of coco val: 5000
Creating model
Start training
Train Epoch: [0]  [  0/781]  eta: 0:52:49  lr: 0.000010  lr_temp_net: 0.00000100  loss_ita: 11.8487  avg_image_tau: 0.0100  avg_text_tau: 0.0100  cur_eta: 0.0000  grad_tau_image: 0.0000  grad_tau_text: 0.0000  b_I: 0.0000  b_T: 0.0000  v: 0.0000  lamda: 0.0000  weights_image_pos: 0.0000  weights_text_pos: 0.0000  time: 4.0580  data: 0.7402  max mem: 9358
Train Epoch: [0]  [ 50/781]  eta: 0:04:32  lr: 0.000010  lr_temp_net: 0.00000100  loss_ita: 7.2009  avg_image_tau: 0.0100  avg_text_tau: 0.0100  cur_eta: 0.0000  grad_tau_image: 0.0000  grad_tau_text: 0.0000  b_I: 0.0000  b_T: 0.0000  v: 0.0000  lamda: 0.0000  weights_image_pos: 0.0000  weights_text_pos: 0.0000  time: 0.2999  data: 0.0001  max mem: 9358
Train Epoch: [0]  [100/781]  eta: 0:03:49  lr: 0.000010  lr_temp_net: 0.00000100  loss_ita: 6.5192  avg_image_tau: 0.0100  avg_text_tau: 0.0100  cur_eta: 0.0000  grad_tau_image: 0

In [2]:
!CUDA_VISIBLE_DEVICES=0 python3 ./iSogCLR/bimodal_exps/clip.py \
    --data_path ./datasets \
    --ann_path ./clip_train \
    --train_file cc3m_train_subset.json \
    --train_image_root cc3m_subset_100k \
    --output_dir output/adam_clip_cc3m_g0.8_e30 \
    --init_model \
    --use_amp \
    --opt adam \
    --ita_type clip \
    --tau_init 0.01 \
    --sogclr_gamma 0.8 \
    --eta_init 0.03 --sched cosine \
    --no-distributed \
    --epochs 30 \
    --evaluate --checkpoint './output/adam_clip_cc3m_g0.8_e30/checkpoint_30.pth' \
    --zs_dataset imagenet --zs_datafolder ./datasets/imagenet/val

Creating retrieval dataset
len of train_dataset: 100000
len of coco val: 5000
Creating model
load checkpoint from ./output/adam_clip_cc3m_g0.8_e30/checkpoint_30.pth
Start training
Computing features for evaluation...
Evaluation time 0:00:17
coco val: {'txt_r1': 3.34, 'txt_r5': 11.36, 'txt_r10': 18.06, 'txt_r_mean': 10.92, 'img_r1': 3.0349074333240034, 'img_r5': 9.840457435323284, 'img_r10': 15.64236874725099, 'img_r_mean': 9.505911205299425, 'r_mean': 10.212955602649712}
zeroshot: {'zeroshot_top1': 4.712, 'zeroshot_top3': 9.738, 'zeroshot_top5': 13.04, 'zeroshot_top10': 18.416}
Training time 0:02:49


## Optimizer: Adam, Loss Function: cyclip

In [3]:
!CUDA_VISIBLE_DEVICES=0 python3 ./iSogCLR/bimodal_exps/clip.py \
    --data_path ./datasets \
    --ann_path ./clip_train \
    --train_file cc3m_train_subset.json \
    --train_image_root cc3m_subset_100k \
    --output_dir output/adam_cyclip_cc3m_g0.8_e30 \
    --init_model \
    --use_amp \
    --opt adam \
    --ita_type cyclip \
    --tau_init 0.01 \
    --sogclr_gamma 0.8 \
    --eta_init 0.03 --sched cosine \
    --no-distributed \
    --epochs 30

Creating retrieval dataset
len of train_dataset: 100000
len of coco val: 5000
Creating model
Start training
Train Epoch: [0]  [  0/781]  eta: 0:53:48  lr: 0.000010  lr_temp_net: 0.00000100  loss_ita: 20.7276  avg_image_tau: 0.0100  avg_text_tau: 0.0100  cur_eta: 0.0000  grad_tau_image: 0.0000  grad_tau_text: 0.0000  b_I: 0.0000  b_T: 0.0000  v: 0.0000  lamda: 0.0000  weights_image_pos: 0.0000  weights_text_pos: 0.0000  time: 4.1336  data: 0.7412  max mem: 9358
Train Epoch: [0]  [ 50/781]  eta: 0:04:36  lr: 0.000010  lr_temp_net: 0.00000100  loss_ita: 8.9034  avg_image_tau: 0.0100  avg_text_tau: 0.0100  cur_eta: 0.0000  grad_tau_image: 0.0000  grad_tau_text: 0.0000  b_I: 0.0000  b_T: 0.0000  v: 0.0000  lamda: 0.0000  weights_image_pos: 0.0000  weights_text_pos: 0.0000  time: 0.3052  data: 0.0001  max mem: 9358
Train Epoch: [0]  [100/781]  eta: 0:03:52  lr: 0.000010  lr_temp_net: 0.00000100  loss_ita: 6.5416  avg_image_tau: 0.0100  avg_text_tau: 0.0100  cur_eta: 0.0000  grad_tau_image: 0

In [4]:
!CUDA_VISIBLE_DEVICES=0 python3 ./iSogCLR/bimodal_exps/clip.py \
    --data_path ./datasets \
    --ann_path ./clip_train \
    --train_file cc3m_train_subset.json \
    --train_image_root cc3m_subset_100k \
    --output_dir output/adam_cyclip_cc3m_g0.8_e30 \
    --init_model \
    --use_amp \
    --opt adam \
    --ita_type cyclip \
    --tau_init 0.01 \
    --sogclr_gamma 0.8 \
    --eta_init 0.03 --sched cosine \
    --no-distributed \
    --epochs 30 \
    --evaluate --checkpoint './output/adam_cyclip_cc3m_g0.8_e30/checkpoint_30.pth' \
    --zs_dataset imagenet --zs_datafolder ./datasets/imagenet/val

Creating retrieval dataset
len of train_dataset: 100000
len of coco val: 5000
Creating model
load checkpoint from ./output/adam_cyclip_cc3m_g0.8_e30/checkpoint_30.pth
Start training
Computing features for evaluation...
Evaluation time 0:00:16
coco val: {'txt_r1': 3.92, 'txt_r5': 12.6, 'txt_r10': 19.18, 'txt_r_mean': 11.9, 'img_r1': 3.2548282618257427, 'img_r5': 10.41625014994602, 'img_r10': 16.36210964052941, 'img_r_mean': 10.011062684100391, 'r_mean': 10.955531342050197}
zeroshot: {'zeroshot_top1': 4.044, 'zeroshot_top3': 9.064, 'zeroshot_top5': 12.536, 'zeroshot_top10': 18.496}
Training time 0:08:48


## Optimizer: Adam, Loss Function: vicreg

In [5]:
!CUDA_VISIBLE_DEVICES=0 python3 ./iSogCLR/bimodal_exps/clip.py \
    --data_path ./datasets \
    --ann_path ./clip_train \
    --train_file cc3m_train_subset.json \
    --train_image_root cc3m_subset_100k \
    --output_dir output/adam_vicreg_cc3m_g0.8_e30 \
    --init_model \
    --use_amp \
    --opt adam \
    --ita_type vicreg \
    --tau_init 0.01 \
    --sogclr_gamma 0.8 \
    --eta_init 0.03 --sched cosine \
    --no-distributed \
    --epochs 30

Creating retrieval dataset
len of train_dataset: 100000
len of coco val: 5000
Creating model
Start training
Train Epoch: [0]  [  0/781]  eta: 0:55:41  lr: 0.000010  lr_temp_net: 0.00000100  loss_ita: 24.4050  avg_image_tau: 0.0000  avg_text_tau: 0.0000  cur_eta: 0.0000  grad_tau_image: 0.0000  grad_tau_text: 0.0000  b_I: 0.0000  b_T: 0.0000  v: 0.0000  lamda: 0.0000  weights_image_pos: 0.0000  weights_text_pos: 0.0000  time: 4.2782  data: 0.8481  max mem: 9358
Train Epoch: [0]  [ 50/781]  eta: 0:04:36  lr: 0.000010  lr_temp_net: 0.00000100  loss_ita: 22.7305  avg_image_tau: 0.0000  avg_text_tau: 0.0000  cur_eta: 0.0000  grad_tau_image: 0.0000  grad_tau_text: 0.0000  b_I: 0.0000  b_T: 0.0000  v: 0.0000  lamda: 0.0000  weights_image_pos: 0.0000  weights_text_pos: 0.0000  time: 0.3008  data: 0.0002  max mem: 9358
Train Epoch: [0]  [100/781]  eta: 0:03:51  lr: 0.000010  lr_temp_net: 0.00000100  loss_ita: 22.5984  avg_image_tau: 0.0000  avg_text_tau: 0.0000  cur_eta: 0.0000  grad_tau_image:

In [6]:
!CUDA_VISIBLE_DEVICES=0 python3 ./iSogCLR/bimodal_exps/clip.py \
    --data_path ./datasets \
    --ann_path ./clip_train \
    --train_file cc3m_train_subset.json \
    --train_image_root cc3m_subset_100k \
    --output_dir output/adam_vicreg_cc3m_g0.8_e30 \
    --init_model \
    --use_amp \
    --opt adam \
    --ita_type vicreg \
    --tau_init 0.01 \
    --sogclr_gamma 0.8 \
    --eta_init 0.03 --sched cosine \
    --no-distributed \
    --epochs 30 \
    --evaluate --checkpoint './output/adam_vicreg_cc3m_g0.8_e30/checkpoint_30.pth' \
    --zs_dataset imagenet --zs_datafolder ./datasets/imagenet/val

Creating retrieval dataset
len of train_dataset: 100000
len of coco val: 5000
Creating model
load checkpoint from ./output/adam_vicreg_cc3m_g0.8_e30/checkpoint_30.pth
Start training
Computing features for evaluation...
Evaluation time 0:00:16
coco val: {'txt_r1': 0.66, 'txt_r5': 2.54, 'txt_r10': 4.5, 'txt_r_mean': 2.566666666666667, 'img_r1': 0.631772561877724, 'img_r5': 2.491103202846975, 'img_r10': 4.402415130553001, 'img_r_mean': 2.5084302984259, 'r_mean': 2.5375484825462835}
zeroshot: {'zeroshot_top1': 1.284, 'zeroshot_top3': 3.43, 'zeroshot_top5': 5.466, 'zeroshot_top10': 9.416}
Training time 0:03:31


## Optimizer: Adam, Loss Function: onlineclr

In [7]:
!CUDA_VISIBLE_DEVICES=0 python3 ./iSogCLR/bimodal_exps/clip.py \
    --data_path ./datasets \
    --ann_path ./clip_train \
    --train_file cc3m_train_subset.json \
    --train_image_root cc3m_subset_100k \
    --output_dir output/adam_onlineclr_cc3m_g0.8_e30 \
    --init_model \
    --use_amp \
    --opt adam \
    --ita_type onlineclr \
    --tau_init 0.01 \
    --sogclr_gamma 0.8 \
    --eta_init 0.03 --sched cosine \
    --no-distributed \
    --epochs 30

Creating retrieval dataset
len of train_dataset: 100000
len of coco val: 5000
Creating model
Start training
Train Epoch: [0]  [  0/781]  eta: 0:54:49  lr: 0.000010  lr_temp_net: 0.00000100  loss_ita: 0.2747  avg_image_tau: 0.0000  avg_text_tau: 0.0000  cur_eta: 0.0000  grad_tau_image: 0.0000  grad_tau_text: 0.0000  b_I: 0.0000  b_T: 0.0000  v: 0.0000  lamda: 0.0000  weights_image_pos: 0.0000  weights_text_pos: 0.0000  time: 4.2121  data: 0.7570  max mem: 9358
Train Epoch: [0]  [ 50/781]  eta: 0:04:38  lr: 0.000010  lr_temp_net: 0.00000100  loss_ita: 0.1167  avg_image_tau: 0.0000  avg_text_tau: 0.0000  cur_eta: 0.0000  grad_tau_image: 0.0000  grad_tau_text: 0.0000  b_I: 0.0000  b_T: 0.0000  v: 0.0000  lamda: 0.0000  weights_image_pos: 0.0000  weights_text_pos: 0.0000  time: 0.3043  data: 0.0001  max mem: 9358
Train Epoch: [0]  [100/781]  eta: 0:03:53  lr: 0.000010  lr_temp_net: 0.00000100  loss_ita: 0.1531  avg_image_tau: 0.0000  avg_text_tau: 0.0000  cur_eta: 0.0000  grad_tau_image: 0.

In [8]:
!CUDA_VISIBLE_DEVICES=0 python3 ./iSogCLR/bimodal_exps/clip.py \
    --data_path ./datasets \
    --ann_path ./clip_train \
    --train_file cc3m_train_subset.json \
    --train_image_root cc3m_subset_100k \
    --output_dir output/adam_onlineclr_cc3m_g0.8_e30 \
    --init_model \
    --use_amp \
    --opt adam \
    --ita_type onlineclr \
    --tau_init 0.01 \
    --sogclr_gamma 0.8 \
    --eta_init 0.03 --sched cosine \
    --no-distributed \
    --epochs 30 \
    --evaluate --checkpoint './output/adam_onlineclr_cc3m_g0.8_e30/checkpoint_30.pth' \
    --zs_dataset imagenet --zs_datafolder ./datasets/imagenet/val

Creating retrieval dataset
len of train_dataset: 100000
len of coco val: 5000
Creating model
load checkpoint from ./output/adam_onlineclr_cc3m_g0.8_e30/checkpoint_30.pth
Start training
Computing features for evaluation...
Evaluation time 0:00:15
coco val: {'txt_r1': 0.02, 'txt_r5': 0.04, 'txt_r10': 0.06, 'txt_r_mean': 0.04, 'img_r1': 0.019992802591067216, 'img_r5': 0.09996401295533608, 'img_r10': 0.19992802591067216, 'img_r_mean': 0.10662828048569183, 'r_mean': 0.07331414024284591}
zeroshot: {'zeroshot_top1': 0.1, 'zeroshot_top3': 0.3, 'zeroshot_top5': 0.5, 'zeroshot_top10': 1.0}
Training time 0:02:50


## Optimizer: Adam, Loss Function: isogclr_new

In [15]:
!CUDA_VISIBLE_DEVICES=0 python3 ./iSogCLR/bimodal_exps/clip.py \
    --data_path ./datasets \
    --ann_path ./clip_train \
    --train_file cc3m_train_subset.json \
    --train_image_root cc3m_subset_100k \
    --output_dir output/adam_isogclr_new_cc3m_g0.8_e30 \
    --init_model \
    --use_amp \
    --opt adam \
    --ita_type isogclr_new \
    --tau_init 0.01 \
    --sogclr_gamma 0.8 \
    --eta_init 0.03 --sched cosine \
    --no-distributed \
    --epochs 30

Creating retrieval dataset
len of train_dataset: 100000
len of coco val: 5000
Creating model
Start training
Train Epoch: [0]  [  0/781]  eta: 2:33:46  lr: 0.000010  lr_temp_net: 0.00000100  loss_ita: 0.2059  avg_image_tau: 0.0100  avg_text_tau: 0.0100  cur_eta: 0.0000  grad_tau_image: 4.7022  grad_tau_text: 4.2080  b_I: 5.0000  b_T: 3.2695  v: 0.0000  lamda: 0.0000  weights_image_pos: 0.0000  weights_text_pos: 0.0000  time: 11.8135  data: 1.7720  max mem: 9454
Train Epoch: [0]  [ 50/781]  eta: 0:10:02  lr: 0.000010  lr_temp_net: 0.00000100  loss_ita: 0.1023  avg_image_tau: 0.0100  avg_text_tau: 0.0100  cur_eta: 0.0000  grad_tau_image: 4.9888  grad_tau_text: 4.8112  b_I: 5.0000  b_T: 3.7519  v: 0.0000  lamda: 0.0000  weights_image_pos: 0.0000  weights_text_pos: 0.0000  time: 0.6211  data: 0.3173  max mem: 9454
Train Epoch: [0]  [100/781]  eta: 0:07:49  lr: 0.000010  lr_temp_net: 0.00000100  loss_ita: 0.0974  avg_image_tau: 0.0100  avg_text_tau: 0.0100  cur_eta: 0.0000  grad_tau_image: 5

In [16]:
!CUDA_VISIBLE_DEVICES=0 python3 ./iSogCLR/bimodal_exps/clip.py \
    --data_path ./datasets \
    --ann_path ./clip_train \
    --train_file cc3m_train_subset.json \
    --train_image_root cc3m_subset_100k \
    --output_dir output/adam_isogclr_new_cc3m_g0.8_e30 \
    --init_model \
    --use_amp \
    --opt adam \
    --ita_type isogclr_new \
    --tau_init 0.01 \
    --sogclr_gamma 0.8 \
    --eta_init 0.03 --sched cosine \
    --no-distributed \
    --epochs 30 \
    --evaluate --checkpoint './output/adam_isogclr_new_cc3m_g0.8_e30/checkpoint_30.pth' \
    --zs_dataset imagenet --zs_datafolder ./datasets/imagenet/val

Creating retrieval dataset
len of train_dataset: 100000
len of coco val: 5000
Creating model
load checkpoint from ./output/adam_isogclr_new_cc3m_g0.8_e30/checkpoint_30.pth
Start training
Computing features for evaluation...
Evaluation time 0:00:42
coco val: {'txt_r1': 0.04, 'txt_r5': 0.52, 'txt_r10': 1.0, 'txt_r_mean': 0.52, 'img_r1': 0.11595825502818985, 'img_r5': 0.5318085489223879, 'img_r10': 0.9756487664440802, 'img_r_mean': 0.5411385234648859, 'r_mean': 0.5305692617324429}
zeroshot: {'zeroshot_top1': 0.24, 'zeroshot_top3': 0.746, 'zeroshot_top5': 1.206, 'zeroshot_top10': 2.274}
Training time 0:03:19


## Optimizer: Adam, Loss Function: isogclr_new_v2

In [17]:
!CUDA_VISIBLE_DEVICES=0 python3 ./iSogCLR/bimodal_exps/clip.py \
    --data_path ./datasets \
    --ann_path ./clip_train \
    --train_file cc3m_train_subset.json \
    --train_image_root cc3m_subset_100k \
    --output_dir output/adam_isogclr_new_v2_cc3m_g0.8_e30 \
    --init_model \
    --use_amp \
    --opt adam \
    --ita_type isogclr_new_v2 \
    --tau_init 0.01 \
    --sogclr_gamma 0.8 \
    --eta_init 0.03 --sched cosine \
    --no-distributed \
    --epochs 30

Creating retrieval dataset
len of train_dataset: 100000
len of coco val: 5000
Creating model
Start training
Train Epoch: [0]  [  0/781]  eta: 0:55:03  lr: 0.000010  lr_temp_net: 0.00000100  loss_ita: 0.2201  avg_image_tau: 0.0050  avg_text_tau: 0.0050  cur_eta: 0.0300  grad_tau_image: 5.1772  grad_tau_text: 4.2097  b_I: 0.0000  b_T: 0.0000  v: 0.0000  lamda: 0.0000  weights_image_pos: 0.0000  weights_text_pos: 0.0000  time: 4.2299  data: 0.8231  max mem: 9454
Train Epoch: [0]  [ 50/781]  eta: 0:04:38  lr: 0.000010  lr_temp_net: 0.00000100  loss_ita: 0.1079  avg_image_tau: 0.0050  avg_text_tau: 0.0050  cur_eta: 0.0300  grad_tau_image: 6.7579  grad_tau_text: 4.9895  b_I: 0.0000  b_T: 0.0000  v: 0.0000  lamda: 0.0000  weights_image_pos: 0.0000  weights_text_pos: 0.0000  time: 0.3046  data: 0.0001  max mem: 9454
Train Epoch: [0]  [100/781]  eta: 0:03:53  lr: 0.000010  lr_temp_net: 0.00000100  loss_ita: 0.1000  avg_image_tau: 0.0050  avg_text_tau: 0.0050  cur_eta: 0.0300  grad_tau_image: 7.

In [18]:
!CUDA_VISIBLE_DEVICES=0 python3 ./iSogCLR/bimodal_exps/clip.py \
    --data_path ./datasets \
    --ann_path ./clip_train \
    --train_file cc3m_train_subset.json \
    --train_image_root cc3m_subset_100k \
    --output_dir output/adam_isogclr_new_v2_cc3m_g0.8_e30 \
    --init_model \
    --use_amp \
    --opt adam \
    --ita_type isogclr_new_v2 \
    --tau_init 0.01 \
    --sogclr_gamma 0.8 \
    --eta_init 0.03 --sched cosine \
    --no-distributed \
    --epochs 30 \
    --evaluate --checkpoint './output/adam_isogclr_new_v2_cc3m_g0.8_e30/checkpoint_30.pth' \
    --zs_dataset imagenet --zs_datafolder ./datasets/imagenet/val

Creating retrieval dataset
len of train_dataset: 100000
len of coco val: 5000
Creating model
load checkpoint from ./output/adam_isogclr_new_v2_cc3m_g0.8_e30/checkpoint_30.pth
Start training
Computing features for evaluation...
Evaluation time 0:00:16
coco val: {'txt_r1': 0.02, 'txt_r5': 0.04, 'txt_r10': 0.06, 'txt_r_mean': 0.04, 'img_r1': 0.019992802591067216, 'img_r5': 0.09996401295533608, 'img_r10': 0.19992802591067216, 'img_r_mean': 0.10662828048569183, 'r_mean': 0.07331414024284591}
zeroshot: {'zeroshot_top1': 0.1, 'zeroshot_top3': 0.3, 'zeroshot_top5': 0.5, 'zeroshot_top10': 1.0}
Training time 0:05:11


## Optimizer: RAdam, Loss Function: sogclr

In [None]:
!CUDA_VISIBLE_DEVICES=0 python3 ./iSogCLR/bimodal_exps/clip.py \
    --data_path ./datasets \
    --ann_path ./clip_train \
    --train_file cc3m_train_subset.json \
    --train_image_root cc3m_subset_100k \
    --output_dir output/radam_sogclr_cc3m_g0.8_e30 \
    --init_model \
    --use_amp \
    --opt radam \
    --ita_type sogclr \
    --tau_init 0.01 \
    --sogclr_gamma 0.8 \
    --eta_init 0.03 --sched cosine \
    --no-distributed \
    --epochs 30

In [None]:
!CUDA_VISIBLE_DEVICES=0 python3 ./iSogCLR/bimodal_exps/clip.py \
    --data_path ./datasets \
    --ann_path ./clip_train \
    --train_file cc3m_train_subset.json \
    --train_image_root cc3m_subset_100k \
    --output_dir output/radam_sogclr_cc3m_g0.8_e30 \
    --init_model \
    --use_amp \
    --opt radam \
    --ita_type sogclr \
    --tau_init 0.01 \
    --sogclr_gamma 0.8 \
    --eta_init 0.03 --sched cosine \
    --no-distributed \
    --epochs 30 \
    --evaluate --checkpoint './output/radam_sogclr_cc3m_g0.8_e30/checkpoint_30.pth' \
    --zs_dataset imagenet --zs_datafolder ./datasets/imagenet/val

## Optimizer: RAdam, Loss Function: isogclr_new

In [None]:
!CUDA_VISIBLE_DEVICES=0 python3 ./iSogCLR/bimodal_exps/clip.py \
    --data_path ./datasets \
    --ann_path ./clip_train \
    --train_file cc3m_train_subset.json \
    --train_image_root cc3m_subset_100k \
    --output_dir output/radam_isogclr_new_cc3m_g0.8_e30 \
    --init_model \
    --use_amp \
    --opt radam \
    --ita_type isogclr_new \
    --tau_init 0.01 \
    --sogclr_gamma 0.8 \
    --eta_init 0.03 \
    --sched cosine \
    --no-distributed \
    --epochs 30

In [None]:
!CUDA_VISIBLE_DEVICES=0 python3 ./iSogCLR/bimodal_exps/clip.py \
    --data_path ./datasets \
    --ann_path ./clip_train \
    --train_file cc3m_train_subset.json \
    --train_image_root cc3m_subset_100k \
    --output_dir output/radam_isogclr_new_cc3m_g0.8_e30 \
    --init_model \
    --use_amp \
    --opt radam \
    --ita_type isogclr_new \
    --tau_init 0.01 \
    --sogclr_gamma 0.8 \
    --eta_init 0.03 \
    --sched cosine \
    --no-distributed \
    --epochs 30 \
    --evaluate \
    --checkpoint './output/radam_isogclr_new_cc3m_g0.8_e30/checkpoint_30.pth' \
    --zs_dataset imagenet \
    --zs_datafolder ./datasets/imagenet/val

## Optimizer: RAdam, Loss Function: isogclr_new_v2

In [None]:
!CUDA_VISIBLE_DEVICES=0 python3 ./iSogCLR/bimodal_exps/clip.py \
    --data_path ./datasets \
    --ann_path ./clip_train \
    --train_file cc3m_train_subset.json \
    --train_image_root cc3m_subset_100k \
    --output_dir output/radam_isogclr_new_v2_cc3m_g0.8_e30 \
    --init_model \
    --use_amp \
    --opt radam \
    --ita_type isogclr_new_v2 \
    --tau_init 0.01 \
    --sogclr_gamma 0.8 \
    --eta_init 0.03 \
    --sched cosine \
    --no-distributed \
    --epochs 30

In [None]:
!CUDA_VISIBLE_DEVICES=0 python3 ./iSogCLR/bimodal_exps/clip.py \
    --data_path ./datasets \
    --ann_path ./clip_train \
    --train_file cc3m_train_subset.json \
    --train_image_root cc3m_subset_100k \
    --output_dir output/radam_isogclr_new_v2_cc3m_g0.8_e30 \
    --init_model \
    --use_amp \
    --opt radam \
    --ita_type isogclr_new_v2 \
    --tau_init 0.01 \
    --sogclr_gamma 0.8 \
    --eta_init 0.03 \
    --sched cosine \
    --no-distributed \
    --epochs 30 \
    --evaluate \
    --checkpoint './output/radam_isogclr_new_v2_cc3m_g0.8_e30/checkpoint_30.pth' \
    --zs_dataset imagenet \
    --zs_datafolder ./datasets/imagenet/val

## Optimizer: RAdam, Loss Function: clip

In [None]:
!CUDA_VISIBLE_DEVICES=0 python3 ./iSogCLR/bimodal_exps/clip.py \
    --data_path ./datasets \
    --ann_path ./clip_train \
    --train_file cc3m_train_subset.json \
    --train_image_root cc3m_subset_100k \
    --output_dir output/radam_clip_cc3m_g0.8_e30 \
    --init_model \
    --use_amp \
    --opt radam \
    --ita_type clip \
    --tau_init 0.01 \
    --eta_init 0.03 \
    --sched cosine \
    --no-distributed \
    --epochs 30

In [None]:
!CUDA_VISIBLE_DEVICES=0 python3 ./iSogCLR/bimodal_exps/clip.py \
    --data_path ./datasets \
    --ann_path ./clip_train \
    --train_file cc3m_train_subset.json \
    --train_image_root cc3m_subset_100k \
    --output_dir output/radam_clip_cc3m_g0.8_e30 \
    --init_model \
    --use_amp \
    --opt radam \
    --ita_type clip \
    --tau_init 0.01 \
    --sogclr_gamma 0.8 \
    --eta_init 0.03 \
    --sched cosine \
    --no-distributed \
    --epochs 30 \
    --evaluate --checkpoint './output/radam_clip_cc3m_g0.8_e30/checkpoint_30.pth' \
    --zs_dataset imagenet --zs_datafolder ./datasets/imagenet/val

## Optimizer: RAdam, Loss Function: cyclip

In [None]:
!CUDA_VISIBLE_DEVICES=0 python3 ./iSogCLR/bimodal_exps/clip.py \
    --data_path ./datasets \
    --ann_path ./clip_train \
    --train_file cc3m_train_subset.json \
    --train_image_root cc3m_subset_100k \
    --output_dir output/radam_cyclip_cc3m_g0.8_e30 \
    --init_model \
    --use_amp \
    --opt radam \
    --ita_type cyclip \
    --tau_init 0.01 \
    --sogclr_gamma 0.8 \
    --eta_init 0.03 --sched cosine \
    --no-distributed \
    --epochs 30

In [None]:
!CUDA_VISIBLE_DEVICES=0 python3 ./iSogCLR/bimodal_exps/clip.py \
    --data_path ./datasets \
    --ann_path ./clip_train \
    --train_file cc3m_train_subset.json \
    --train_image_root cc3m_subset_100k \
    --output_dir output/radam_cyclip_cc3m_g0.8_e30 \
    --init_model \
    --use_amp \
    --opt radam \
    --ita_type cyclip \
    --tau_init 0.01 \
    --sogclr_gamma 0.8 \
    --eta_init 0.03 --sched cosine \
    --no-distributed \
    --epochs 30 \
    --evaluate --checkpoint './output/radam_cyclip_cc3m_g0.8_e30/checkpoint_30.pth' \
    --zs_dataset imagenet --zs_datafolder ./datasets/imagenet/val

## Optimizer: RAdam, Loss Function: vicreg

In [None]:
!CUDA_VISIBLE_DEVICES=0 python3 ./iSogCLR/bimodal_exps/clip.py \
    --data_path ./datasets \
    --ann_path ./clip_train \
    --train_file cc3m_train_subset.json \
    --train_image_root cc3m_subset_100k \
    --output_dir output/radam_vicreg_cc3m_g0.8_e30 \
    --init_model \
    --use_amp \
    --opt radam \
    --ita_type vicreg \
    --tau_init 0.01 \
    --eta_init 0.03 \
    --sched cosine \
    --no-distributed \
    --epochs 30

In [None]:
!CUDA_VISIBLE_DEVICES=0 python3 ./iSogCLR/bimodal_exps/clip.py \
    --data_path ./datasets \
    --ann_path ./clip_train \
    --train_file cc3m_train_subset.json \
    --train_image_root cc3m_subset_100k \
    --output_dir output/radam_vicreg_cc3m_g0.8_e30 \
    --init_model \
    --use_amp \
    --opt radam \
    --ita_type vicreg \
    --tau_init 0.01 \
    --eta_init 0.03 \
    --sched cosine \
    --no-distributed \
    --epochs 30 \
    --evaluate --checkpoint './output/radam_vicreg_cc3m_g0.8_e30/checkpoint_30.pth' \
    --zs_dataset imagenet --zs_datafolder ./datasets/imagenet/val

## Optimizer: RAdam, Loss Function: onlineclr

In [None]:
!CUDA_VISIBLE_DEVICES=0 python3 ./iSogCLR/bimodal_exps/clip.py \
    --data_path ./datasets \
    --ann_path ./clip_train \
    --train_file cc3m_train_subset.json \
    --train_image_root cc3m_subset_100k \
    --output_dir output/radam_onlineclr_cc3m_g0.8_e30 \
    --init_model \
    --use_amp \
    --opt radam \
    --ita_type onlineclr \
    --tau_init 0.01 \
    --eta_init 0.03 \
    --sched cosine \
    --no-distributed \
    --epochs 30

In [None]:
!CUDA_VISIBLE_DEVICES=0 python3 ./iSogCLR/bimodal_exps/clip.py \
    --data_path ./datasets \
    --ann_path ./clip_train \
    --train_file cc3m_train_subset.json \
    --train_image_root cc3m_subset_100k \
    --output_dir output/radam_onlineclr_cc3m_g0.8_e30 \
    --init_model \
    --use_amp \
    --opt radam \
    --ita_type onlineclr \
    --tau_init 0.01 \
    --sogclr_gamma 0.8 \
    --eta_init 0.03 \
    --sched cosine \
    --no-distributed \
    --epochs 30 \
    --evaluate --checkpoint './output/radam_onlineclr_cc3m_g0.8_e30/checkpoint_30.pth' \
    --zs_dataset imagenet --zs_datafolder ./datasets/imagenet/val