# AIOK Model Adapter DEMO
Model Adapter extends AIOk optimized models with knowledge transfer technology. It is a convenient framework can be used to reduce training and inference time, or data labeling cost by efficiently utilizing public advanced models and those datasets from many domains. Model Adapter mainly contains three components served for different cases: Finetuner, Distiller, and Domain Adapter. 

# Content
* ### 1. [Model Adapter S](#Framework)
* ### 2. [Environment Setup](#Environment-setup)
* ### 3. [Launch training](#Launch-training)
* ### 4. [Optimizations](#Optimizations)
* ### 5. [Performance Overview](#Performance-overview)

## 1. Framework


Transfer Learning Kit is a general and convenient framework for transfer knowledge from pretrained model and/or source domain data to target task. Its objectives are:
* Transfer knowledge from pretrained model with the same/different network structure, which greatly speedups training without accuracy regression.
* Transfer knowledge from source domain data without target label.

The hierarchy of Transfer Learning Kit is list below. And, there are 5 key components in our Transfer Learning Kit:

1.	Backbone Factory: creates a backbone net according to predefined backbone or user-provided backbone to make basic prediction. 
2.	Task Finetunner: creates a pretrained finetuning schema (called “finetunner”) to transfer knowledge from a pretrained model to target model with the same network structure.
3.	Domain Adapter: creates a domain adaption net (called “adapter”) to transfer knowledge from source domain to target domain.
4.	Knowledge Distiller: creates a knowledge distillation net (called “distiller”) to transfer knowledge from teacher model to target model. 
5.	Transferrable Model: creates a customized and transferrable model which is a wrapper of backbone, adapter and distiller.
![Framework](../doc/imgs/framework.png)

### 1.1 Finetunner
Transfer knowledge from pretrained model to target model with same network structure.

* Pretrained models are generated by pretraining process, which is training specific model  on specific dataset and has been performed by DE-NAS, PyTorch, TensorFlow, or HuggingFace.
* Finetunner retrieves  the pretrained model with same network structure, and copy pretrained weights from pretrained model to corresponding layer of target model, instead of random initialization for target mode.
* Finetunner can greatly improve training speed, and usually achieves better performance.

![Finetunner](../doc/imgs/finetunner.png)

### 1.2 Distiller
Transfer knowledge from a heavy model (teacher) to a light one (student) with different structure.

* Teacher is a large model pretrained on specific dataset, which contains sufficient knowledge for this task, while the student model has much smaller structure. Distiller trains the student not only on the dataset, but also with the help of teacher’s knowledge.
* Distiller can take use of the knowledge from the existing pretrained large models but use much less training time. It can also significantly improve the converge  speed and predicting accuracy of a small model, which is very helpful for inference.
![Distiller](../doc/imgs/distiller.png)

### 1.3 Adapter
Transfer knowledge from source domain(cheap labels) to target domain (label-free).

* Direct applying pre-trained model into target domain always cannot work due to covariate shift and label shift,  while labeling could be expensive in some domains and delays the model deployment time, which make fine-tuning not working.
* Adapter aims at reusing the transferable knowledge with the help of another labeled dataset with same learning task. That is, achieving better generalization with little labeled target dataset or achieving a competitive performance in label-free target dataset.
![Adapter](../doc/imgs/adapter.png)

### 1.4 Transferrable Model

A transferrable model is a container, which contains a backbone (the original model), a finetunner, an adapter, a distiller, and is used to enhance backbone with transfer learning ability.

We can use to make a model to be transferrable:

```
transferrable_model = make_transferrable (model, loss, finetunner, distiller, adapter,...)
```
Then we can use transferrable_model to train like original model with the help of transfer learning.

## 2. Environment Setup

1. build docker image
   ```
   cd Dockerfile-ubuntu18.04 && docker build -t aidk-pytorch110 . -f DockerfilePytorch110 && cd .. && yes | docker container prune && yes | docker image prune
   ```
2. run docker
   ```
   docker run --rm -t -d --name transfer_learning_kit --privileged --network host --shm-size=2g --device=/dev/dri \
   -v ${pretrained_model_path}:/home/vmagent/app/data/pretrained \
   -v ${output_path}:/home/vmagent/app/data/model \
   -v ${tensorboard_path}:/home/vmagent/app/data/tensorboard \
   -v ${dataset_path}:/home/vmagent/app/data/dataset \
   -v ${tlk_code_path}:/home/vmagent/app/TLK \
   -w /home/vmagent/app/TLK \
   aidk-pytorch110 /bin/bash
   ``` 
3. Enter container with `docker exec -it transfer_learning_kit /bin/bash`

4. Start the jupyter notebook service and tensorboard service
   ```
   source /opt/intel/oneapi/setvars.sh --ccl-configuration=cpu_icc --force
   conda activate pytorch-1.10.0
   pip install jupyter
   nohup jupyter notebook --notebook-dir=/home/vmagent/app/TLK --ip=0.0.0.0 --port=8899 --allow-root &
   nohup tensorboard --logdir /home/vmagent/app/data/model_saved --host=0.0.0.0 --port=6006 & 
   ```
   Now you can visit TLK demo in `http://${hostname}:8899/`, and see tensorboad log in ` http://${hostname}:6006`.

## 3. Launch training

### 3.1 Finetuner

1. Run script to train Resnet50 (or MobileNetV3/VitBase) from scratch on CIFAR100 dataset in 1 epoch:
    ```
    python main_finetunner_cifar.py ../config/baseline/cifar100_resnet50_CosinLR.yaml --opts solver.epochs 1 
   ```   
2. Run script to finetune Resnet50 (or MobileNetV3/VitBase) from scratch on CIFAR100 dataset in 1 epoch:
   ```
    python main_finetunner_cifar.py ../config/finetuner/cifar100_res50PretrainI21k.yaml --opts solver.epochs 1 
   ```

We can see the result from tensorboard after training 1 epoch: finetuning achieves **81.19%** validation accuracy, while training from scratch achieves only **10.84%**.
![Finetuner_Result1](../doc/imgs/finetuner_result.png)

All of these models achive huge improvement when using finetuning in only 1 epoch or 2 epochs:
- MobilenetV3 (1 epoch): From 6.69% (training from scratch) to 70.67% (training with finetune).
- Resnet50(1 epoch): From 10.84% (training from scratch) to 81.77% (training with finetune).
- VitBase(2 epoches): From 14.6% (training from scratch) to 64.83% (training with finetune).

![Finetuner_Result2](../doc/imgs/finetuner_result2.png)

**Training resnet50 on CIFAR100 from scratch:**

In [2]:
%run main.py --cfg ../config/baseline/cifar100_resnet50_CosinLR.yaml \
--opts solver.epochs 1 dataset.path /mnt/DP_disk1/dataset

No Trial
Model Name:resnet50
Data Name:cifar100
Transfer Learning Strategy:
Enable DDP:False
Training epochs:1
adapter:
  feature_layer_name: x
  feature_size: 500
  type: ''
dataset:
  data_drop_last: false
  num_workers: 1
  path: /mnt/DP_disk1/dataset
  test:
    batch_size: 128
  test_transform: pretrainI21k
  train_transform: pretrainI21k
  type: cifar100
  val:
    batch_size: 128
distiller:
  check_logits: false
  feature_layer_name: x
  feature_size: ''
  logits_path: ''
  logits_topk: 0
  save_logits: false
  save_logits_start_epoch: 1
  teacher:
    is_frozen: true
    pretrain: ''
    type: resnet50_v2
  type: ''
  use_saved_logits: false
experiment:
  log_interval_step: 10
  loss:
    adapter: 0.0
    backbone: 1.0
    distiller: 0.0
  model_save: /home/vmagent/app/data/model
  model_save_interval: 40
  project: finetuner
  seed: 0
  strategy: ''
  tag: cifar100_res50_PretrainI21k
  tensorboard_dir: /home/vmagent/app/data/model/finetuner/cifar100_res50_PretrainI21k/tensorbo

  0%|          | 0/169001437 [00:00<?, ?it/s]

Extracting /mnt/DP_disk1/dataset/cifar-100-python.tar.gz to /mnt/DP_disk1/dataset
Files already downloaded and verified
Model params:  23712932
Epoch [1] lr: [0.00753]
[2022-10-24 02:10:24]  epoch(1) step (0/391) Train: loss = 5.6154;	acc = 0.0000
[2022-10-24 02:10:40]  epoch(1) step (10/391) Train: loss = 5.0343;	acc = 0.0234
[2022-10-24 02:10:48]  epoch(1) step (20/391) Train: loss = 4.8966;	acc = 0.0234
[2022-10-24 02:10:56]  epoch(1) step (30/391) Train: loss = 5.4148;	acc = 0.0234
[2022-10-24 02:11:04]  epoch(1) step (40/391) Train: loss = 5.3916;	acc = 0.0078
[2022-10-24 02:11:12]  epoch(1) step (50/391) Train: loss = 5.4395;	acc = 0.0078
[2022-10-24 02:11:20]  epoch(1) step (60/391) Train: loss = 5.2647;	acc = 0.0078
[2022-10-24 02:11:27]  epoch(1) step (70/391) Train: loss = 4.9104;	acc = 0.0312
[2022-10-24 02:11:35]  epoch(1) step (80/391) Train: loss = 4.9845;	acc = 0.0469
[2022-10-24 02:11:43]  epoch(1) step (90/391) Train: loss = 4.6626;	acc = 0.0156
[2022-10-24 02:11:50]  



Best Epoch: 1
Epoch 1 took 357.83730149269104 seconds
Total seconds:357.838774
2022-10-24 02:16:21 0/79
2022-10-24 02:16:24 10/79
2022-10-24 02:16:27 20/79
2022-10-24 02:16:29 30/79
2022-10-24 02:16:32 40/79
2022-10-24 02:16:35 50/79
2022-10-24 02:16:37 60/79
2022-10-24 02:16:40 70/79
[2022-10-24 02:16:42]  epoch(0) Test: acc = 0.1084;	loss = 3.9488
Total seconds:21.774717
Totally take 482.4181442260742 seconds


**Training resnet50 on CIFAR100 with pretraining:**

In [7]:
%run main.py --cfg ../config/finetuner/cifar100_res50PretrainI21k.yaml \
--opts solver.epochs 1 

No Trial
Model Name:resnet50
Data Name:cifar100
Transfer Learning Strategy:OnlyFinetuneStrategy
Enable DDP:False
Training epochs:1
adapter:
  feature_layer_name: x
  feature_size: 500
  type: ''
dataset:
  data_drop_last: false
  num_workers: 1
  path: /home/vmagent/app/data/dataset
  test:
    batch_size: 128
  test_transform: pretrainI21k
  train_transform: pretrainI21k
  type: cifar100
  val:
    batch_size: 128
distiller:
  check_logits: false
  feature_layer_name: x
  feature_size: ''
  logits_path: ''
  logits_topk: 0
  save_logits: false
  save_logits_start_epoch: 1
  teacher:
    is_frozen: true
    pretrain: ''
    type: resnet50_v2
  type: ''
  use_saved_logits: false
experiment:
  log_interval_step: 10
  loss:
    adapter: 0.0
    backbone: 1.0
    distiller: 0.0
  model_save: /home/vmagent/app/data/model
  model_save_interval: 40
  project: finetuner
  seed: 0
  strategy: OnlyFinetuneStrategy
  tag: cifar100_res50_PretrainI21k
  tensorboard_dir: /home/vmagent/app/data/model

### 3.2 Distiller

1. Run script to train Resnet18 from scratch on CIFAR100 dataset:
   ```
    python main.py --cfg ../config/demo/cifar100_resnet18.yaml
   ```   
2. Run script to apply distiller from resnet50 to resnet18 on CIFAR100 dataset:
   ```
    python main.py --cfg ../config/demo/cifar100_kd_res50_res18.yaml
   ```

From the result we can see, with distiller, resnet18 can achieve accuracy **81.46%** in only 4.1h, while training from scratch can only achieve **75.98%** with 7.3h.
![Distiller_Result1](../doc/imgs/kd_res50_res18.png)

We can also apply distiller to other models, such as VIT to ResNet18 or ResNet50 to DeNas generated CNN. In these cases, distiller can always both speedup the converage and improve the accuracy.
- ResNet18 (distillation from ResNet50): get 1.8x training time speedup and +5.5% accuracy improvement.
- ResNet18 (distillation from VIT): get 158x training time speedup and +3.7% accuracy improvement.
- DeNas generated CNN (distillation from ResNet50): get 1.7x training time speedup and +0.1% accuracy improvement.


![Distiller_Result2](../doc/imgs/kd_3models.png)

**Training resnet18 on CIFAR100 from scratch (for 1 epoch):**

In [2]:
%run main.py --cfg ../config/demo/cifar100_resnet18.yaml --opts solver.epochs 1

No Trial
Model Name:resnet18_cifar
Data Name:cifar100
Transfer Learning Strategy:
Enable DDP:False
Training epochs:1
dataset:
  data_drop_last: false
  num_workers: 4
  path: /home/vmagent/app/data/dataset/cifar
  test:
    batch_size: 128
  test_transform: default
  train_transform: default
  type: cifar100
  val:
    batch_size: 128
experiment:
  log_interval_step: 10
  loss:
    adapter: 0.0
    backbone: 1.0
    distiller: 0.0
  model_save: /home/vmagent/app/data/model
  model_save_interval: 40
  project: demo
  seed: 0
  strategy: ''
  tag: cifar100_res18
  tensorboard_dir: /home/vmagent/app/data/tensorboard/cifar100_res18resnet18_cifar_cifar100
  tensorboard_filename_suffix: ''
optimize:
  enable_ipex: false
profiler:
  active: 2
  activities: cpu
  repeat: 1
  skip_first: 1
  trace_file_inference: /home/vmagent/app/data/model/demo/cifar100_res18/profile/test_profile_resnet18_cifar_cifar100_1666759998
  trace_file_training: /home/vmagent/app/data/model/demo/cifar100_res18/profile

**Training resnet18 with distillation from ResNet50 on CIFAR100 (for 1 epoch):**

In [3]:
%run main.py --cfg ../config/demo/cifar100_kd_res50_res18.yaml --opts solver.epochs 1

No Trial
Model Name:resnet18_cifar
Data Name:cifar100
Transfer Learning Strategy:OnlyDistillationStrategy
Enable DDP:False
Training epochs:1
dataset:
  data_drop_last: false
  num_workers: 4
  path: /home/vmagent/app/data/dataset/cifar
  test:
    batch_size: 128
  test_transform: default
  train_transform: denascnn
  type: cifar100
  val:
    batch_size: 128
distiller:
  check_logits: false
  feature_layer_name: x
  feature_size: ''
  logits_path: /home/vmagent/app/data/model/demo/cifar100_res50/logits
  logits_topk: 0
  save_logits: false
  save_logits_start_epoch: 1
  teacher:
    frozen: true
    pretrain: /home/vmagent/app/data/model/demo/cifar100_res50/cifar100_res50_pretrain_imagenet21k.pth
    type: resnet50
  type: kd
  use_saved_logits: true
experiment:
  log_interval_step: 10
  loss:
    adapter: 0.0
    backbone: 0.1
    distiller: 0.9
  model_save: /home/vmagent/app/data/model
  model_save_interval: 40
  project: demo
  seed: 0
  strategy: OnlyDistillationStrategy
  tag: c

2022-10-26 05:01:57 0/79
2022-10-26 05:01:58 10/79
2022-10-26 05:02:00 20/79
2022-10-26 05:02:01 30/79
2022-10-26 05:02:02 40/79
2022-10-26 05:02:03 50/79
2022-10-26 05:02:04 60/79
2022-10-26 05:02:05 70/79
[2022-10-26 05:02:06]  epoch(0) Test: acc = 0.2310;	loss = 3.3793
Total seconds:9.284982
Totally take 191.62841939926147 seconds


**Training resnet18 with distillation from VIT on CIFAR100 (for 1 epoch):**

In [1]:
%run main.py --cfg ../config/demo/cifar100_kd_vit_res18.yaml --opts solver.epochs 1

No Trial
Model Name:resnet18_cifar
Data Name:cifar100
Transfer Learning Strategy:OnlyDistillationStrategy
Enable DDP:False
Training epochs:1
dataset:
  data_drop_last: false
  num_workers: 4
  path: /home/vmagent/app/data/dataset/cifar
  test:
    batch_size: 128
  test_transform: vit_train
  train_transform: vit_train
  type: cifar100
  val:
    batch_size: 128
distiller:
  check_logits: false
  feature_layer_name: x
  feature_size: ''
  logits_path: /home/vmagent/app/data/model/demo/cifar100_vit/logits
  logits_topk: 0
  save_logits: false
  save_logits_start_epoch: 1
  teacher:
    frozen: true
    pretrain: 'true'
    type: vit_base_224_in21k_ft_cifar100
  type: kd
  use_saved_logits: true
experiment:
  log_interval_step: 10
  loss:
    adapter: 0.0
    backbone: 0.1
    distiller: 0.9
  model_save: /home/vmagent/app/data/model
  model_save_interval: 40
  project: demo
  seed: 0
  strategy: OnlyDistillationStrategy
  tag: cifar100_kd_vit_res18
  tensorboard_dir: /home/vmagent/app/d

### 3.3. Adaptor

#### 3.3.1 Task Description
* In this demo, we will introduce how to use domain adaptation to transfer knowledge in medical image semantic segmentation
* Our source domain is AMOS dataset(Download AMOS data from [here](https://amos22.grand-challenge.org/Dataset/)), which provides 500 CT and 100 MRI scans with voxel-level annotations of 15 abdominal organs, including the spleen, right kidney, left kidney, gallbladder, esophagus, liver, stomach, aorta, inferior vena cava, pancreas, right adrenal gland, left adrenal gland, duodenum, bladder, prostate/uterus.
* Our target domain is KiTS dataset(Download KiTS data from [here](https://github.com/neheller/kits19)), which provides 300 CT scans with voxel-level annotations of kidney organs and kidney tumor.
* Our task is to explore reliable kidney semantic segmentation methodologies with the help of labeled AMOS dataset and unlabeled KiTS dataset, evalutaion metric is kidney dice score in target domain.
* We can see from the following picture, **even without the target label data, adapter achieve 10.67x training speedup, while keep the 93% performance ratio.**

![adapter_result_plot](../doc/imgs/adapter_result_plot.png)

#### 3.3.2 domain adaptation from AMOS to KiTS
- We will first pre-train model in AMOS dataset, and use this pre-trained model later for prameter initialization for domain adaptation
- We use [3D-UNet](https://arxiv.org/abs/1606.06650) to train the model
- Now we apply domain adaptation algorithm to transfer knowledge from AMOS dataset to KiTS dataset
- We use a DANN-like model architecture, the DANN algorithm is illustrated as follows:
![dann](../doc/imgs/dann.png)
- Notice: 
    - we donot use **any label** from target domain KiTS, we only use label from source domain AMOS for training
    - *For demostration, we only train 1 epochs:*

In [9]:
!cd cd /home/vmagent/app/TLK/src/task/medical_segmentation && sh sripts/run_all.sh


###############################################
For that I will be using the following source data configuration:
num_classes:  15
modalities:  {0: 'CT'}
use_mask_for_norm OrderedDict([(0, False)])
keep_only_largest_region None
min_region_size_per_class None
min_size_per_class None
normalization_schemes OrderedDict([(0, 'CT')])
stages...

stage:  0
{'batch_size': 2, 'num_pool_per_axis': [4, 5, 5], 'patch_size': array([ 80, 160, 160]), 'median_patient_size_in_voxels': array([140, 264, 264]), 'current_spacing': array([3.22, 1.62, 1.62]), 'original_spacing': array([3.22, 1.62, 1.62]), 'do_dummy_2D_data_aug': False, 'pool_op_kernel_sizes': [[2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2], [1, 2, 2]], 'conv_kernel_sizes': [[3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3]]}


I am using source data from this folder:  /home/vmagent/app/dataset/nnUNet_preprocessed/Task508_AMOS_kidney/nnUNetData_plans_v2.1_trgSp_kits19
###############################################
#################

2022-10-14 08:03:19.386149: train loss : 1.0404
2022-10-14 08:05:40.979750: validation loss: 0.0087
2022-10-14 08:05:40.981636: Average global foreground Dice: [0.0]
2022-10-14 08:05:40.981980: (interpret this as an estimate for the Dice of the different classes. This is not exact.)
2022-10-14 08:05:41.279021: lr index 0: 0.001982
2022-10-14 08:05:41.279208: lr index 1: 0.00991
2022-10-14 08:05:41.279391: This epoch took 3509.259070 s

2022-10-14 08:05:41.280476: saving checkpoint...
2022-10-14 08:05:41.633380: done, saving took 0.35 seconds
case_00004 (2, 80, 309, 309)
debug: mirroring True mirror_axes (0, 1, 2)
step_size: 0.5
do mirror: True
data shape: (1, 80, 309, 309)
patch size: [ 80 160 160]
steps (x, y, and z): [[0], [0, 74, 149], [0, 74, 149]]
number of tiles: 9
computing Gaussian
done
prediction done
force_separate_z: None interpolation order: 1
separate z: True lowres axis [0]
separate z, order in z is 0 order inplane is 1
2022-10-14 08:07:06.740799: finished prediction
2022

#### 3.3 Visualization of Data and Segmentations
- Download files from server:

   - Images from: ```${nnUNet_raw_data_base}/nnUNet_raw_data/Task507_KiTS_kidney/imagesTr/```

   - Segmentations from: ```${nnUNet_raw_data_base}/nnUNet_raw_data/Task507_KiTS_kidney/labelsTr/```

   - predictions from: ```/home/vmagent/app/dataset/prediction```


- After downloading these files you can visualize them with any volumetric visualization program.
For this we would advise to use [MITK](https://www.mitk.org/wiki/The_Medical_Imaging_Interaction_Toolkit_(MITK)) which already has some great [tutorials](https://www.mitk.org/wiki/Tutorials). 
    - If you have not already downloaded it, here is the [MITK Download Link](https://www.mitk.org/wiki/Downloads)
    
- Here is a demostration of visualization result from MITK on KiTS dataset
![KiTS_visualization](../doc/imgs/KiTS_visualization.png)

