# Transformer Tutorial


In this tutorial, you will learn

- How to build a Transformer or MLP model
- How to use a Transformer or MLP model for classification


## Step 1: Installation

The following shows how to install mmclassification from scratch. We Create a conda virtual environment and activate it.
```shell
git clone

(base) ➜  mmclassification git:(master) git checkout -b dev-tutorial v0.16.0
Switched to a new branch 'dev-tutorial'

# create an environment named 'comp3340', with python=3.8
conda create -n comp3340 python=3.8

conda install pytorch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0 cudatoolkit=11.1 -c pytorch -c conda-forge

pip install mmcv-full==1.3.8 -f https://download.openmmlab.com/mmcv/dist/cu111/torch1.8.0/index.html

cd /work/path/to/mmclassification
pip install -e .

```

### Check Installation
```
python -c "import mmcls; print(mmcls.__version__)"

# Install successfully if get:
0.16.0
```

In [1]:
!python -c "import mmcls; print(mmcls.__version__)"

0.23.2


## Step 2. Data Preparing of the flower dataset and Build up the dataloader


### In ```mmclassification/mmcls```, we follow the file 'imagenet.py', which writes the dataset class of Imagenet, to write a dataset class file 'flowers.py'. Since we follow the data directory of Imagenet above, we just need to copy 'imagenet.py' and re-write the **CLASSES** here.



```python
@DATASETS.register_module()
class Flowers(BaseDataset):

    IMG_EXTENSIONS = ('.jpg', '.jpeg', '.png', '.ppm', '.bmp', '.pgm', '.tif')
    CLASSES = [
        'daffodil', 'snowdrop', 'lilyValley', 'bluebell', 'crocys', 'iris', 'tigerlily', 'tulip', 'fritillary', 'sunflower', 'daisy', 'colts foot', 'dandelion', 'cowslip', 'buttercup', 'wind flower', 'pansy'
    ]

    def load_annotations(self):
        if self.ann_file is None:
            folder_to_idx = find_folders(self.data_prefix)
            samples = get_samples(
                self.data_prefix,
                folder_to_idx,
                extensions=self.IMG_EXTENSIONS)
            if len(samples) == 0:
                raise (RuntimeError('Found 0 files in subfolders of: '
                                    f'{self.data_prefix}. '
                                    'Supported extensions are: '
                                    f'{",".join(self.IMG_EXTENSIONS)}'))

            self.folder_to_idx = folder_to_idx
        elif isinstance(self.ann_file, str):
            with open(self.ann_file) as f:
                samples = [x.strip().rsplit(' ', 1) for x in f.readlines()]
        else:
            raise TypeError('ann_file must be a str or None')
        self.samples = samples

        data_infos = []
        for filename, gt_label in self.samples:
            info = {'img_prefix': self.data_prefix}
            info['img_info'] = {'filename': filename}
            info['gt_label'] = np.array(gt_label, dtype=np.int64)
            data_infos.append(info)
        return data_infos
```

We also need to import `Flower` class in the `mmcls/datasets/__init__.py`:

```diff
 from .voc import VOC
+from .flowers import Flowers
 
 __all__ = [
     'BaseDataset', 'ImageNet', 'CIFAR10', 'CIFAR100', 'MNIST', 'FashionMNIST',
     'VOC', 'MultiLabelDataset', 'build_dataloader', 'build_dataset', 'Compose',
     'DistributedSampler', 'ConcatDataset', 'RepeatDataset',
-    'ClassBalancedDataset', 'DATASETS', 'PIPELINES'
+    'ClassBalancedDataset', 'DATASETS', 'PIPELINES', 'Flowers',
 ]

```


## Step 3: Configuration for Vision Transformer and Dataset


Now we are at `mmclassification/configs/_base_` directory

### set data config
a. In `_base_/datasets`, we add `flowers_bs32.py` to describe the basic config about data.


### Building Vision Transformer
b. In `_base_/models`, we add `vit_base_flowers.py` to describe the basic config about model.

```python
model = dict(
    type='ImageClassifier',
    backbone=dict(
        return_tuple=False,
        type='VisionTransformer',
        num_layers=12,
        embed_dim=768,
        num_heads=12,
        img_size=224,
        patch_size=16,
        in_channels=3,
        feedforward_channels=3072,
        drop_rate=0.1,
        attn_drop_rate=0.),
    neck=None,
    head=dict(
        type='VisionTransformerClsHead',
        num_classes=17,  # modify
        in_channels=768,
        hidden_dim=3072,
        loss=dict(type='LabelSmoothLoss', label_smooth_val=0.1),
        topk=(1, 5),
    ),
    train_cfg=dict(
        augments=dict(type='BatchMixup', alpha=0.2, num_classes=17,  # modify
                      prob=1.))
)
```
Note we use the `ViT-Base` model, and set `num_classes` as 17 since the number of flower categories is 17.

### Set training config
c. In `_base_/schedules`, we add `flowers_bs32.py`
```shell
paramwise_cfg = dict(
    norm_decay_mult=0.0,
    bias_decay_mult=0.0,
    custom_keys={
        '.absolute_pos_embed': dict(decay_mult=0.0),
        '.relative_position_bias_table': dict(decay_mult=0.0)
    })

# for batch in each gpu is 128, 8 gpu
# lr = 5e-4 * 128 * 8 / 512 = 0.001
optimizer = dict(
    type='AdamW',
    # lr=5e-4 * 128 * 8 / 512,
    lr=5e-4 * 16 / 512,
    weight_decay=0.05,
    eps=1e-8,
    betas=(0.9, 0.999),
    paramwise_cfg=paramwise_cfg)
optimizer_config = dict(grad_clip=dict(max_norm=5.0))

# learning policy
lr_config = dict(
    policy='CosineAnnealing',
    by_epoch=False,
    min_lr_ratio=1e-2,
    warmup='linear',
    warmup_ratio=1e-3,
    warmup_iters=20 * 1252,
    warmup_by_epoch=False)

runner = dict(type='EpochBasedRunner', max_epochs=300)

```

### set saving config
d. In the ```_base_/default_runtime.py```, we set the config about checkpoint saving and log file.
```
    # checkpoint saving
    checkpoint_config = dict(interval=1)
    # yapf:disable
    log_config = dict(
        interval=100,
        hooks=[
            dict(type='TextLoggerHook'),
            # dict(type='TensorboardLoggerHook')
        ])
    # yapf:enable

    dist_params = dict(backend='nccl')
    log_level = 'INFO'
    load_from = None
    resume_from = None
    workflow = [('train', 1)]
```

And then, we new a config file in ```mmclassification/configs/vision_transformer/vit_base_patch16_224_flowers.py```
we add these config .py files into this file as
```
_base_ = [
    '../_base_/models/vit_base_flowers.py',
    '../_base_/datasets/flowers_bs32.py',
    '../_base_/schedules/flowers_bs32.py',
    '../_base_/default_runtime.py'
]
```

## Train

In the directory of 'mmclassification', we run
```shell
python tools/train.py configs/vision_transformer/vit_base_patch16_224_flowers.py
```



In [2]:
!python tools/train.py --config 'configs/vision_transformer/vit_base_patch16_224_flowers.py'

2022-10-01 18:02:36,902 - mmcls - INFO - Environment info:
------------------------------------------------------------
sys.platform: linux
Python: 3.8.13 (default, Mar 28 2022, 11:38:47) [GCC 7.5.0]
CUDA available: True
GPU 0: GeForce RTX 2080 Ti
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 11.0, V11.0.221
GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
PyTorch: 1.8.0
PyTorch compiling details: PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) oneAPI Math Kernel Library Version 2021.4-Product Build 20210904 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v1.7.0 (Git Hash 7aed236906b1f7a05c0917e5257a1af05e9ff683)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 11.1
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-genco

Logs:

```
2021-10-11 17:16:01,923 - mmcls - INFO - Epoch [2][10/75]       lr: 6.799e-08, eta: 1:06:39, time: 0.355, data_time: 0.236, memory: 4064, loss: 2.8331, grad_norm: 7.4339
2021-10-11 17:16:03,150 - mmcls - INFO - Epoch [2][20/75]       lr: 7.422e-08, eta: 1:04:26, time: 0.123, data_time: 0.003, memory: 4064, loss: 2.8331, grad_norm: 7.2677
2021-10-11 17:16:04,365 - mmcls - INFO - Epoch [2][30/75]       lr: 8.045e-08, eta: 1:02:35, time: 0.121, data_time: 0.003, memory: 4064, loss: 2.8331, grad_norm: 7.4220
2021-10-11 17:16:05,570 - mmcls - INFO - Epoch [2][40/75]       lr: 8.668e-08, eta: 1:01:02, time: 0.121, data_time: 0.003, memory: 4064, loss: 2.8331, grad_norm: 7.1345 
2021-10-11 17:16:06,838 - mmcls - INFO - Epoch [2][50/75]       lr: 9.292e-08, eta: 0:59:54, time: 0.127, data_time: 0.004, memory: 4064, loss: 2.8330, grad_norm: 7.3180
2021-10-11 17:16:08,087 - mmcls - INFO - Epoch [2][60/75]       lr: 9.915e-08, eta: 0:58:53, time: 0.125, data_time: 0.004, memory: 4064, loss: 2.8330, grad_norm: 7.2354
2021-10-11 17:16:09,308 - mmcls - INFO - Epoch [2][70/75]       lr: 1.054e-07, eta: 0:57:56, time: 0.122, data_time: 0.004, memory: 4064, loss: 2.8330, grad_norm: 7.3601
2021-10-11 17:16:09,999 - mmcls - INFO - Saving checkpoint at 2 epochs
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 170/170, 181.1 task/s, elapsed: 1s, ETA:     0s2021-10-11 17:16:12,832 - mmcls - INFO - Epoch(val) [2][11] accuracy_top-1: 20.5882, accuracy_top-5: 64.70
59

```

## Evaluation

```bash
python tools/test.py configs/vision_transformer/vit_base_patch16_224_flowers.py path/to/checkpoint.pth
```

In [4]:
!python tools/test.py \
    --config 'configs/vision_transformer/vit_base_patch16_224_flowers.py' \
    --checkpoint 'work_dirs/vit_base_patch16_224_flowers/epoch_299.pth' \
    --out 'work_dirs/vit_base_patch16_224_flowers/test.json'

load checkpoint from local path: work_dirs/vit_base_patch16_224_flowers/epoch_299.pth
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 170/170, 76.5 task/s, elapsed: 2s, ETA:     0s
dumping results to work_dirs/vit_base_patch16_224_flowers/test.json


In [5]:
import json
file = open('work_dirs/vit_base_patch16_224_flowers/test.json')
test_json = json.load(file)
print(json.dumps(test_json, indent=4))

{
    "class_scores": [
        [
            0.09865140169858932,
            0.0200875923037529,
            0.1563376486301422,
            0.043313417583703995,
            0.0022775682155042887,
            0.013058098964393139,
            0.0034248025622218847,
            0.06047721207141876,
            0.00935336109250784,
            0.0032029771246016026,
            0.007380681578069925,
            0.029721667990088463,
            0.06148378923535347,
            0.3236308693885803,
            0.14174070954322815,
            0.014410450123250484,
            0.01144773792475462
        ],
        [
            0.2000880390405655,
            0.006210176274180412,
            0.010773859918117523,
            0.009493306279182434,
            0.003892306936904788,
            0.011853661388158798,
            0.0015901351580396295,
            0.0550769604742527,
            0.0022648037411272526,
            0.018999123945832253,
            0.009432148188352585,
     