<a href="https://colab.research.google.com/github/ValentinCord/DL_TimeSformer/blob/main/TimeSformer.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Main Resources**

---


*   MMACTION2 Tutorial : https://github.com/open-mmlab/mmaction2/blob/master/demo/mmaction2_tutorial.ipynb
*   MMACTION2 TimeSformer : https://www.kaggle.com/code/thousandtie/mmaction2-timesformer-fold1-ucf101 
* MMACTIONS2 Tools : https://github.com/open-mmlab/mmaction2/tree/master/tools


# **Install MMACTION2**

---



In [1]:
!nvcc -V # Check nvcc version
!gcc --version # Check GCC version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Feb_14_21:12:58_PST_2021
Cuda compilation tools, release 11.2, V11.2.152
Build cuda_11.2.r11.2/compiler.29618528_0
gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.



In [2]:
# install dependencies: (use cu111 because colab has CUDA 11.1)
!pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 -f https://download.pytorch.org/whl/torch_stable.html

# install mmcv-full thus we could use CUDA operators
!pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cu111/torch1.9.0/index.html

# Install mmaction2
!rm -rf mmaction2
!git clone https://github.com/open-mmlab/mmaction2.git
%cd mmaction2

!pip install -e .

# Install some optional requirements
!pip install -r requirements/optional.txt

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in links: https://download.pytorch.org/whl/torch_stable.html
Collecting torch==1.9.0+cu111
  Downloading https://download.pytorch.org/whl/cu111/torch-1.9.0%2Bcu111-cp38-cp38-linux_x86_64.whl (2041.3 MB)
[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m2.0/2.0 GB[0m [31m63.7 MB/s[0m eta [36m0:00:01[0mtcmalloc: large alloc 2041339904 bytes == 0x353e000 @  0x7f7cea87b680 0x7f7cea89c824 0x5b3128 0x5bbc90 0x5f714c 0x64d800 0x527022 0x504866 0x56bbe1 0x569d8a 0x5f60c3 0x56bbe1 0x569d8a 0x5f60c3 0x56bbe1 0x569d8a 0x5f60c3 0x56bbe1 0x569d8a 0x5f60c3 0x56bbe1 0x569d8a 0x5f60c3 0x56bbe1 0x5f5ee6 0x56bbe1 0x569d8a 0x5f60c3 0x56cc92 0x569d8a 0x5f60c3
[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m2.0/2.0 GB[0m [31m71.7 MB/s[0m eta [36m0:00:01[0mtcmalloc: large alloc 2551676928 bytes == 0x7d004000 @  0x7f7cea87b680 0x7f7cea89bda2 0x5f

In [3]:
# Check Pytorch installation
import torch, torchvision
print('Torch : ', torch.__version__, torch.cuda.is_available())

# Check MMAction2 installation
import mmaction
print('Mmaction2 : ', mmaction.__version__)

# Check MMCV installation
from mmcv.ops import get_compiling_cuda_version, get_compiler_version
print('Cuda compiler : ', get_compiling_cuda_version())
print('Compiler : ', get_compiler_version())

Torch :  1.9.0+cu111 True
Mmaction2 :  0.24.1
Cuda compiler :  11.1
Compiler :  GCC 7.3




# **Custom Dataset**

---



In [4]:
from google.colab import drive
drive.mount('/content/gdrive')

Mounted at /content/gdrive


In [5]:
%%writefile timesformer.py

_base_ = ['./configs/_base_/default_runtime.py']

# model settings
model = dict(
    type='Recognizer3D',
    backbone=dict(
        type='TimeSformer',
        pretrained=  # noqa: E251
        'https://download.openmmlab.com/mmaction/recognition/timesformer/vit_base_patch16_224.pth',  # noqa: E501
        num_frames=8,
        img_size=224,
        patch_size=16,
        embed_dims=768,
        in_channels=3,
        dropout_ratio=0.,
        transformer_layers=None,
        attention_type='space_only', # <----------------- How to change the model
        norm_cfg=dict(type='LN', eps=1e-6)),
    cls_head=dict(type='TimeSformerHead', num_classes=4, in_channels=768),
    # model training and testing settings
    train_cfg=None,
    test_cfg=dict(average_clips='prob'))

# dataset settings
dataset_type = 'VideoDataset'
data_root = '/content/gdrive/MyDrive/action_video/'
data_root_val = '/content/gdrive/MyDrive/action_video/'
ann_file_train = '/content/gdrive/MyDrive/action_annotations/train.txt'
ann_file_val = '/content/gdrive/MyDrive/action_annotations/val.txt'
ann_file_test = '/content/gdrive/MyDrive/action_annotations/test.txt'

img_norm_cfg = dict(
    mean=[127.5, 127.5, 127.5], std=[127.5, 127.5, 127.5], to_bgr=False)
train_pipeline = [
    dict(type='DecordInit'),
    dict(type='SampleFrames', clip_len=8, frame_interval=32, num_clips=1),
    dict(type='DecordDecode'),
    dict(type='RandomRescale', scale_range=(256, 320)),
    dict(type='RandomCrop', size=224),
    dict(type='Flip', flip_ratio=0.5),
    dict(
        type='Normalize',
        mean=[127.5, 127.5, 127.5],
        std=[127.5, 127.5, 127.5],
        to_bgr=False),
    dict(type='FormatShape', input_format='NCTHW'),
    dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
    dict(type='ToTensor', keys=['imgs', 'label'])
]
val_pipeline = [
    dict(type='DecordInit'),
    dict(
        type='SampleFrames',
        clip_len=8,
        frame_interval=32,
        num_clips=1,
        test_mode=True),
    dict(type='DecordDecode'),
    dict(type='Resize', scale=(-1, 256)),
    dict(type='CenterCrop', crop_size=224),
    dict(
        type='Normalize',
        mean=[127.5, 127.5, 127.5],
        std=[127.5, 127.5, 127.5],
        to_bgr=False),
    dict(type='FormatShape', input_format='NCTHW'),
    dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
    dict(type='ToTensor', keys=['imgs', 'label'])
]
test_pipeline = [
    dict(type='DecordInit'),
    dict(
        type='SampleFrames',
        clip_len=8,
        frame_interval=32,
        num_clips=1,
        test_mode=True),
    dict(type='DecordDecode'),
    dict(type='Resize', scale=(-1, 224)),
    dict(type='ThreeCrop', crop_size=224),
    dict(
        type='Normalize',
        mean=[127.5, 127.5, 127.5],
        std=[127.5, 127.5, 127.5],
        to_bgr=False),
    dict(type='FormatShape', input_format='NCTHW'),
    dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
    dict(type='ToTensor', keys=['imgs', 'label'])
]
data = dict(
    videos_per_gpu=4, # <---------------- How to change the number of loaded videos on the GPU
    workers_per_gpu=2,
    test_dataloader=dict(videos_per_gpu=1),
    train=dict(
        type=dataset_type,
        ann_file=ann_file_train,
        data_prefix=data_root,
        pipeline=train_pipeline),
    val=dict(
        type=dataset_type,
        ann_file=ann_file_val,
        data_prefix=data_root_val,
        pipeline=val_pipeline),
    test=dict(
        type=dataset_type,
        ann_file=ann_file_val,
        data_prefix=data_root_val,
        pipeline=test_pipeline))
evaluation = dict(
    interval=1, metrics=['top_k_accuracy', 'mean_class_accuracy'])

# optimizer
optimizer = dict(
    type='SGD',
    lr=0.005,
    momentum=0.9,
    paramwise_cfg=dict(
        custom_keys={
            '.backbone.cls_token': dict(decay_mult=0.0),
            '.backbone.pos_embed': dict(decay_mult=0.0),
            '.backbone.time_embed': dict(decay_mult=0.0)
        }),
    weight_decay=1e-4,
    nesterov=True)  # this lr is used for 8 gpus
optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2))

# learning policy
lr_config = dict(policy='step', step=[5, 10])
total_epochs = 5

# runtime settings
checkpoint_config = dict(interval=3)
work_dir = './work_dirs/timesformer_divST_8x32x1_15e_kinetics400_rgb'

Writing timesformer.py


In [6]:
from mmcv import Config
cfg = Config.fromfile('/content/mmaction2/timesformer.py')

cfg.setdefault('omnisource', False)
cfg.seed = 0
cfg.gpu_ids = range(0, 1)

# **Train the recognizer**

---



In [7]:
import os.path as osp
from mmaction.datasets import build_dataset
from mmaction.models import build_model
from mmaction.apis import train_model
import mmcv

# Build the dataset
datasets = [build_dataset(cfg.data.train)]

# Build the recognizer
model = build_model(cfg.model, 
                    train_cfg = cfg.get('train_cfg'), 
                    test_cfg = cfg.get('test_cfg'))

# Create work_dir
mmcv.mkdir_or_exist(osp.abspath(cfg.work_dir))
train_model(model, 
            datasets, 
            cfg, 
            distributed=False, 
            validate=True)

2023-01-21 16:12:12,442 - mmaction - INFO - load model from: https://download.openmmlab.com/mmaction/recognition/timesformer/vit_base_patch16_224.pth


load checkpoint from http path: https://download.openmmlab.com/mmaction/recognition/timesformer/vit_base_patch16_224.pth


Downloading: "https://download.openmmlab.com/mmaction/recognition/timesformer/vit_base_patch16_224.pth" to /root/.cache/torch/hub/checkpoints/vit_base_patch16_224.pth


  0%|          | 0.00/327M [00:00<?, ?B/s]

2023-01-21 16:12:25,845 - mmaction - INFO - Start running, host: root@70a004a853ea, work_dir: /content/mmaction2/work_dirs/timesformer_divST_8x32x1_15e_kinetics400_rgb
2023-01-21 16:12:25,846 - mmaction - INFO - Hooks will be executed in the following order:
before_run:
(VERY_HIGH   ) StepLrUpdaterHook                  
(NORMAL      ) CheckpointHook                     
(LOW         ) EvalHook                           
(VERY_LOW    ) TextLoggerHook                     
 -------------------- 
before_train_epoch:
(VERY_HIGH   ) StepLrUpdaterHook                  
(LOW         ) IterTimerHook                      
(LOW         ) EvalHook                           
(VERY_LOW    ) TextLoggerHook                     
 -------------------- 
before_train_iter:
(VERY_HIGH   ) StepLrUpdaterHook                  
(LOW         ) IterTimerHook                      
(LOW         ) EvalHook                           
 -------------------- 
after_train_iter:
(ABOVE_NORMAL) OptimizerHook              

[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 163/163, 2.3 task/s, elapsed: 70s, ETA:     0s

2023-01-21 16:23:09,907 - mmaction - INFO - Evaluating top_k_accuracy ...
2023-01-21 16:23:09,913 - mmaction - INFO - 
top1_acc	0.6258
top5_acc	1.0000
2023-01-21 16:23:09,914 - mmaction - INFO - Evaluating mean_class_accuracy ...
2023-01-21 16:23:09,924 - mmaction - INFO - 
mean_acc	0.5452
2023-01-21 16:23:14,562 - mmaction - INFO - Now best checkpoint is saved as best_top1_acc_epoch_1.pth.
2023-01-21 16:23:14,564 - mmaction - INFO - Best top1_acc is 0.6258 at 1 epoch.
2023-01-21 16:23:14,568 - mmaction - INFO - Epoch(val) [1][41]	top1_acc: 0.6258, top5_acc: 1.0000, mean_class_accuracy: 0.5452
2023-01-21 16:23:40,471 - mmaction - INFO - Epoch [2][20/329]	lr: 5.000e-03, eta: 0:36:13, time: 1.294, data_time: 0.131, memory: 6478, top1_acc: 0.6250, top5_acc: 1.0000, loss_cls: 0.8082, loss: 0.8082, grad_norm: 2.9033
2023-01-21 16:24:03,121 - mmaction - INFO - Epoch [2][40/329]	lr: 5.000e-03, eta: 0:35:02, time: 1.132, data_time: 0.001, memory: 6478, top1_acc: 0.6125, top5_acc: 1.0000, loss_

[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 163/163, 9.4 task/s, elapsed: 17s, ETA:     0s

2023-01-21 16:29:43,778 - mmaction - INFO - Evaluating top_k_accuracy ...
2023-01-21 16:29:43,784 - mmaction - INFO - 
top1_acc	0.6380
top5_acc	1.0000
2023-01-21 16:29:43,785 - mmaction - INFO - Evaluating mean_class_accuracy ...
2023-01-21 16:29:43,792 - mmaction - INFO - 
mean_acc	0.5338
2023-01-21 16:29:43,888 - mmaction - INFO - The previous best checkpoint /content/mmaction2/work_dirs/timesformer_divST_8x32x1_15e_kinetics400_rgb/best_top1_acc_epoch_1.pth was removed
2023-01-21 16:29:47,825 - mmaction - INFO - Now best checkpoint is saved as best_top1_acc_epoch_2.pth.
2023-01-21 16:29:47,827 - mmaction - INFO - Best top1_acc is 0.6380 at 2 epoch.
2023-01-21 16:29:47,835 - mmaction - INFO - Epoch(val) [2][41]	top1_acc: 0.6380, top5_acc: 1.0000, mean_class_accuracy: 0.5338
2023-01-21 16:30:12,706 - mmaction - INFO - Epoch [3][20/329]	lr: 5.000e-03, eta: 0:22:28, time: 1.242, data_time: 0.127, memory: 6478, top1_acc: 0.6000, top5_acc: 1.0000, loss_cls: 0.7659, loss: 0.7659, grad_norm:

[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 163/163, 9.4 task/s, elapsed: 17s, ETA:     0s

2023-01-21 16:36:20,411 - mmaction - INFO - Evaluating top_k_accuracy ...
2023-01-21 16:36:20,413 - mmaction - INFO - 
top1_acc	0.6503
top5_acc	1.0000
2023-01-21 16:36:20,416 - mmaction - INFO - Evaluating mean_class_accuracy ...
2023-01-21 16:36:20,418 - mmaction - INFO - 
mean_acc	0.5704
2023-01-21 16:36:20,478 - mmaction - INFO - The previous best checkpoint /content/mmaction2/work_dirs/timesformer_divST_8x32x1_15e_kinetics400_rgb/best_top1_acc_epoch_2.pth was removed
2023-01-21 16:36:24,356 - mmaction - INFO - Now best checkpoint is saved as best_top1_acc_epoch_3.pth.
2023-01-21 16:36:24,359 - mmaction - INFO - Best top1_acc is 0.6503 at 3 epoch.
2023-01-21 16:36:24,367 - mmaction - INFO - Epoch(val) [3][41]	top1_acc: 0.6503, top5_acc: 1.0000, mean_class_accuracy: 0.5704
2023-01-21 16:36:49,863 - mmaction - INFO - Epoch [4][20/329]	lr: 5.000e-03, eta: 0:13:47, time: 1.271, data_time: 0.136, memory: 6478, top1_acc: 0.6500, top5_acc: 1.0000, loss_cls: 0.7453, loss: 0.7453, grad_norm:

[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 163/163, 9.4 task/s, elapsed: 17s, ETA:     0s

2023-01-21 16:42:52,696 - mmaction - INFO - Evaluating top_k_accuracy ...
2023-01-21 16:42:52,701 - mmaction - INFO - 
top1_acc	0.6074
top5_acc	1.0000
2023-01-21 16:42:52,702 - mmaction - INFO - Evaluating mean_class_accuracy ...
2023-01-21 16:42:52,708 - mmaction - INFO - 
mean_acc	0.5649
2023-01-21 16:42:52,710 - mmaction - INFO - Epoch(val) [4][41]	top1_acc: 0.6074, top5_acc: 1.0000, mean_class_accuracy: 0.5649
2023-01-21 16:43:17,605 - mmaction - INFO - Epoch [5][20/329]	lr: 5.000e-03, eta: 0:06:25, time: 1.244, data_time: 0.128, memory: 6478, top1_acc: 0.6500, top5_acc: 1.0000, loss_cls: 0.6963, loss: 0.6963, grad_norm: 1.9360
2023-01-21 16:43:40,097 - mmaction - INFO - Epoch [5][40/329]	lr: 5.000e-03, eta: 0:06:00, time: 1.125, data_time: 0.001, memory: 6478, top1_acc: 0.6500, top5_acc: 1.0000, loss_cls: 0.7440, loss: 0.7440, grad_norm: 1.9745
2023-01-21 16:44:02,393 - mmaction - INFO - Epoch [5][60/329]	lr: 5.000e-03, eta: 0:05:34, time: 1.115, data_time: 0.001, memory: 6478, to

[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 163/163, 9.3 task/s, elapsed: 17s, ETA:     0s

2023-01-21 16:49:25,065 - mmaction - INFO - Evaluating top_k_accuracy ...
2023-01-21 16:49:25,067 - mmaction - INFO - 
top1_acc	0.7239
top5_acc	1.0000
2023-01-21 16:49:25,070 - mmaction - INFO - Evaluating mean_class_accuracy ...
2023-01-21 16:49:25,072 - mmaction - INFO - 
mean_acc	0.6606
2023-01-21 16:49:25,178 - mmaction - INFO - The previous best checkpoint /content/mmaction2/work_dirs/timesformer_divST_8x32x1_15e_kinetics400_rgb/best_top1_acc_epoch_3.pth was removed
2023-01-21 16:49:29,142 - mmaction - INFO - Now best checkpoint is saved as best_top1_acc_epoch_5.pth.
2023-01-21 16:49:29,145 - mmaction - INFO - Best top1_acc is 0.7239 at 5 epoch.
2023-01-21 16:49:29,148 - mmaction - INFO - Epoch(val) [5][41]	top1_acc: 0.7239, top5_acc: 1.0000, mean_class_accuracy: 0.6606


# **Test the recognizer**

---



In [8]:
from mmaction.apis import single_gpu_test
from mmaction.datasets import build_dataloader
from mmcv.parallel import MMDataParallel

# Build a test dataloader
dataset = build_dataset(cfg.data.test, dict(test_mode=True))
data_loader = build_dataloader(
        dataset,
        videos_per_gpu=2,
        workers_per_gpu=cfg.data.workers_per_gpu,
        dist=False,
        shuffle=False)
model = MMDataParallel(model, device_ids=[0])
outputs = single_gpu_test(model, data_loader)


eval_config = cfg.evaluation
eval_config.pop('interval')
eval_res = dataset.evaluate(outputs, **eval_config)
for name, val in eval_res.items():
    print(f'{name}: {val:.04f}')

[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 163/163, 3.2 task/s, elapsed: 51s, ETA:     0s
Evaluating top_k_accuracy ...

top1_acc	0.6994
top5_acc	1.0000

Evaluating mean_class_accuracy ...

mean_acc	0.6180
top1_acc: 0.6994
top5_acc: 1.0000
mean_class_accuracy: 0.6180


In [None]:
!python tools/analysis/analyze_logs.py plot_curve /content/mmaction2/work_dirs/timesformer_divST_8x32x1_15e_kinetics400_rgb/None.log.json \
--keys top1_acc \
--out /content/mmaction2/work_dirs/timesformer_divST_8x32x1_15e_kinetics400_rgb/results.pdf \
--legend top1_acc