<a href="https://colab.research.google.com/github/albivaltzew/DsworksEqualAI/blob/main/mmaction2_train.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# MMAction2 Tutorial

Welcome to MMAction2! This is the official colab tutorial for using MMAction2. In this tutorial, you will learn
- Perform inference with a MMAction2 recognizer.
- Train a new recognizer with a new dataset.


Let's start!

In [16]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [17]:
# importing shutil module
import shutil

# Full path of
# the archive file
filename = "/content/drive/MyDrive/DSWorks_Equal_AI/slovo_split.zip"

# Target directory
extract_dir = "/content/mmaction2/tools/data/slovo"

# Format of archive file
archive_format = "zip"

# Unpack the archive file
shutil.unpack_archive(filename, extract_dir, archive_format)
print("Archive file unpacked successfully.")

Archive file unpacked successfully.


In [18]:
import os

## Install MMAction2

In [1]:
# Check nvcc version
!nvcc -V
# Check GCC version
!gcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0
gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.



In [2]:
# install dependencies: (if your colab has CUDA 11.8)
%pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

Looking in indexes: https://download.pytorch.org/whl/cu118


In [3]:
# install MMEngine, MMCV and MMDetection using MIM
%pip install -U openmim
!mim install mmengine
!mim install "mmcv>=2.0.0"

# Install mmaction2
!rm -rf mmaction2
!git clone https://github.com/open-mmlab/mmaction2.git -b main
%cd mmaction2

!pip install -e .

# Install some optional requirements
!pip install -r requirements/optional.txt

Collecting openmim
  Downloading openmim-0.3.9-py2.py3-none-any.whl (52 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m52.7/52.7 kB[0m [31m1.8 MB/s[0m eta [36m0:00:00[0m
Collecting colorama (from openmim)
  Downloading colorama-0.4.6-py2.py3-none-any.whl (25 kB)
Collecting model-index (from openmim)
  Downloading model_index-0.1.11-py3-none-any.whl (34 kB)
Collecting opendatalab (from openmim)
  Downloading opendatalab-0.0.10-py3-none-any.whl (29 kB)
Collecting ordered-set (from model-index->openmim)
  Downloading ordered_set-4.1.0-py3-none-any.whl (7.6 kB)
Collecting pycryptodome (from opendatalab->openmim)
  Downloading pycryptodome-3.19.0-cp35-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.1/2.1 MB[0m [31m36.5 MB/s[0m eta [36m0:00:00[0m
Collecting openxlab (from opendatalab->openmim)
  Downloading openxlab-0.0.28-py3-none-any.whl (297 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━

Looking in links: https://download.openmmlab.com/mmcv/dist/cu118/torch2.1.0/index.html
Collecting mmengine
  Downloading mmengine-0.9.0-py3-none-any.whl (449 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m449.8/449.8 kB[0m [31m8.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting addict (from mmengine)
  Downloading addict-2.4.0-py3-none-any.whl (3.8 kB)
Collecting yapf (from mmengine)
  Downloading yapf-0.40.2-py3-none-any.whl (254 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m254.7/254.7 kB[0m [31m17.8 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: addict, yapf, mmengine
Successfully installed addict-2.4.0 mmengine-0.9.0 yapf-0.40.2
Looking in links: https://download.openmmlab.com/mmcv/dist/cu118/torch2.1.0/index.html
Collecting mmcv>=2.0.0
  Downloading https://download.openmmlab.com/mmcv/dist/cu118/torch2.1.0/mmcv-2.1.0-cp310-cp310-manylinux1_x86_64.whl (99.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [3

In [4]:
# Check Pytorch installation
import torch, torchvision
print(torch.__version__, torch.cuda.is_available())

# Check MMAction2 installation
import mmaction
print(mmaction.__version__)

# Check MMCV installation
from mmcv.ops import get_compiling_cuda_version, get_compiler_version
print(get_compiling_cuda_version())
print(get_compiler_version())

# Check MMEngine installation
from mmengine.utils.dl_utils import collect_env
print(collect_env())

2.1.0+cu118 True
1.2.0
11.8
GCC 9.3
OrderedDict([('sys.platform', 'linux'), ('Python', '3.10.12 (main, Jun 11 2023, 05:26:28) [GCC 11.4.0]'), ('CUDA available', True), ('numpy_random_seed', 2147483648), ('GPU 0', 'Tesla T4'), ('CUDA_HOME', '/usr/local/cuda'), ('NVCC', 'Cuda compilation tools, release 11.8, V11.8.89'), ('GCC', 'x86_64-linux-gnu-gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0'), ('PyTorch', '2.1.0+cu118'), ('PyTorch compiling details', 'PyTorch built with:\n  - GCC 9.3\n  - C++ Version: 201703\n  - Intel(R) oneAPI Math Kernel Library Version 2022.2-Product Build 20220804 for Intel(R) 64 architecture applications\n  - Intel(R) MKL-DNN v3.1.1 (Git Hash 64f6bcbcbab628e96f33a62c3e975f8535a7bde4)\n  - OpenMP 201511 (a.k.a. OpenMP 4.5)\n  - LAPACK is enabled (usually provided by MKL)\n  - NNPACK is enabled\n  - CPU capability usage: AVX2\n  - CUDA Runtime 11.8\n  - NVCC architecture flags: -gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_

## Perform inference with a MMAction2 recognizer
MMAction2 already provides high level APIs to do inference and training.

In [5]:
# !mkdir checkpoints
# !wget -c https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \
#       -O checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth

--2023-10-25 16:32:45--  https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth
Resolving download.openmmlab.com (download.openmmlab.com)... 8.38.121.207, 8.38.121.209, 8.38.121.210, ...
Connecting to download.openmmlab.com (download.openmmlab.com)|8.38.121.207|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 97579339 (93M) [application/octet-stream]
Saving to: ‘checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth’


2023-10-25 16:32:59 (7.36 MB/s) - ‘checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth’ saved [97579339/97579339]



In [19]:
shutil.copy("/content/drive/MyDrive/DSWorks_Equal_AI/baseline/mvit32.2_small.pth",
            "/content/mmaction2/checkpoints/mvit32.2_small.pth")

'/content/mmaction2/checkpoints/mvit32.2_small.pth'

In [20]:
shutil.copy("/content/drive/MyDrive/DSWorks_Equal_AI/baseline/mvit32.2_small_config.py",
            "/content/mmaction2/configs/recognition/mvit/mvit32.2_small_config.py")

'/content/mmaction2/configs/recognition/mvit/mvit32.2_small_config.py'

In [35]:
!python /content/drive/MyDrive/DSWorks_Equal_AI/baseline/solution.py

Loads checkpoint by local backend from path: /content/drive/MyDrive/DSWorks_Equal_AI/baseline/mvit32.2_small.pth
100% 3235/3235 [14:06<00:00,  3.82it/s]


# Copied Solution.py

In [59]:
import os

from tqdm import tqdm
from glob import glob
import pandas as pd
import numpy as np
import cv2
from mmaction.datasets.transforms.loading import DecordInit, SampleFrames, DecordDecode
from mmaction.datasets.transforms.processing import Resize, CenterCrop
from mmaction.datasets.transforms.formatting import FormatShape, PackActionInputs
from mmaction.datasets.transforms.wrappers import PytorchVideoWrapper
from mmaction.apis import inference_recognizer, init_recognizer
from mmengine.dataset import Compose
from mmcv.transforms import BaseTransform


CHECKPOINT = "/content/drive/MyDrive/DSWorks_Equal_AI/baseline/mvit32.2_small.pth"
CONFIG = "/content/drive/MyDrive/DSWorks_Equal_AI/baseline/mvit32.2_small_config.py"
# DATASET_DIR = "/content/mmaction2/tools/data/slovo/slovo_split/val"

DATASET_DIR = '/content/drive/MyDrive/DSWorks_Equal_AI/test'
OUTPUT_FILE = "/content/predicts.csv"
DEVICE = "cuda:0"


class SquarePadding(BaseTransform):

    def __init__(self, out_shape):
        self.out_shape = out_shape

    def transform(self, results):
        imgs = results['imgs']
        in_shape = results['img_shape']
        out_shape = self.out_shape
        padding = (int((out_shape[1] - in_shape[1]) / 2), int((out_shape[0] - in_shape[0]) / 2))
        pad_func = lambda x: cv2.copyMakeBorder(x, padding[1], padding[1], padding[0], padding[0], cv2.BORDER_CONSTANT, value=114)

        padded_images = [pad_func(img) for img in imgs]
        results['imgs'] = padded_images
        results['img_shape'] = out_shape
        return results



if __name__ == "__main__":
    videos = glob(os.path.join(DATASET_DIR, "*.mp4"))

    shape = (300, 300)

    test_pipeline = Compose([
        DecordInit(io_backend='disk'),
        SampleFrames(
            clip_len=32,
            frame_interval=2,
            num_clips=1,
            test_mode=True,
            out_of_bound_opt='repeat_last'
        ),
        DecordDecode(),
        Resize(scale=shape),
        SquarePadding(out_shape=shape),
        CenterCrop(crop_size=224),
        FormatShape(input_format='NCTHW'),
        PackActionInputs(),
    ])

    model = init_recognizer(CONFIG, CHECKPOINT, device=DEVICE)
    model.eval()

    names = []
    predicts = []
    for video in tqdm(videos):
        name = os.path.basename(video).replace(".mp4", "")
        names.append(name)
        predicted = inference_recognizer(model, video, test_pipeline)
        predicted_class = int(predicted.pred_label.item())
        predicts.append(predicted_class)





    result_df = pd.DataFrame.from_dict({"attachment_id":names, "class_indx":predicts})

    result_df.to_csv(OUTPUT_FILE, sep="\t", index=False)




Loads checkpoint by local backend from path: /content/drive/MyDrive/DSWorks_Equal_AI/baseline/mvit32.2_small.pth


100%|██████████| 5/5 [00:01<00:00,  3.75it/s]


In [60]:
pred_scores = predicted.pred_score.tolist()
score_tuples = tuple(zip(range(len(pred_scores)), pred_scores))
score_sorted = sorted(score_tuples, key=itemgetter(1), reverse=True)
top5_label = score_sorted[:5]

label = '/content/mmaction2/tools/data/slovo/slovo_split/slovo_val_video.txt'
labels = open(label).readlines()
labels = [x.strip() for x in labels]
results = [(labels[k[0]], k[1]) for k in top5_label]

In [61]:
print('The top-5 labels with corresponding scores are:')
for result in results:
    print(f'{result[0]}: ', result[1])

The top-5 labels with corresponding scores are:
fbeabc17-148e-4b51-a668-390e2229505a.mp4 344:  0.6083210110664368
c44e01bc-1b56-4a13-b837-62decdfb9fad.mp4 376:  0.22765898704528809
ceb9747c-98cb-4afc-b676-272b337ac0b7.mp4 301:  0.015209250152111053
4106f36b-1018-4de4-b538-b483b3ad3831.mp4 294:  0.014321260154247284
5ac15375-a9fc-4fb2-83fa-7620cdbe4118.mp4 411:  0.011500399559736252


## Train a recognizer on customized dataset

To train a new recognizer, there are usually three things to do:
1. Support a new dataset
2. Modify the config
3. Train a new recognizer

### Support a new dataset

In this tutorial, we gives an example to convert the data into the format of existing datasets. Other methods and more advanced usages can be found in the [doc](/docs/tutorials/new_dataset.md)

Firstly, let's download a tiny dataset obtained from [Kinetics-400](https://deepmind.com/research/open-source/open-source-datasets/kinetics/). We select 30 videos with their labels as train dataset and 10 videos with their labels as test dataset.

In [None]:
# Check the directory structure of the tiny data

# Install tree first
!apt-get -q install tree
!tree /content/mmaction2/tools/data/slovo/slovo_split

In [None]:
# After downloading the data, we need to check the annotation format
!cat /content/mmaction2/tools/data/slovo/slovo_split/slovo_train_video.txt

According to the format defined in [`VideoDataset`](./datasets/video_dataset.py), each line indicates a sample video with the filepath and label, which are split with a whitespace.

### Modify the config

In the next step, we need to modify the config for the training.
To accelerate the process, we finetune a recognizer using a pre-trained recognizer.

In [12]:


cfg = Config.fromfile('./configs/recognition/tsn/tsn_imagenet-pretrained-r50_8xb32-1x1x3-100e_kinetics400-rgb.py')

Given a config that trains a TSN model on kinetics400-full dataset, we need to modify some values to use it for training TSN on Kinetics400-tiny dataset.


In [13]:
from mmengine.runner import set_random_seed

# Modify dataset type and path
cfg.data_root = 'kinetics400_tiny/train/'
cfg.data_root_val = 'kinetics400_tiny/val/'
cfg.ann_file_train = 'kinetics400_tiny/kinetics_tiny_train_video.txt'
cfg.ann_file_val = 'kinetics400_tiny/kinetics_tiny_val_video.txt'


cfg.test_dataloader.dataset.ann_file = 'kinetics400_tiny/kinetics_tiny_val_video.txt'
cfg.test_dataloader.dataset.data_prefix.video = 'kinetics400_tiny/val/'

cfg.train_dataloader.dataset.ann_file = 'kinetics400_tiny/kinetics_tiny_train_video.txt'
cfg.train_dataloader.dataset.data_prefix.video = 'kinetics400_tiny/train/'

cfg.val_dataloader.dataset.ann_file = 'kinetics400_tiny/kinetics_tiny_val_video.txt'
cfg.val_dataloader.dataset.data_prefix.video  = 'kinetics400_tiny/val/'


# Modify num classes of the model in cls_head
cfg.model.cls_head.num_classes = 2
# We can use the pre-trained TSN model
cfg.load_from = './checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth'

# Set up working dir to save files and logs.
cfg.work_dir = './tutorial_exps'

# The original learning rate (LR) is set for 8-GPU training.
# We divide it by 8 since we only use one GPU.
cfg.train_dataloader.batch_size = cfg.train_dataloader.batch_size // 16
cfg.val_dataloader.batch_size = cfg.val_dataloader.batch_size // 16
cfg.optim_wrapper.optimizer.lr = cfg.optim_wrapper.optimizer.lr / 8 / 16
cfg.train_cfg.max_epochs = 10

cfg.train_dataloader.num_workers = 2
cfg.val_dataloader.num_workers = 2
cfg.test_dataloader.num_workers = 2

# We can initialize the logger for training and have a look
# at the final config used for training
print(f'Config:\n{cfg.pretty_text}')


Config:
ann_file_train = 'kinetics400_tiny/kinetics_tiny_train_video.txt'
ann_file_val = 'kinetics400_tiny/kinetics_tiny_val_video.txt'
auto_scale_lr = dict(base_batch_size=256, enable=False)
data_root = 'kinetics400_tiny/train/'
data_root_val = 'kinetics400_tiny/val/'
dataset_type = 'VideoDataset'
default_hooks = dict(
    checkpoint=dict(
        interval=3, max_keep_ckpts=3, save_best='auto', type='CheckpointHook'),
    logger=dict(ignore_last=False, interval=20, type='LoggerHook'),
    param_scheduler=dict(type='ParamSchedulerHook'),
    runtime_info=dict(type='RuntimeInfoHook'),
    sampler_seed=dict(type='DistSamplerSeedHook'),
    sync_buffers=dict(type='SyncBuffersHook'),
    timer=dict(type='IterTimerHook'))
default_scope = 'mmaction'
env_cfg = dict(
    cudnn_benchmark=False,
    dist_cfg=dict(backend='nccl'),
    mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0))
file_client_args = dict(io_backend='disk')
load_from = './checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_

### Train a new recognizer

Finally, lets initialize the dataset and recognizer, then train a new recognizer!

In [14]:
import os.path as osp
import mmengine
from mmengine.runner import Runner

# Create work_dir
mmengine.mkdir_or_exist(osp.abspath(cfg.work_dir))

# build the runner from config
runner = Runner.from_cfg(cfg)

# start training
runner.train()

10/25 16:34:56 - mmengine - INFO - 
------------------------------------------------------------
System environment:
    sys.platform: linux
    Python: 3.10.12 (main, Jun 11 2023, 05:26:28) [GCC 11.4.0]
    CUDA available: True
    numpy_random_seed: 156649256
    GPU 0: Tesla T4
    CUDA_HOME: /usr/local/cuda
    NVCC: Cuda compilation tools, release 11.8, V11.8.89
    GCC: x86_64-linux-gnu-gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
    PyTorch: 2.1.0+cu118
    PyTorch compiling details: PyTorch built with:
  - GCC 9.3
  - C++ Version: 201703
  - Intel(R) oneAPI Math Kernel Library Version 2022.2-Product Build 20220804 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v3.1.1 (Git Hash 64f6bcbcbab628e96f33a62c3e975f8535a7bde4)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 11.8
  - NVCC architecture flags: -gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60

Downloading: "https://download.pytorch.org/models/resnet50-11ad3fa6.pth" to /root/.cache/torch/hub/checkpoints/resnet50-11ad3fa6.pth


10/25 16:34:59 - mmengine - INFO - These parameters in pretrained checkpoint are not loaded: {'fc.weight', 'fc.bias'}
Loads checkpoint by local backend from path: ./checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth
The model and loaded state dict do not match exactly

size mismatch for cls_head.fc_cls.weight: copying a param with shape torch.Size([400, 2048]) from checkpoint, the shape in current model is torch.Size([2, 2048]).
size mismatch for cls_head.fc_cls.bias: copying a param with shape torch.Size([400]) from checkpoint, the shape in current model is torch.Size([2]).
10/25 16:34:59 - mmengine - INFO - Load checkpoint from ./checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth
10/25 16:34:59 - mmengine - INFO - Checkpoints will be saved to /content/mmaction2/tutorial_exps.
10/25 16:35:03 - mmengine - INFO - Exp name: tsn_imagenet-pretrained-r50_8xb32-1x1x3-100e_kinetics400-rgb_20231025_163456
10/25 16:35:03 - mmengine - INFO - Epoch(train)  [1][

Recognizer2D(
  (data_preprocessor): ActionDataPreprocessor()
  (backbone): ResNet(
    (conv1): ConvModule(
      (conv): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
      (bn): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (activate): ReLU(inplace=True)
    )
    (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
    (layer1): Sequential(
      (0): Bottleneck(
        (conv1): ConvModule(
          (conv): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (activate): ReLU(inplace=True)
        )
        (conv2): ConvModule(
          (conv): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (activate): ReLU(inplace=True)
        

### Understand the log
From the log, we can have a basic understanding the training process and know how well the recognizer is trained.

Firstly, the ResNet-50 backbone pre-trained on ImageNet is loaded, this is a common practice since training from scratch is more cost. The log shows that all the weights of the ResNet-50 backbone are loaded except the `fc.bias` and `fc.weight`.

Second, since the dataset we are using is small, we loaded a TSN model and finetune it for action recognition.
The original TSN is trained on original Kinetics-400 dataset which contains 400 classes but Kinetics-400 Tiny dataset only have 2 classes. Therefore, the last FC layer of the pre-trained TSN for classification has different weight shape and is not used.

Third, after training, the recognizer is evaluated by the default evaluation. The results show that the recognizer achieves 100% top1 accuracy and 100% top5 accuracy on the val dataset,

Not bad!

## Test the trained recognizer

After finetuning the recognizer, let's check the prediction results!

In [15]:
runner.test()

10/25 16:36:03 - mmengine - INFO - Epoch(test) [10/10]    acc/top1: 1.0000  acc/top5: 1.0000  acc/mean1: 1.0000  data_time: 0.1359  time: 0.8529


{'acc/top1': 1.0, 'acc/top5': 1.0, 'acc/mean1': 1.0}