# Reference 
* [YutaroOgawa's github](https://github.com/YutaroOgawa/pytorch_advanced/blob/master/9_video_classification_eco/9-4_3_ECO_DataLoader.ipynb) ← code reference 
* [ActivityNet/Crawler/Kinetics](https://github.com/activitynet/ActivityNet/tree/master/Crawler/Kinetics) ← baseline with python2 execution


# 1. Prerequisite 

## 1-1. get the example repository 


In [1]:
!git init . 
!git remote add -f origin https://github.com/DoranLyong/Kinetics-400-tutorial
!git config core.sparseCheckout true

Initialized empty Git repository in /content/.git/
Updating origin
remote: Enumerating objects: 39, done.[K
remote: Counting objects: 100% (39/39), done.[K
remote: Compressing objects: 100% (31/31), done.[K
remote: Total 39 (delta 2), reused 33 (delta 2), pack-reused 0[K
Unpacking objects: 100% (39/39), done.
From https://github.com/DoranLyong/Kinetics-400-tutorial
 * [new branch]      main       -> origin/main


In [2]:
!echo "dataset" >> .git/info/sparse-checkout
!echo "utils" >> .git/info/sparse-checkout

!cat .git/info/sparse-checkout

dataset
utils


In [3]:
!git remote add Kinetics-400-tutorial https://github.com/DoranLyong/Kinetics-400-tutorial
!git pull Kinetics-400-tutorial main

From https://github.com/DoranLyong/Kinetics-400-tutorial
 * branch            main       -> FETCH_HEAD
 * [new branch]      main       -> Kinetics-400-tutorial/main


## 1-2. installing FFmpeg 
* for extracting frames of video data 

In [4]:
!sudo apt update 
!sudo apt install -y ffmpeg 

[33m0% [Working][0m            Get:1 http://ppa.launchpad.net/c2d4u.team/c2d4u4.0+/ubuntu bionic InRelease [15.9 kB]
[33m0% [Connecting to archive.ubuntu.com (91.189.91.39)] [Connecting to security.ub[0m                                                                               Hit:2 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  InRelease
[33m0% [Connecting to archive.ubuntu.com (91.189.91.39)] [Connecting to security.ub[0m                                                                               Get:3 https://cloud.r-project.org/bin/linux/ubuntu bionic-cran40/ InRelease [3,626 B]
[33m0% [Connecting to archive.ubuntu.com (91.189.91.39)] [Connecting to security.ub[0m[33m0% [2 InRelease gpgv 1,575 B] [Connecting to archive.ubuntu.com (91.189.91.39)][0m                                                                               Hit:4 http://ppa.launchpad.net/cran/libgit2/ubuntu bionic InRelease
[33m0% [2 InRelease gpgv 1,575 

In [5]:
!ffmpeg -version

ffmpeg version 3.4.8-0ubuntu0.2 Copyright (c) 2000-2020 the FFmpeg developers
built with gcc 7 (Ubuntu 7.5.0-3ubuntu1~18.04)
configuration: --prefix=/usr --extra-version=0ubuntu0.2 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --enable-gpl --disable-stripping --enable-avresample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librubberband --enable-librsvg --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq 

# 2. Data prepare
change the 'project_root' path in ```utils/extract_frames.py``` ,then 
```python
(...)
@click.command()
@click.option('--project_root', required=True, 
                default=osp.join('/content'),
                help="Root path for dataset")
(...)

```

In [6]:
!pwd  # project root path 

/content


In [8]:
# Extract frames of video data 
!python ./utils/extract_frames.py

project_root: /content
dataset_path: dataset/kinetics_videos
class_list: ['arm wrestling', 'bungee jumping']
Extracting frames for /content/dataset/kinetics_videos/arm wrestling/5JzkrOVhPOw_000027_000037.mp4
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '/content/dataset/kinetics_videos/arm wrestling/5JzkrOVhPOw_000027_000037.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    encoder         : Lavf56.40.101
  Duration: 00:00:10.01, start: 0.000000, bitrate: 1842 kb/s
    Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 1280x720, 1741 kb/s, 30 fps, 30 tbr, 15360 tbn, 60 tbc (default)
    Metadata:
      handler_name    : VideoHandler
    Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, mono, fltp, 93 kb/s (default)
    Metadata:
      handler_name    : SoundHandler
Stream mapping:
  Stream #0:0 -> #0:0 (h264 (native) -> png (native))
Press [q] to stop, [?] for help
Output #0, image2, to '/content

# 3. PyTorch Dataset and DataLoader 

In [11]:
import os 
import os.path as osp 
import glob 
import csv 

import numpy as np 
from PIL import Image 
import matplotlib.pyplot as plt 

import torch 
import torch.nn as nn 
import torch.utils.data as data 

import torchvision

from utils.kinetics400_dataloader import make_datapath_list, get_label_id_dictionary, VideoTransform, VideoDataset

## 3-1. get labels 

In [14]:
label_dicitionary_path = f'./dataset/anno/kinetics_400_label_dicitionary.csv'
label_id_dict, id_label_dict = get_label_id_dictionary(label_dicitionary_path)

In [15]:
#print(id_label_dict)
print(label_id_dict)

{'abseiling': 0, 'air drumming': 1, 'answering questions': 2, 'applauding': 3, 'applying cream': 4, 'archery': 5, 'arm wrestling': 6, 'arranging flowers': 7, 'assembling computer': 8, 'auctioning': 9, 'baby waking up': 10, 'baking cookies': 11, 'balloon blowing': 12, 'bandaging': 13, 'barbequing': 14, 'bartending': 15, 'beatboxing': 16, 'bee keeping': 17, 'belly dancing': 18, 'bench pressing': 19, 'bending back': 20, 'bending metal': 21, 'biking through snow': 22, 'blasting sand': 23, 'blowing glass': 24, 'blowing leaves': 25, 'blowing nose': 26, 'blowing out candles': 27, 'bobsledding': 28, 'bookbinding': 29, 'bouncing on trampoline': 30, 'bowling': 31, 'braiding hair': 32, 'breading or breadcrumbing': 33, 'breakdancing': 34, 'brush painting': 35, 'brushing hair': 36, 'brushing teeth': 37, 'building cabinet': 38, 'building shed': 39, 'bungee jumping': 40, 'busking': 41, 'canoeing or kayaking': 42, 'capoeira': 43, 'carrying baby': 44, 'cartwheeling': 45, 'carving pumpkin': 46, 'catchin

## 3-2. get video list 

In [19]:
dataset_root = f"./dataset/kinetics_videos"
video_list = make_datapath_list(dataset_root)

print(video_list)

['./dataset/kinetics_videos/arm wrestling/BdMiTo_OtnU_000024_000034', './dataset/kinetics_videos/arm wrestling/5JzkrOVhPOw_000027_000037', './dataset/kinetics_videos/arm wrestling/C4lCVBZ3ux0_000028_000038', './dataset/kinetics_videos/arm wrestling/ehLnj7pXnYE_000027_000037', './dataset/kinetics_videos/bungee jumping/TUvSX0pYu4o_000002_000012', './dataset/kinetics_videos/bungee jumping/zkXOcxGnUhs_000025_000035', './dataset/kinetics_videos/bungee jumping/dAeUFSdYG1I_000010_000020', './dataset/kinetics_videos/bungee jumping/b6yQZjPE26c_000023_000033']


## 3-3. torch dataset 

In [20]:
# 전처리 설정
resize, crop_size = 224, 224
mean, std = [104, 117, 123], [1, 1, 1]
video_transform = VideoTransform(resize, crop_size, mean, std)


# Dataset 작성
# num_segments는 동영상을 어떻게 분할해 사용할지 정한다
val_dataset = VideoDataset(video_list, label_id_dict, num_segments=16,
                           phase="val", transform=video_transform, img_tmpl='{:05d}.png')

  "Argument interpolation should be of type InterpolationMode instead of int. "


In [21]:
# 데이터를 꺼내는 예
# 출력은 imgs_transformed, label, label_id, dir_path
index = 0
sample = val_dataset.__getitem__(index)

print(sample[0].shape)  # 동영상의 텐서
print(sample[1])  # 라벨
print(sample[2])  # 라벨ID
print(sample[3])  # 동영상 경로

torch.Size([16, 3, 224, 224])
arm wrestling
6
./dataset/kinetics_videos/arm wrestling/BdMiTo_OtnU_000024_000034


## 3-4. torch dataloader 

In [22]:
batch_size = 8

val_dataloader = data.DataLoader( val_dataset, 
                                batch_size=batch_size, 
                                shuffle=False)

In [23]:
# 동작 확인
batch_iterator = iter(val_dataloader)  # 반복자로 변환
imgs_transformeds, labels, label_ids, dir_path = next(batch_iterator)  # 1번째 요소를 꺼낸다

print(imgs_transformeds.shape)

torch.Size([8, 16, 3, 224, 224])
