https://github.com/facebookresearch/pytorchvideo
https://pytorch.org/hub/facebookresearch_pytorchvideo_slowfast/

# SlowFast

*Author: FAIR PyTorchVideo*

**SlowFast networks pretrained on the Kinetics 400 dataset**


### Example Usage

#### Imports

Load the model:

In [1]:
!pip install pytorchvideo

Collecting pytorchvideo
  Downloading pytorchvideo-0.1.2.tar.gz (115 kB)
[K     |████████████████████████████████| 115 kB 5.3 MB/s 
[?25hCollecting fvcore
  Downloading fvcore-0.1.5.post20210825.tar.gz (49 kB)
[K     |████████████████████████████████| 49 kB 3.4 MB/s 
[?25hCollecting av
  Downloading av-8.0.3-cp37-cp37m-manylinux2010_x86_64.whl (37.2 MB)
[K     |████████████████████████████████| 37.2 MB 29 kB/s 
[?25hCollecting parameterized
  Downloading parameterized-0.8.1-py2.py3-none-any.whl (26 kB)
Collecting iopath
  Downloading iopath-0.1.9-py3-none-any.whl (27 kB)
Collecting yacs>=0.1.6
  Downloading yacs-0.1.8-py3-none-any.whl (14 kB)
Collecting pyyaml>=5.1
  Downloading PyYAML-5.4.1-cp37-cp37m-manylinux1_x86_64.whl (636 kB)
[K     |████████████████████████████████| 636 kB 33.4 MB/s 
Collecting portalocker
  Downloading portalocker-2.3.2-py2.py3-none-any.whl (15 kB)
Building wheels for collected packages: pytorchvideo, fvcore
  Building wheel for pytorchvideo (setup.py) 

In [2]:
import torch
# Choose the `slowfast_r50` model 
model = torch.hub.load('facebookresearch/pytorchvideo', 'slowfast_r50', pretrained=True)

Downloading: "https://github.com/facebookresearch/pytorchvideo/archive/master.zip" to /root/.cache/torch/hub/master.zip
Downloading: "https://dl.fbaipublicfiles.com/pytorchvideo/model_zoo/kinetics/SLOWFAST_8x8_R50.pyth" to /root/.cache/torch/hub/checkpoints/SLOWFAST_8x8_R50.pyth


  0%|          | 0.00/264M [00:00<?, ?B/s]

Import remaining functions:

In [3]:
from typing import Dict
import json
import urllib
from torchvision.transforms import Compose, Lambda
from torchvision.transforms._transforms_video import (
    CenterCropVideo,
    NormalizeVideo,
)
from pytorchvideo.data.encoded_video import EncodedVideo
from pytorchvideo.transforms import (
    ApplyTransformToKey,
    ShortSideScale,
    UniformTemporalSubsample,
    UniformCropVideo
) 

  "The _functional_video module is deprecated. Please use the functional module instead."
  "The _transforms_video module is deprecated. Please use the transforms module instead."


#### Setup

Set the model to eval mode and move to desired device.

In [4]:
# Set to GPU or CPU
device = "cpu"
model = model.eval()
model = model.to(device)

Download the id to label mapping for the Kinetics 400 dataset on which the torch hub models were trained. This will be used to get the category label names from the predicted class ids.

In [5]:
json_url = "https://dl.fbaipublicfiles.com/pyslowfast/dataset/class_names/kinetics_classnames.json"
json_filename = "kinetics_classnames.json"
try: urllib.URLopener().retrieve(json_url, json_filename)
except: urllib.request.urlretrieve(json_url, json_filename)

In [6]:
with open(json_filename, "r") as f:
    kinetics_classnames = json.load(f)

# Create an id to label name mapping
kinetics_id_to_classname = {}
for k, v in kinetics_classnames.items():
    kinetics_id_to_classname[v] = str(k).replace('"', "")

#### Define input transform

In [7]:
side_size = 256
mean = [0.45, 0.45, 0.45]
std = [0.225, 0.225, 0.225]
crop_size = 256
num_frames = 32
sampling_rate = 2
frames_per_second = 30
slowfast_alpha = 4
num_clips = 10
num_crops = 3

class PackPathway(torch.nn.Module):
    """
    Transform for converting video frames as a list of tensors. 
    """
    def __init__(self):
        super().__init__()
        
    def forward(self, frames: torch.Tensor):
        fast_pathway = frames
        # Perform temporal sampling from the fast pathway.
        slow_pathway = torch.index_select(
            frames,
            1,
            torch.linspace(
                0, frames.shape[1] - 1, frames.shape[1] // slowfast_alpha
            ).long(),
        )
        frame_list = [slow_pathway, fast_pathway]
        return frame_list

transform =  ApplyTransformToKey(
    key="video",
    transform=Compose(
        [
            UniformTemporalSubsample(num_frames),
            Lambda(lambda x: x/255.0),
            NormalizeVideo(mean, std),
            ShortSideScale(
                size=side_size
            ),
            CenterCropVideo(crop_size),
            PackPathway()
        ]
    ),
)

# The duration of the input clip is also specific to the model.
clip_duration = (num_frames * sampling_rate)/frames_per_second

#### Run Inference

Download an example video.

In [8]:
url_link = "https://dl.fbaipublicfiles.com/pytorchvideo/projects/archery.mp4"
video_path = 'archery.mp4'
try: urllib.URLopener().retrieve(url_link, video_path)
except: urllib.request.urlretrieve(url_link, video_path)

In [9]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [10]:
ls /content/drive/MyDrive/MSVD/MSVD/videos/

00jrXRMlZOY_0_10.avi     -DRy7rBg0IQ_31_37.avi    pUPKsHTDZTo_70_85.avi
02Z-kuB3IaM_2_13.avi     dtn0PuxgfkM_0_5.avi      PuQVs2Ch1LY_5_15.avi
04Gt01vatkk_248_265.avi  dtwXtwJByYk_5_14.avi     Puv_4NtflqE_26_34.avi
04Gt01vatkk_308_321.avi  DuMkW35BwK8_43_47.avi    -pUwIypksfE_13_23.avi
05gNigkqfNU_11_23.avi    DvYN53KBDr0_51_62.avi    pW9DFPqoIsI_26_50.avi
05gNigkqfNU_24_32.avi    dZBIdRGKRhM_13_32.avi    pzq5fPfsPZg_145_160.avi
05gNigkqfNU_25_34.avi    E2r6nnkwl0c_6_18.avi     pzq5fPfsPZg_29_33.avi
05gNigkqfNU_78_84.avi    e3XkmpNcSt4_8_19.avi     pzq5fPfsPZg_51_57.avi
05Gtb7_9tLU_0_9.avi      e40bBP0_AbE_64_67.avi    q3I3R_gqy8M_34_37.avi
06CbMa0kDr8_3_13.avi     E4k0Aylzdyo_97_104.avi   q3I3R_gqy8M_38_42.avi
08pVpBq706k_175_212.avi  e4QGnppJ-ys_6_14.avi     q5ZRMvjzhXQ_15_29.avi
0bSz70pYAP0_5_15.avi     E61HNXjgyqA_22_32.avi    Q6HuQEIJqcA_9_16.avi
0GXq1An3yHI_22_35.avi    E6sqA9QtV5I_195_201.avi  q6vz80UkVtw_0_7.avi
0hyZ__3YhZc_279_283.avi  e996zZ0uV_A_152_163.avi  Q7H9mI9dtMY_20_3

In [18]:
video_path = '/content/drive/MyDrive/MSVD/MSVD/videos/DIebwNHGjm8_27_38.avi'

In [19]:
#display video
from moviepy.editor import *

path='/content/drive/MyDrive/MSVD/MSVD/videos/-Cv5LsqKUXc_17_25.avi' 

clip=VideoFileClip(video_path)
clip.ipython_display(width=280)

100%|██████████| 331/331 [00:00<00:00, 472.91it/s]


Load the video and transform it to the input format required by the model.

In [20]:
# Select the duration of the clip to load by specifying the start and end duration
# The start_sec should correspond to where the action occurs in the video
start_sec = 0
end_sec = start_sec + clip_duration

# Initialize an EncodedVideo helper class and load the video
video = EncodedVideo.from_path(video_path)

# Load the desired clip
video_data = video.get_clip(start_sec=start_sec, end_sec=end_sec)

# Apply a transform to normalize the video input
video_data = transform(video_data)

# Move the inputs to the desired device
inputs = video_data["video"]
inputs = [i.to(device)[None, ...] for i in inputs]

#### Get Predictions

In [21]:
# Pass the input clip through the model
preds = model(inputs)

# Get the predicted classes
post_act = torch.nn.Softmax(dim=1)
preds = post_act(preds)
pred_classes = preds.topk(k=5).indices[0]

# Map the predicted classes to the label names
pred_class_names = [kinetics_id_to_classname[int(i)] for i in pred_classes]
print("Top 5 predicted labels: %s" % ", ".join(pred_class_names))

Top 5 predicted labels: motorcycling, riding a bike, riding mountain bike, riding scooter, riding unicycle


### Model Description
SlowFast model architectures are based on [1] with pretrained weights using the 8x8 setting
on the Kinetics dataset. 

| arch | depth | frame length x sample rate | top 1 | top 5 | Flops (G) | Params (M) |
| --------------- | ----------- | ----------- | ----------- | ----------- | ----------- |  ----------- | ----------- |
| SlowFast | R50   | 8x8                        | 76.94 | 92.69 | 65.71     | 34.57      |
| SlowFast | R101  | 8x8                        | 77.90 | 93.27 | 127.20    | 62.83      |


### References
[1] Christoph Feichtenhofer et al, "SlowFast Networks for Video Recognition"
https://arxiv.org/pdf/1812.03982.pdf