# Load Video Data

This tutorial demonstrates:
1. how to load and
2. preprocess
   
AVI video data using the [UCF101 human action dataset](https://www.tensorflow.org/datasets/catalog/ucf101).

**About the dataset:** \
The original datset contains realistic action videos collected from Youtube with 101 categories (playing cello, brushing teeth, applying eye makeup and, many more).

**What we will learn:** 
1. Load the data from a zip file
2. Read sequences of frames from video files
3. Visualize video data
4. Wrap the frame generator `tf.data.Dataset`

In [3]:
## installing required libraries
# remotezip to inspect the contents of a zip file
# tqdm to use a progress bar
# opencv to process video files
# tensorflow_docs for embedding data in a jupyter notebook

!pip install remotezip tqdm opencv-python
!pip install -q git+https://github.com/tensorflow/docs

Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com


In [4]:
# importing the required libraries
import tqdm
import random
import pathlib
import itertools
import collections

import os
import cv2
import numpy as np
import remotezip as rz

import tensorflow as tf

# some modules to display an animation using imageio
import imageio
from IPython import display
from urllib import request
from tensorflow_docs.vis import embed

In [5]:
# Download a subset of the categories
# UCF101 dataset contains 101 categories of different actions.
# we will use only a subset in this demonstration

URL = 'https://storage.googleapis.com/thumos14_files/UCF101_videos.zip'

In [13]:
# The above URL contains a zip file.
# We can remotely examine the contents of this zip file, like this

def list_files_from_zip_url(zip_url):
    '''
    List the files in each class (category) of the dataset, 
    given a URL with the zip file.

    Args:
    zip_url: A URL from which the files can be extracted from

    Returns:
    List of files in each of the classes.
    '''
    files = []
    with rz.RemoteZip(zip_url) as zip:
        for zip_info in zip.infolist():
            files.append(zip_info.filename)
    return files


files = list_files_from_zip_url(URL)

In [17]:
files = [f for f in files if f.endswith('.avi')]
print("Number of video files: ", len(files))
files[:10]

Number of video files:  13320


['UCF101/v_ApplyEyeMakeup_g01_c01.avi',
 'UCF101/v_ApplyEyeMakeup_g01_c02.avi',
 'UCF101/v_ApplyEyeMakeup_g01_c03.avi',
 'UCF101/v_ApplyEyeMakeup_g01_c04.avi',
 'UCF101/v_ApplyEyeMakeup_g01_c05.avi',
 'UCF101/v_ApplyEyeMakeup_g01_c06.avi',
 'UCF101/v_ApplyEyeMakeup_g02_c01.avi',
 'UCF101/v_ApplyEyeMakeup_g02_c02.avi',
 'UCF101/v_ApplyEyeMakeup_g02_c03.avi',
 'UCF101/v_ApplyEyeMakeup_g02_c04.avi']

Note from the above filenames that the classname is included in the filename of each video.

So, we need to extract the classname from the filename, for this we define a function called `get_class`.

In [18]:
def get_class(filename):
    '''
    Retrieves the name of the class given file belongs to.

    Args:
    filename: Name of the file in the UCF101 dataset

    Returns:
    Class that the file belongs to
    '''
    return filename.split('_')[-3]

We also need to know which files belong to which class, and in a format we can easily use. Hence, we create function called `get_files_per_class` which converts the list of all files into a dictionary listing the files for each class.

In [None]:
def get_files_per_class(files):
    '''
    Retrieve the files that belong to each class.

    Args:
    files: list of files in the dataset

    Returns:
    a dictionary of class names (key) and files (values).
    '''
    files_for_class = collections.defaultdict(list)