# Introduction to Video Classification and Human Avtivity Recognition

This notebook contains the code from [this tutorial](https://learnopencv.com/introduction-to-video-classification-and-human-activity-recognition/) by Taha Anwar.

The goal of completing this tutorial is to learn:
1. Video Preprocessing
2. How to input video data to neural network?
3. Video Classification

This tutorial is a prerequisite to [this tensorflow tutorial](https://youtu.be/DjQFwJGnRDY?list=PLQY2H8rRoyvwmjfn7hM-Yg_6RIyoMnKQx) by Shilpa Kancharla.

In this tutorial we will go over a number of approaches to make a video classifier for human activity recognition.

## 1. Understanding Human Activity Recognition

- **Task:** Classifying or prediction the activity/action being performed by someone is called Activity Recognition.

- How is it different from a normal classification task? \
For human activity recognition, you need a series of data points to predict the action being performed correctly. You can't just give prediction from frame to frame. But the information of a series of frames is required. **So, it is a time-series classification problem.** Hence, you need data from a series of timesteps.

- How was Human Activity Recognition traditionally solved? 
  1. (Most common and effective technique) Attach a wearable sensor (example a smartphone) on to a person and then train a temporal model like an LSTM on the output of sensor data. \
  Here the readings from accelerometer and gyroscope are used to train a model which outputs these six classes: (walking, walking upstairs, walking downstairs, sitting, standingm laying)

  2. Image Classification 
     1. Through simple image classification on frame by frame basis. This can also work well because the model learns environmental context as well. Check the tutorial webpage for explanation.
     2. Simple image classification on frame by frame basis, but results averaged over a number of frames. Moving average.

        
  3. Video Classification \
     We are looking at methods which can take input, a short video clip and then output the activity being performed.
     1. Method1: Single-Frame CNN, then averaging
     2. Method2: Late Fusion
     3. Method3: Early Fusion
     4. Method4: Using CNN with LSTM
     5. Method5: Using pose detection and LSTM
     6. Method6: Using Optical Flow and CNN
     7. Method7: Using SlowFast Networks
     8. Method8: Using 3D CNN's / Slow Fusion


- Types of Activity Recognition Problems
  1. Simple Activity Recognition
  2. Temporal Activity Recognition/Localization
  3. Spatio-Temporal Detection

## 2. Video Classification using Keras

Basic video classification system using Keras.
1. First we will create a normal classifier.
2. Then implement a moving average technique
3. Then finally create a single frame CNN video classifier.

In [2]:
!pip install pafy youtube-dl moviepy

Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Collecting pafy
  Downloading pafy-0.5.5-py2.py3-none-any.whl (35 kB)
Collecting youtube-dl
  Downloading youtube_dl-2021.12.17-py2.py3-none-any.whl (1.9 MB)
     ---------------------------------------- 0.0/1.9 MB ? eta -:--:--
      --------------------------------------- 0.0/1.9 MB ? eta -:--:--
     - -------------------------------------- 0.1/1.9 MB 1.3 MB/s eta 0:00:02
     - -------------------------------------- 0.1/1.9 MB 1.3 MB/s eta 0:00:02
     ---- ----------------------------------- 0.2/1.9 MB 1.4 MB/s eta 0:00:02
     ----- ---------------------------------- 0.3/1.9 MB 1.4 MB/s eta 0:00:02
     ----- ---------------------------------- 0.3/1.9 MB 1.4 MB/s eta 0:00:02
     ----- ---------------------------------- 0.3/1.9 MB 1.4 MB/s eta 0:00:02
     ----- ---------------------------------- 0.3/1.9 MB 1.4 MB/s eta 0:00:02
     -------- ------------------------------- 0.4/1.9 MB 1.1 MB/s eta 0:00:02
   

In [5]:
# importing the libraries
import os

import numpy as np
import random
import tensorflow as tf

In [6]:
# set numpy, python and tensorflow seeds to get consistent results
seed_constant = 23
np.random.seed(seed_constant)
random.seed(seed_constant)
tf.random.set_seed(seed_constant)

#### Download and Extract the Dataset

The dataset we will use is [UCF50 - Action Recognition Data Set](https://www.crcv.ucf.edu/data/UCF50.php).

UCF50 is an action recognition dataset which contains:
- 50 action categories consisting of realistic youtube videos
- 25 groups of videos per action category
- 133 average videos per action category
- 199 average number of frames per video
- 320 avaergae frames width per video
- 240 avergae frames height per video
- 26 average frames per second per video 

In [10]:
!wget -nc --no-check-certificate https://www.crcv.ucf.edu/data/UCF50.rar
!unrar x UCF50.rar -inul -y

'wget' is not recognized as an internal or external command,
operable program or batch file.
'unrar' is not recognized as an internal or external command,
operable program or batch file.
