<a href="https://colab.research.google.com/github/DevavratSinghBisht/neural-networks/blob/main/neural-networks/6.VideoData(CNN)/Video_Classification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Video Classification

* @author: Devavrat Singh Bisht
* Dataset: YouTube DataSet Annotated
* Click [here](https://www.crcv.ucf.edu/data/YouTube_DataSet_Annotated.zip) to download the whole YouTube DataSet.
* Click [here](https://www.crcv.ucf.edu/data/UCF_YouTube_Action.php) to visit the website where you can download this and many other similar datasets.
* Note: I have reduced the dataset to the 3 classes mentioned below, in order to reduce the dataset size and thus computation required in order to fit it. The original dataset contains 11 classes

In this session we will do video classification.
There are 3 classes/types of videos:
* Walking
* Horse Riding
* Bikinng

As a video is made up of frames, we will take multiple frames from a single video and make a convolutional network using Conv3D laeyers to predict the class of the video.

The video in our dataset are small and is about 10sec on an average. So taking 5 frames from the video seems good enough for our learning purpose, as we do not have access to high computation.

Also building a model that perfectly fits a video data needs a huge dataset and a lot of computation. Thus, understanding the concept is our main aim in this notbook, none the less we will also try to optimize the model a little bit.

## Importing Libraries

In [55]:
# you can ignore this
# connecting to drive
from google.colab import drive
drive.mount('/content/drive/')

Drive already mounted at /content/drive/; to attempt to forcibly remount, call drive.mount("/content/drive/", force_remount=True).


In [56]:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv3D, Flatten, Dense, Dropout, BatchNormalization

import os
import cv2
import random
import numpy as np

## Data Loading and Preprocessing

We will create a data generator or you can also call it as data loader, this class will take the video data directly from the hard drive and get exactly 5 frames from the video and since each frame will be resized to (32, 32, 3) irrespective of the shape of original frames in the video, we will get an output of shape (batch size, 5, 32, 32, 3).

One 4D array will represent 1 Video. And stacking all these 4D arrays to make a batch will result in a 5D array of whole dataset.
As we only have 3 classes our target variable will be of shape (batch size, 3) that is a 2D array as it usually is. The only change that happens here is in the independaent variable i.e. X.

In [29]:
class DataGenerator(tf.keras.utils.Sequence):
  'Generates data for Keras'
  def __init__(self, dataset_path, batch_size=32, dim=(5, 32, 32, 3), vid_per_class = 21*3):
    self.dataset_path = dataset_path
    self.dir_list = os.listdir(dataset_path)
    self.n_classes = len(self.dir_list)
    self.batch_size = batch_size
    self.dim = dim
    self.frame_per_vid, self.height, self.width, self.channels = self.dim
    self.dataset_len = 0
    self.vid_per_class = vid_per_class

    # for dir in self.dir_list:
    #   dir_path = self.dataset_path + '/' + dir
    #   #print(os.listdir(dir_path))
    #   self.dataset_len = self.dataset_len + len(os.listdir(dir_path))

    # self.dataset_len = self.dataset_len * self.dim[0]

    self.dataset_len = self.n_classes * self. vid_per_class
      

  def __len__(self):
    'Denotes the number of batches per epoch'
    return int(np.floor(self.dataset_len / self.batch_size))

  def __getitem__(self, index):
    'Generate one batch of data'
    # Generate data
    X, y = self.__data_generation()

    return X, y

  def __data_generation(self):
    'Generates data containing batch_size samples' # X : (n_samples, *dim, n_channels)
    # Initialization
    X = np.zeros((self.batch_size, *self.dim))
    y = np.zeros((self.batch_size, self.n_classes), dtype=int)

    # Generate data
    for i in range(self.batch_size):
      #print(i)

      frame_list = []

      #generates random number between and inclusive of the limiting values
      class_no = random.randint(0, self.n_classes-1)

      vid_dir_path = self.dataset_path + "//" + self.dir_list[class_no]
      vid_path = vid_dir_path + "//" + random.choice(os.listdir(vid_dir_path))

      cam = cv2.VideoCapture(vid_path)

      currentframe = 0
  
      while(True): 
      
        # reading from frame 
        ret,frame = cam.read() 
  
        if ret:
          frame = cv2.resize(frame, (self.height, self.width), interpolation = cv2.INTER_NEAREST)
          frame_list.append(frame)           
        else: 
          break
      
      multiplier = (len(frame_list)-1)//(self.frame_per_vid-1)

      for j in range(self.frame_per_vid):
        #print(j, multiplier, len(frame_list), frame_list[j*multiplier].shape)
        X[i, j, :, :, :] = frame_list[j*multiplier]

      y[i, class_no] = 1 

    X = X/255

    return X, y

## Model Building

In [45]:
model = Sequential([
                    Conv3D(4, kernel_size=(2, 8, 8), input_shape=(5, 32, 32, 3), activation='relu'),
                    Dropout(0.5), # for regularization 
                    Conv3D(16, kernel_size=(2, 8, 8), activation='relu'),
                    Dropout(0.5),
                    Conv3D(32, kernel_size=(1, 16, 16), activation='relu'),
                    Dropout(0.5),
                    Flatten(),
                    Dense(256, activation='relu'),
                    Dense(3, activation='softmax')
])

In [46]:
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=["accuracy"])

In [47]:
model.summary()

Model: "sequential_7"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv3d_11 (Conv3D)           (None, 4, 25, 25, 4)      1540      
_________________________________________________________________
dropout (Dropout)            (None, 4, 25, 25, 4)      0         
_________________________________________________________________
batch_normalization (BatchNo (None, 4, 25, 25, 4)      16        
_________________________________________________________________
conv3d_12 (Conv3D)           (None, 3, 18, 18, 16)     8208      
_________________________________________________________________
dropout_1 (Dropout)          (None, 3, 18, 18, 16)     0         
_________________________________________________________________
batch_normalization_1 (Batch (None, 3, 18, 18, 16)     64        
_________________________________________________________________
conv3d_13 (Conv3D)           (None, 3, 3, 3, 32)      

## Model Training

In [48]:
train_datagen = DataGenerator('/content/drive/MyDrive/Study/DL/Video Classification/Datset/Train')
val_datagen = DataGenerator('/content/drive/MyDrive/Study/DL/Video Classification/Datset/Val', batch_size=4, vid_per_class= 2*3)

In [49]:
model.fit(train_datagen, validation_data=val_datagen, epochs=50)

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


<tensorflow.python.keras.callbacks.History at 0x7f4660668e80>