# Objective: generate video from text description

- Download dataset UCF-101 https://www.crcv.ucf.edu/data/UCF101.php. It's an action recognition dataset, with input videos and corresponding action classes.

- Download MoCoGAN code (conditional GAN model for video generation, with categorical condition): https://github.com/DLHacks/mocogan

- Train MoCoGAN on UCF-101: the resulting model will be able to generate videos from action classes

- Download S2VT pre-trained model (video-to-text model): https://vsubhashini.github.io/s2vt.html

  - Note: S2VT uses the Caffe library. Install the library (you don't need to train the model).

- Process each video in UCF-101 and get the corresponding text description. Create a dataset with input=text description and output=action class.

- Train an LSTM classifier (similar to the one used in class for sentiment analysis) to classify text descriptions into actions. Report the performance.

- The final model works by: getting an input text description by the user, converting it into an action class with the LSTM model, and using the action class as a condition to MoCoGAN.

Note: the model will not be precise in generating video details, since conditioning is based on the class only. 
For example:

"I'm running in the park" -> action: running

"My dog is running on the beach" -> action: running

Output videos will not actually take into account the context, but only the action.

In [9]:
import os
import torch
import numpy as np
from glob import glob
from torch.utils.data import DataLoader

import skvideo.io

In [6]:
data_dir = "/home/carlo/Documents/Cognitive Computing/Text2VideoGAN/mocogan/resized_data"
image_dir_name = "*"
image_paths = glob(os.path.join(data_dir, image_dir_name, "*"))

if (len(image_paths) < 1):
    data_dir = "/home/carlo/Documenti/Text2VideoGAN/mocogan/resized_data"
    image_paths = glob(os.path.join(data_dir, image_dir_name, "*"))
    
image_paths[0]

'/home/carlo/Documenti/Text2VideoGAN/mocogan/resized_data/Haircut/v_Haircut_g20_c05.mp4'

In [28]:
original_video = skvideo.io.vread(image_paths[0], 96, 96)
video =  original_video.transpose(3, 0, 1, 2) / 255.0

## Trying to display a video changed by transposing it

In [36]:

dirToSave = "./"
fileName = "transposedVideo.mp4"
#filepath = os.path.join(dirToSave, fileName)

toSaveVideo = original_video.astype(np.uint8)
skvideo.io.vwrite(fileName, toSaveVideo)



## Import the necessary dependencies to train the model

In [7]:
import imageio
import pylab
import math
import numpy as np
import skvideo.io
from glob import glob
filenames = glob("/home/carlo/Documenti/Text2VideoGAN/mocogan/resized_data/*/*")

def readImageio(filename, frame= 0,show= False):

    reader = imageio.get_reader(filename,  'ffmpeg')
    
    #nframes = math.floor(vid.get_meta_data()['fps'] * vid.get_meta_data()['duration'])
    #shape = vid.get_meta_data()['size']
    #n_channels = 3

    #for i, im in enumerate(vid):
        #print(i, type(im), type(np.asarray(im)), np.asarray(im).shape)

    image = reader.get_data(frame)
        
    if show:
        fig = pylab.figure()
        fig.suptitle('image', fontsize=20)
        pylab.imshow(image)
        pylab.show()
        
    return np.asarray(image)

def composeVideoImageio(filename):
    
    reader = imageio.get_reader(filename,  'ffmpeg')
    
    nframes = math.ceil(reader.get_meta_data()['fps'] * reader.get_meta_data()['duration'])
    shape = reader.get_meta_data()['size']
    
    videodata = np.empty((nframes, shape[0], shape[1], 3))
    
    for idx, img in enumerate(reader):
         videodata[idx, :, :, :] = img
            
    return videodata
        

res = composeVideoImageio(filenames[0])
res_ = skvideo.io.vread(filenames[0])
print(res_.shape, res.shape)

"""for arrs in (res - res_):
    for arr in arrs:
        for els in arr:
            for el in els:
                assert el == 0.0"""

skmean = res_.mean()
imgmean = res.mean()

print(f"Imageio: {imgmean}, Skvideo: {skmean}")

assert np.isclose(skmean - imgmean, 0)

(201, 96, 96, 3) (201, 96, 96, 3)
Imageio: 79.99903747322416, Skvideo: 79.99903747322416
