# Text To Video

In this notebook the project will be exposed with its normal operating mode.

First let's import the Generator Model.

In [1]:
from mocogan.models import VideoGenerator

n_channels      = 3
dim_z_content   = 50
dim_z_category  = 2 #101
dim_z_motion    = 10
video_length    = 16
cuda            = True

trained_classes = {"Surfing" : 1, "PlayingPiano": 2}

gen = VideoGenerator(n_channels, dim_z_content, dim_z_category, dim_z_motion, video_length, cuda, class_to_idx = trained_classes)

## Paths
Let's define paths to retrieve model state

In [2]:
import os

current_path = !pwd
current_path = current_path[0]

trained_path = os.path.join(current_path, 'mocogan', 'trained_models')

## Load saved State
Now load saved state.

In [3]:
from mocogan.trainer import loadState

loadState(80, gen, path = trained_path)

## Load LSTM Model 

Let's load LSTM model to get the category predicted from natural language

In [4]:
import torch.nn as nn
import os
from glob import glob
from TextToClass.models import LSTM
from TextToClass.dataloading import TextLoader

rnnType     = nn.LSTM
rnnSize     = 512
embedSize   = 256
itemLength  = 10
loadEpoch   = 75


dataset_path = os.path.join(current_path, 'caffe', 'examples', 's2vt', 'results', '[!val]*')
dataset_path = glob(dataset_path)[0]

dataset = TextLoader(dataset_path, item_length = itemLength)

network = LSTM(rnnType, rnnSize, embedSize, dataset.vocabulary )

network.loadState(loadEpoch)

## Generate Video from your input

Now put a input and let's generate a video. 

First cell will take your input and tell you the predicted class.

Second cell will generate and save the video.

In [5]:
import torch

humanDescription     = input('Put your input here: > ')

try:
    toForwardDescription = dataset.prepareTxtForTensor(humanDescription)
    results              = network(torch.tensor(toForwardDescription).unsqueeze_(0))
    _, actionIDx         = results.max(1)
    actionClassName      = dataset.getClassNameFromIndex(actionIDx.item() + 1)
    print(f'Predicted class is {actionClassName}')    
    
except KeyError as err:
    print('Sorry, that word is not in the vocabulary. Please try again.')

Put your input here: > A group of men are dancing and running on the ice
Predicted class is bandmarching


In [6]:
from mocogan.trainer import save_video

mean   = (100.99800554447337/255, 96.7195209000943/255, 89.63882431650443/255)
std    = (72.07041943699456/255, 70.41506399740703/255, 71.55581999303428/255)

gen      = gen.cuda()

n_videos = 1
n_frames = 25 * 3 # 3s

save_path =  current_path

actionIDx       = torch.tensor(dim_z_category - 2) if actionIDx.item() >= dim_z_category else actionIDx
actionClassName = gen.getCorrectClassName(actionIDx.item() + 1)

print(f'MoCoGAN was trained on less categories, the video class that will be created is {actionClassName}')

fakeVideo, _ = gen.sample_videos(n_videos, n_frames, [actionIDx.item() + 1])
fakeVideo    = fakeVideo[0].detach().cpu().numpy().transpose(1, 2, 3, 0)
save_video(fakeVideo, actionClassName, 0, std, mean, save_path)

MoCoGAN was trained on less categories, the video class that will be created is Surfing


In [8]:
from IPython.display import Video
import numpy as np

Video(f'./fake_{actionClassName}_epoch-0.mp4', embed= True)