# Text 2 Video 

In this notebook the necessary dependencies are imported to show the results of the Conditioning of MoCoGAN.

## Import dependencies

In the first cell the necessary models and states are imported.

In [1]:
from mocogan.models import Generator_I, GRU

print(Generator_I.__doc__)


        Constructor
        -----------
        The constructor of Generator_I takes 6 arguments, all optional.
        
        nc:         integer, default= 3
            Num channels of the image to produce.
        
        ngf:        integer, default= 64
            Parameter of the ConvTranspose2d Layers.
        
        nz:         integer, default= 60
            Number of samples for the noise.
            
        ngpu:       integer, default= 1
            Number of GPU on which the model will run.
            
        nClasses:   integer, default= 102
            Number of classes on which the Embedding module will work.
            
        batch_size: integer, default = 16
            Batch size for each argument that will be passed to the model.
            
    
    


## Explanation

Docstring has been added to the Generator_I class to make it possible to have more information about constructor and so on. 

As it is possible to see, all arguments are optional when working with the default parameters from the creators of MoCoGAN.

Is enough to keep default parameters apart from `batch_size` and create the object Generator_I.

Next cell will print the informations on how the model forwards information into layers.


Be aware that the `forward` method of the Generator requires two arguments: `noise` and `label`.


1. The first `Sequential` will put the labels passed as arguments (Action Classes) into an Embedding layer, giving as output $\dfrac{nClasses}{16}$ features that will then pass through a Fully Connected Layer and then to a Rectified Linear Unit.

2. Before passing input data into the second `Sequential` layer, input noise and output of the first pass will be concatenated. Then they are passed through a Fully Connected Layer that will output $ngf \times 4$ features to the Transposed Convolution network.

3. The last layer is equivalent to that of Vanilla MoCoGAN, it apply TransposedConvolution to get an image as output.


In [2]:
gen_i = Generator_I(batch_size = 16)
gen_i

Generator_I(
  (label_sequence): Sequential(
    (0): Embedding(102, 6)
    (1): Linear(in_features=6, out_features=60, bias=True)
    (2): ReLU(inplace)
  )
  (combine_sequence): Sequential(
    (0): Linear(in_features=272, out_features=256, bias=True)
  )
  (main): Sequential(
    (0): ConvTranspose2d(60, 512, kernel_size=(6, 6), stride=(1, 1), bias=False)
    (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU(inplace)
    (3): ConvTranspose2d(512, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
    (4): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (5): ReLU(inplace)
    (6): ConvTranspose2d(256, 128, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
    (7): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (8): ReLU(inplace)
    (9): ConvTranspose2d(128, 64, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
    (10

## Noise

In the following cells methods to create noise to provide as input to the network are defined.

In [3]:
import os
import torch
import skvideo.io
import numpy as np
import torch.nn as nn

'''Define trained_path'''
trained_path = !pwd
trained_path = os.path.join(trained_path[0], "mocogan", "trained_models")

'''Import variables from train.py'''
img_size = 96
nc = 3
ndf = 64 # from dcgan
ngf = 64
d_E = 10
hidden_size = 100 # guess
d_C = 50
d_M = d_E
nz  = d_C + d_M
criterion = nn.BCELoss()

T = 16 # Hyperparameter for taking #Frames into discriminator.
n_frames = 4 * 25 # 4 seconds of video

batch_size = 16 #We only want one video to produce.

cuda = torch.cuda.is_available()

gru = GRU(d_E, hidden_size, gpu = cuda)

# for input noises to generate fake video
# note that noises are trimmed randomly from n_frames to T for efficiency
def trim_noise(noise):
    
    start = np.random.randint(0, noise.size(1) - (T+1))
    end = start + T
    
    return noise[:, start:end, :, :, :]


def gen_z(n_frames, batch_size = batch_size):
    
    z_C = Variable(torch.randn(batch_size, d_C))
    #  repeat z_C to (batch_size, n_frames, d_C)
    z_C = z_C.unsqueeze(1).repeat(1, n_frames, 1)
    eps = Variable(torch.randn(batch_size, d_E))
    if cuda == True:
        z_C, eps = z_C.cuda(), eps.cuda()

    gru.initHidden(batch_size)
    # notice that 1st dim of gru outputs is seq_len, 2nd is batch_size
    z_M = gru(eps, n_frames).transpose(1, 0)
    z = torch.cat((z_M, z_C), 2)  # z.size() => (batch_size, n_frames, nz)
    
    return z.view(batch_size, n_frames, nz, 1, 1)


def save_video(fake_video, actionClass, baseDir):
    outputdata = fake_video * 255
    outputdata = outputdata.astype(np.uint8)
    dir_path = os.path.join(baseDir, 'mocogan', 'generated_videos')
    file_path = os.path.join(dir_path, f'{actionClass}.mp4')
    skvideo.io.vwrite(file_path, outputdata)

In [4]:
'''Load trained weights'''
def load():
    #dis_i.load_state_dict(torch.load(trained_path + '/Discriminator_I.model'))
    #dis_v.load_state_dict(torch.load(trained_path + '/Discriminator_V.model'))
    gen_i.load_state_dict(torch.load(trained_path + '/Generator_I_epoch-44.model'))
    gru.load_state_dict(torch.load(trained_path + '/GRU_epoch-44.model'))
    #optim_Di.load_state_dict(torch.load(trained_path + '/Discriminator_I.state'))
    #optim_Dv.load_state_dict(torch.load(trained_path + '/Discriminator_V.state'))
    #optim_Gi.load_state_dict(torch.load(trained_path + '/Generator_I.state'))
    #optim_GRU.load_state_dict(torch.load(trained_path + '/GRU.state'))

    
load()
'''Move models to GPU'''
if cuda:
    gen_i.cuda()
    gru.cuda()
    
'''Change to evaluation mode'''
gen_i.eval(); gru.eval()

GRU(
  (gru): GRUCell(10, 100)
  (drop): Dropout(p=0)
  (linear): Linear(in_features=100, out_features=10, bias=True)
  (bn): BatchNorm1d(10, eps=1e-05, momentum=0.1, affine=False, track_running_stats=True)
)

## Put input

In the following cell you can decide to put a number between 1 and 101 or to put a class name.

In [5]:
try:
    actionClass = int(input("Choose a number between 1 and 101 or just put a name.\n"))
    assert actionClass > 0 and actionClass < 102
    
except ValueError as _:
    #Check if actionClass is contained into the dictionary
    pass

except AssertionError as _:
    print("Please put a Number between 1 and 101.")

Choose a number between 1 and 101 or just put a name.
3


In [6]:
from torch.autograd import Variable

fakeLabels = torch.randint(0, 101, tuple([batch_size]), dtype=torch.long)
fakeLabels[0] = actionClass

label = fakeLabels#torch.tensor([actionClass], dtype=torch.long)

if cuda:
    label = label.cuda()

batch_size = 16
    
Z = gen_z(n_frames, batch_size)  # Z.size() => (batch_size, n_frames, nz, 1, 1)
# trim => (batch_size, T, nz, 1, 1)
Z = trim_noise(Z)
# generate videos
Z = Z.contiguous().view(batch_size*T, nz, 1, 1)

fake_videos = gen_i(Z, label)
print(fake_videos.shape)
fake_videos = fake_videos.view(batch_size, T, nc, img_size, img_size)
# transpose => (batch_size, nc, T, img_size, img_size)
fake_videos = fake_videos.transpose(2, 1)
# img sampling
fake_img = fake_videos[:, :, np.random.randint(0, T), :, :]

torch.Size([256, 3, 96, 96])


In [7]:
currentDir = !pwd
currentDir = currentDir[0]

for idx, video in enumerate(fake_videos):
    save_video(video.data.cpu().numpy().transpose(1, 2, 3, 0), f"{actionClass}-{idx}", currentDir)