# Predicting from a pre-trained motion generating model
In this notebook we will generate motion sequences from a Mixture Density Reccurent Neural Network (MDRNN).


It is common to explore multiple sampling approaches when generating output from generative deep neural networks for creative  applications.   Choosing  suitable  sampling  parameters  can  make  or  break  the  realism  and  perceived  creative merit of the output. The process of selecting the correct sampling parameters is often task-specific and under-reported in many  publications,  which  can  make  the  reproducibility  of the results challenging.  

Here, we will explore some of the most common sampling techniques in the context of generating human body movement, specifically dance movement  The  notebook loads an MDRNN trained on a dataset of improvised dance motion capture data from which it is possible to generate novel movement sequences. Systematically examining the different sampling strategies allows us to further the understanding of how the sampling parameters affect motion generation, which provides evidencefor utility in creative applications.

In [1]:
from tensorflow.compat.v1 import keras
from tensorflow.compat.v1.keras import backend as K
from tensorflow.compat.v1.keras.layers import Dense, Input
import tensorflow.compat.v1 as tf
import numpy as np
import mdn
import datetime

import csv
from IPython.display import HTML, display

## Load a small dataset to use when generating new motion

In [2]:
datafolder = 'examples.npz'

loaded = np.load(datafolder)
ex = loaded['x']

print('Loaded ', ex.shape[0], 'examples.')

Loaded  14 examples.


## Load the pre-trained model

In [3]:
HIDDEN_UNITS1 = 1024 
HIDDEN_UNITS2 = 512
HIDDEN_UNITS3 = 256
N_MIXES =  4 # number of mixture components
INPUT_DIMS = 66 # 22 joints * 3 
OUTPUT_DIMS = 66  # Number of real-values predicted by each mixture component
SEQ_LEN = 256 # Number of frames in an example
freq = 30 # frame rate of data

opt = keras.optimizers.Adam(learning_rate=1e-5)

model_name = 'trained_mdrnn'


decoder = keras.Sequential()
decoder.add(keras.layers.LSTM(HIDDEN_UNITS1, batch_input_shape=(1,SEQ_LEN,INPUT_DIMS), return_sequences=True, stateful=True))
decoder.add(keras.layers.LSTM(HIDDEN_UNITS2, batch_input_shape=(1,SEQ_LEN,INPUT_DIMS), return_sequences=True, stateful=True))
decoder.add(keras.layers.LSTM(HIDDEN_UNITS3, stateful=True))
decoder.add(mdn.MDN(OUTPUT_DIMS, N_MIXES))
decoder.compile(loss=mdn.get_mixture_loss_func(OUTPUT_DIMS,N_MIXES), optimizer=opt)
decoder.summary()

decoder.load_weights(model_name+'.h5') # load weights independently from file

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
lstm (LSTM)                  (1, 256, 1024)            4468736   
_________________________________________________________________
lstm_1 (LSTM)                (1, 256, 512)             3147776   
_________________________________________________________________
lstm_2 (LSTM)                (1, 256)                  787456    
_________________________________________________________________
mdn (MDN)                    (1, 532)                  136724    
Total params: 8,540,692
Trainable params: 8,540,692
Non-trainable params: 0
_________________________________________________________________


## Functions for generating predictions and animations

In [4]:
def shift(arr, num, fill_value=np.nan):
    result = np.empty_like(arr)
    if num > 0:
        result[:num] = fill_value
        result[num:] = arr[:-num]
    elif num < 0:
        result[num:] = fill_value
        result[:num] = arr[-num:]
    else:
        result[:] = arr
    return result


def predict_sequence(model, advancing=False, pi=1e-10, sigma=1e-10, frames=256, primer_idx=0, select_mix=False, use_priming=False, mix=0):
    motion = []
    idx = primer_idx
    pred_on = ex[idx,:,:] # starting pose 
    
    
    for i in range(frames):
        reshaped_pred_on = tf.reshape(pred_on,[1,SEQ_LEN,OUTPUT_DIMS])
        params = decoder.predict(reshaped_pred_on, steps=1)
        
        if select_mix:
            pred = mdn.sample_from_output_select_mix(params[0], OUTPUT_DIMS, N_MIXES, temp=pi, sigma_temp=sigma, mix=mix)
        else:
            pred = mdn.sample_from_output(params[0], OUTPUT_DIMS, N_MIXES, temp=pi, sigma_temp=sigma)
            
        motion.append(pred.reshape((OUTPUT_DIMS,)))
        
        if use_priming:
            if i%SEQ_LEN==0:
                idx += 1
            pred_on = shift(pred_on, -1, fill_value=ex[idx,i%SEQ_LEN,:])
        else:
            pred_on = shift(pred_on, -1,  fill_value=pred)
    
    motion = np.array(motion)
    
    date_string = datetime.datetime.today().strftime('%Y%m%d')
    fn = date_string+ '-pi_temp-' +str(pi) + '-sig_temp-' + str(sigma) + "-mix-" + str(mix) + "-primer_idx-" + str(primer_idx)
    if use_priming:
        fn = fn+'-priming_experiment'
    if select_mix:
        fn = fn+'-mix_experiment'
    
    print('Generated motion sequence with filename ', fn)
    write_ex_to_tsv(motion,fn)
    
    return fn


In [5]:
def write_ex_to_tsv(ex, fn):
    num_frames = ex.shape[0]
    marker_names = ['MARKER_NAMES','Head','neck','lsho','lelb','lwri','lhan','rsho','relb','rwri','rhan','t10','root','lhip','lknee','lank','lfoot','ltoe','rhip','rknee','rank','rfoot','rtoe']
    
    
    with open('tsv/'+fn+'.tsv', 'wt') as out_file:
        tsv_writer = csv.writer(out_file, delimiter='\t')
        tsv_writer.writerow(['NO_OF_FRAMES', num_frames])
        tsv_writer.writerow(['NO_OF_CAMERAS', 0])
        tsv_writer.writerow(['NO_OF_MARKERS', 22])
        tsv_writer.writerow(['FREQUENCY', freq])
        tsv_writer.writerow(['NO_OF_ANALOG', 0])
        tsv_writer.writerow(['ANALOG_FREQUENCY', 0])
        tsv_writer.writerow(['DESCRIPTION--', ''])
        tsv_writer.writerow(['TIME_STAMP--', ''])
        tsv_writer.writerow(['DATA_INCLUDED', '3D'])
        tsv_writer.writerow(marker_names)
        
        for frame in range(num_frames):
            tsv_writer.writerow(ex[frame,:])

In [6]:
def display_animation(var_mp4):
    var_mp4 = 'mp4/' + var_mp4 + '.mp4'
    link_t = "<div align='middle'><video width='80%' controls><source src='{href}' type='video/mp4'></video></div>"
    html = HTML(link_t.format(href=var_mp4))
    display(html)

# Sampling with temperature adjustment
When sampling from our trained model we can choose to alter the value of two temperature parameters, the $\pi$-temperature and the $\sigma$-temperature. 



### Examining $\pi$:
The $\pi$-temperature adjusts the probability of selecting a certain mixture component.
By reweighting the mixture components we make it more, or less, likely that we will sample from a given component. High $\pi$-temperatures reweight the probability of sampling from each component in such a way that each component is an equally likely choice, while sufficiently low temperatures will ensure that only a single component is selected, as it will have a probability of 1 while all other components have a probability of 0.


Here we sample from the MDN with three different values for $\pi$. The $\sigma$ temperature and starting frame for each example is kept the same.


#### Low $\pi$

In [7]:
low_pi = predict_sequence(decoder,frames=256,pi=1e-10,sigma=1e-10, primer_idx=10)

Generated motion sequence with filename  20200501-pi_temp-1e-10-sig_temp-1e-10-mix-0-primer_idx-10


In [8]:
%get low_pi
build_animation(low_pi);

In [9]:
display_animation(low_pi)

#### Medium $\pi$

In [10]:
mid_pi = predict_sequence(decoder,frames=256,pi=0.5,sigma=1e-10, primer_idx=10)

Generated motion sequence with filename  20200501-pi_temp-0.5-sig_temp-1e-10-mix-0-primer_idx-10


In [11]:
%get mid_pi
build_animation(mid_pi);

In [37]:
display_animation(mid_pi)

#### High $\pi$

In [13]:
high_pi = predict_sequence(decoder,frames=256,pi=10,sigma=1e-10, primer_idx=10)

Generated motion sequence with filename  20200501-pi_temp-10-sig_temp-1e-10-mix-0-primer_idx-10


In [14]:
%get high_pi
build_animation(high_pi);

In [38]:
display_animation(high_pi)

### Examinig $\sigma$:
Here we sample from the MDN with different values for $\sigma$. 
The previous section showed the result of sampling with a low $\sigma$-temperature, so we will skip that here.
The $\pi$ temperature and starting frame for each example is kept the same.


Adjusting the $\sigma$-temperature affects the width of each mixture component by scaling the learned distribution by this temperature parameter. A high $\sigma$-temperature allows for samples further from the learned mean of each mixture component to be selected. 

#### Medium $\sigma$

In [16]:
mid_sig = predict_sequence(decoder,frames=256,pi=1e-10,sigma=1e-4, primer_idx=10)

Generated motion sequence with filename  20200501-pi_temp-1e-10-sig_temp-0.0001-mix-0-primer_idx-10


In [17]:
%get mid_sig
build_animation(mid_sig);

In [39]:
display_animation(mid_sig)

#### High $\sigma$

In [19]:
high_sig = predict_sequence(decoder,frames=256,pi=1e-10,sigma=10, primer_idx=10)

Generated motion sequence with filename  20200501-pi_temp-1e-10-sig_temp-10-mix-0-primer_idx-10


In [20]:
%get high_sig
build_animation(high_sig);

In [40]:
display_animation(high_sig)

# Sampling from isolated mixture components
For these experiments, we disregard the $\pi$-temperature and instead manually select which of the 4 mixture components to sample from.
This ensures that each new frame is sampled from a single component.
We observed in the previous section that the entire position of the body changed as we sampled with a higher $\pi$-temperature, indicating that individual components emphasise different features.


In order to examine this more closely the $\sigma$-temperature is kept at a low value to make certain that we sample close to the mean of each component and each sequence is given the same starting position.

In [22]:
fnmix0 = predict_sequence(decoder,frames=256, select_mix=True, mix=0, primer_idx=10)
fnmix1 = predict_sequence(decoder,frames=256, select_mix=True, mix=1, primer_idx=10)
fnmix2 = predict_sequence(decoder,frames=256, select_mix=True, mix=2, primer_idx=10)
fnmix3 = predict_sequence(decoder,frames=256, select_mix=True, mix=3, primer_idx=10)

Generated motion sequence with filename  20200501-pi_temp-1e-10-sig_temp-1e-10-mix-0-primer_idx-10-mix_experiment
Generated motion sequence with filename  20200501-pi_temp-1e-10-sig_temp-1e-10-mix-1-primer_idx-10-mix_experiment
Generated motion sequence with filename  20200501-pi_temp-1e-10-sig_temp-1e-10-mix-2-primer_idx-10-mix_experiment
Generated motion sequence with filename  20200501-pi_temp-1e-10-sig_temp-1e-10-mix-3-primer_idx-10-mix_experiment


In [23]:
%get fnmix0 fnmix1 fnmix2 fnmix3
build_animation(fnmix0);
build_animation(fnmix1);
build_animation(fnmix2);
build_animation(fnmix3);

In [24]:
display_animation(fnmix0)
display_animation(fnmix1)
display_animation(fnmix2)
display_animation(fnmix3)

# Sampling with priming
When generating motion with priming, a movement sequence which has not been used in training is given as input to the model. The next frame is then generated and the process is repeated. The model always predicts the next frame for a previously unseen real sequence, as opposed to non-primed sampling, wherein the models previous predictions become part of the sequence used to generate the following frame. The examples used for priming here are taken from two performances by different individuals and were withheld during training.


The first example, *primer A*, was performed to rhythmical musical stimuli with a strong beat presence. 
The second example, *primer B*,  was performed to a slow, non-rhythmic musical stimuli. 

- primer idx 0-6 = Primer A. Performed to an upbeat song with a strong pulse.
- primer idx 7-13 = Primer B. Performed to a slow, melodic song without a strong pulse.

## Original recordings: Primer A

In [25]:
write_ex_to_tsv(ex[4,:,:],'primerA')
primerA = 'primerA'

In [26]:
%get primerA
build_animation(primerA);

In [41]:
display_animation('primerA')

### Primed on A

In [28]:
primed_on_A = predict_sequence(decoder,frames=256, use_priming=True, primer_idx=3)

Generated motion sequence with filename  20200502-pi_temp-1e-10-sig_temp-1e-10-mix-0-primer_idx-3-priming_experiment


In [29]:
%get primed_on_A
build_animation(primed_on_A);

In [30]:
display_animation(primed_on_A)

## Original recordings: Primer B

In [31]:
write_ex_to_tsv(ex[8,:,:],'primerB')
primerB = 'primerB'

In [32]:
%get primerB
build_animation(primerB);

In [33]:
display_animation('primerB')

### Primed on B

In [34]:
primed_on_B = predict_sequence(decoder,frames=256, use_priming=True, primer_idx=7)

Generated motion sequence with filename  20200502-pi_temp-1e-10-sig_temp-1e-10-mix-0-primer_idx-7-priming_experiment


In [35]:
%get primed_on_B
build_animation(primed_on_B);

In [36]:
display_animation(primed_on_B)