# Human-motion Synthesis through Physically-inspired Machine Learning Models

#### Diego A. Agudelo España.

#### Tutor: Mauricio A. Álvarez, Phd.

## General Aim

* To develop a general methodology for computer-generation of human motion figures based on
mechanistically inspired machine learning methods. 

** Current Specific Aim **:
*  To develop a non-parametric sequential dynamical model for describing the time evolution of motor primitives in motion capture data and humanoid robotics performing different tasks.

Our methodologies will use general non-linear
regression methods that incorporate flexible and soft mechanistic assumptions for combining models
of basic movements either hierarchically or sequentially.

* **Data driven approach (weakly mechanistic)**:
    * if data is scarce in comparison with the model it could be unable to make accurate predictions.
* ** Purely mechanistic models (strongly mechanistic)**:
    * A mechanistic model can enable accurate predictions even in regions where there is no available training data but the model could be extremely complex.
* **  Proposed Approach (hybrid system) **:
    * Based on the observation that a weak mechanistic assumption underlie a data driven model.
    * The key is to retain sufficient flexibility in our model to be able to fit the system even when the mechanistic assumptions are not rigorously fulfilled.
    
What is motor primitive? is the basic building block complex movements are made up of.

## Proposed Approach 

* **Motor primitive representation**: Latent Force Models (LFM's).
* **Sequential dynamical model**: Hidden Markov Models (HMM's)

Mention the reason why a single LFM is not suitable for the problem of interest (discontinuities over latent forces and discrete changes, GP smoothnes).

## Second Order Latent Force Models.

$$\frac{d^2 y_d(t)}{dt} + C_d \frac{d y_d(t)}{dt} + B_d y(t) = \sum_{q=1}^Q S_{d,q} u_q(t)$$

If Gaussian process priors are assumed over the latent forcing functions with RBF covariance functions then the output functions are jointly governed by a Gaussian process as well. Formally,

$$Y \sim GP(0, K(x, x'; \Theta))$$

Where $Y$ represents the stacked system of output functions, $K$ represents the second order latent force model kernel (Alvarez et al, 2009) and $\Theta$ represents the set of kernel hyperparameters (i.e. differential equation coefficients and lengthscales)

* Gaussian Process priors with RBF covariance are assumed over the latent forcing functions.
* In this models the system is being forced by latent functions (LFM).
* The general framework in LFM is to combine a mechanistic model with a probabilistic prior over a latent variable/function.
*  allows combining dimensionality reduction with systems of differential equations

## HMM + LFM.

The whole system is modelled as a Hidden Markov Model (**HMM**) where each hidden state represents a different motor primitive. The emission distribution for each hidden state is represented by a Latent Force Model (**LFM**). The corresponding graphical model is the following:

<img src="http://d29qn7q9z0j1p6.cloudfront.net/content/roypta/371/1984/20120222/F4.large.jpg?width=800&height=600&carousel=1" >

$$\text{Figure from (Bishop, 2013).}$$

## HMM + LFM.

For now the following assumptions are made:

* There is only one latent force governing the movements.
* The number of sample locations belonging to each LFM is the same and is denoted by $N$. All of them are assumed to be equally spaced.
* The number of segments is fixed and is equal to $W$.
* Each segment (i.e hidden state, LFM) can potentially have a different set of parameters (e.g. damper constants, spring constants, etc.)
* The hidden state $z_n$ only depends on the state $z_{n-1}$ and this dependency is represented via the transition probability distribution $A_{z_{n-1}}$.
* The emission $Y_i = \{y^i_1, y^i_2, \dots, y^i_N\}$ for the sample locations $X_i = \{x^i_1, x^i_2, \dots, x^i_N\}$ only depends on the current hidden state $z_i$.
   

## HMM + LFM.

Formally,

The joint probability of $Y = \{ Y_1, Y_2, \dots, Y_W \}$ and $Z = \{ z_1, z_2, \dots, z_W \}$ given $X = \{ X_1, X_2, \dots, X_W \}$ and $\Theta = \{ \theta_1, \theta_2, \dots, \theta_W \}$.

$$p(Z, Y | X, \Theta) = p(z_1 | s) p(Y_1 | z_1, \theta_1, X_1) \prod_{i = 2}^{W} p(z_i | z_{i - 1}, A) p(Y_i | z_i, \theta_i, X_i)$$

where $A$ is the hidden state transition probabiliy matrix, $s$ is initial state probability distribution and $\theta_i$ is the set of parameters which characterize the i-th LFM.

The emission process is performed as follows:

$$p(Y_i | z_i, \theta_i, X_i) = \mathcal{N}(f_i(X_i), I \sigma^2)$$

where:

$$f_i(x) \sim \mathcal{GP}(0, k_f(x, x'; \theta_i))$$.

and $k_f$ represents the second order LFM kernel.


## How to infer the model from data?

Maximization of the expected marginal log likelihood over $\Theta$ (this implies taking kernel derivatives w.r.t. hyperparameters). 

$$ \ln{p(Z, Y | X, \Theta)} = \sum_{k=1}^K z_{1,k} \ln{s_k} + \sum_{n=2}^W \sum_{k=1}^K \sum_{j=1}^K  z_{n, i} z_{n - 1, j} \ln{A_{j,k}} + \sum_{n=1}^W \sum_{k=1}^K z_{n,k} \ln{p(Y_n |\theta_k, X_n)} $$

The maximization w.r.t parameters $s_k$ and $A_{j,k}$ is performed using the standard EM algorithm. Notice that parameters are not coupled in the optimization process.

## Maximization step for emission parameters.

Here the aim is to optimize the following term w.r.t. the hyperparameters $\theta_k$:

$$ E = \sum_{k=1}^K  \sum_{n=1}^W \gamma(z_{n,k}) \ln{p(Y_n |\theta_k, X_n)} $$

where:

$$\gamma(z_{i,j}) = \langle z_{i,j} \rangle_{p(Z | X, \Theta^{old})} = p(z_{i,j} = 1 | X, \Theta^{old} ) = p(z_i = j | X, \Theta^{old} )$$

In order to use gradient ascent is necessary to take derivatives w.r.t the kernel hyperparameters. Assuming $\theta_k = \{\theta_{k,1}, \dots, \theta_{k,h} \}$ we have (Rassmussen, 2006) <cite data-cite="rasmussen2006gaussian"></cite>:

$$ \frac{\partial E}{\partial \theta_{k,j}} =  \sum_{n=1}^W \gamma(z_{n,k}) \frac{\partial \ln p(Y_n | \theta_k, X_n)}{\partial \theta_{k,j}}$$

$$ \text{with}$$

$$ \frac{\partial \ln p(Y_n | \Theta_k, X_n)}{\partial \theta_{k,j}} = \frac{1}{2} tr \left( (\alpha \alpha^T - K_y^{-1}) \frac{\partial K_y}{\partial \theta_{k,j}} \right) \text{ with } \alpha = K_y^{-1} Y_n$$

It is clear that the result of $\frac{\partial K_y}{\partial \theta_{k,j}}$ is an elementwise derivative and depends on the specific form of the kernel $k_f$.

## Currently working in

* Inferring the standard Latent Force Model (LFM) parameters from data.
* Plugging the LFM inference over the standard EM algorithm for HMM.