## Learning to read faces.

In the previous example, we saw how to hand-specify a "theory" of how people feel in response to outcomes. But it is impossible to hand-specify other processes, such as how emotions translate into facial expressions. We can posit a theory that feeling an emotion may 'cause' someone to display some facial expressions (e.g. *happy* 'causes' smiling, *anger* 'causes' a furrowing of the eyebrows), but it may not be practical to spell out the possible sets of muscular configurations---as well as variance in these configurations---that accompany each emotion. Thus, from a computational standpoint, we also have to be able to learn how to "read" unstructured data like images of faces (or audio, or video, or text).

In this example, we extend the model from the previous example to demonstrate how we can learn to "read" emotions from faces using deep neural networks, but within a probabilistic generative model.

We consider the following graphical model:

<div style="width: 300px; margin: auto;">![Graphical Model](images/graphicalModel_SSVAE.png)</div>

(For simplicity, we have left out the other parameters in the "appraisal" part of the model)

We have added an observed variable, **Facial Expressions**, which are 'caused' by **Emotion Ratings**. The transformation from Ratings to Expressions, $P_{\theta}(\text{face }|\text{ ratings})$ is parameterized by $\theta$. But the face contains many other aspects that are not determined by the emotion, such as the face shape, gender, and race, and it would be ideal to try to model those aspects separately from the face. One way to do this is to add an additional latent variable, $z$, to capture these non-emotional features, and then try to learn a model $P_{\theta}(\text{face }| \text{ rating}, z)$.


The parameters $\theta$ can be learnt using a technique called stochastic variational inference (SVI), which we saw briefly in the last example as well. 


If we just had the model $z \rightarrow \text{face}$, which is a deep generative model with a latent cause, then we can fit this using a technique called variational inference: this would be an example of a variational autoencoder (VAE; Kingma & Welling, 2014).

The pa- rameters 𝜃 can be learnt via stochastic variational infer- ence (SVI) [32]. Modern probabilistic programming lan- guages are able to perform SVI automatically with a small amount of input from the modeler. SVI historically re- quired the derivation of a quantity called the evidence lower bound (ELBO)—the ELBO is maximized during training, much like how a loss function is minimized dur- ing many machine learning approaches. In practice, the ELBO contains the posterior distribution (e.g. P(z | face)), which is often intractable, but can be approximated with variational distributions (in our case, q(z | face)) that can also be parameterized by neural networks. Trained in this manner, the model is a semi-supervised variant of the variational autoencoder (VAE) [39], a popular generative model that has received significant attention in the deep learning community.

In [None]:
from __future__ import division, print_function, absolute_import

%matplotlib inline

import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader

import pyro
import pyro.distributions as dist
from pyro.distributions import Normal
from pyro.infer import SVI, Trace_ELBO
from pyro.optim import Adam

#### Dataset

We will be using the same dataset as the previous example. Now, we will consider the trials in which participants only saw a facial expression, and rated how they thought the character feels, or what we call the "facial expression only" trials.

Here is an example face

In [None]:
%matplotlib inline

import os

import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
from torch.autograd import Variable
from torch.nn import functional as F

import pyro
import pyro.distributions as dist
from pyro.infer import SVI, Trace_ELBO
from pyro.optim import Adam
from pyro.infer import SVI, Trace_ELBO, TraceEnum_ELBO, config_enumerate

from torchvision.transforms import ToPILImage
from skimage import io, transform
from torch.utils.data import Dataset, DataLoader
from torchvision import transforms, utils, datasets

from visdom import Visdom

#from utils.vae_plots import plot_llk, plot_vae_samples

import matplotlib.pyplot as plt
from scipy.special import expit

import pandas as pd

from PIL import ImageMath 
from PIL import Image

from pyro.contrib.examples.util import print_and_log, set_seed
import pyro.poutine as poutine

from utils.custom_mlp import MLP, Exp
from utils.mnist_cached import  mkdir_p, setup_data_loaders
#from utils.vae_plots import plot_conditional_samples_ssvae, plot_vae_samples

In [None]:
# display imagefile
facedatadir = os.path.join(os.path.abspath('..'), "CognitionData", "more_faces", "small")
t1 = ToTensor(io.imread(os.path.join(facedatadir, "imgs", "f_0.png")))
#t1 = ToTensor(io.imread(os.path.join(datadir, "imgs", "face_3_0_1.png")))
ToPILImage(t1)

In [None]:
class SemiSupervisedVAE():
    def condition(self, outcome, emotion, image):
        # generate a new emotion from outcome
        prediction_mean = self.outcomesToEmotion(outcomes)     
        # condition on the observed data
        emo = pyro.sample("emo", Normal(prediction_mean, 1), obs=emotion)
        # sample z given priors
        z = pyro.sample("z", dist.Normal(prior_location, prior_scale))
        # generate the face using emotion and z
        # conditioned on observed image
        zAndEmo = torch.cat((z, emo), 1)
        loc = self.zAndEmosToFace_Decoder(zAndEmo)           
        pyro.sample("face", dist.Bernoulli(loc), obs=image)


Can separate the auxillary code / helper functions into a separate .py file that is imported.