## Learning to read emotions in words using a Semi-Supervised Variational Autoencoder

In this example, we use a semi-supervised variant of the VAE to train the model with emotion ratings as well.

This example is adapted from the Pyro [SSVAE Tutorial](http://pyro.ai/examples/ss-vae.html)

The difference from the normal VAE is that now there is an additional observed **Emotion Ratings**. The parameters $\theta$ now parameterize the transformation from **ratings** and $z$ to **word embeddings** $p_{\theta}(\text{embedding}| \text{ rating}, z)$.

One way to think about this model, coming from the VAE, is that the addition of the emotion ratings will encourage the model to learn $\theta$ that reflect variance in the word embeddings due to emotions.

This model is a **semi-supervised variant of the VAE**, or SSVAE (Kingma et al, 2014; Siddharth et al, 2017).
We discuss the SSVAE formulation at a high level, below, and we invite the interested reader to check the reference list at the bottom for the more theoretical details of inference in the SSVAE.) 

We will be using the same Dataset as before: we have a set of (Face, Rating) paired observations (with ~18 unique faces) and a set of 203 unlablled Faces we used for the VAE. 

### Semi-supervised VAE

The model is similar to the VAE in that there are `encoders` and `decoders`. The main difference is that, in addition to the `decoder`, there are now 2 `encoders`:

- `decoder` "goes from" the (latent $z$ and ratings) to word embeddings, i.e. $p_\theta(\text{embedding }|z, \text{ratings})$)
- `encoder_y` "goes from" the embedding to the ratings, i.e., $q(\text{ratings } | \text{ embedding})$
- `encoder_z` "goes from" (embedding and ratings) to the latent $z$, i.e., $q(z, | \text{ ratings, embedding})$

In our SSVAE model, we have a ``.model()`` and ``.guide()`` that are similar to the VAE: they define the reconstruction loss that the model is trained against (Kingma & Welling, 2014). 

We add an additional ``.model_rating()`` and ``.guide_rating()`` to add a "supervised" loss to guide the model to learn from the supervised examples (Kingma et al, 2014).

```
class SemiSupervisedVAE():
    def model(self, emotion, embedding):
        # condition on the observed data
        emo = pyro.sample("emo", Normal(emo_prior_loc, emo_prior_scale), obs=emotion)
        # sample z given priors
        z = pyro.sample("z", dist.Normal(prior_location, prior_scale))
        # generate the face using emotion and z, and condition on observed image
        loc = self.decoder(torch.cat((z, emo), 1))           
        pyro.sample("word", dist.Bernoulli(loc), obs=embedding)
        
    def model_rating(self, embedding, emotion=None):
        # Extra term to yield an auxiliary loss that we do gradient descent on
        if emotion is not None:
            emo_mean, emo_scale = self.encoder_y(embedding)
            pyro.sample("emo_aux", dist.Normal(emo_mean, emo_scale), obs=emotion)

```

##### Preamble

This first chunk of code imports the necessary python packages and functions that we will use

In [33]:
#from __future__ import division, print_function, absolute_import
from __future__ import print_function

%matplotlib inline

import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import torch
import torch.nn as nn
import torch.optim as optim
from torch.autograd import Variable
from torch.nn import functional as F
from torch.utils.data import Dataset, DataLoader

import pyro
import pyro.distributions as dist
from pyro.distributions import Normal
from pyro.infer import SVI, Trace_ELBO
from pyro.optim import Adam


from torchvision import transforms, utils, datasets
from torchvision.transforms import ToPILImage
from skimage import io, transform
from scipy.special import expit
from PIL import Image
from matplotlib.pyplot import imshow

from pyro.contrib.examples.util import print_and_log, set_seed
import pyro.poutine as poutine
# custom helperCode for this tutorial, in helperCode.py
import helperCode
from utils.custom_mlp import MLP, Exp


from visdom import Visdom

#from utils.vae_plots import plot_llk, plot_vae_samples
from utils.mnist_cached import  mkdir_p, setup_data_loaders
from utils.vae_plots import plot_conditional_samples_ssvae, plot_vae_samples

EMBED_SIZE = 50
IMG_WIDTH = 100
IMG_SIZE = IMG_WIDTH*IMG_WIDTH*3
BATCH_SIZE = 32
DEFAULT_HIDDEN_DIMS = [100, 100] # [200,200] #[500, 500]
DEFAULT_Z_DIM = 10 # 50 #2

#### Word embeddings

Now we will load the GloVe word embeddings. (Warning: may take up to a minute...)

In [34]:
embed_path = os.path.join(os.path.abspath('..'), "glove", "glove.6B.50d.txt")

def load_glove_embeddings(path):
    print("Loading GloVe embeddings")
    with open(path,'r') as f:
        model = {}
        for line in f:
            split_line = line.split()
            word = split_line[0]
            embedding = np.array([float(val) for val in split_line[1:]], dtype=np.float32)
            model[word] = embedding
        print("Done.",len(model)," words loaded!")
        return model

embeddings = load_glove_embeddings(embed_path)

Loading GloVe embeddings
Done. 400000  words loaded!


#### Dataset

We will using dataset with only utterances and emotion ratings. Here are the utterances used in the experiment:



In [35]:
# Data location
dataset_path = os.path.join(os.path.abspath(".."), "CognitionData", "dataSecondExpt_utteranceOnly.csv")
expdata = pd.read_csv(dataset_path)

# Print utterances
utterances = list(sorted(pd.unique(expdata.loc[:]["utterance"])))
print(utterances)


['awesome', 'cool', 'damn', 'dang', 'man', 'meh', 'oh', 'wow', 'yay', 'yikes']


This next chunk defines a function to read in the data, and stores the data in `word_emotion_dataset`. Each observation consists of a 8-dimensional emotion rating vector and an accompanying word embedding.

In [36]:
class WordEmotionDataset(Dataset):
    """Word emotion dataset."""
    
    def __init__(self, csv_file, embeddings, transform=None):
        """
        Args:
            csv_file (string): Path to the experiment csv file 
            embeddings (dict): Dictionary of pre-trained word embeddings.
            transform (callable, optional): Optional transform to be applied
                on a sample.
        """
        self.expdata = pd.read_csv(csv_file)
        self.embeddings = embeddings
        self.transform = transform

    def __len__(self):
        return len(self.expdata)

    def __getitem__(self, idx):
        # Load and normalize the emotion data
        emotions = np.array(self.expdata.iloc[idx]["happy":"disapp"], np.float32)
        emotions = (emotions-1)/8
        
        word = self.expdata.iloc[idx]["utterance"]
        try:
            embed = self.embeddings[word]
            if self.transform:
                embed = self.transform(embed)
        except:
            print(word)
            raise

        return word, embed, emotions

# Transform to Tensor for use by PyTorch
data_transform = torch.from_numpy

# Read in datafile.
print("Reading in dataset...")

word_emotion_dataset = WordEmotionDataset(csv_file=dataset_path, 
                                          embeddings=embeddings, 
                                          transform=data_transform)
word_emotion_loader = torch.utils.data.DataLoader(word_emotion_dataset,
                                                  batch_size=BATCH_SIZE, shuffle=True,
                                                  num_workers=4)

N_samples = len(word_emotion_dataset)
print("Number of observations:", N_samples)

# Taking a sample observation
word1, embed1, emo1 = word_emotion_dataset[np.random.randint(0, N_samples)]
print("Sample Observation: ")
print("Utterance:", word1)
print("Embedding:")
print(embed1)
print("Ratings:")
for k, v in zip(helperCode.EMOTION_VAR_NAMES, emo1):
    print("{:10}:   {}".format(k,v))

Reading in dataset...
Number of observations: 495
Sample Observation: 
Utterance: dang
Embedding:
tensor([-0.3506, -0.6451,  0.0278, -0.1339,  0.0142, -1.4423,  0.5744,
        -0.1708, -0.1624, -0.1450, -0.5040, -0.4346,  0.0917, -0.7354,
         0.0677, -0.6634,  0.6884,  1.0008,  0.4845,  0.6128, -0.6672,
        -0.5989,  0.2892,  0.3741,  1.3327, -0.6727, -0.7735,  0.4228,
        -0.5093, -1.0273,  0.1071, -0.1422,  0.1344,  1.2473, -0.4950,
        -0.2869,  0.1091,  0.9635,  0.2762,  1.0672, -0.5012,  0.4027,
        -0.2211, -0.4713,  0.6056, -1.1590, -0.1890, -0.4710,  1.0820,
         0.7326])
Ratings:
happy     :   0.25
sad       :   0.75
anger     :   0.125
surprise  :   0.375
disgust   :   0.625
fear      :   0.125
content   :   0.25
disapp    :   0.875


Now we load dataset of additional exclamation words for unsupervised learning.

In [42]:
class WordDataset(Dataset):
    """Random word dataset."""
    
    def __init__(self, csv_file, embeddings, transform=None):
        """
        Args:
            size (int): Number of random words to use 
            embeddings (dict): Dictionary of pre-trained word embeddings.
            transform (callable, optional): Optional transform to be applied
                on a sample.
        """
        self.words = pd.read_csv(csv_file, header=None)
        self.embeddings = embeddings
        self.transform = transform

    def __len__(self):
        return len(self.words)

    def __getitem__(self, idx):
        # Load and normalize the emotion data        
        word = self.words.iloc[idx][0]
        try:
            embed = self.embeddings[word]
            if self.transform:
                embed = self.transform(embed)
        except:
            print(word)
            raise

        return word, embed, 0

exclamations_path = os.path.join(os.path.abspath(".."), "CognitionData", "exclamations.csv")
more_words_dataset = WordDataset(csv_file=exclamations_path, 
                                 embeddings=embeddings,
                                 transform=data_transform)
more_words_loader = torch.utils.data.DataLoader(more_words_dataset,
                                                batch_size=BATCH_SIZE, shuffle=True,
                                                num_workers=4)

print("Number of observations:", len(more_words_dataset))

# Taking a sample observation
word2, embed2, _ = more_words_dataset[np.random.randint(0, len(more_words_dataset))]
print("Sample Observation: ")
print("Utterance:", word2)
print("Embedding:")
print(embed2)

# Define dictionary of data loaders
data_loaders = {"supervised": word_emotion_loader , "unsupervised": more_words_loader }

Number of observations: 121
Sample Observation: 
Utterance: damn
Embedding:
tensor([-0.5741, -0.0868,  0.0718, -0.6580,  0.2905, -0.3830,  0.0166,
         0.1121, -0.3213,  0.6598, -0.5088,  0.5223, -0.1437,  0.4856,
         0.3916,  0.0484,  1.0275,  0.9643, -0.1664,  0.0795, -0.5766,
         0.3243,  0.2332,  0.3038,  1.1071, -1.1531, -1.3940,  0.7610,
         1.6428, -1.3568,  0.9815,  0.2852,  0.0057,  0.8859, -0.9513,
        -0.1090,  0.6705, -0.3748,  0.0386, -0.1464,  0.2357, -0.4880,
        -0.1487,  0.6436, -0.0459,  0.0955, -0.4051,  0.1312, -0.0415,
         0.9727])


### Model

In [43]:
class SSVAE(nn.Module):
    """
    This class encapsulates the parameters (neural networks), models & guides needed to train a
    semi-supervised variational auto-encoder.
    Modified from https://github.com/uber/pyro/blob/dev/examples/vae/ss_vae_M2.py

    :param output_size: size of the tensor representing the ratings 
                        in our case, emotion ratings is an 8 dimensional vector in [0,1]
                        global constant: helperCode.EMOTION_VAR_DIM (= 8)
    :param input_size: size of the tensor representing the word embedding
                        defaults to 300 (the largest size in GloVe)
    :param z_dim: size of the tensor representing the latent random variable z
    :param hidden_layers: a tuple (or list) of MultiLayer Perceptron (MLP) layers 
                          to be used in the neural networks
                          representing the parameters of the distributions in our model
    :param use_cuda: use GPUs for faster training
    :param aux_loss_multiplier: the multiplier to use with the auxiliary loss
    """
    def __init__(self, output_size=helperCode.EMOTION_VAR_DIM, input_size=EMBED_SIZE, 
                 z_dim=DEFAULT_Z_DIM, hidden_layers=DEFAULT_HIDDEN_DIMS,
                 config_enum=None, use_cuda=False, aux_loss_multiplier=None):
        super(SSVAE, self).__init__()
        self.output_size = output_size
        self.input_size = input_size
        self.z_dim = z_dim
        self.hidden_layers = hidden_layers
        self.allow_broadcast = config_enum == 'parallel'
        self.use_cuda = use_cuda
        self.aux_loss_multiplier = aux_loss_multiplier
        
        # define the neural networks used later in the model and the guide.
        # these networks are MLPs (multi-layered perceptrons or simple feed-forward networks)
        # where the provided activation parameter is used on every linear layer except
        # for the output layer where we use the provided output_activation parameter
        
        # self.encoder_y = nn.Sequential(
        #       nn.Conv2d(in_channels=3, out_channels=32, kernel_size=4, stride=2, padding=1, bias=False)
        # )

        
        # a split in the final layer's size is used for multiple outputs
        # and potentially applying separate activation functions on them
        # e.g. in this network the final output is of size [z_dim, z_dim]
        # to produce a mean and scale parameter for a Normal distribution, 
        # and we can apply different activations [None, Exp] on them

        # encoder_y goes from embeddings to ratings
        self.encoder_y = MLP([self.input_size] +
                             self.hidden_layers + 
                             [[self.output_size, self.output_size]],
                             activation=nn.Tanh,
                             output_activation=[None, Exp],
                             allow_broadcast=self.allow_broadcast,
                             use_cuda=self.use_cuda)

        # encoder_z goes from [embeddings, ratings] to z
        self.encoder_z = MLP([self.input_size + self.output_size] +
                             self.hidden_layers + [[z_dim, z_dim]],
                             activation=nn.ReLU,
                             output_activation=[None, Exp],
                             allow_broadcast=self.allow_broadcast,
                             use_cuda=self.use_cuda)

        # decoder goes from [z, emotion ratings] to the embedding.
        self.decoder = MLP([z_dim + self.output_size] +
                           self.hidden_layers[::-1] +
                           [[self.input_size, self.input_size]],
                           activation=nn.ReLU,
                           output_activation=nn.Sigmoid,
                           allow_broadcast=self.allow_broadcast,
                           use_cuda=self.use_cuda)
        
        # using GPUs for faster training of the networks
        if self.use_cuda:
            self.cuda()
            
    def model(self, xs, ys=None, beta=1.0):
        """
        The model corresponds to the following generative process:
        p(z)     = normal(0,I)              # Prior on the latent variable z
        p(y)     = normal(.5, .05)          # Emotion Rating corresponding to word embedding
        p(x|y,z) = normal(decoder(y,z))  # Producing an image
                                              decoder is a neural network

        :param xs: a batch of word embeddings
        :param ys: (optional) a batch of emotion ratings.
                   if ys is not provided, will treat as unsupervised, sample from prior.
        :param beta: scale parameter that weights the KL divergence in the ELBO
                     also sometimes called annealing.
        :return: None
        """
        # register this pytorch module and all of its sub-modules with pyro
        pyro.module("ssvae", self)

        batch_size = xs.size(0)
        # inform Pyro that the variables in the batch of xs, ys are conditionally independent
        with pyro.iarange("data"):

            # sample the latent z from the (constant) prior, z ~ Normal(0,I)
            z_prior_mean  = torch.zeros(size=[batch_size, self.z_dim])
            z_prior_scale = torch.exp(torch.zeros(size=[batch_size, self.z_dim]))
            with poutine.scale(scale=beta):
                zs = pyro.sample("z", dist.Normal(z_prior_mean, z_prior_scale).independent(1))

            # if the label y (emotion rating) is not provided, sample from the
            # constant prior, otherwise, observe the value (i.e. score it against the constant prior)
            y_prior_mean  = torch.ones(size=[batch_size, self.output_size]) *0.5 #/ (1.0 * self.output_size)
            y_prior_scale = torch.ones(size=[batch_size, self.output_size]) *0.05
            if ys is None:
                ys = pyro.sample("y", dist.Normal(y_prior_mean, y_prior_scale).independent(1))
            else:
                ys = pyro.sample("y", dist.Normal(y_prior_mean, y_prior_scale).independent(1), obs=ys)
                
            # Finally, we can condition on observing the word embedding,
            #    using the latent z and emotion rating y in the 
            #    parametrized distribution p(x|y,z) = bernoulli(decoder(y,z))
            #    where decoder is a neural network
            
            x_mean, x_scale = self.decoder.forward([zs, ys])
            pyro.sample("x", dist.Normal(x_mean, x_scale).independent(1), obs=xs)
            
            # return the mean and variance of the word embedding distribution
            return x_mean, x_scale

    def guide(self, xs, ys=None, beta=1.0):
        """
        The guide corresponds to the following:
        q(y|x)   = normal(encoder_y(x))   # infer emotion rating from word embedding
        q(z|x,y) = normal(encoder_z(x,y)) # infer z from embedding and the emotion rating

        :param xs: a batch of word vectors
        :param ys: (optional) a batch of emotion ratings.
                   if ys is not provided, will treat as unsupervised
        :param beta: not used here, but left to match the call signature of self.model()
        :return: None
        """
        # inform Pyro that the variables in the batch of xs, ys are conditionally independent
        with pyro.iarange("data"):

            # if the emotion rating is not provided, 
            #    sample with the variational distribution
            #    q(y|x) = Normal(encoder_y(x))
            if ys is None:
                y_mean, y_scale = self.encoder_y.forward(xs)
                #scale_ys = xs.new_ones([batch_size, self.output_size])*0.05
                ys = pyro.sample("y", dist.Normal(y_mean, y_scale).independent(1))
                
            # Sample (and score) the latent z with the variational
            #   distribution q(z|x,y) = normal(loc(x,y),scale(x,y))
            #   where loc(.), scale(.) are given by encoder_z()
                        
            z_mean, z_scale = self.encoder_z.forward([xs, ys])
            with poutine.scale(scale=beta): 
                pyro.sample("z", dist.Normal(z_mean, z_scale).independent(1))

    def model_rating(self, xs, ys=None, beta=None):
        """
        this model is used to add an auxiliary (supervised) loss as described in
        Kingma et al. (2014), "Semi-Supervised Learning with Deep Generative Models".
        
        This is to ensure that the model learns from the supervised examples.
        q(y|x) = normal(encoder_y(x))
        
        :param xs:   word embedding
        :param ys:   emotion rating
        :param beta: not used here, but left to match the call signature of self.model()
        """
        # register all pytorch (sub)modules with pyro
        pyro.module("ssvae", self)

        # inform Pyro that the variables in the batch of xs, ys are conditionally independent
        with pyro.iarange("data"):
            # this is the extra term to yield an auxiliary loss that we do gradient descent on
            if ys is not None:
                y_mean, y_scale = self.encoder_y.forward(xs)
                with pyro.poutine.scale(scale=self.aux_loss_multiplier):
                    pyro.sample("y_aux", dist.Normal(y_mean, y_scale).independent(1), obs=ys)

    def guide_rating(self, xs, ys=None, beta=None):
        """
        dummy guide function to accompany model_rating() in inference
        This guide function is empty, because model_rating() has no latent random variables
        (i.e., model_rating() has no pyro.sample() calls that are not conditioned on observations)
        """
        pass
    
    # define a helper function to assign ratings to faces
    def rate(self, xs):
        """
        assign emotion ratings (ys) to a word embedding (or a batch of them)

        :param xs: a batch of word vectors
        :return:   a batch of the corresponding emotion ratings (ys)
        """
        # use the trained model q(y|x) = normal(encoder_y(x))
        # compute the emotion ratings for the image(s)
        y_mean, y_scale = self.encoder_y.forward(xs)
        return y_mean
    
    # define a helper function for reconstructing word vectors
    def reconstruct_word(self, x):
        # encode word vector x. This function assumes that x is a single vector, 
        # but as the encoders and decoders take in batches, we have to resize x:
        xs = x.view(1, EMBED_SIZE)
        y_mean, y_scale = self.encoder_y.forward(xs)
        ys = dist.Normal(y_mean, y_scale).sample()
        z_mean, z_scale = self.encoder_z.forward([xs, ys])
        # sample in latent space
        zs = dist.Normal(z_mean, z_scale).sample()
        # decode the word (note we don't sample in the word embedding space)
        x_mean, x_scale = self.decoder([zs, ys])
        return x_mean

#### Training

Next we define the parameters of our training session, and set up the model and inference algorithm. This part is identical to the linear regression and VAE example.

The key difference is that we have 2 losses. 

- The unsupervised loss from the vanilla VAE, which is captured by ```loss_basic = SVI(ssvae.model, ssvae.guide, optimizer, loss=Trace_ELBO())``` 
- and the supervised loss, or auxillary loss, captured by ```loss_aux = SVI(ssvae.model_rating, ssvae.guide_rating, optimizer, loss=Trace_ELBO())```

At each loop in the training, we take one "basic" step and one "auxillary" step, to train the model with both losses together.

`num_epochs` below is set to 1000. If you just want to check if the code works, set it to 1.

In [63]:
class Args:
    learning_rate = 5e-6#5e-4
    num_epochs = 1000 #1
    hidden_layers = DEFAULT_HIDDEN_DIMS
    z_dim = DEFAULT_Z_DIM
    seed = 1
    beta_1 = 0.900
    aux_loss = True 
    aux_loss_multiplier = 10 #50.0
    cuda = False
    # enum_discrete = None #"parallel"#"sequential" #"parallel"
    #visdom_flag = True
    #visualize = True
    #logfile = "./tmp.log"
    
args = Args()


pyro.clear_param_store()

unsup_num = len(more_words_dataset)
sup_num = len(word_emotion_dataset)


if args.seed is not None:
    set_seed(args.seed, args.cuda)

# batch_size: number of images (and labels) to be considered in a batch
ssvae = SSVAE(output_size=helperCode.EMOTION_VAR_DIM, input_size=EMBED_SIZE, 
              z_dim=args.z_dim,
              hidden_layers=args.hidden_layers,
              use_cuda=args.cuda,
              #config_enum=args.enum_discrete,
              aux_loss_multiplier=args.aux_loss_multiplier)

# setup the optimizer
adam_params = {"lr": args.learning_rate, "betas": (args.beta_1, 0.999)}
optimizer = Adam(adam_params)

# set up the loss(es) for inference. 

## wrapping the guide in config_enumerate builds the loss as a sum
## by enumerating each class label for the sampled discrete categorical distribution in the model
##guide = config_enumerate(ssvae.guide, args.enum_discrete)
##loss_basic = SVI(ssvae.model, guide, optimizer, loss=TraceEnum_ELBO(max_iarange_nesting=2))

loss_basic = SVI(ssvae.model, ssvae.guide, optimizer, loss=Trace_ELBO())

# build a list of all losses considered
losses = [loss_basic]

# aux_loss: whether to use the auxiliary loss from (Kingma et al, 2014 NIPS paper)
if args.aux_loss:
    loss_aux = SVI(ssvae.model_rating, ssvae.guide_rating, optimizer, loss=Trace_ELBO())
    losses.append(loss_aux)

#### Setting up Inference

This next chunk defines some helper functions to help run the inference

In [57]:
def getBeta(epoch, gamma=1e-2, c=800):
    # this is a helper function to compute an annealing parameter beta for a given epoch.
    # beta starts off near 0 and gradually increases, reaching 1 at epoch=c
    return float(expit(gamma*(epoch - c)))

def run_inference_for_epoch(data_loaders, losses, epoch, gamma=1e-2, c=800, cuda=False):
    """
    runs the inference algorithm for an epoch
    returns the values of all losses separately on supervised and unsupervised parts
    """
    num_losses = len(losses)

    # compute number of batches for an epoch
    # don't use all the sup_batches
    sup_batches = len(data_loaders["supervised"])
    unsup_batches = len(data_loaders["unsupervised"])
    batches_per_epoch = sup_batches + unsup_batches

    # initialize variables to store loss values
    epoch_losses_sup = [0.] * num_losses
    epoch_losses_unsup = [0.] * num_losses

    # setup the iterators for training data loaders
    sup_iter = iter(data_loaders["supervised"])
    unsup_iter = iter(data_loaders["unsupervised"])

    # random order
    is_sups = [1]*sup_batches + [0]*unsup_batches
    is_sups = np.random.permutation(is_sups)
    
    # annealing factor
    beta = getBeta(epoch, gamma, c)
    
    for i in range(batches_per_epoch):
        #if i%10 == 0:
        #    print("Epoch", epoch, ": on batch", i, "out of", batches_per_epoch)
        # whether this batch is supervised or not
        is_supervised = is_sups[i]
        
        # extract the corresponding batch
        if is_supervised:
            _, xs, ys = next(sup_iter)
        else:
            _, xs, ys = next(unsup_iter)
            
        if cuda:
            ys = ys.cuda()
            xs = xs.cuda()
        
        batchsize = xs.size(0)
        xs = xs.view(batchsize, -1)
        
        # run the inference for each loss with supervised or un-supervised
        # data as arguments
        for loss_id in range(num_losses):
            if is_supervised:
                new_loss = losses[loss_id].step(xs, ys, beta=beta)
                epoch_losses_sup[loss_id] += new_loss
            else:
                new_loss = losses[loss_id].step(xs, beta=beta)
                epoch_losses_unsup[loss_id] += new_loss

    # return the values of all losses
    return epoch_losses_sup, epoch_losses_unsup

def get_prediction_error(data_loader, rating_fn):
    """
    compute the prediction error over the supervised training set or the testing set
    """
    predictions, actuals = [], []

    # use the appropriate data loader
    for (_, xs, ys) in data_loader:
        # use classification function to compute all predictions for each batch
        batchsize = xs.size(0)
        xs = xs.view(batchsize, -1)
        ys = ys.view(batchsize, -1)
        predictions.append(rating_fn(xs))
        actuals.append(ys)
        
    #abs(predictions[i] - actuals[i]) for i in range(len(predictions))

    predErrors = [abs(pred_i - act_i) for pred_i, act_i in zip(predictions, actuals)]
    meanPredError = sum([torch.sum(pe) for pe in predErrors]) / (len(predictions) * BATCH_SIZE * helperCode.EMOTION_VAR_DIM)

    return meanPredError, predictions, actuals

#### Training loop

This next chunk of code runs the training over `num_epochs` epochs.

In [61]:
try:
    # run inference for a certain number of epochs
    for i in range(0, args.num_epochs):

        # get the losses for an epoch
        epoch_losses_sup, epoch_losses_unsup = \
            run_inference_for_epoch(data_loaders, losses, epoch=i, cuda=args.cuda)

        # compute average epoch losses i.e. losses per example
        avg_epoch_losses_sup = map(lambda v: v / sup_num, epoch_losses_sup)
        avg_epoch_losses_unsup = map(lambda v: v / unsup_num, epoch_losses_unsup)

        # store the loss and validation/testing accuracies in the logfile
        #str_loss_sup = " ".join(map(str, avg_epoch_losses_sup))
        #str_loss_unsup = " ".join(map(str, avg_epoch_losses_unsup))
        str_loss_sup = "Supervised Recon Loss: " + str(avg_epoch_losses_sup[0]) + \
        "\n     Auxillary Loss: " + str(avg_epoch_losses_sup[1])
        str_loss_unsup = "\n     Unsupervised Recon Loss: " + str(avg_epoch_losses_unsup[0])

        str_print = "Epoch {} : Avg {}".format(i, "{} {}".format(str_loss_sup, str_loss_unsup))
        
        predErr, _, _ = get_prediction_error(data_loaders["supervised"], ssvae.rate)
        str_print += "\n     Train set prediction error: {}".format(predErr)
        print(str_print)
finally:
    print("Done")


Epoch 0 : Avg Supervised Recon Loss: 254.834145383
     Auxillary Loss: 24.7626493204 
     Unsupervised Recon Loss: 198.879734609
     Train set prediction error: 0.267566591501
Epoch 1 : Avg Supervised Recon Loss: 254.704253641
     Auxillary Loss: 24.7177454476 
     Unsupervised Recon Loss: 217.643848618
     Train set prediction error: 0.267618685961
Epoch 2 : Avg Supervised Recon Loss: 254.629784051
     Auxillary Loss: 24.707728962 
     Unsupervised Recon Loss: 212.504246217
     Train set prediction error: 0.267602622509
Epoch 3 : Avg Supervised Recon Loss: 254.619062994
     Auxillary Loss: 24.7017939404 
     Unsupervised Recon Loss: 195.908650075
     Train set prediction error: 0.267508149147
Epoch 4 : Avg Supervised Recon Loss: 254.492827343
     Auxillary Loss: 24.5949660792 
     Unsupervised Recon Loss: 210.318757231
     Train set prediction error: 0.267289429903
Epoch 5 : Avg Supervised Recon Loss: 254.44397866
     Auxillary Loss: 24.7571736037 
     Unsupervised Re

Epoch 46 : Avg Supervised Recon Loss: 253.325107154
     Auxillary Loss: 24.451592586 
     Unsupervised Recon Loss: 197.037636661
     Train set prediction error: 0.267407536507
Epoch 47 : Avg Supervised Recon Loss: 253.471947615
     Auxillary Loss: 24.545330903 
     Unsupervised Recon Loss: 194.45055013
     Train set prediction error: 0.267382949591
Epoch 48 : Avg Supervised Recon Loss: 253.389884767
     Auxillary Loss: 24.5368772565 
     Unsupervised Recon Loss: 195.165609663
     Train set prediction error: 0.267446011305
Epoch 49 : Avg Supervised Recon Loss: 253.625229861
     Auxillary Loss: 24.3845253068 
     Unsupervised Recon Loss: 193.519747995
     Train set prediction error: 0.267155647278
Epoch 50 : Avg Supervised Recon Loss: 253.559460905
     Auxillary Loss: 24.391738121 
     Unsupervised Recon Loss: 181.931977525
     Train set prediction error: 0.26722022891
Epoch 51 : Avg Supervised Recon Loss: 253.491277177
     Auxillary Loss: 24.271677468 
     Unsupervised 

Epoch 92 : Avg Supervised Recon Loss: 252.78758871
     Auxillary Loss: 24.0921936652 
     Unsupervised Recon Loss: 182.965644894
     Train set prediction error: 0.266956448555
Epoch 93 : Avg Supervised Recon Loss: 252.555046828
     Auxillary Loss: 24.2238960651 
     Unsupervised Recon Loss: 196.867554591
     Train set prediction error: 0.266971766949
Epoch 94 : Avg Supervised Recon Loss: 252.635490255
     Auxillary Loss: 24.1987534032 
     Unsupervised Recon Loss: 189.800661142
     Train set prediction error: 0.267051517963
Epoch 95 : Avg Supervised Recon Loss: 252.601739388
     Auxillary Loss: 24.135079432 
     Unsupervised Recon Loss: 195.893381815
     Train set prediction error: 0.266975522041
Epoch 96 : Avg Supervised Recon Loss: 252.457338081
     Auxillary Loss: 24.1648933179 
     Unsupervised Recon Loss: 194.821112909
     Train set prediction error: 0.266975551844
Epoch 97 : Avg Supervised Recon Loss: 252.546986846
     Auxillary Loss: 23.9538481741 
     Unsupervi

Epoch 138 : Avg Supervised Recon Loss: 251.920346947
     Auxillary Loss: 23.6863769531 
     Unsupervised Recon Loss: 187.960445765
     Train set prediction error: 0.266231477261
Epoch 139 : Avg Supervised Recon Loss: 251.909332516
     Auxillary Loss: 23.5740852741 
     Unsupervised Recon Loss: 194.871940833
     Train set prediction error: 0.266178011894
Epoch 140 : Avg Supervised Recon Loss: 252.004213662
     Auxillary Loss: 23.6780655061 
     Unsupervised Recon Loss: 193.742679012
     Train set prediction error: 0.266364037991
Epoch 141 : Avg Supervised Recon Loss: 251.888443241
     Auxillary Loss: 23.7203677399 
     Unsupervised Recon Loss: 184.702578911
     Train set prediction error: 0.266224205494
Epoch 142 : Avg Supervised Recon Loss: 251.926455321
     Auxillary Loss: 23.6460186583 
     Unsupervised Recon Loss: 182.494531938
     Train set prediction error: 0.266168773174
Epoch 143 : Avg Supervised Recon Loss: 251.988914184
     Auxillary Loss: 23.6731902151 
     U

Epoch 184 : Avg Supervised Recon Loss: 251.518402957
     Auxillary Loss: 23.4004222523 
     Unsupervised Recon Loss: 165.019363202
     Train set prediction error: 0.265828430653
Epoch 185 : Avg Supervised Recon Loss: 251.509177729
     Auxillary Loss: 23.619587353 
     Unsupervised Recon Loss: 184.421595903
     Train set prediction error: 0.265909910202
Epoch 186 : Avg Supervised Recon Loss: 251.620427406
     Auxillary Loss: 23.6095945416 
     Unsupervised Recon Loss: 180.051288084
     Train set prediction error: 0.265752911568
Epoch 187 : Avg Supervised Recon Loss: 251.508482302
     Auxillary Loss: 23.4631377866 
     Unsupervised Recon Loss: 184.181116069
     Train set prediction error: 0.265798002481
Epoch 188 : Avg Supervised Recon Loss: 251.44067314
     Auxillary Loss: 23.2972908405 
     Unsupervised Recon Loss: 171.770207866
     Train set prediction error: 0.265589058399
Epoch 189 : Avg Supervised Recon Loss: 251.432476596
     Auxillary Loss: 23.3551874026 
     Uns

Epoch 230 : Avg Supervised Recon Loss: 251.323380282
     Auxillary Loss: 22.8694917313 
     Unsupervised Recon Loss: 170.005310532
     Train set prediction error: 0.264539420605
Epoch 231 : Avg Supervised Recon Loss: 251.196389703
     Auxillary Loss: 23.0205834591 
     Unsupervised Recon Loss: 174.603585532
     Train set prediction error: 0.264747172594
Epoch 232 : Avg Supervised Recon Loss: 251.301895961
     Auxillary Loss: 23.1326044873 
     Unsupervised Recon Loss: 174.753029304
     Train set prediction error: 0.264814168215
Epoch 233 : Avg Supervised Recon Loss: 251.198019567
     Auxillary Loss: 22.9732799183 
     Unsupervised Recon Loss: 179.960939491
     Train set prediction error: 0.264685630798
Epoch 234 : Avg Supervised Recon Loss: 251.277465307
     Auxillary Loss: 22.8613720826 
     Unsupervised Recon Loss: 188.293890872
     Train set prediction error: 0.264567583799
Epoch 235 : Avg Supervised Recon Loss: 251.362108268
     Auxillary Loss: 22.9718698829 
     U

Epoch 276 : Avg Supervised Recon Loss: 251.106894553
     Auxillary Loss: 22.6211834532 
     Unsupervised Recon Loss: 182.018854605
     Train set prediction error: 0.263669520617
Epoch 277 : Avg Supervised Recon Loss: 251.165377744
     Auxillary Loss: 22.5408778335 
     Unsupervised Recon Loss: 170.3581791
     Train set prediction error: 0.263465940952
Epoch 278 : Avg Supervised Recon Loss: 251.119608958
     Auxillary Loss: 22.5507843942 
     Unsupervised Recon Loss: 173.035972841
     Train set prediction error: 0.263536304235
Epoch 279 : Avg Supervised Recon Loss: 251.150634502
     Auxillary Loss: 22.4040599476 
     Unsupervised Recon Loss: 178.948898876
     Train set prediction error: 0.263431787491
Epoch 280 : Avg Supervised Recon Loss: 250.999464295
     Auxillary Loss: 22.4342257413 
     Unsupervised Recon Loss: 171.40250899
     Train set prediction error: 0.263448506594
Epoch 281 : Avg Supervised Recon Loss: 251.047045953
     Auxillary Loss: 22.3192623871 
     Unsu

Epoch 322 : Avg Supervised Recon Loss: 251.19304168
     Auxillary Loss: 21.9325994626 
     Unsupervised Recon Loss: 176.155886045
     Train set prediction error: 0.261823236942
Epoch 323 : Avg Supervised Recon Loss: 251.063448404
     Auxillary Loss: 22.0025228481 
     Unsupervised Recon Loss: 171.600235037
     Train set prediction error: 0.26188442111
Epoch 324 : Avg Supervised Recon Loss: 251.14048469
     Auxillary Loss: 21.8752147944 
     Unsupervised Recon Loss: 163.712102088
     Train set prediction error: 0.261818647385
Epoch 325 : Avg Supervised Recon Loss: 251.115852267
     Auxillary Loss: 21.8941663954 
     Unsupervised Recon Loss: 175.749951183
     Train set prediction error: 0.26175314188
Epoch 326 : Avg Supervised Recon Loss: 251.030344805
     Auxillary Loss: 21.817401308 
     Unsupervised Recon Loss: 177.672183596
     Train set prediction error: 0.261603593826
Epoch 327 : Avg Supervised Recon Loss: 251.047380202
     Auxillary Loss: 21.8893383912 
     Unsupe

Epoch 368 : Avg Supervised Recon Loss: 251.146179615
     Auxillary Loss: 21.1112988096 
     Unsupervised Recon Loss: 174.181585799
     Train set prediction error: 0.259270519018
Epoch 369 : Avg Supervised Recon Loss: 251.102920181
     Auxillary Loss: 21.0425670277 
     Unsupervised Recon Loss: 164.226239555
     Train set prediction error: 0.259231835604
Epoch 370 : Avg Supervised Recon Loss: 251.186973445
     Auxillary Loss: 21.1015658908 
     Unsupervised Recon Loss: 168.536463568
     Train set prediction error: 0.259359449148
Epoch 371 : Avg Supervised Recon Loss: 251.186920695
     Auxillary Loss: 21.0309892751 
     Unsupervised Recon Loss: 176.706102722
     Train set prediction error: 0.259042590857
Epoch 372 : Avg Supervised Recon Loss: 251.134729509
     Auxillary Loss: 20.8330419675 
     Unsupervised Recon Loss: 160.090481165
     Train set prediction error: 0.258813917637
Epoch 373 : Avg Supervised Recon Loss: 251.073397665
     Auxillary Loss: 21.0476612438 
     U

Epoch 414 : Avg Supervised Recon Loss: 251.219457682
     Auxillary Loss: 20.106326818 
     Unsupervised Recon Loss: 154.138973827
     Train set prediction error: 0.256203621626
Epoch 415 : Avg Supervised Recon Loss: 251.36327574
     Auxillary Loss: 20.0140800399 
     Unsupervised Recon Loss: 164.727317266
     Train set prediction error: 0.256100445986
Epoch 416 : Avg Supervised Recon Loss: 251.336156302
     Auxillary Loss: 20.0727393565 
     Unsupervised Recon Loss: 174.730546616
     Train set prediction error: 0.256147503853
Epoch 417 : Avg Supervised Recon Loss: 251.305925058
     Auxillary Loss: 20.1251020952 
     Unsupervised Recon Loss: 167.213087196
     Train set prediction error: 0.256095856428
Epoch 418 : Avg Supervised Recon Loss: 251.29517861
     Auxillary Loss: 20.0317712032 
     Unsupervised Recon Loss: 158.279706395
     Train set prediction error: 0.256017029285
Epoch 419 : Avg Supervised Recon Loss: 251.22392147
     Auxillary Loss: 19.935879979 
     Unsupe

Epoch 460 : Avg Supervised Recon Loss: 251.453739931
     Auxillary Loss: 18.7958280005 
     Unsupervised Recon Loss: 154.958427547
     Train set prediction error: 0.252106398344
Epoch 461 : Avg Supervised Recon Loss: 251.470513915
     Auxillary Loss: 18.6353573269 
     Unsupervised Recon Loss: 153.921787286
     Train set prediction error: 0.251753091812
Epoch 462 : Avg Supervised Recon Loss: 251.47454386
     Auxillary Loss: 18.5820441969 
     Unsupervised Recon Loss: 144.457669802
     Train set prediction error: 0.251682192087
Epoch 463 : Avg Supervised Recon Loss: 251.529869028
     Auxillary Loss: 18.6030791928 
     Unsupervised Recon Loss: 161.745563145
     Train set prediction error: 0.251640200615
Epoch 464 : Avg Supervised Recon Loss: 251.508785526
     Auxillary Loss: 18.6045748316 
     Unsupervised Recon Loss: 149.014621743
     Train set prediction error: 0.25144520402
Epoch 465 : Avg Supervised Recon Loss: 251.524991037
     Auxillary Loss: 18.6144651471 
     Uns

Epoch 506 : Avg Supervised Recon Loss: 251.639936681
     Auxillary Loss: 17.0193864919 
     Unsupervised Recon Loss: 161.52942912
     Train set prediction error: 0.24660860002
Epoch 507 : Avg Supervised Recon Loss: 251.614334482
     Auxillary Loss: 16.9771787701 
     Unsupervised Recon Loss: 149.468540058
     Train set prediction error: 0.246405050159
Epoch 508 : Avg Supervised Recon Loss: 251.724095611
     Auxillary Loss: 16.9586979105 
     Unsupervised Recon Loss: 146.717192106
     Train set prediction error: 0.246311888099
Epoch 509 : Avg Supervised Recon Loss: 251.801983826
     Auxillary Loss: 16.9935549958 
     Unsupervised Recon Loss: 162.53954353
     Train set prediction error: 0.246223703027
Epoch 510 : Avg Supervised Recon Loss: 251.825264364
     Auxillary Loss: 16.7800396312 
     Unsupervised Recon Loss: 151.141208972
     Train set prediction error: 0.245947018266
Epoch 511 : Avg Supervised Recon Loss: 251.730146179
     Auxillary Loss: 16.8518946792 
     Unsu

Epoch 552 : Avg Supervised Recon Loss: 251.853328335
     Auxillary Loss: 15.046863841 
     Unsupervised Recon Loss: 164.403115816
     Train set prediction error: 0.24014775455
Epoch 553 : Avg Supervised Recon Loss: 251.964485307
     Auxillary Loss: 15.2089660336 
     Unsupervised Recon Loss: 154.722116297
     Train set prediction error: 0.24036501348
Epoch 554 : Avg Supervised Recon Loss: 251.914908276
     Auxillary Loss: 15.0476072215 
     Unsupervised Recon Loss: 147.879064024
     Train set prediction error: 0.239995032549
Epoch 555 : Avg Supervised Recon Loss: 251.904872047
     Auxillary Loss: 15.1047074751 
     Unsupervised Recon Loss: 155.001910375
     Train set prediction error: 0.239964216948
Epoch 556 : Avg Supervised Recon Loss: 251.861993221
     Auxillary Loss: 14.923568479 
     Unsupervised Recon Loss: 150.8149232
     Train set prediction error: 0.239637762308
Epoch 557 : Avg Supervised Recon Loss: 252.068922871
     Auxillary Loss: 14.9608058737 
     Unsuper

Epoch 598 : Avg Supervised Recon Loss: 252.257114938
     Auxillary Loss: 12.6033016012 
     Unsupervised Recon Loss: 147.86914031
     Train set prediction error: 0.23275667429
Epoch 599 : Avg Supervised Recon Loss: 252.038091945
     Auxillary Loss: 12.6640781942 
     Unsupervised Recon Loss: 147.926301121
     Train set prediction error: 0.232597485185
Epoch 600 : Avg Supervised Recon Loss: 252.244935843
     Auxillary Loss: 12.504941004 
     Unsupervised Recon Loss: 159.776850235
     Train set prediction error: 0.232430905104
Epoch 601 : Avg Supervised Recon Loss: 252.255374885
     Auxillary Loss: 12.5271986065 
     Unsupervised Recon Loss: 146.26306332
     Train set prediction error: 0.232105657458
Epoch 602 : Avg Supervised Recon Loss: 252.223125616
     Auxillary Loss: 12.502878455 
     Unsupervised Recon Loss: 150.889172578
     Train set prediction error: 0.232172563672
Epoch 603 : Avg Supervised Recon Loss: 252.101317261
     Auxillary Loss: 12.3674507218 
     Unsupe

Epoch 644 : Avg Supervised Recon Loss: 252.558123656
     Auxillary Loss: 10.1739918179 
     Unsupervised Recon Loss: 141.50153231
     Train set prediction error: 0.224892392755
Epoch 645 : Avg Supervised Recon Loss: 252.565480816
     Auxillary Loss: 9.86531947743 
     Unsupervised Recon Loss: 147.841198409
     Train set prediction error: 0.224356263876
Epoch 646 : Avg Supervised Recon Loss: 252.408202354
     Auxillary Loss: 9.94786592734 
     Unsupervised Recon Loss: 144.73617084
     Train set prediction error: 0.224464312196
Epoch 647 : Avg Supervised Recon Loss: 252.558947392
     Auxillary Loss: 9.8756026952 
     Unsupervised Recon Loss: 151.548137444
     Train set prediction error: 0.224379882216
Epoch 648 : Avg Supervised Recon Loss: 252.456390234
     Auxillary Loss: 9.90947418213 
     Unsupervised Recon Loss: 145.470757224
     Train set prediction error: 0.224259585142
Epoch 649 : Avg Supervised Recon Loss: 252.639991814
     Auxillary Loss: 9.79958932472 
     Unsu

Epoch 690 : Avg Supervised Recon Loss: 252.728949938
     Auxillary Loss: 7.86347750076 
     Unsupervised Recon Loss: 151.129449797
     Train set prediction error: 0.218021973968
Epoch 691 : Avg Supervised Recon Loss: 252.87196542
     Auxillary Loss: 7.81026538887 
     Unsupervised Recon Loss: 143.791127134
     Train set prediction error: 0.217807665467
Epoch 692 : Avg Supervised Recon Loss: 252.694137342
     Auxillary Loss: 7.71126975628 
     Unsupervised Recon Loss: 150.729551173
     Train set prediction error: 0.217670351267
Epoch 693 : Avg Supervised Recon Loss: 252.627137017
     Auxillary Loss: 7.72649260242 
     Unsupervised Recon Loss: 147.180302581
     Train set prediction error: 0.217387929559
Epoch 694 : Avg Supervised Recon Loss: 252.724216438
     Auxillary Loss: 7.56809553281 
     Unsupervised Recon Loss: 145.869702489
     Train set prediction error: 0.217430174351
Epoch 695 : Avg Supervised Recon Loss: 252.898502088
     Auxillary Loss: 7.83972415539 
     Un

Epoch 736 : Avg Supervised Recon Loss: 252.965151962
     Auxillary Loss: 5.74648322288 
     Unsupervised Recon Loss: 140.865586872
     Train set prediction error: 0.211734846234
Epoch 737 : Avg Supervised Recon Loss: 253.062486899
     Auxillary Loss: 5.64043200714 
     Unsupervised Recon Loss: 138.165189128
     Train set prediction error: 0.211503073573
Epoch 738 : Avg Supervised Recon Loss: 253.030744518
     Auxillary Loss: 5.69809314458 
     Unsupervised Recon Loss: 145.635610344
     Train set prediction error: 0.211517453194
Epoch 739 : Avg Supervised Recon Loss: 253.034391261
     Auxillary Loss: 5.6916229749 
     Unsupervised Recon Loss: 143.936171666
     Train set prediction error: 0.211454689503
Epoch 740 : Avg Supervised Recon Loss: 253.051315377
     Auxillary Loss: 5.68986177926 
     Unsupervised Recon Loss: 143.95680571
     Train set prediction error: 0.211267381907
Epoch 741 : Avg Supervised Recon Loss: 253.100287728
     Auxillary Loss: 5.51786351252 
     Uns

Epoch 782 : Avg Supervised Recon Loss: 252.972023827
     Auxillary Loss: 3.84063725616 
     Unsupervised Recon Loss: 134.991621727
     Train set prediction error: 0.206577375531
Epoch 783 : Avg Supervised Recon Loss: 253.289409599
     Auxillary Loss: 3.91587316532 
     Unsupervised Recon Loss: 134.321882043
     Train set prediction error: 0.206765666604
Epoch 784 : Avg Supervised Recon Loss: 253.298233772
     Auxillary Loss: 3.77247463766 
     Unsupervised Recon Loss: 134.147029656
     Train set prediction error: 0.206438675523
Epoch 785 : Avg Supervised Recon Loss: 253.187977492
     Auxillary Loss: 3.73782319734 
     Unsupervised Recon Loss: 142.243465266
     Train set prediction error: 0.206111013889
Epoch 786 : Avg Supervised Recon Loss: 253.126524862
     Auxillary Loss: 3.72737291509 
     Unsupervised Recon Loss: 140.590741244
     Train set prediction error: 0.206316262484
Epoch 787 : Avg Supervised Recon Loss: 253.35805886
     Auxillary Loss: 3.64261163268 
     Un

Epoch 828 : Avg Supervised Recon Loss: 253.140395594
     Auxillary Loss: 2.38375261384 
     Unsupervised Recon Loss: 131.556285953
     Train set prediction error: 0.202773720026
Epoch 829 : Avg Supervised Recon Loss: 253.194799866
     Auxillary Loss: 2.43831623347 
     Unsupervised Recon Loss: 137.124944829
     Train set prediction error: 0.202761501074
Epoch 830 : Avg Supervised Recon Loss: 253.254647241
     Auxillary Loss: 2.25513194306 
     Unsupervised Recon Loss: 129.234690201
     Train set prediction error: 0.20239251852
Epoch 831 : Avg Supervised Recon Loss: 253.224852005
     Auxillary Loss: 2.38904154999 
     Unsupervised Recon Loss: 136.67707913
     Train set prediction error: 0.202654972672
Epoch 832 : Avg Supervised Recon Loss: 253.268062723
     Auxillary Loss: 2.30238249808 
     Unsupervised Recon Loss: 129.712274851
     Train set prediction error: 0.202480584383
Epoch 833 : Avg Supervised Recon Loss: 253.169777749
     Auxillary Loss: 2.20797860502 
     Uns

Epoch 874 : Avg Supervised Recon Loss: 253.336528493
     Auxillary Loss: 1.35591887657 
     Unsupervised Recon Loss: 132.210496382
     Train set prediction error: 0.199846252799
Epoch 875 : Avg Supervised Recon Loss: 253.289604557
     Auxillary Loss: 1.24168231849 
     Unsupervised Recon Loss: 131.346363541
     Train set prediction error: 0.199649065733
Epoch 876 : Avg Supervised Recon Loss: 253.203464053
     Auxillary Loss: 1.22813422848 
     Unsupervised Recon Loss: 140.663318161
     Train set prediction error: 0.199676409364
Epoch 877 : Avg Supervised Recon Loss: 253.134126297
     Auxillary Loss: 1.18094838653 
     Unsupervised Recon Loss: 140.447946028
     Train set prediction error: 0.19944062829
Epoch 878 : Avg Supervised Recon Loss: 253.053706591
     Auxillary Loss: 1.10583414791 
     Unsupervised Recon Loss: 133.759529807
     Train set prediction error: 0.199316516519
Epoch 879 : Avg Supervised Recon Loss: 252.983431236
     Auxillary Loss: 1.08189380723 
     Un

Epoch 920 : Avg Supervised Recon Loss: 252.944997552
     Auxillary Loss: 0.057238430447 
     Unsupervised Recon Loss: 135.761861943
     Train set prediction error: 0.19716784358
Epoch 921 : Avg Supervised Recon Loss: 253.051947823
     Auxillary Loss: 0.0702552564216 
     Unsupervised Recon Loss: 128.611640489
     Train set prediction error: 0.196985632181
Epoch 922 : Avg Supervised Recon Loss: 252.776786511
     Auxillary Loss: -0.102064418793 
     Unsupervised Recon Loss: 137.301739969
     Train set prediction error: 0.196851074696
Epoch 923 : Avg Supervised Recon Loss: 252.986398038
     Auxillary Loss: -0.131053756945 
     Unsupervised Recon Loss: 133.777669859
     Train set prediction error: 0.196895852685
Epoch 924 : Avg Supervised Recon Loss: 252.796284469
     Auxillary Loss: -0.112061089217 
     Unsupervised Recon Loss: 131.389801845
     Train set prediction error: 0.196935772896
Epoch 925 : Avg Supervised Recon Loss: 253.031841733
     Auxillary Loss: -0.0822142745

Epoch 965 : Avg Supervised Recon Loss: 252.774937516
     Auxillary Loss: -0.785434110237 
     Unsupervised Recon Loss: 123.095648647
     Train set prediction error: 0.195084705949
Epoch 966 : Avg Supervised Recon Loss: 252.719160261
     Auxillary Loss: -0.781666175765 
     Unsupervised Recon Loss: 128.356854021
     Train set prediction error: 0.195090949535
Epoch 967 : Avg Supervised Recon Loss: 252.7311246
     Auxillary Loss: -0.856939715087 
     Unsupervised Recon Loss: 135.874394693
     Train set prediction error: 0.194944784045
Epoch 968 : Avg Supervised Recon Loss: 252.655243999
     Auxillary Loss: -0.834446643097 
     Unsupervised Recon Loss: 131.282487538
     Train set prediction error: 0.194853052497
Epoch 969 : Avg Supervised Recon Loss: 252.562527928
     Auxillary Loss: -0.901760590679 
     Unsupervised Recon Loss: 131.102895248
     Train set prediction error: 0.194858625531
Epoch 970 : Avg Supervised Recon Loss: 252.586828305
     Auxillary Loss: -0.9346763071

Because the model above takes a while to train, you can use the following two chunks to save or load a model. We assume that you'll skip this save step and load the model from `trained_models`.

In [64]:
# save model
savemodel = True
if savemodel:
    pyro.get_param_store().save('word_ssvae_pretrained.save')

In [69]:
loadmodel = False
if loadmodel:
    pyro.get_param_store().load('word_ssvae_pretrained.save')
    pyro.module("ssvae", ssvae, update_module_params=True)

In [68]:
pyro.get_param_store().get_all_param_names()

['ssvae$$$encoder_y.sequential_mlp.3.bias',
 'ssvae$$$decoder.sequential_mlp.3.weight',
 'ssvae$$$encoder_y.sequential_mlp.1.bias',
 'ssvae$$$decoder.sequential_mlp.1.weight',
 'ssvae$$$decoder.sequential_mlp.3.bias',
 'ssvae$$$decoder.sequential_mlp.5.1.0.bias',
 'ssvae$$$encoder_z.sequential_mlp.5.1.0.bias',
 'ssvae$$$encoder_y.sequential_mlp.1.weight',
 'ssvae$$$encoder_z.sequential_mlp.5.0.0.weight',
 'ssvae$$$encoder_y.sequential_mlp.5.0.0.bias',
 'ssvae$$$encoder_y.sequential_mlp.3.weight',
 'ssvae$$$encoder_z.sequential_mlp.5.0.0.bias',
 'ssvae$$$encoder_z.sequential_mlp.3.weight',
 'ssvae$$$encoder_z.sequential_mlp.1.weight',
 'ssvae$$$decoder.sequential_mlp.5.1.0.weight',
 'ssvae$$$encoder_y.sequential_mlp.5.0.0.weight',
 'ssvae$$$encoder_z.sequential_mlp.5.1.0.weight',
 'ssvae$$$encoder_y.sequential_mlp.5.1.0.weight',
 'ssvae$$$encoder_y.sequential_mlp.5.1.0.bias',
 'ssvae$$$decoder.sequential_mlp.5.0.0.weight',
 'ssvae$$$decoder.sequential_mlp.1.bias',
 'ssvae$$$encoder_z.se

Now we reconstruct the utterances and evaluate their predictions and cosine similarity.

In [102]:
def cosine_sim(a, b):
    a_norm = a / a.norm()
    b_norm = b / b.norm()    
    return torch.dot(a_norm, b_norm)

eval_training = True

## Use training set as samples
if eval_training:
    train_words = ['awesome', 'cool', 'damn', 'dang', 'man', 'meh', 'oh', 'wow', 'yay', 'yikes']
    samples = [(w, torch.from_numpy(embeddings[w]), 0) for w in train_words]
else:
    ## Select samples randomly
    NUM_SAMPLES = 10
    indices = np.random.randint(0, len(more_words_dataset), NUM_SAMPLES)
    samples = [more_words_dataset[i] for i in indices]

print("Reconstruction similarity and ratings")
for word, embed, _ in samples:
    # Reconstruct the embedding
    recon_embed = ssvae.reconstruct_word(embed).view(-1)
    # Find cosine similarity
    sim = cosine_sim(embed, recon_embed).detach().numpy()
    # Predict ratings
    ratings = ssvae.rate(embed).detach().numpy()
    print("{:8} : {:10}".format(word, sim))
    print(ratings)

Reconstruction similarity and ratings
awesome  : 0.100971713662
[-0.05897579 -0.08185055  0.00404333  0.06529614  0.07893325  0.03559387
  0.08187602  0.04322229]
cool     : 0.226481407881
[-0.05899695 -0.0818296   0.00401297  0.06534441  0.07892848  0.03557191
  0.08190591  0.04322892]
damn     : 0.126131772995
[-0.0589525  -0.08185185  0.00402749  0.06529296  0.07894272  0.03559731
  0.08186757  0.04320458]
dang     : -0.0274714380503
[-0.05896634 -0.08183733  0.00409978  0.06528778  0.0789484   0.03562738
  0.0819075   0.04320452]
man      : 0.0732271373272
[-0.05895166 -0.08183766  0.00404894  0.06530266  0.07893724  0.03560038
  0.08188415  0.04321766]
meh      : -0.0291567258537
[-0.05905218 -0.08195414  0.00405192  0.06532442  0.07897616  0.0355177
  0.08186737  0.04320359]
oh       : 0.201870530844
[-0.05894198 -0.08189965  0.00402689  0.06532466  0.07894249  0.03556963
  0.08188675  0.04316571]
wow      : 0.0394622795284
[-0.05897043 -0.08187162  0.00403401  0.0653179   0.0789

-----

Written by: Desmond Ong (desmond.c.ong@gmail.com) and Harold Soh (hsoh@comp.nus.edu.sg)

References:

Pyro [VAE tutorial](http://pyro.ai/examples/vae.html), [SSVAE tutorial](http://pyro.ai/examples/ss-vae.html)

Hoffman, M. D., Blei, D. M., Wang, C., & Paisley, J. (2013). Stochastic
variational inference. *The Journal of Machine Learning Research*, 14(1),
1303-1347.

Kingma, D. P., Mohamed, S., Rezende, D. J., & Welling, M. (2014). Semi-supervised learning with deep generative models. In *Advances in Neural Information Processing Systems*, pp. 3581-3589. https://arxiv.org/abs/1406.5298

Kingma, D. P., & Welling, M. (2014). Auto-encoding variational bayes. Auto-Encoding Variational Bayes. In *The International Conference on Learning Representations*. https://arxiv.org/abs/1312.6114


Narayanaswamy, S., Paige, T. B., van de Meent, J. W., Desmaison, A., Goodman, N. D., Kohli, P., Wood, F. & Torr, P. (2017). Learning Disentangled Representations with Semi-Supervised Deep Generative Models. In *Advances in Neural Information Processing Systems*, pp. 5927-5937. https://arxiv.org/abs/1706.00400

Data from https://github.com/desmond-ong/affCog, from the following paper:

Ong, D. C., Zaki, J., & Goodman, N. D. (2015). Affective Cognition: Exploring lay theories of emotion. *Cognition*, 143, 141-162.