# Conditioning and Supervision

While variational auto-encoders can trained entirely in an unsupervised way, the obtained latent space is not constrained to any explicitely defined structure. Hence, without any further constraints, interaction of the limited is limited to a direct exploration of the model. To this end, metadata can be benefically used to enforce target structure to the model, allowing whether to shape the explored space, or to condition the generation in order to control the desired output.


<img src="img/conds.png" style="width: 500px;"/>
<center> <b>(a)</b> unsupervised case, <b>(b)</b> conditioned case, <br/>
<b>(c)</b> deterministic semi-supervised case, <b>(d)</b> variational semi-supervised case </center>

## Classification

If the metadata regarding a specific task is entirely available, a classification loss can be added to the model's loss using a discriminator taking the latent space as input. Provided the small capacity of the discriminator, this will force the latent space to be easilly separated by the classifier, enforcing data sharing the same label to be clustered together. 


In [None]:
import torch, vschaos
import vschaos.distributions as dist
import vschaos.vaes as vaes
from vschaos.data import dataset_from_torchvision, Flatten

from torchvision.transforms import Lambda, ToTensor
from vschaos.vaes import VanillaVAE

from vschaos.criterions import ELBO, Classification
from vschaos.monitor import PCA
from vschaos.train import SimpleTrainer, train_model

# import MNIST
transforms = [Lambda(lambda x: x / 255. + 0.01*torch.randn_like(x.float())), Flatten(-2)]
dataset = dataset_from_torchvision('MNIST', transforms=transforms)

# make VAE
input_params = {'dim':784, 'dist': dist.Normal}
class_params = {'dim':10, 'dist':dist.Categorical}
hidden_params = {'dim':800, 'nlayers':2, 'batch_norm':'batch'}
latent_params = {'dim':8, "dist":dist.Normal}
vae = vaes.VanillaVAE(input_params, latent_params, hidden_params=hidden_params)

# initialize optimizer
optim_params = {'optimizer':'Adam', 'optimArgs':{'lr':1e-5}, 'scheduler':'ReduceLROnPlateau'}
vae.init_optimizer(optim_params)

# defining the losses
elbo = ELBO(beta=1.0, warmup=20)
classifier = Classification(latent_params, "class", class_params,
                            layer=0, hidden_params={'dim':50, 'nlayers':2}, optim_params = {'lr':1e-5})
loss = elbo + classifier

# The Trainer object performs training, monitoring, and automating saving during the training process.
plots = {}
plots['reconstructions'] = {'preprocess': False, "n_points":15, 'plot_multihead':True, 'label':['class']}
plots['latent_space'] = {'preprocess':False, 'transformation':PCA, 'tasks':'class', 'balanced':True, 'n_points':3000, 'label':['class'], 'batch_size':512}
plots['confusion'] = {'classifiers':classifier}
trainer = SimpleTrainer(vae, dataset, loss, tasks=["class"], plots=plots, use_tensorboard="runs/")

train_options = {'epochs':100, 'save_epochs':20, 'results_folder':'tutorial_3',  'batch_size':64}
train_model(trainer, train_options, save_with={'transforms':dataset.classes})

## Conditioning

While adding classification losses to the ELBO may help to organize the latent space, it does not provide a manner of enforcing the target class of the generation. An efficient way to constrain the generative process $p(\mathbf{x}|\mathbf{z})$ is to _condition_ this distribution to the incoming label information, such that we rather model the distributions $p(\mathbf{x}|\mathbf{z, y})$ and/or $q(\mathbf{z}|\mathbf{x, y})$. Computationally, this can be done simply by concatenating the label information to the respective inputs of the decoder and/or the encoder. Conditioning the sole decoder may allow to have an inference process global to every class, while conditioning the encoder provides seperate latent spaces for every class.


In [None]:
import torch, vschaos
import vschaos.distributions as dist
import vschaos.vaes as vaes
from vschaos.data.data_generic import dataset_from_torchvision

from torchvision.transforms import Lambda, ToTensor
from vschaos.vaes import VanillaVAE

from vschaos.criterions import ELBO, Classification
from vschaos.monitor import PCA
from vschaos.train import SimpleTrainer, train_model

# import MNIST
transforms = [Lambda(lambda x: x / 255. + 0.01*torch.randn_like(x.float())), Flatten(-2)]
dataset = dataset_from_torchvision('MNIST', transforms=transforms)

# make VAE
input_params = {'dim':784, 'dist': dist.Normal}
class_params = {'dim':10, 'dist':dist.Categorical}
hidden_params = {'encoder':{'dim':800, 'nlayers':2, 'batch_norm':'batch'}, 
                 'decoder':{'dim':800, 'nlayers':2, 'batch_norm':'batch',
                            'label_params':{'class':class_params}}} # we condition by adding the label_params keyword
latent_params = {'dim':8, "dist":dist.Normal}
vae = vaes.VanillaVAE(input_params, latent_params, hidden_params=hidden_params)
print(vae.decoders)

optim_params = {'optimizer':'Adam', 'optimArgs':{'lr':1e-5}, 'scheduler':'ReduceLROnPlateau'}
vae.init_optimizer(optim_params)
elbo = ELBO(beta=1.0, warmup=20)

# The Trainer object performs training, monitoring, and automating saving during the training process.
plots = {}
plots['reconstructions'] = {'preprocess': False, "n_points":15, 'plot_multihead':True, 'label':['class']}
plots['latent_space'] = {'preprocess':False, 'transformation':PCA, 'tasks':'class', 'balanced':True, 'n_points':3000, 'label':['class'], 'batch_size':512}
trainer = SimpleTrainer(vae, dataset, elbo, tasks=["class"], plots=plots, use_tensorboard="runs/")

train_options = {'epochs':100, 'save_epochs':20, 'results_folder':'tutorial_3',  'batch_size':64}
train_model(trainer, train_options, save_with={'transforms':dataset.classes})

## Semi-supervision

While conditioning the encoding / decoding modules of the variational auto-encoder is a very efficient way to constrain the generation to a given class, it requires the entire availability of labels for a given dataset. 
In his article [_Semi-supervised Learning with
Deep Generative Models_](http://papers.nips.cc/paper/5352-semi-supervised-learning-with-deep-generative-models.pdf), Kingma & Rezende proposed an efficient way to perform _semi-supervised_ training with VAEs, using the label if available and inferring it if missing. This way, label information can be thought as _discrete latent variables_, modeled using multinomial distributions, that are sampled if the corresponding label is missing. In that case, the ELBO is calculated for every possible label and weighted average by the distribution $q(\mathbf{y|x})$, performing closed-form expectation. In the other case, the true label is used as an input for the decoder.



In [None]:
import torch, vschaos
import vschaos.distributions as dist
import vschaos.vaes as vaes
from vschaos.data.data_generic import dataset_from_torchvision

from torchvision.transforms import Lambda, ToTensor
from vschaos.vaes import VanillaVAE

from vschaos.criterions import SemiSupervisedELBO, Classification
from vschaos.monitor import PCA
from vschaos.train import SimpleTrainer, train_model

# import MNIST
transforms = [Lambda(lambda x: x / 255. + 0.01*torch.randn_like(x.float())), Flatten(-2)]
dataset = dataset_from_torchvision('MNIST', transforms=transforms)

# make VAE
input_params = {'dim':784, 'dist': dist.Normal}
class_params = {'dim':10, 'dist':dist.Categorical}
# linked encoders and decoders should be used here to enforce the information sharing between categorical and latent
hidden_params = {'dim':800, 'nlayers':2, 'batch_norm':'batch', 'linked':True}

# we perform semi-supervision by adding a latent subdivision at first layer (don't forget the nested list, otherwise
# the model will consider it as a second stochastic layer) and specifying the corresponding task
latent_params = [[{'dim':8, "dist":dist.Normal}, 
                  {'dim':class_params['dim'], 'dist':dist.Multinomial, 'task':'class'}]]

vae = vaes.VanillaVAE(input_params, latent_params, hidden_params=hidden_params)

optim_params = {'optimizer':'Adam', 'optimArgs':{'lr':1e-5}, 'scheduler':'ReduceLROnPlateau'}
vae.init_optimizer(optim_params)
elbo = SemiSupervisedELBO(beta=1.0, warmup=20)

# The Trainer object performs training, monitoring, and automating saving during the training process.
plots = {}
plots['reconstructions'] = {'preprocess': False, "n_points":15, 'plot_multihead':True, 'label':['class']}
plots['latent_space'] = {'preprocess':False, 'transformation':PCA, 'tasks':'class', 'balanced':True, 'n_points':3000, 'label':['class'], 'batch_size':512}

# the semi_supervision keyword has to be added, such that the trainer will alternate between the supervised
#    and semi supervised case. 
trainer = SimpleTrainer(vae, dataset, elbo, tasks=["class"], plots=plots, use_tensorboard="runs/",
                       semi_supervision=["class"], semi_supervision_dropout = 0.2)

train_options = {'epochs':100, 'save_epochs':20, 'results_folder':'tutorial_3',  'batch_size':64}
train_model(trainer, train_options, save_with={'transforms':dataset.classes})