remark YO

ctallec · Jun 28, 2018 · 376c6f7 · 376c6f7
1 parent 829975b
commit 376c6f7
Showing 1 changed file with 2 additions and 1 deletion.
diff --git a/docs/README.md b/docs/README.md
@@ -15,7 +15,8 @@ experiments on the importance of the recurrent network in the training process.
 *World Models* introduces a *model-based* approach to reinforcement learning. It revolves around a three part model, comprised of:
 
   1. A Variational Auto-Encoder (VAE, Kingma et al., 2014)[2], a generative model, who learns both an encoder and a decoder. The encoder's task is to compress the input images into a compact latent representation. The decoder's task is to recover the original image from the latent representation.
-  2. A Mixture-Density Recurrent Network (MDN-RNN, Graves, 2013)[3], trained to predict the latent encoding of the next frame given past latent encodings and actions. The mixture-density network outputs a gaussian mixture observational density at each time step, allowing for multi-modal model predictions.
+  2. A Mixture-Density Recurrent Network (MDN-RNN, Graves, 2013)[3], trained to predict the latent encoding of the next frame given past latent encodings and actions. The mixture-density network outputs a Gaussian mixture for predicting
+the distribution density of the next observation.
   3. A simple linear Controller (C). It takes as inputs both the latent encoding of the current frame and the hidden state of the MDN-RNN given past latents and actions and outputs an action. It is trained to maximize the cumulated reward using the Covariance-Matrix Adaptation Evolution-Strategy ([CMA-ES](http://www.cmap.polytechnique.fr/~nikolaus.hansen/cmaartic.pdf), Hansen, 2006)[4], a generic black box optimization algorithm.
 
 Below is a figure from the original paper explaining the architecture.