Skip to content

Commit

Permalink
remark YO
Browse files Browse the repository at this point in the history
  • Loading branch information
leonardblier committed Jun 28, 2018
1 parent 829975b commit 376c6f7
Showing 1 changed file with 2 additions and 1 deletion.
3 changes: 2 additions & 1 deletion docs/README.md
Expand Up @@ -15,7 +15,8 @@ experiments on the importance of the recurrent network in the training process.
*World Models* introduces a *model-based* approach to reinforcement learning. It revolves around a three part model, comprised of:

1. A Variational Auto-Encoder (VAE, Kingma et al., 2014)[2], a generative model, who learns both an encoder and a decoder. The encoder's task is to compress the input images into a compact latent representation. The decoder's task is to recover the original image from the latent representation.
2. A Mixture-Density Recurrent Network (MDN-RNN, Graves, 2013)[3], trained to predict the latent encoding of the next frame given past latent encodings and actions. The mixture-density network outputs a gaussian mixture observational density at each time step, allowing for multi-modal model predictions.
2. A Mixture-Density Recurrent Network (MDN-RNN, Graves, 2013)[3], trained to predict the latent encoding of the next frame given past latent encodings and actions. The mixture-density network outputs a Gaussian mixture for predicting
the distribution density of the next observation.
3. A simple linear Controller (C). It takes as inputs both the latent encoding of the current frame and the hidden state of the MDN-RNN given past latents and actions and outputs an action. It is trained to maximize the cumulated reward using the Covariance-Matrix Adaptation Evolution-Strategy ([CMA-ES](http://www.cmap.polytechnique.fr/~nikolaus.hansen/cmaartic.pdf), Hansen, 2006)[4], a generic black box optimization algorithm.

Below is a figure from the original paper explaining the architecture.
Expand Down

0 comments on commit 376c6f7

Please sign in to comment.