Skip to content

Commit

Permalink
Fix minor typos in VAE notes
Browse files Browse the repository at this point in the history
  • Loading branch information
andrewk1 committed Nov 16, 2018
1 parent 9cdce6d commit 181f6a6
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions vae/index.md
Expand Up @@ -251,7 +251,7 @@ where $$\Sigma^{1/2}$$ is the Cholesky decomposition of $$\Sigma$$. For simplici
Amortized Variational Inference
==============

A noticable limitation of black-box variational inference is that **Step 1** executes an optimization subroutine that is computationally expensive. Recall that the goal of the **Step 1** is to find
A noticeable limitation of black-box variational inference is that **Step 1** executes an optimization subroutine that is computationally expensive. Recall that the goal of the **Step 1** is to find
{% math %}
\begin{align}
\lambda^* = \argmax_{\lambda\in \Lambda} \ELBO(\bx; \theta, \lambda).
Expand All @@ -277,7 +277,7 @@ and rewrite the optimization problem as
\max_{\phi } \sum_{\bx \in \D} \ELBO(\bx; \theta, \phi).
\end{align}
{% endmath %}
It is also worth noting that optimizing $$\phi$$ over the entire dataset as a *subroutine* everytime we sample a new mini-batch is clearly not reasonable. However, if we believe that $$f_\phi$$ is capable of quickly adapting to a close-enough approximation of $$\lambda^\ast$$ given the current choice of $$\theta$$, then we can interleave the optimization $$\phi$$ and $$\theta$$. The yields the following procedure, where for each mini-batch $$\M = \set{\bx^{(1)}, \ldots, \bx^{(m)}}$$, we perform the following two updates jointly
It is also worth noting that optimizing $$\phi$$ over the entire dataset as a *subroutine* every time we sample a new mini-batch is clearly not reasonable. However, if we believe that $$f_\phi$$ is capable of quickly adapting to a close-enough approximation of $$\lambda^\ast$$ given the current choice of $$\theta$$, then we can interleave the optimization $$\phi$$ and $$\theta$$. The yields the following procedure, where for each mini-batch $$\M = \set{\bx^{(1)}, \ldots, \bx^{(m)}}$$, we perform the following two updates jointly
{% math %}
\begin{align}
\phi &\gets \phi + \tilde{\nabla}_\phi \sum_{\bx \in \M} \ELBO(\bx; \theta, \phi) \\
Expand Down

0 comments on commit 181f6a6

Please sign in to comment.