vae fix

deepgenerativemodels · Nov 8, 2018 · ff34897 · ff34897
1 parent 58c6ad7
commit ff34897
Show file tree

Hide file tree

Showing 2 changed files with 7 additions and 8 deletions.
diff --git a/docs/vae/index.html b/docs/vae/index.html
@@ -157,7 +157,7 @@ <h1 id="learning-directed-latent-variable-models">Learning Directed Latent Varia
 \end{align}
 </script></div>
 
-<p>As we have seen previously, optimizing an empirical estimate of the KL divergence is equivalent to maximizing the marginal log-likelihood <script type="math/tex">\log p(\bz)</script> over <script type="math/tex">\D</script></p>
+<p>As we have seen previously, optimizing an empirical estimate of the KL divergence is equivalent to maximizing the marginal log-likelihood <script type="math/tex">\log p(\bx)</script> over <script type="math/tex">\D</script></p>
 <div class="mathblock"><script type="math/tex; mode=display">
 \begin{align}
 \max_{p \in \P_{\bx, \bz}} \sum_{\bx \in \D} \log p(\bx) = \sum_{\bx \in \D} \log\int p(\bx, \bz) \d \bz.
@@ -176,7 +176,7 @@ <h1 id="learning-directed-latent-variable-models">Learning Directed Latent Varia
 
 <p>Next, we introduce a variational family <script type="math/tex">\Q</script> of distributions that approximate the true, but intractable posterior <script type="math/tex">p(\bz \mid \bx)</script>. Further henceforth, we will assume a parameteric setting where any distribution in the model family <script type="math/tex">\P_{\bx, \bz}</script> is specified via a set of parameters <script type="math/tex">\theta \in \Theta</script> and distributions in the variational family <script type="math/tex">\Q</script> are specified via a set of parameters <script type="math/tex">\lambda \in \Lambda</script>.</p>
 
-<p>Given <script type="math/tex">\P_{\bx, \bz}</script> and <script type="math/tex">\Q</script>, we note that the following relationships hold true<sup id="fnref:2"><a href="#fn:2" class="footnote">1</a></sup> for any <script type="math/tex">\bx</script> and all variational distributions <script type="math/tex">q_\lambda(\bz) \in \Q</script></p>
+<p>Given <script type="math/tex">\P_{\bx, \bz}</script> and <script type="math/tex">\Q</script>, we note that the following relationships hold true<sup id="fnref:1"><a href="#fn:1" class="footnote">1</a></sup> for any <script type="math/tex">\bx</script> and all variational distributions <script type="math/tex">q_\lambda(\bz) \in \Q</script></p>
 
 <div class="mathblock"><script type="math/tex; mode=display">
 \begin{align}
@@ -348,8 +348,8 @@ <h1 id="amortized-variational-inference">Amortized Variational Inference</h1>
 <h1 id="footnotes">Footnotes</h1>
 <div class="footnotes">
   <ol>
-    <li id="fn:2">
-      <p>The first equality only holds if the support of <script type="math/tex">q</script> includes that of <script type="math/tex">p</script>. If not, it is an inequality. <a href="#fnref:2" class="reversefootnote">&#8617;</a></p>
+    <li id="fn:1">
+      <p>The first equality only holds if the support of <script type="math/tex">q</script> includes that of <script type="math/tex">p</script>. If not, it is an inequality. <a href="#fnref:1" class="reversefootnote">&#8617;</a></p>
     </li>
   </ol>
 </div>

diff --git a/vae/index.md b/vae/index.md
@@ -84,7 +84,7 @@ One way to measure how closely $$p(\bx, \bz)$$ fits the observed dataset $$\D$$
 \end{align}
 {% endmath %}
 
-As we have seen previously, optimizing an empirical estimate of the KL divergence is equivalent to maximizing the marginal log-likelihood $$\log p(\bz)$$ over $$\D$$
+As we have seen previously, optimizing an empirical estimate of the KL divergence is equivalent to maximizing the marginal log-likelihood $$\log p(\bx)$$ over $$\D$$
 {% math %}
 \begin{align}
 \max_{p \in \P_{\bx, \bz}} \sum_{\bx \in \D} \log p(\bx) = \sum_{\bx \in \D} \log\int p(\bx, \bz) \d \bz.
@@ -106,7 +106,7 @@ Rather than maximizing the log-likelihood directly, an alternate is to instead c
 Next, we introduce a variational family $$\Q$$ of distributions that approximate the true, but intractable posterior $$p(\bz \mid \bx)$$. Further henceforth, we will assume a parameteric setting where any distribution in the model family $$\P_{\bx, \bz}$$ is specified via a set of parameters $$\theta \in \Theta$$ and distributions in the variational family $$\Q$$ are specified via a set of parameters $$\lambda \in \Lambda$$. 
 
 
-Given $$\P_{\bx, \bz}$$ and $$\Q$$, we note that the following relationships hold true[^2] for any $$\bx$$ and all variational distributions $$q_\lambda(\bz) \in \Q$$
+Given $$\P_{\bx, \bz}$$ and $$\Q$$, we note that the following relationships hold true[^1] for any $$\bx$$ and all variational distributions $$q_\lambda(\bz) \in \Q$$
 
 {% math %}
 \begin{align}
@@ -289,5 +289,4 @@ rather than running BBVI's **Step 1** as a subroutine. By leveraging the learnab
 
 Footnotes
 ==============
-[^1]: Computing the marginal likelihood $$p(\bx)$$ is at least as difficult as as computing the posterior $$p(\bz \mid \bx)$$ since by definition $$p(\bz \mid \bx) = p(\bx, \bz) / p(\bx)$$.
-[^2]: The first equality only holds if the support of $$q$$ includes that of $$p$$. If not, it is an inequality.
+[^1]: The first equality only holds if the support of $$q$$ includes that of $$p$$. If not, it is an inequality.