Skip to content

Commit

Permalink
Cholesky blog post
Browse files Browse the repository at this point in the history
  • Loading branch information
chiphogg committed Mar 31, 2015
1 parent a32188a commit 5af643a
Show file tree
Hide file tree
Showing 2 changed files with 226 additions and 0 deletions.
Binary file added blog/2015/03/30/cholesky/correlated_plot.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
226 changes: 226 additions & 0 deletions blog/2015/03/30/cholesky/index.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,226 @@
<!DOCTYPE HTML>
<html lang='en'>
<head>
<meta charset="utf-8">
<link rel="stylesheet" href="/style.css">
<link href='http://fonts.googleapis.com/css?family=Scada' rel='stylesheet' type='text/css'>
<link href='http://fonts.googleapis.com/css?family=Oranienbaum' rel='stylesheet' type='text/css'>

<script type="text/javascript" src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"></script>
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-47801434-1', 'chogg.name');
ga('send', 'pageview');
</script>
<meta content='nanoc 3.7.3' name='generator'>
<title>
Stumbling upon simplicity: the Cholesky decomposition
[Charles R. Hogg III]
</title>
</head>
<body>
<div id='title'>
<a title="Main page" href="/">Charles R. Hogg III</a>
<div id='nav'>
<ul>
<li>
<a title="About the author, the website, etc." href="/about/">About</a>
</li>
<li>
<a title="Personal blog" href="/blog/">Blog</a>
</li>
<li>
<a title="Things I'm working on" href="/projects/">Projects</a>
</li>
<li>
<a title="Ways to contact me online" href="/contact/">Contact</a>
</li>
</ul>
</div>
</div>
<hr>
<div id='main'>
<div id='post'>
<h1>
Stumbling upon simplicity: the Cholesky decomposition
</h1>
<aside class='posted_at'>
Posted at 2015-03-30 23:06
</aside>
<aside class='tldr'>
<b>tl;dr:</b>
Sometimes things only seem complicated because we've never taken the time to play with them. Once I stopped treating the Cholesky decomposition as a black box, I found that it was surprisingly easy to compute.
</aside>
<article>
<p>The <a href="http://en.wikipedia.org/wiki/Cholesky_decomposition">Cholesky decomposition</a> is a key concept in matrix algebra. It's a kind of "matrix square root": a lower-triangular matrix <span class="math">\(L\)</span> which gives back the original matrix <span class="math">\(K\)</span> when multiplied with its transpose: <span class="math">\[L L^\intercal = K.\]</span></p>
<p>It's very useful for working with high-dimensional Gaussian distributions. <span class="math">\(L\)</span> takes independent one-dimensional Gaussian samples (i.e., the simplest possible case), and turns them into samples from the multidimensional distribution whose covariance matrix is <span class="math">\(K\)</span>.</p>
<p>For example, take the highly-correlated matrix <span class="math">\[K = \left[\begin{array}{cc}1 &amp; 0.8 \\ 0.8 &amp; 1\end{array}\right].\]</span> Its Cholesky decomposition is <span class="math">\[L = \left[\begin{array}{cc}1 &amp; 0 \\ 0.8 &amp; 0.6\end{array}\right],\]</span> as we can easily verify: <span class="math">\[L L^\intercal = \left[\begin{array}{cc}1 &amp; 0 \\ 0.8 &amp; 0.6\end{array}\right] \left[\begin{array}{cc}1 &amp; 0.8 \\ 0 &amp; 0.6\end{array}\right] = \left[\begin{array}{cc}1 &amp; 0.8 \\ 0.8 &amp; 1\end{array}\right] = K.\]</span> If we now use <span class="math">\(L\)</span> to multiply <em>independent</em> normal samples, it turns them into <em>correlated</em> samples with a correlation of 0.8.</p>
<pre class="sourceCode r"><code class="sourceCode r">num_points &lt;-<span class="st"> </span><span class="dv">200</span>
independent_random &lt;-<span class="st"> </span><span class="kw">matrix</span>(<span class="dt">nrow=</span><span class="dv">2</span>, <span class="kw">rnorm</span>(<span class="dt">n=</span><span class="dv">2</span> *<span class="st"> </span>num_points))
correlated_random &lt;-<span class="st"> </span>L %*%<span class="st"> </span>independent_random</code></pre>
<div class="figure">
<img src="correlated_plot.png">
</div>
<p>What's missing in the above is any notion of how to <em>calculate</em> <span class="math">\(L\)</span>, given <span class="math">\(K\)</span>. If we have a good software library, we can just use it, and treat this calculation as a black box. But if we want to dig deeper, we're in luck: it turns out we already have enough information to compute it from scratch.</p>
<p>Everything here is elementary, and I'm sure it's quite well known. I'm only blogging it to share my utter delight when I stumbled across its underlying simplicity.</p>
<h2 id="adding-one-more-row">Adding "one more row"</h2>
<p>In the appendix for <a href="https://github.com/chiphogg/paper_gaussian_oscillators">my animation paper</a>, I asked the question: given <span class="math">\(n\)</span> draws from a Gaussian with a particular covariance function, can we efficiently compute the <span class="math">\((n+1)\)</span>st?</p>
<p>If we have the first <span class="math">\(n\)</span> draws, that means we have their covariance matrix <span class="math">\(K_n\)</span> and its Cholesky decomposition <span class="math">\(L_n\)</span>, so that <span class="math">\(L_n L_n^\intercal = K_n\)</span>. We also have the covariance of every old point with the new point, a vector we'll call <span class="math">\(k_n\)</span>, as well as the new point's variance <span class="math">\(\sigma_{n + 1}^2\)</span>. Together, these form the new covariance matrix<a href="#fn1" class="footnoteRef" id="fnref1"><sup>1</sup></a>, <span class="math">\[ K_{n + 1} =
\left[\begin{array}{cc}
K_n &amp; k_n \\
k_n^\intercal &amp; \sigma_{n + 1}^2
\end{array}\right]
\]</span></p>
<p>What we <em>need</em> is the new row that makes <span class="math">\(L_n\)</span> into <span class="math">\(L_{n + 1}\)</span>, where <span class="math">\(L_{n + 1} L_{n + 1}^\intercal = K_{n + 1}\)</span>. We can break this row into two pieces: a vector <span class="math">\(\ell_n^\intercal\)</span> (corresponding to contributions from all the old points), and a scalar <span class="math">\(\gamma_{n + 1}\)</span> (corresponding to the new point). This gives us a block matrix equation: <span class="math">\[
\left[\begin{array}{cc}
L_n &amp; 0 \\
\ell_n^\intercal &amp; \gamma_{n + 1}
\end{array}\right]
\left[\begin{array}{cc}
L_n^\intercal &amp; \ell_n \\
0 &amp; \gamma_{n + 1}
\end{array}\right]
= \left[\begin{array}{cc}
K_n &amp; k_n \\
k_n^\intercal &amp; \sigma_{n + 1}^2
\end{array}\right]
\]</span> What interesting equations fall out of this?</p>
<ol style="list-style-type: decimal">
<li><span class="math">\(L_n L_n^\intercal = K_n\)</span></li>
</ol>
<ul>
<li>Well, we knew that already.</li>
</ul>
<ol start="2" style="list-style-type: decimal">
<li><span class="math">\(L_n \ell_n = k_n\)</span></li>
</ol>
<ul>
<li>
<p>Now, this looks promising! <span class="math">\(L_n\)</span> and <span class="math">\(k_n\)</span> are known, so we can solve for <span class="math">\(\ell_n\)</span>.</p>
<p>It gets better: remember that <span class="math">\(L_n\)</span> is lower-triangular, so the equation looks something like this: <span class="math">\[
\left[\begin{array}{cccc}
L_{11} &amp; 0 &amp; 0 &amp; \cdots \\
L_{21} &amp; L_{22} &amp; 0 &amp; \cdots \\
L_{31} &amp; L_{32} &amp; L_{33} &amp; \cdots \\
\vdots &amp; \vdots &amp; \vdots &amp; \ddots \\
\end{array}\right]
\left[\begin{array}{c}
\ell_1 \\
\ell_2 \\
\ell_3 \\
\vdots
\end{array}\right]
= \left[\begin{array}{c}
k_1 \\
k_2 \\
k_3 \\
\vdots
\end{array}\right]
\]</span> (Everything is known except the <span class="math">\(\ell\)</span>s.)</p>
<p>In the first row, <span class="math">\(\ell_1\)</span> is the only unknown; we have <span class="math">\(\ell_1 = k_1 / L_{11}\)</span>. The second row uses only <span class="math">\(\ell_1\)</span> and <span class="math">\(\ell_2\)</span>, but we already solved for <span class="math">\(\ell_1\)</span>: again, only one unknown, which is easy to solve.</p>
<p>Going down the line, we can solve for every element of <span class="math">\(\ell\)</span>, without even doing any matrix manipulations.</p>
</li>
</ul>
<ol start="3" style="list-style-type: decimal">
<li><span class="math">\(\ell_n^\intercal \ell_n + \gamma_{n + 1}^2 = \sigma_{n + 1} ^ 2\)</span></li>
</ol>
<ul>
<li>Since we already figured out <span class="math">\(\ell_n\)</span> above, and we know <span class="math">\(\sigma_{n + 1}\)</span>, this equation is just as easy as the previous ones.</li>
</ul>
<p>So: yes, we <em>can</em> compute the next row of the Cholesky decomposition if we know all the previous rows.</p>
<h2 id="this-smells-like-induction">This smells like induction</h2>
<p>We now know how to get <span class="math">\(L_{n + 1}\)</span> if we have <span class="math">\(L_n\)</span>, but how do we get <span class="math">\(L_n\)</span> in the first place? One way is to call a library function -- always a good default strategy. This works pretty well, until <a href="https://github.com/jstat/jstat/blob/b3a72f52917403e948f628dfebd645f8cb925c52/src/linearalgebra.js#L284-L286">it doesn't</a>.</p>
<p>Another approach would be to start with <span class="math">\(L_{n - 1}\)</span>, and use what we just figured out to add the final row. Of course, then we'd <em>need</em> <span class="math">\(L_{n - 1}\)</span>, but we could get it from <span class="math">\(L_{n - 2}\)</span>...</p>
<p>At this point, I'm reminded of <a href="http://en.wikipedia.org/wiki/Mathematical_induction">mathematical induction</a>: a method to generate "valid for any positive integer" proofs based on two ingredients. One ingredient is a method for getting from step <span class="math">\(n\)</span> to step <span class="math">\((n + 1)\)</span> for any <span class="math">\(n\)</span>; we have that already. The other ingredient is a <em>base case</em>: an answer for the smallest possible <span class="math">\(n\)</span> which <em>doesn't</em> depend on any earlier stages.</p>
<p>In our case, the base case turns out to be trivial. The simplest matrix which could have a Cholesky decomposition is a <span class="math">\((1 \times 1)\)</span> matrix; let's say <span class="math">\(K = \left[ k \right]\)</span> for some <span class="math">\(k &gt; 0\)</span>. Then <span class="math">\(L = \left[ \sqrt{k} \right]\)</span> -- what could be easier!</p>
<p>We can put this all together to compute the elements of <span class="math">\(L\)</span> one at a time. Each new element uses <span class="math">\(K\)</span> and the elements of <span class="math">\(L\)</span> we've <em>already</em> computed. Let's update our notation for consistency: we'll replace the <span class="math">\(\ell\)</span>s and <span class="math">\(k\)</span>s with the corresponding indices in <span class="math">\(L\)</span> and <span class="math">\(K\)</span>. The matrix equation for a typical intermediate stage looks like this: <span class="math">\[
\left[\begin{array}{cc}
L_{11} &amp; 0 \\
L_{21} &amp; L_{22}
\end{array}\right]
\left[\begin{array}{c} L_{31} \\ L_{32} \end{array}\right]
= \left[ \begin{array}{c} K_{31} \\ K_{32} \end{array} \right]
.\]</span> This gives us the "pre-diagonal" elements of the third row, <span class="math">\(L_{31}\)</span> and <span class="math">\(L_{32}\)</span>. We'll then get <span class="math">\(L_{33}\)</span> in terms of <span class="math">\(K_{33}\)</span> and these same pre-diagonal elements, and this gives us everything we'll need to compute the <em>fourth</em> row. So it goes, until we have the whole matrix.</p>
<p>I started out trying to extend existing Cholesky matrices, and I got the ability to compute <em>any</em> Cholesky matrix thrown in for free.</p>
<h2 id="testing-it-out">Testing it out</h2>
<p>Let's turn these ideas into actual code.</p>
<pre class="sourceCode r"><code class="sourceCode r"><span class="co"># This doesn't check that K actually *has* a Cholesky decomposition (not every</span>
<span class="co"># matrix does). Give it some garbage, and it will happily return a nonsense</span>
<span class="co"># answer (if it doesn't crash first!).</span>
<span class="co">#</span>
<span class="co"># That's OK; the point is to see if it gives the right answer for suitable</span>
<span class="co"># matrices.</span>
<span class="co">#</span>
<span class="co"># For this reason (and many others!), you shouldn't be using this algorithm in</span>
<span class="co"># the real world: like the blog post it's part of, it's only good for improving</span>
<span class="co"># your understanding of the concept.</span>
MyCholesky &lt;-<span class="st"> </span>function(K) {
n &lt;-<span class="st"> </span><span class="kw">nrow</span>(K)

<span class="co"># Start out with a matrix which is all zeroes, except for the upper-left</span>
<span class="co"># element. (We know that element is the square root of the corresponding</span>
<span class="co"># element in K, since that's the base case.)</span>
L &lt;-<span class="st"> </span><span class="kw">matrix</span>(<span class="dv">0</span>, <span class="dt">nrow=</span>n, <span class="dt">ncol=</span>n)
L[<span class="dv">1</span>, <span class="dv">1</span>] &lt;-<span class="st"> </span><span class="kw">sqrt</span>(K[<span class="dv">1</span>, <span class="dv">1</span>])
if (n ==<span class="st"> </span><span class="dv">1</span>) { <span class="kw">return</span>(L) }

<span class="co"># Compute each row's values.</span>
for (row in <span class="dv">2</span>:n) {
<span class="co"># This loop computes all the elements *before* the diagonal (which</span>
<span class="co"># corresponds to step 2 above).</span>
for (col in <span class="dv">1</span>:(row -<span class="st"> </span><span class="dv">1</span>)) {
<span class="co"># sum_to_subtract is the contribution of the elements we've previously</span>
<span class="co"># computed in this new row.</span>
sum_to_subtract &lt;-<span class="st"> </span><span class="dv">0</span>
if (col &gt;<span class="st"> </span><span class="dv">1</span>) {
i &lt;-<span class="st"> </span><span class="dv">1</span>:(col -<span class="st"> </span><span class="dv">1</span>)
sum_to_subtract &lt;-<span class="st"> </span>L[col, i] %*%<span class="st"> </span>L[row, i]
}
L[row, col] &lt;-<span class="st"> </span>(K[row, col] -<span class="st"> </span>sum_to_subtract) /<span class="st"> </span>L[col, col]
}
<span class="co"># Now compute the element on the main diagonal (corresponding to step 3</span>
<span class="co"># above).</span>
i &lt;-<span class="st"> </span><span class="dv">1</span>:(row -<span class="st"> </span><span class="dv">1</span>)
L[row, row] &lt;-<span class="st"> </span><span class="kw">sqrt</span>(K[row, row] -<span class="st"> </span>L[row, i] %*%<span class="st"> </span>L[row, i])
}
<span class="kw">return</span> (L)
}</code></pre>
<p>To test this, I'll construct a random <span class="math">\((20 \times 20)\)</span> covariance matrix. I'll compute its Cholesky decomposition twice -- once with my method, and once with R's built-in <code>chol()</code> function -- and see how close the two answers are.</p>
<pre class="sourceCode r"><code class="sourceCode r"><span class="co"># Generate some points.</span>
x &lt;-<span class="st"> </span><span class="kw">sort</span>(<span class="kw">rnorm</span>(<span class="dt">n=</span>N))

<span class="co"># Construct a matrix from a covariance function which I know is valid.</span>
<span class="co"># (We'll add some noise on the diagonal to help it converge).</span>
K &lt;-<span class="st"> </span><span class="kw">outer</span>(x, x,
function(a, b) {
<span class="kw">exp</span>(-(a -<span class="st"> </span>b)^<span class="dv">2</span>) +<span class="st"> </span><span class="kw">ifelse</span>(a==b, <span class="fl">0.01</span>, <span class="fl">0.0</span>)
})

<span class="co"># What are the biggest and smallest differences between my method,</span>
<span class="co"># and R's built-in method I know is correct?</span>
<span class="kw">range</span>(<span class="kw">MyCholesky</span>(K) -<span class="st"> </span><span class="kw">t</span>(<span class="kw">chol</span>(K)))</code></pre>
<pre><code>## [1] -2.193e-15 2.665e-15</code></pre>
<p>So the biggest difference was smaller than <span class="math">\(10^{-14}\)</span>. Looks pretty good to me!</p>
<h2 id="conclusions">Conclusions</h2>
<p>Don't be intimidated by theorems with <a href="https://biostat-lists.wustl.edu/sympa/arc/s-news/2000-10/msg00172.html">hard to pronounce</a> names. They often contain extremely simple ideas.</p>
<div class="footnotes">
<hr>
<ol>
<li id="fn1"><p>Note that this is a <a href="http://en.wikipedia.org/wiki/Block_matrix">block matrix</a>: one whose elements stand for other matrices. Specifically, <span class="math">\(K_n\)</span> is an <span class="math">\((n \times n)\)</span> matrix; <span class="math">\(k_n\)</span> is <span class="math">\((n \times 1)\)</span> (i.e., a vector), and <span class="math">\(\sigma_{n + 1}^2\)</span> is <span class="math">\((1 \times 1)\)</span> (i.e., a scalar).<a href="#fnref1"></a></p></li>
</ol>
</div>

</article>
</div>

</div>
<div id='source'>
<a href='https://github.com/chiphogg/chogg_name/tree/master/content/blog/cholesky.Rmd'>
Page source on GitHub
</a>
</div>
</body>
</html>

0 comments on commit 5af643a

Please sign in to comment.