-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
2 changed files
with
226 additions
and
0 deletions.
There are no files selected for viewing
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,226 @@ | ||
<!DOCTYPE HTML> | ||
<html lang='en'> | ||
<head> | ||
<meta charset="utf-8"> | ||
<link rel="stylesheet" href="/style.css"> | ||
<link href='http://fonts.googleapis.com/css?family=Scada' rel='stylesheet' type='text/css'> | ||
<link href='http://fonts.googleapis.com/css?family=Oranienbaum' rel='stylesheet' type='text/css'> | ||
|
||
<script type="text/javascript" src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"></script> | ||
<script> | ||
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ | ||
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), | ||
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) | ||
})(window,document,'script','//www.google-analytics.com/analytics.js','ga'); | ||
ga('create', 'UA-47801434-1', 'chogg.name'); | ||
ga('send', 'pageview'); | ||
</script> | ||
<meta content='nanoc 3.7.3' name='generator'> | ||
<title> | ||
Stumbling upon simplicity: the Cholesky decomposition | ||
[Charles R. Hogg III] | ||
</title> | ||
</head> | ||
<body> | ||
<div id='title'> | ||
<a title="Main page" href="/">Charles R. Hogg III</a> | ||
<div id='nav'> | ||
<ul> | ||
<li> | ||
<a title="About the author, the website, etc." href="/about/">About</a> | ||
</li> | ||
<li> | ||
<a title="Personal blog" href="/blog/">Blog</a> | ||
</li> | ||
<li> | ||
<a title="Things I'm working on" href="/projects/">Projects</a> | ||
</li> | ||
<li> | ||
<a title="Ways to contact me online" href="/contact/">Contact</a> | ||
</li> | ||
</ul> | ||
</div> | ||
</div> | ||
<hr> | ||
<div id='main'> | ||
<div id='post'> | ||
<h1> | ||
Stumbling upon simplicity: the Cholesky decomposition | ||
</h1> | ||
<aside class='posted_at'> | ||
Posted at 2015-03-30 23:06 | ||
</aside> | ||
<aside class='tldr'> | ||
<b>tl;dr:</b> | ||
Sometimes things only seem complicated because we've never taken the time to play with them. Once I stopped treating the Cholesky decomposition as a black box, I found that it was surprisingly easy to compute. | ||
</aside> | ||
<article> | ||
<p>The <a href="http://en.wikipedia.org/wiki/Cholesky_decomposition">Cholesky decomposition</a> is a key concept in matrix algebra. It's a kind of "matrix square root": a lower-triangular matrix <span class="math">\(L\)</span> which gives back the original matrix <span class="math">\(K\)</span> when multiplied with its transpose: <span class="math">\[L L^\intercal = K.\]</span></p> | ||
<p>It's very useful for working with high-dimensional Gaussian distributions. <span class="math">\(L\)</span> takes independent one-dimensional Gaussian samples (i.e., the simplest possible case), and turns them into samples from the multidimensional distribution whose covariance matrix is <span class="math">\(K\)</span>.</p> | ||
<p>For example, take the highly-correlated matrix <span class="math">\[K = \left[\begin{array}{cc}1 & 0.8 \\ 0.8 & 1\end{array}\right].\]</span> Its Cholesky decomposition is <span class="math">\[L = \left[\begin{array}{cc}1 & 0 \\ 0.8 & 0.6\end{array}\right],\]</span> as we can easily verify: <span class="math">\[L L^\intercal = \left[\begin{array}{cc}1 & 0 \\ 0.8 & 0.6\end{array}\right] \left[\begin{array}{cc}1 & 0.8 \\ 0 & 0.6\end{array}\right] = \left[\begin{array}{cc}1 & 0.8 \\ 0.8 & 1\end{array}\right] = K.\]</span> If we now use <span class="math">\(L\)</span> to multiply <em>independent</em> normal samples, it turns them into <em>correlated</em> samples with a correlation of 0.8.</p> | ||
<pre class="sourceCode r"><code class="sourceCode r">num_points <-<span class="st"> </span><span class="dv">200</span> | ||
independent_random <-<span class="st"> </span><span class="kw">matrix</span>(<span class="dt">nrow=</span><span class="dv">2</span>, <span class="kw">rnorm</span>(<span class="dt">n=</span><span class="dv">2</span> *<span class="st"> </span>num_points)) | ||
correlated_random <-<span class="st"> </span>L %*%<span class="st"> </span>independent_random</code></pre> | ||
<div class="figure"> | ||
<img src="correlated_plot.png"> | ||
</div> | ||
<p>What's missing in the above is any notion of how to <em>calculate</em> <span class="math">\(L\)</span>, given <span class="math">\(K\)</span>. If we have a good software library, we can just use it, and treat this calculation as a black box. But if we want to dig deeper, we're in luck: it turns out we already have enough information to compute it from scratch.</p> | ||
<p>Everything here is elementary, and I'm sure it's quite well known. I'm only blogging it to share my utter delight when I stumbled across its underlying simplicity.</p> | ||
<h2 id="adding-one-more-row">Adding "one more row"</h2> | ||
<p>In the appendix for <a href="https://github.com/chiphogg/paper_gaussian_oscillators">my animation paper</a>, I asked the question: given <span class="math">\(n\)</span> draws from a Gaussian with a particular covariance function, can we efficiently compute the <span class="math">\((n+1)\)</span>st?</p> | ||
<p>If we have the first <span class="math">\(n\)</span> draws, that means we have their covariance matrix <span class="math">\(K_n\)</span> and its Cholesky decomposition <span class="math">\(L_n\)</span>, so that <span class="math">\(L_n L_n^\intercal = K_n\)</span>. We also have the covariance of every old point with the new point, a vector we'll call <span class="math">\(k_n\)</span>, as well as the new point's variance <span class="math">\(\sigma_{n + 1}^2\)</span>. Together, these form the new covariance matrix<a href="#fn1" class="footnoteRef" id="fnref1"><sup>1</sup></a>, <span class="math">\[ K_{n + 1} = | ||
\left[\begin{array}{cc} | ||
K_n & k_n \\ | ||
k_n^\intercal & \sigma_{n + 1}^2 | ||
\end{array}\right] | ||
\]</span></p> | ||
<p>What we <em>need</em> is the new row that makes <span class="math">\(L_n\)</span> into <span class="math">\(L_{n + 1}\)</span>, where <span class="math">\(L_{n + 1} L_{n + 1}^\intercal = K_{n + 1}\)</span>. We can break this row into two pieces: a vector <span class="math">\(\ell_n^\intercal\)</span> (corresponding to contributions from all the old points), and a scalar <span class="math">\(\gamma_{n + 1}\)</span> (corresponding to the new point). This gives us a block matrix equation: <span class="math">\[ | ||
\left[\begin{array}{cc} | ||
L_n & 0 \\ | ||
\ell_n^\intercal & \gamma_{n + 1} | ||
\end{array}\right] | ||
\left[\begin{array}{cc} | ||
L_n^\intercal & \ell_n \\ | ||
0 & \gamma_{n + 1} | ||
\end{array}\right] | ||
= \left[\begin{array}{cc} | ||
K_n & k_n \\ | ||
k_n^\intercal & \sigma_{n + 1}^2 | ||
\end{array}\right] | ||
\]</span> What interesting equations fall out of this?</p> | ||
<ol style="list-style-type: decimal"> | ||
<li><span class="math">\(L_n L_n^\intercal = K_n\)</span></li> | ||
</ol> | ||
<ul> | ||
<li>Well, we knew that already.</li> | ||
</ul> | ||
<ol start="2" style="list-style-type: decimal"> | ||
<li><span class="math">\(L_n \ell_n = k_n\)</span></li> | ||
</ol> | ||
<ul> | ||
<li> | ||
<p>Now, this looks promising! <span class="math">\(L_n\)</span> and <span class="math">\(k_n\)</span> are known, so we can solve for <span class="math">\(\ell_n\)</span>.</p> | ||
<p>It gets better: remember that <span class="math">\(L_n\)</span> is lower-triangular, so the equation looks something like this: <span class="math">\[ | ||
\left[\begin{array}{cccc} | ||
L_{11} & 0 & 0 & \cdots \\ | ||
L_{21} & L_{22} & 0 & \cdots \\ | ||
L_{31} & L_{32} & L_{33} & \cdots \\ | ||
\vdots & \vdots & \vdots & \ddots \\ | ||
\end{array}\right] | ||
\left[\begin{array}{c} | ||
\ell_1 \\ | ||
\ell_2 \\ | ||
\ell_3 \\ | ||
\vdots | ||
\end{array}\right] | ||
= \left[\begin{array}{c} | ||
k_1 \\ | ||
k_2 \\ | ||
k_3 \\ | ||
\vdots | ||
\end{array}\right] | ||
\]</span> (Everything is known except the <span class="math">\(\ell\)</span>s.)</p> | ||
<p>In the first row, <span class="math">\(\ell_1\)</span> is the only unknown; we have <span class="math">\(\ell_1 = k_1 / L_{11}\)</span>. The second row uses only <span class="math">\(\ell_1\)</span> and <span class="math">\(\ell_2\)</span>, but we already solved for <span class="math">\(\ell_1\)</span>: again, only one unknown, which is easy to solve.</p> | ||
<p>Going down the line, we can solve for every element of <span class="math">\(\ell\)</span>, without even doing any matrix manipulations.</p> | ||
</li> | ||
</ul> | ||
<ol start="3" style="list-style-type: decimal"> | ||
<li><span class="math">\(\ell_n^\intercal \ell_n + \gamma_{n + 1}^2 = \sigma_{n + 1} ^ 2\)</span></li> | ||
</ol> | ||
<ul> | ||
<li>Since we already figured out <span class="math">\(\ell_n\)</span> above, and we know <span class="math">\(\sigma_{n + 1}\)</span>, this equation is just as easy as the previous ones.</li> | ||
</ul> | ||
<p>So: yes, we <em>can</em> compute the next row of the Cholesky decomposition if we know all the previous rows.</p> | ||
<h2 id="this-smells-like-induction">This smells like induction</h2> | ||
<p>We now know how to get <span class="math">\(L_{n + 1}\)</span> if we have <span class="math">\(L_n\)</span>, but how do we get <span class="math">\(L_n\)</span> in the first place? One way is to call a library function -- always a good default strategy. This works pretty well, until <a href="https://github.com/jstat/jstat/blob/b3a72f52917403e948f628dfebd645f8cb925c52/src/linearalgebra.js#L284-L286">it doesn't</a>.</p> | ||
<p>Another approach would be to start with <span class="math">\(L_{n - 1}\)</span>, and use what we just figured out to add the final row. Of course, then we'd <em>need</em> <span class="math">\(L_{n - 1}\)</span>, but we could get it from <span class="math">\(L_{n - 2}\)</span>...</p> | ||
<p>At this point, I'm reminded of <a href="http://en.wikipedia.org/wiki/Mathematical_induction">mathematical induction</a>: a method to generate "valid for any positive integer" proofs based on two ingredients. One ingredient is a method for getting from step <span class="math">\(n\)</span> to step <span class="math">\((n + 1)\)</span> for any <span class="math">\(n\)</span>; we have that already. The other ingredient is a <em>base case</em>: an answer for the smallest possible <span class="math">\(n\)</span> which <em>doesn't</em> depend on any earlier stages.</p> | ||
<p>In our case, the base case turns out to be trivial. The simplest matrix which could have a Cholesky decomposition is a <span class="math">\((1 \times 1)\)</span> matrix; let's say <span class="math">\(K = \left[ k \right]\)</span> for some <span class="math">\(k > 0\)</span>. Then <span class="math">\(L = \left[ \sqrt{k} \right]\)</span> -- what could be easier!</p> | ||
<p>We can put this all together to compute the elements of <span class="math">\(L\)</span> one at a time. Each new element uses <span class="math">\(K\)</span> and the elements of <span class="math">\(L\)</span> we've <em>already</em> computed. Let's update our notation for consistency: we'll replace the <span class="math">\(\ell\)</span>s and <span class="math">\(k\)</span>s with the corresponding indices in <span class="math">\(L\)</span> and <span class="math">\(K\)</span>. The matrix equation for a typical intermediate stage looks like this: <span class="math">\[ | ||
\left[\begin{array}{cc} | ||
L_{11} & 0 \\ | ||
L_{21} & L_{22} | ||
\end{array}\right] | ||
\left[\begin{array}{c} L_{31} \\ L_{32} \end{array}\right] | ||
= \left[ \begin{array}{c} K_{31} \\ K_{32} \end{array} \right] | ||
.\]</span> This gives us the "pre-diagonal" elements of the third row, <span class="math">\(L_{31}\)</span> and <span class="math">\(L_{32}\)</span>. We'll then get <span class="math">\(L_{33}\)</span> in terms of <span class="math">\(K_{33}\)</span> and these same pre-diagonal elements, and this gives us everything we'll need to compute the <em>fourth</em> row. So it goes, until we have the whole matrix.</p> | ||
<p>I started out trying to extend existing Cholesky matrices, and I got the ability to compute <em>any</em> Cholesky matrix thrown in for free.</p> | ||
<h2 id="testing-it-out">Testing it out</h2> | ||
<p>Let's turn these ideas into actual code.</p> | ||
<pre class="sourceCode r"><code class="sourceCode r"><span class="co"># This doesn't check that K actually *has* a Cholesky decomposition (not every</span> | ||
<span class="co"># matrix does). Give it some garbage, and it will happily return a nonsense</span> | ||
<span class="co"># answer (if it doesn't crash first!).</span> | ||
<span class="co">#</span> | ||
<span class="co"># That's OK; the point is to see if it gives the right answer for suitable</span> | ||
<span class="co"># matrices.</span> | ||
<span class="co">#</span> | ||
<span class="co"># For this reason (and many others!), you shouldn't be using this algorithm in</span> | ||
<span class="co"># the real world: like the blog post it's part of, it's only good for improving</span> | ||
<span class="co"># your understanding of the concept.</span> | ||
MyCholesky <-<span class="st"> </span>function(K) { | ||
n <-<span class="st"> </span><span class="kw">nrow</span>(K) | ||
|
||
<span class="co"># Start out with a matrix which is all zeroes, except for the upper-left</span> | ||
<span class="co"># element. (We know that element is the square root of the corresponding</span> | ||
<span class="co"># element in K, since that's the base case.)</span> | ||
L <-<span class="st"> </span><span class="kw">matrix</span>(<span class="dv">0</span>, <span class="dt">nrow=</span>n, <span class="dt">ncol=</span>n) | ||
L[<span class="dv">1</span>, <span class="dv">1</span>] <-<span class="st"> </span><span class="kw">sqrt</span>(K[<span class="dv">1</span>, <span class="dv">1</span>]) | ||
if (n ==<span class="st"> </span><span class="dv">1</span>) { <span class="kw">return</span>(L) } | ||
|
||
<span class="co"># Compute each row's values.</span> | ||
for (row in <span class="dv">2</span>:n) { | ||
<span class="co"># This loop computes all the elements *before* the diagonal (which</span> | ||
<span class="co"># corresponds to step 2 above).</span> | ||
for (col in <span class="dv">1</span>:(row -<span class="st"> </span><span class="dv">1</span>)) { | ||
<span class="co"># sum_to_subtract is the contribution of the elements we've previously</span> | ||
<span class="co"># computed in this new row.</span> | ||
sum_to_subtract <-<span class="st"> </span><span class="dv">0</span> | ||
if (col ><span class="st"> </span><span class="dv">1</span>) { | ||
i <-<span class="st"> </span><span class="dv">1</span>:(col -<span class="st"> </span><span class="dv">1</span>) | ||
sum_to_subtract <-<span class="st"> </span>L[col, i] %*%<span class="st"> </span>L[row, i] | ||
} | ||
L[row, col] <-<span class="st"> </span>(K[row, col] -<span class="st"> </span>sum_to_subtract) /<span class="st"> </span>L[col, col] | ||
} | ||
<span class="co"># Now compute the element on the main diagonal (corresponding to step 3</span> | ||
<span class="co"># above).</span> | ||
i <-<span class="st"> </span><span class="dv">1</span>:(row -<span class="st"> </span><span class="dv">1</span>) | ||
L[row, row] <-<span class="st"> </span><span class="kw">sqrt</span>(K[row, row] -<span class="st"> </span>L[row, i] %*%<span class="st"> </span>L[row, i]) | ||
} | ||
<span class="kw">return</span> (L) | ||
}</code></pre> | ||
<p>To test this, I'll construct a random <span class="math">\((20 \times 20)\)</span> covariance matrix. I'll compute its Cholesky decomposition twice -- once with my method, and once with R's built-in <code>chol()</code> function -- and see how close the two answers are.</p> | ||
<pre class="sourceCode r"><code class="sourceCode r"><span class="co"># Generate some points.</span> | ||
x <-<span class="st"> </span><span class="kw">sort</span>(<span class="kw">rnorm</span>(<span class="dt">n=</span>N)) | ||
|
||
<span class="co"># Construct a matrix from a covariance function which I know is valid.</span> | ||
<span class="co"># (We'll add some noise on the diagonal to help it converge).</span> | ||
K <-<span class="st"> </span><span class="kw">outer</span>(x, x, | ||
function(a, b) { | ||
<span class="kw">exp</span>(-(a -<span class="st"> </span>b)^<span class="dv">2</span>) +<span class="st"> </span><span class="kw">ifelse</span>(a==b, <span class="fl">0.01</span>, <span class="fl">0.0</span>) | ||
}) | ||
|
||
<span class="co"># What are the biggest and smallest differences between my method,</span> | ||
<span class="co"># and R's built-in method I know is correct?</span> | ||
<span class="kw">range</span>(<span class="kw">MyCholesky</span>(K) -<span class="st"> </span><span class="kw">t</span>(<span class="kw">chol</span>(K)))</code></pre> | ||
<pre><code>## [1] -2.193e-15 2.665e-15</code></pre> | ||
<p>So the biggest difference was smaller than <span class="math">\(10^{-14}\)</span>. Looks pretty good to me!</p> | ||
<h2 id="conclusions">Conclusions</h2> | ||
<p>Don't be intimidated by theorems with <a href="https://biostat-lists.wustl.edu/sympa/arc/s-news/2000-10/msg00172.html">hard to pronounce</a> names. They often contain extremely simple ideas.</p> | ||
<div class="footnotes"> | ||
<hr> | ||
<ol> | ||
<li id="fn1"><p>Note that this is a <a href="http://en.wikipedia.org/wiki/Block_matrix">block matrix</a>: one whose elements stand for other matrices. Specifically, <span class="math">\(K_n\)</span> is an <span class="math">\((n \times n)\)</span> matrix; <span class="math">\(k_n\)</span> is <span class="math">\((n \times 1)\)</span> (i.e., a vector), and <span class="math">\(\sigma_{n + 1}^2\)</span> is <span class="math">\((1 \times 1)\)</span> (i.e., a scalar).<a href="#fnref1">↩</a></p></li> | ||
</ol> | ||
</div> | ||
|
||
</article> | ||
</div> | ||
|
||
</div> | ||
<div id='source'> | ||
<a href='https://github.com/chiphogg/chogg_name/tree/master/content/blog/cholesky.Rmd'> | ||
Page source on GitHub | ||
</a> | ||
</div> | ||
</body> | ||
</html> |