From e47447346eca9084f10059b676ba50357b3005ee Mon Sep 17 00:00:00 2001
From: mmcky <mamckay@gmail.com>
Date: Wed, 6 Jan 2021 16:37:17 +1100
Subject: [PATCH 1/2] update sources from lecture-python (dbd838d) using
 sphinx-tomyst (b06cacb)

---
 lectures/ar1_processes.md                |   4 +-
 lectures/cake_eating_numerical.md        |   8 +-
 lectures/cass_koopmans_1.md              |   6 +-
 lectures/cass_koopmans_2.md              |   4 +-
 lectures/finite_markov.md                |   6 +-
 lectures/heavy_tails.md                  |   4 +-
 lectures/ifp.md                          |   9 +-
 lectures/ifp_advanced.md                 |   6 +-
 lectures/inventory_dynamics.md           |   4 +-
 lectures/jv.md                           |   2 +-
 lectures/kalman.md                       |   2 +-
 lectures/kesten_processes.md             |   2 +-
 lectures/likelihood_ratio_process.md     |   4 +-
 lectures/linear_algebra.md               |   2 +-
 lectures/lln_clt.md                      |   8 +-
 lectures/mccall_fitted_vfi.md            |   4 +-
 lectures/mccall_model.md                 |   4 +-
 lectures/mccall_model_with_separation.md |   2 +-
 lectures/multi_hyper.md                  |   2 +-
 lectures/multivariate_normal.md          | 223 ++++++++++++++++++++++-
 lectures/optgrowth.md                    |   2 +-
 lectures/samuelson.md                    |  13 +-
 lectures/short_path.md                   |   2 +-
 lectures/sir_model.md                    |   4 +-
 lectures/time_series_with_matrices.md    |   4 +-
 lectures/wald_friedman.md                |   3 +-
 lectures/wealth_dynamics.md              |   2 +-
 27 files changed, 270 insertions(+), 66 deletions(-)

diff --git a/lectures/ar1_processes.md b/lectures/ar1_processes.md
index c3dffa1bc..65b596051 100644
--- a/lectures/ar1_processes.md
+++ b/lectures/ar1_processes.md
@@ -109,7 +109,7 @@ series $\{ X_t\}$.
 
 To see this, we first note that $X_t$ is normally distributed for each $t$.
 
-This is immediate form {eq}`ar1_ma`, since linear combinations of independent
+This is immediate from {eq}`ar1_ma`, since linear combinations of independent
 normal random variables are normal.
 
 Given that $X_t$ is normally distributed, we will know the full distribution
@@ -212,7 +212,7 @@ In fact it's easy to show that such convergence will occur, regardless of the in
 To see this, we just have to look at the dynamics of the first two moments, as
 given in {eq}`dyn_tm`.
 
-When $|a| < 1$, these sequence converge to the respective limits
+When $|a| < 1$, these sequences converge to the respective limits
 
 ```{math}
 :label: mu_sig_star
diff --git a/lectures/cake_eating_numerical.md b/lectures/cake_eating_numerical.md
index 8ad096748..f9b1eae7f 100644
--- a/lectures/cake_eating_numerical.md
+++ b/lectures/cake_eating_numerical.md
@@ -100,7 +100,7 @@ Let's write this a bit more mathematically.
 ### The Bellman Operator
 
 We introduce the **Bellman operator** $T$ that takes a function v as an
-argument and returns a new function $Tv$ defined by.
+argument and returns a new function $Tv$ defined by
 
 $$
 Tv(x) = \max_{0 \leq c \leq x} \{u(c) + \beta v(x - c)\}
@@ -118,7 +118,7 @@ v$ converges to the solution to the Bellman equation.
 
 ### Fitted Value Function Iteration
 
-Both consumption $c$ and the state variable $x$ are continous.
+Both consumption $c$ and the state variable $x$ are continuous.
 
 This causes complications when it comes to numerical work.
 
@@ -419,7 +419,7 @@ ax.legend()
 plt.show()
 ```
 
-The fit is reasoable but not perfect.
+The fit is reasonable but not perfect.
 
 We can improve it by increasing the grid size or reducing the
 error tolerance in the value function iteration routine.
@@ -509,7 +509,7 @@ modification in the exercise above).
 
 ### Exercise 1
 
-We need to create a class to hold our primitives and return the right hand side of the bellman equation.
+We need to create a class to hold our primitives and return the right hand side of the Bellman equation.
 
 We will use [inheritance](https://en.wikipedia.org/wiki/Inheritance_%28object-oriented_programming%29) to maximize code reuse.
 
diff --git a/lectures/cass_koopmans_1.md b/lectures/cass_koopmans_1.md
index 843804b43..1297e55f1 100644
--- a/lectures/cass_koopmans_1.md
+++ b/lectures/cass_koopmans_1.md
@@ -26,14 +26,14 @@ kernelspec:
 
 ## Overview
 
-This lecture and in {doc}`Cass-Koopmans Competitive Equilibrium <cass_koopmans_2>` describe a model that Tjalling Koopmans {cite}`Koopmans`
+This lecture and lecture {doc}`Cass-Koopmans Competitive Equilibrium <cass_koopmans_2>` describe a model that Tjalling Koopmans {cite}`Koopmans`
 and David Cass {cite}`Cass` used to analyze optimal growth.
 
 The model can be viewed as an extension of the model of Robert Solow
 described in [an earlier lecture](https://lectures.quantecon.org/py/python_oop.html)
 but adapted to make the saving rate the outcome of an optimal choice.
 
-(Solow assumed a constant saving rate determined outside the model).
+(Solow assumed a constant saving rate determined outside the model.)
 
 We describe two versions of the model, one in this lecture and the other in {doc}`Cass-Koopmans Competitive Equilibrium <cass_koopmans_2>`.
 
@@ -696,7 +696,7 @@ its steady state value most of the time.
 plot_paths(pp, 0.3, k_ss/3, [250, 150, 50, 25], k_ss=k_ss);
 ```
 
-Different colors in the above graphs are associated
+Different colors in the above graphs are associated with
 different horizons $T$.
 
 Notice that as the horizon increases, the planner puts $K_t$
diff --git a/lectures/cass_koopmans_2.md b/lectures/cass_koopmans_2.md
index 2ac409a66..b5258c1cc 100644
--- a/lectures/cass_koopmans_2.md
+++ b/lectures/cass_koopmans_2.md
@@ -397,7 +397,7 @@ verify** approach.
 In this lecture {doc}`Cass-Koopmans Planning Model <cass_koopmans_1>`, we  computed an allocation $\{\vec{C}, \vec{K}, \vec{N}\}$
 that solves the planning problem.
 
-(This allocation will constitute the **Big** $K$  to be in the presence instance of the *Big** $K$ **, little** $k$ trick
+(This allocation will constitute the **Big** $K$  to be in the present instance of the *Big** $K$ **, little** $k$ trick
 that we'll apply to  a competitive equilibrium in the spirit of [this lecture](https://lectures.quantecon.org/py/rational_expectations.html#)
 and  [this lecture](https://lectures.quantecon.org/py/dyn_stack.html#).)
 
@@ -597,7 +597,7 @@ representative household living in a competitive equilibrium.
 We now turn to  the problem faced by a firm in a competitive
 equilibrium:
 
-If we plug in {eq}`eq-pl` into {eq}`Zero-profits` for all t, we
+If we plug {eq}`eq-pl` into {eq}`Zero-profits` for all t, we
 get
 
 $$
diff --git a/lectures/finite_markov.md b/lectures/finite_markov.md
index 6d33d8dd2..6058ca908 100644
--- a/lectures/finite_markov.md
+++ b/lectures/finite_markov.md
@@ -603,7 +603,7 @@ We'll come back to this a bit later.
 
 ### Aperiodicity
 
-Loosely speaking, a Markov chain is called periodic if it cycles in a predictible way, and aperiodic otherwise.
+Loosely speaking, a Markov chain is called periodic if it cycles in a predictable way, and aperiodic otherwise.
 
 Here's a trivial example with three states
 
@@ -771,7 +771,7 @@ with the unit eigenvalue $\lambda = 1$.
 
 A more stable and sophisticated algorithm is implemented in [QuantEcon.py](http://quantecon.org/quantecon-py).
 
-This is the one we recommend you use:
+This is the one we recommend you to use:
 
 ```{code-cell} python3
 P = [[0.4, 0.6],
@@ -1023,7 +1023,7 @@ A topic of interest for economics and many other disciplines is *ranking*.
 Let's now consider one of the most practical and important ranking problems
 --- the rank assigned to web pages by search engines.
 
-(Although the problem is motivated from outside of economics, there is in fact a deep connection between search ranking systems and prices in certain competitive equilibria --- see {cite}`DLP2013`)
+(Although the problem is motivated from outside of economics, there is in fact a deep connection between search ranking systems and prices in certain competitive equilibria --- see {cite}`DLP2013`.)
 
 To understand the issue, consider the set of results returned by a query to a web search engine.
 
diff --git a/lectures/heavy_tails.md b/lectures/heavy_tails.md
index a1e35c461..9ed870eae 100644
--- a/lectures/heavy_tails.md
+++ b/lectures/heavy_tails.md
@@ -186,7 +186,7 @@ where $\mu := \mathbb E X_i = \int x F(x)$ is the common mean of the sample.
 The condition $\mathbb E | X_i | = \int |x| F(x) < \infty$ holds
 in most cases but can fail if the distribution $F$ is very heavy tailed.
 
-For example, it fails for the Cauchy distribution
+For example, it fails for the Cauchy distribution.
 
 Let's have a look at the behavior of the sample mean in this case, and see
 whether or not the LLN is still valid.
@@ -590,7 +590,7 @@ $$
 2^{1/\alpha} = \exp(\mu)
 $$
 
-which we solve for $\mu$ and $\sigma$ given $\alpha = 1.05$
+which we solve for $\mu$ and $\sigma$ given $\alpha = 1.05$.
 
 Here is code that generates the two samples, produces the violin plot and
 prints the mean and standard deviation of the two samples.
diff --git a/lectures/ifp.md b/lectures/ifp.md
index 0ec789c33..429759d72 100644
--- a/lectures/ifp.md
+++ b/lectures/ifp.md
@@ -48,7 +48,7 @@ model <optgrowth>` and yet differs in important ways.
 
 For example, the choice problem for the agent includes an additive income term that leads to an occasionally binding constraint.
 
-Moreover, in this and the following lectures, we will inject more realisitic
+Moreover, in this and the following lectures, we will inject more realistic
 features such as correlated shocks.
 
 To solve the model we will use Euler equation based time iteration, which proved
@@ -194,7 +194,7 @@ strict inequality $u' (c_t) > \beta R \,  \mathbb{E}_t  u'(c_{t+1})$
 can occur because $c_t$ cannot increase sufficiently to attain equality.
 
 (The lower boundary case $c_t = 0$ never arises at the optimum because
-$u'(0) = \infty$)
+$u'(0) = \infty$.)
 
 With some thought, one can show that {eq}`ee00` and {eq}`ee01` are
 equivalent to
@@ -409,8 +409,7 @@ Next we provide a function to compute the difference
 ```{math}
 :label: euler_diff_eq
 
-u'(c)
-- \max \left\{
+u'(c) - \max \left\{
            \beta R \, \mathbb E_z (u' \circ \sigma) \,
            [R (a - c) + \hat Y, \, \hat Z]
            \, , \;
@@ -629,7 +628,7 @@ shocks.
 Your task is to investigate how this measure of aggregate capital varies with
 the interest rate.
 
-Following tradition, put the price (i.e., interest rate) is on the vertical axis.
+Following tradition, put the price (i.e., interest rate) on the vertical axis.
 
 On the horizontal axis put aggregate capital, computed as the mean of the
 stationary distribution given the interest rate.
diff --git a/lectures/ifp_advanced.md b/lectures/ifp_advanced.md
index e9587a3c7..bfc5b434d 100644
--- a/lectures/ifp_advanced.md
+++ b/lectures/ifp_advanced.md
@@ -250,7 +250,7 @@ It can be shown that
 
 We now have a clear path to successfully approximating the optimal policy:
 choose some $\sigma \in \mathscr C$ and then iterate with $K$ until
-convergence (as measured by the distance $\rho$)
+convergence (as measured by the distance $\rho$).
 
 ### Using an Endogenous Grid
 
@@ -325,7 +325,7 @@ $$
 L(z, \hat z) := P(z, \hat z) \int R(\hat z, x) \phi(x) dx
 $$
 
-This indentity is proved in {cite}`ma2020income`, where $\phi$ is the
+This identity is proved in {cite}`ma2020income`, where $\phi$ is the
 density of the innovation $\zeta_t$ to returns on assets.
 
 (Remember that $\mathsf Z$ is a finite set, so this expression defines a matrix.)
@@ -618,7 +618,7 @@ For example, we will pass in the solutions `a_star, σ_star` along with
 `ifp`, even though it would be more natural to just pass in `ifp` and then
 solve inside the function.
 
-The reason we do this is because `solve_model_time_iter` is not
+The reason we do this is that `solve_model_time_iter` is not
 JIT-compiled.
 
 ```{code-cell} python3
diff --git a/lectures/inventory_dynamics.md b/lectures/inventory_dynamics.md
index e8afcbc60..f1ee5f7c7 100644
--- a/lectures/inventory_dynamics.md
+++ b/lectures/inventory_dynamics.md
@@ -34,7 +34,7 @@ follow so-called s-S inventory dynamics.
 Such firms
 
 1. wait until inventory falls below some level $s$ and then
-1. order sufficent quantities to bring their inventory back up to capacity $S$.
+1. order sufficient quantities to bring their inventory back up to capacity $S$.
 
 These kinds of policies are common in practice and also optimal in certain circumstances.
 
@@ -176,7 +176,7 @@ fixed $T$.
 We will do this by generating many draws of $X_T$ given initial
 condition $X_0$.
 
-With these draws of $X_T$ we can build up a picture of its distribution $\psi_T$
+With these draws of $X_T$ we can build up a picture of its distribution $\psi_T$.
 
 Here's one visualization, with $T=50$.
 
diff --git a/lectures/jv.md b/lectures/jv.md
index b4852beee..6ad4f5ee7 100644
--- a/lectures/jv.md
+++ b/lectures/jv.md
@@ -223,7 +223,7 @@ class JVWorker:
 ```
 
 The function `operator_factory` takes an instance of this class and returns a
-jitted version of the Bellman operator `T`, ie.
+jitted version of the Bellman operator `T`, i.e.
 
 $$
 Tv(x)
diff --git a/lectures/kalman.md b/lectures/kalman.md
index f45319d0e..a36d2e2cc 100644
--- a/lectures/kalman.md
+++ b/lectures/kalman.md
@@ -499,7 +499,7 @@ Conditions under which a fixed point exists and the sequence $\{\Sigma_t\}$ conv
 
 A sufficient (but not necessary) condition is that all the eigenvalues $\lambda_i$ of $A$ satisfy $|\lambda_i| < 1$ (cf. e.g., {cite}`AndersonMoore2005`, p. 77).
 
-(This strong condition assures that the unconditional  distribution of $x_t$  converges as $t \rightarrow + \infty$)
+(This strong condition assures that the unconditional  distribution of $x_t$  converges as $t \rightarrow + \infty$.)
 
 In this case, for any initial choice of $\Sigma_0$ that is both non-negative and symmetric, the sequence $\{\Sigma_t\}$ in {eq}`kalman_sdy` converges to a non-negative symmetric matrix $\Sigma$ that solves {eq}`kalman_dare`.
 
diff --git a/lectures/kesten_processes.md b/lectures/kesten_processes.md
index 42dd64db6..7297ea0c3 100644
--- a/lectures/kesten_processes.md
+++ b/lectures/kesten_processes.md
@@ -496,7 +496,7 @@ s_{t+1} = e_{t+1} \mathbb{1}\{s_t < \bar s\} +
 
 Here
 
-* the state variable $s_t$ is represents productivity (which is a proxy
+* the state variable $s_t$ represents productivity (which is a proxy
   for output and hence firm size),
 * the IID sequence $\{ e_t \}$ is thought of as a productivity draw for a new
   entrant and
diff --git a/lectures/likelihood_ratio_process.md b/lectures/likelihood_ratio_process.md
index 2d6edfb6a..78e23d20f 100644
--- a/lectures/likelihood_ratio_process.md
+++ b/lectures/likelihood_ratio_process.md
@@ -254,7 +254,7 @@ But it would be too challenging for us to that  here simply by applying a standa
 
 The reason is that the distribution of $L\left(w^{t}\right)$ is extremely skewed for large values of  $t$.
 
-Because the probabilty density in the right tail is close to $0$,  it just takes too much computer time to sample enough points from the right tail.
+Because the probability density in the right tail is close to $0$,  it just takes too much computer time to sample enough points from the right tail.
 
 Instead, the following code just illustrates that the unconditional means of $l(w_t)$ are $1$.
 
@@ -498,7 +498,7 @@ Notice that as $t$ increases, we are assured a larger probability
 of detection and a smaller probability of false alarm associated with
 a given discrimination threshold $c$.
 
-As $t \rightarrow + \infty$, we approach the the perfect detection
+As $t \rightarrow + \infty$, we approach the perfect detection
 curve that is indicated by a right angle hinging on the green dot.
 
 For a given sample size $t$, a value discrimination threshold $c$ determines a point on the receiver operating
diff --git a/lectures/linear_algebra.md b/lectures/linear_algebra.md
index a03f8626e..57f4c4904 100644
--- a/lectures/linear_algebra.md
+++ b/lectures/linear_algebra.md
@@ -1290,7 +1290,7 @@ $$
 (Q + B'PB)u + B'PAx = 0
 $$
 
-which is the first-order condition for maximizing L w.r.t. u.
+which is the first-order condition for maximizing $L$ w.r.t. $u$.
 
 Thus, the optimal choice of u must satisfy
 
diff --git a/lectures/lln_clt.md b/lectures/lln_clt.md
index dcfe39265..06d57420e 100644
--- a/lectures/lln_clt.md
+++ b/lectures/lln_clt.md
@@ -385,7 +385,7 @@ To this end, we now perform the following simulation
 Here's some code that does exactly this for the exponential distribution
 $F(x) = 1 - e^{- \lambda x}$.
 
-(Please experiment with other choices of $F$, but remember that, to conform with the conditions of the CLT, the distribution must have a finite second moment)
+(Please experiment with other choices of $F$, but remember that, to conform with the conditions of the CLT, the distribution must have a finite second moment.)
 
 (sim_one)=
 ```{code-cell} python3
@@ -437,7 +437,7 @@ random variable, the distribution of $Y_n$ will smooth out into a bell-shaped cu
 The next figure shows this process for $X_i \sim f$, where $f$ was
 specified as the convex combination of three different beta densities.
 
-(Taking a convex combination is an easy way to produce an irregular shape for $f$)
+(Taking a convex combination is an easy way to produce an irregular shape for $f$.)
 
 In the figure, the closest density is that of $Y_1$, while the furthest is that of
 $Y_5$
@@ -650,7 +650,7 @@ n \to \infty
 
 This theorem is used frequently in statistics to obtain the asymptotic distribution of estimators --- many of which can be expressed as functions of sample means.
 
-(These kinds of results are often said to use the "delta method")
+(These kinds of results are often said to use the "delta method".)
 
 The proof is based on a Taylor expansion of $g$ around the point $\mu$.
 
@@ -741,7 +741,7 @@ n \| \mathbf Q ( \bar{\mathbf X}_n - \boldsymbol \mu ) \|^2
 where $\chi^2(k)$ is the chi-squared distribution with $k$ degrees
 of freedom.
 
-(Recall that $k$ is the dimension of $\mathbf X_i$, the underlying random vectors)
+(Recall that $k$ is the dimension of $\mathbf X_i$, the underlying random vectors.)
 
 Your second exercise is to illustrate the convergence in {eq}`lln_ctc` with a simulation.
 
diff --git a/lectures/mccall_fitted_vfi.md b/lectures/mccall_fitted_vfi.md
index f6bc2137b..0f3d6c6aa 100644
--- a/lectures/mccall_fitted_vfi.md
+++ b/lectures/mccall_fitted_vfi.md
@@ -320,7 +320,7 @@ The exercises ask you to explore the solution and how it changes with parameters
 Use the code above to explore what happens to the reservation wage when the wage parameter $\mu$
 changes.
 
-Use the default parameters and $\mu$ in `mu_vals = np.linspace(0.0, 2.0, 15)`
+Use the default parameters and $\mu$ in `mu_vals = np.linspace(0.0, 2.0, 15)`.
 
 Is the impact on the reservation wage as you expected?
 
@@ -338,7 +338,7 @@ support.
 
 Use `s_vals = np.linspace(1.0, 2.0, 15)` and `m = 2.0`.
 
-State how you expect the reservation wage vary with $s$.
+State how you expect the reservation wage to vary with $s$.
 
 Now compute it.  Is this as you expected?
 
diff --git a/lectures/mccall_model.md b/lectures/mccall_model.md
index a5e916ac6..59288f041 100644
--- a/lectures/mccall_model.md
+++ b/lectures/mccall_model.md
@@ -296,7 +296,7 @@ Step 4: if the deviation is larger than some fixed tolerance, set $v = v'$ and g
 
 Step 5: return $v$.
 
-Let $\{ v_k \}$ denote the sequence genererated by this algorithm.
+Let $\{ v_k \}$ denote the sequence generated by this algorithm.
 
 This sequence converges to the solution
 to {eq}`odu_pv2` as $k \to \infty$, which is the value function $v^*$.
@@ -321,7 +321,7 @@ itself via
 ```
 
 (A new vector $Tv$ is obtained from given vector $v$ by evaluating
-the r.h.s. at each $i$)
+the r.h.s. at each $i$.)
 
 The element $v_k$ in the sequence $\{v_k\}$ of successive
 approximations corresponds to $T^k v$.
diff --git a/lectures/mccall_model_with_separation.md b/lectures/mccall_model_with_separation.md
index 73237ffdb..06b0428af 100644
--- a/lectures/mccall_model_with_separation.md
+++ b/lectures/mccall_model_with_separation.md
@@ -124,7 +124,7 @@ If he rejects, then he receives unemployment compensation $c$.
 
 The process then repeats.
 
-(Note: we do not allow for job search while employed---this topic is taken up in a {doc}`later lecture <jv>`)
+(Note: we do not allow for job search while employed---this topic is taken up in a {doc}`later lecture <jv>`.)
 
 ## Solving the Model
 
diff --git a/lectures/multi_hyper.md b/lectures/multi_hyper.md
index 161804f3f..ca274ed2f 100644
--- a/lectures/multi_hyper.md
+++ b/lectures/multi_hyper.md
@@ -77,7 +77,7 @@ $$
 To evaluate whether the selection procedure is **color blind** the administrator wants to  study whether the particular realization of $X$ drawn can plausibly
 be said to be a random draw from the probability distribution that is implied by the **color blind** hypothesis.
 
-The appropriate probability distribution is the one described [here](https://en.wikipedia.org/wiki/Hypergeometric_distribution)
+The appropriate probability distribution is the one described [here](https://en.wikipedia.org/wiki/Hypergeometric_distribution).
 
 Let's now instantiate the administrator's problem, while continuing to use the colored balls metaphor.
 
diff --git a/lectures/multivariate_normal.md b/lectures/multivariate_normal.md
index d9b660d32..d4a4a3d6d 100644
--- a/lectures/multivariate_normal.md
+++ b/lectures/multivariate_normal.md
@@ -37,11 +37,12 @@ In this lecture, you will learn formulas for
 
 We will use  the multivariate normal distribution to formulate some classic models:
 
-* a **factor analytic model** on an intelligence quotient, i.e., IQ
-* a **factor analytic model** or two independent inherent abilities, mathematical and verbal.
+* a **factor analytic model** of an intelligence quotient, i.e., IQ
+* a **factor analytic model** of two independent inherent abilities, mathematical and verbal.
 * a more general factor analytic model
 * PCA as an approximation to a factor analytic model
 * time series generated by linear stochastic difference equations
+* optimal linear filtering theory
 
 ## The Multivariate Normal Distribution
 
@@ -429,7 +430,7 @@ one-dimensional measure of intelligence called IQ from a list of test
 scores.
 
 The $i$th test score $y_i$ equals the sum of an unknown
-scalar IQ $\theta$ and a random variables $w_{i}$.
+scalar IQ $\theta$ and a random variable $w_{i}$.
 
 $$
 y_{i} = \theta + \sigma_y w_i, \quad i=1,\dots, n
@@ -636,7 +637,7 @@ plt.show()
 ```
 
 The solid blue line in the plot above shows $\hat{\mu}_{\theta}$
-as function of the number of test scores that we have recorded and
+as a function of the number of test scores that we have recorded and
 conditioned on.
 
 The blue area shows the span that comes from adding or deducing
@@ -717,8 +718,8 @@ Then we can write
 \theta = \mu_{\theta} + c_1 \epsilon_1 + c_2 \epsilon_2 + \dots + c_n \epsilon_n + c_{n+1} \epsilon_{n+1}
 ```
 
-The mutual orthogonality of the $\epsilon_i$’s provides us an
-informative way to interpret them in light of equation (1).
+The mutual orthogonality of the $\epsilon_i$’s provides us with an
+informative way to interpret them in light of equation {eq}`mnv_1`.
 
 Thus, relative to what is known from tests $i=1, \ldots, n-1$,
 $c_i \epsilon_i$ is the amount of **new information** about
@@ -727,7 +728,7 @@ $\theta$ brought by the test number $i$.
 Here **new information** means **surprise** or what could not be
 predicted from earlier information.
 
-Formula (1) also provides us with an enlightening way to express
+Formula {eq}`mnv_1` also provides us with an enlightening way to express
 conditional means and conditional variances that we computed earlier.
 
 In particular,
@@ -780,9 +781,9 @@ Evidently, the Cholesky factorization is automatically computing the
 population  **regression coefficients** and associated statistics
 that are produced by our `MultivariateNormal` class.
 
-And they are doing it **recursively**.
+The Cholesky factorization is  computing things **recursively**.
 
-Indeed, in formula (1),
+Indeed, in formula {eq}`mnv_1`,
 
 - the random variable $c_i \epsilon_i$ is information about
   $\theta$ that is not contained by the information in
@@ -1527,7 +1528,7 @@ B @ y
 ```
 
 The fraction of variance in $y_{t}$ explained by the first two
-principal component can be computed as below.
+principal components can be computed as below.
 
 ```{code-cell} python3
 𝜆_tilde[:2].sum() / 𝜆_tilde.sum()
@@ -1855,3 +1856,205 @@ be if people did not have perfect foresight but were optimally
 predicting future dividends on the basis of the information
 $y_t, y_{t-1}$ at time $t$.
 
+## Filtering Foundations
+
+Assume that $x_0$ is an $n \times 1$ random vector and that
+$y_0$ is a $p \times 1$ random vector determined by the
+*observation equation*
+
+$$
+y_0 = G x_0 + v_0  , \quad x_0 \sim {\mathcal N}(\hat x_0, \Sigma_0), \quad v_0 \sim {\mathcal N}(0, R)
+$$
+
+where $v_0$ is orthogonal to $x_0$, $G$ is a
+$p \times n$ matrix, and $R$ is a $p \times p$
+positive definite matrix.
+
+We consider the problem of someone who *observes* $y_0$, who does
+not observe $x_0$, who knows $\hat x_0, \Sigma_0, G, R$ –
+and therefore knows the joint probability distribution of the vector
+$\begin{bmatrix} x_0 \cr y_0 \end{bmatrix}$ – and who wants to
+infer $x_0$ from $y_0$ in light of what he knows about that
+joint probability distribution.
+
+Therefore, the person wants to construct the probability distribution of
+$x_0$ conditional on the random vector $y_0$.
+
+The joint distribution of
+$\begin{bmatrix} x_0 \cr y_0 \end{bmatrix}$ is multivariate normal
+${\mathcal N}(\mu, \Sigma)$ with
+
+$$
+\mu = \begin{bmatrix} \hat x_0 \cr G \hat x_0 \end{bmatrix} , \quad
+  \Sigma = \begin{bmatrix} \Sigma_0 & \Sigma_0 G' \cr
+                          G \Sigma_0 & G \Sigma_0 G' + R \end{bmatrix}
+$$
+
+By applying an appropriate instance of the above formulas for the  mean vector $\hat \mu_1$ and covariance matrix
+$\hat \Sigma_{11}$ of $z_1$ conditional on $z_2$, we find that the probability distribution of
+$x_0$ conditional on $y_0$ is
+${\mathcal N}(\tilde x_0, \tilde \Sigma_0)$ where
+
+$$
+\begin{aligned} \beta_0  & = \Sigma_0 G' (G \Sigma_0 G' + R)^{-1} \cr
+\tilde x_0 & = \hat x_0 + \beta_0 ( y_0 - G \hat x_0) \cr
+ \tilde \Sigma_0 & = \Sigma_0 - \Sigma_0 G' (G \Sigma_0 G' + R)^{-1} G \Sigma_0
+  \end{aligned}
+$$
+
+### Step toward dynamics
+
+Now suppose that we are in a time series setting and that we have the
+one-step state transition equation
+
+$$
+x_1 = A x_0 + C w_1 ,  \quad w_1 \sim {\mathcal N}(0, I )
+$$
+
+where $A$ is an $n \times n$ matrix and $C$ is an
+$n \times m$ matrix.
+
+It follows that the probability distribution of $x_1$ conditional
+on $y_0$ is
+
+$$
+x_1 | y_0 \sim {\mathcal N}(A \tilde x_0 , A \tilde \Sigma_0 A' + C C' )
+$$
+
+Define
+
+$$
+\begin{aligned} \hat x_1 & = A \tilde x_0 \cr
+               \Sigma_1 & = A \tilde \Sigma_0 A' + C C'
+\end{aligned}
+$$
+
+### Dynamic version
+
+Suppose now that for $t \geq 0$,
+$\{x_{t+1}, y_t\}_{t=0}^\infty$ are governed by the equations
+
+$$
+\begin{aligned}
+x_{t+1} & = A x_t + C w_{t+1} \cr
+y_t & = G x_t + v_t
+\end{aligned}
+$$
+
+where as before $x_0 \sim {\mathcal N}(\hat x_0, \Sigma_0)$,
+$w_{t+1}$ is the $t+1$th component of an i.i.d. stochastic
+process distributed as $w_{t+1} \sim {\mathcal N}(0, I)$, and
+$v_t$ is the $t$th component of an i.i.d. process
+distributed as $v_t \sim {\mathcal N}(0, R)$ and the
+$\{w_{t+1}\}_{t=0}^\infty$ and $\{v_t\}_{t=0}^\infty$
+processes are orthogonal at all pairs of dates.
+
+The logic and
+formulas that we applied above imply that the probability distribution
+of $x_t$ conditional on
+$y_0, y_1, \ldots , y_{t-1} = y^{t-1}$ is
+
+$$
+x_t | y^{t-1} \sim {\mathcal N}(A \tilde x_t , A \tilde \Sigma_t A' + C C' )
+$$
+
+where $\{\tilde x_t, \tilde \Sigma_t\}_{t=1}^\infty$ can be
+computed by iterating on the following equations starting from
+$t=1$ and initial conditions for
+$\tilde x_0, \tilde \Sigma_0$ computed as we have above:
+
+$$
+\begin{aligned} \Sigma_t & = A  \tilde \Sigma_{t-1} A' + C C' \cr
+               \hat x_t & = A \tilde x_{t-1} \cr
+\beta_t & = \Sigma_t G' (G \Sigma_t G' + R)^{-1} \cr
+\tilde x_t & = \hat x_t + \beta_t ( y_t - G \hat x_t) \cr
+ \tilde \Sigma_t & = \Sigma_t - \Sigma_t G' (G \Sigma_t G' + R)^{-1} G \Sigma_t
+  \end{aligned}
+$$
+
+We can use the Python class *MultivariateNormal* to construct examples.
+
+Here is an example for a single period problem at time $0$
+
+```{code-cell} python3
+G = np.array([[1., 3.]])
+R = np.array([[1.]])
+
+x0_hat = np.array([0., 1.])
+Σ0 = np.array([[1., .5], [.3, 2.]])
+
+μ = np.hstack([x0_hat, G @ x0_hat])
+Σ = np.block([[Σ0, Σ0 @ G.T], [G @ Σ0, G @ Σ0 @ G.T + R]])
+```
+
+```{code-cell} python3
+# construction of the multivariate normal instance
+multi_normal = MultivariateNormal(μ, Σ)
+```
+
+```{code-cell} python3
+multi_normal.partition(2)
+```
+
+```{code-cell} python3
+# the observation of y
+y0 = 2.3
+
+# conditional distribution of x0
+μ1_hat, Σ11 = multi_normal.cond_dist(0, y0)
+μ1_hat, Σ11
+```
+
+```{code-cell} python3
+A = np.array([[0.5, 0.2], [-0.1, 0.3]])
+C = np.array([[2.], [1.]])
+
+# conditional distribution of x1
+x1_cond = A @ μ1_hat
+Σ1_cond = C @ C.T + A @ Σ11 @ A.T
+x1_cond, Σ1_cond
+```
+
+### Code for Iterating
+
+Here is code for solving a dynamic filtering problem by iterating on our
+equations, followed by an example.
+
+```{code-cell} python3
+def iterate(x0_hat, Σ0, A, C, G, R, y_seq):
+
+    p, n = G.shape
+
+    T = len(y_seq)
+    x_hat_seq = np.empty((T+1, n))
+    Σ_hat_seq = np.empty((T+1, n, n))
+
+    x_hat_seq[0] = x0_hat
+    Σ_hat_seq[0] = Σ0
+
+    for t in range(T):
+        xt_hat = x_hat_seq[t]
+        Σt = Σ_hat_seq[t]
+        μ = np.hstack([xt_hat, G @ xt_hat])
+        Σ = np.block([[Σt, Σt @ G.T], [G @ Σt, G @ Σt @ G.T + R]])
+
+        # filtering
+        multi_normal = MultivariateNormal(μ, Σ)
+        multi_normal.partition(n)
+        x_tilde, Σ_tilde = multi_normal.cond_dist(0, y_seq[t])
+
+        # forecasting
+        x_hat_seq[t+1] = A @ x_tilde
+        Σ_hat_seq[t+1] = C @ C.T + A @ Σ_tilde @ A.T
+
+    return x_hat_seq, Σ_hat_seq
+```
+
+```{code-cell} python3
+iterate(x0_hat, Σ0, A, C, G, R, [2.3, 1.2, 3.2])
+```
+
+The iterative algorithm just described is a version of the celebrated **Kalman filter**.
+
+We describe the Kalman filter  and some applications of it in {doc}`A First Look at the Kalman Filter <kalman>`
+
diff --git a/lectures/optgrowth.md b/lectures/optgrowth.md
index 1afa7bf19..8da671358 100644
--- a/lectures/optgrowth.md
+++ b/lectures/optgrowth.md
@@ -771,7 +771,7 @@ utility specification.
 
 Setting $\gamma = 1.5$, compute and plot an estimate of the optimal policy.
 
-Time how long this function takes to run, so you can compare it to faster code developed in the {doc}`next lecture <optgrowth_fast>`
+Time how long this function takes to run, so you can compare it to faster code developed in the {doc}`next lecture <optgrowth_fast>`.
 
 (og_ex2)=
 ### Exercise 2
diff --git a/lectures/samuelson.md b/lectures/samuelson.md
index 391f677e2..90e2bd354 100644
--- a/lectures/samuelson.md
+++ b/lectures/samuelson.md
@@ -76,11 +76,12 @@ represent a model of national output based on three components:
   $t$ equals a constant called the *accelerator coefficient*
   times the difference in output between period $t-1$ and
   $t-2$.
-- the idea that consumption plus investment plus government purchases
-  constitute *aggregate demand,* which automatically calls forth an
-  equal amount of *aggregate supply*.
 
-(To read about linear difference equations see [here](https://en.wikipedia.org/wiki/Linear_difference_equation) or chapter IX of {cite}`Sargent1987`)
+Consumption plus investment plus government purchases
+constitute *aggregate demand,* which automatically calls forth an
+equal amount of *aggregate supply*.
+
+(To read about linear difference equations see [here](https://en.wikipedia.org/wiki/Linear_difference_equation) or chapter IX of {cite}`Sargent1987`.)
 
 Samuelson used the model to analyze how particular values of the
 marginal propensity to consume and the accelerator coefficient might
@@ -102,7 +103,7 @@ This modification makes national output become governed by a second-order
 gives rise to recurrent irregular business cycles.
 
 (To read about stochastic linear difference equations see chapter XI of
-{cite}`Sargent1987`)
+{cite}`Sargent1987`.)
 
 ## Details
 
@@ -284,7 +285,7 @@ $$
 $$
 
 (To read about the polar form, see
-[here](https://www.varsitytutors.com/hotmath/hotmath_help/topics/polar-form-of-a-complex-number))
+[here](https://www.khanacademy.org/math/precalculus/x9e81a4f98389efdf:complex/x9e81a4f98389efdf:complex-mul-div-polar/a/complex-number-polar-form-review))
 
 Given **initial conditions** $Y_{-1}, Y_{-2}$, we want to generate
 a **solution** of the difference equation {eq}`second_stochastic2`.
diff --git a/lectures/short_path.md b/lectures/short_path.md
index d3639a9ef..424dbcb88 100644
--- a/lectures/short_path.md
+++ b/lectures/short_path.md
@@ -67,7 +67,7 @@ We wish to travel from node (vertex) A to node G at minimum cost
 * Arrows (edges) indicate the movements we can take.
 * Numbers on edges indicate the cost of traveling that edge.
 
-(Graphs such as the one above are called **weighted directed graphs**)
+(Graphs such as the one above are called **weighted `directed graphs <https://en.wikipedia.org/wiki/Directed_graph>`_**.)
 
 Possible interpretations of the graph include
 
diff --git a/lectures/sir_model.md b/lectures/sir_model.md
index 907076ef6..ee493cc84 100644
--- a/lectures/sir_model.md
+++ b/lectures/sir_model.md
@@ -276,7 +276,7 @@ As expected, lower effective transmission rates defer the peak of infections.
 
 They also lead to a lower peak in current cases.
 
-Here is cumulative cases, as a fraction of population:
+Here are cumulative cases, as a fraction of population:
 
 ```{code-cell} ipython3
 plot_paths(c_paths, labels)
@@ -333,7 +333,7 @@ for η in η_vals:
     c_paths.append(c_path)
 ```
 
-This is current cases under the different scenarios:
+These are current cases under the different scenarios:
 
 ```{code-cell} ipython3
 plot_paths(i_paths, labels)
diff --git a/lectures/time_series_with_matrices.md b/lectures/time_series_with_matrices.md
index 09011da8f..ec3bf5ce2 100644
--- a/lectures/time_series_with_matrices.md
+++ b/lectures/time_series_with_matrices.md
@@ -75,7 +75,7 @@ But actually, it is a collection of $T$ simultaneous linear
 equations in the $T$ variables $y_1, y_2, \ldots, y_T$.
 
 **Note:** To be able to solve a second-order linear difference
-equations, we require two **boundary conditions** that can take the form
+equation, we require two **boundary conditions** that can take the form
 either of two **initial conditions** or two **terminal conditions** or
 possibly one of each.
 
@@ -219,7 +219,7 @@ plt.show()
 ## Adding a random term
 
 To generate some excitement, we'll follow in the spirit of the great economists
-Eugen Slusky and Ragnar Frisch and replace our original second-order difference
+Eugen Slutsky and Ragnar Frisch and replace our original second-order difference
 equation with the following **second-order stochastic linear difference
 equation**:
 
diff --git a/lectures/wald_friedman.md b/lectures/wald_friedman.md
index e57fbae3f..2b50109b5 100644
--- a/lectures/wald_friedman.md
+++ b/lectures/wald_friedman.md
@@ -745,7 +745,7 @@ wf = WaldFriedman(c=2.5)
 simulation_plot(wf)
 ```
 
-Increased cost per draw has induced the decision-maker to take less draws before deciding.
+Increased cost per draw has induced the decision-maker to take fewer draws before deciding.
 
 Because he decides with less, the percentage of time he is correct drops.
 
@@ -940,3 +940,4 @@ We'll dig deeper into some of the ideas used here in the following lectures:
 * {doc}`this lecture <likelihood_bayes>` discusses the role of likelihood ratio processes in **Bayesian learning**
 * {doc}`this lecture <navy_captain>` returns to the subject of this lecture and studies whether the Captain's hunch that the (frequentist) decision rule
   that the Navy had ordered him to use can be expected to be better or worse than the rule sequential rule that Abraham Wald designed
+
diff --git a/lectures/wealth_dynamics.md b/lectures/wealth_dynamics.md
index a84fbcd54..aedf88544 100644
--- a/lectures/wealth_dynamics.md
+++ b/lectures/wealth_dynamics.md
@@ -545,7 +545,7 @@ For the values of the tail index, use `a_vals = np.linspace(1, 10, 25)`.
 
 Use sample of size 1,000 for each $a$ and the sampling method for generating Pareto draws employed in the discussion of Lorenz curves for the Pareto distribution.
 
-To the extend that you can, interpret the monotone relationship between the
+To the extent that you can, interpret the monotone relationship between the
 Gini index and $a$.
 
 ### Exercise 2

From 634a14aaaabc63642ba7d77157c5e6e2253cf228 Mon Sep 17 00:00:00 2001
From: mmcky <mamckay@gmail.com>
Date: Wed, 6 Jan 2021 16:48:57 +1100
Subject: [PATCH 2/2] fix end of file space issue

---
 lectures/wald_friedman.md | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/lectures/wald_friedman.md b/lectures/wald_friedman.md
index 2b50109b5..49f073a24 100644
--- a/lectures/wald_friedman.md
+++ b/lectures/wald_friedman.md
@@ -939,5 +939,4 @@ We'll dig deeper into some of the ideas used here in the following lectures:
 * {doc}`this lecture <likelihood_ratio_process>` describes **likelihood ratio processes** and their role in frequentist and Bayesian statistical theories
 * {doc}`this lecture <likelihood_bayes>` discusses the role of likelihood ratio processes in **Bayesian learning**
 * {doc}`this lecture <navy_captain>` returns to the subject of this lecture and studies whether the Captain's hunch that the (frequentist) decision rule
-  that the Navy had ordered him to use can be expected to be better or worse than the rule sequential rule that Abraham Wald designed
-
+  that the Navy had ordered him to use can be expected to be better or worse than the rule sequential rule that Abraham Wald designed
\ No newline at end of file