QuantEcon
diff --git a/‎lectures/_static/quant-econ.bib‎
Lines changed: 4 additions & 3 deletions b/‎lectures/_static/quant-econ.bib‎
Lines changed: 4 additions & 3 deletions
diff --git a/‎lectures/blackwell_kihlstrom.md‎
Lines changed: 1 addition & 1 deletion b/‎lectures/blackwell_kihlstrom.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎lectures/cass_fiscal.md‎
Lines changed: 1 addition & 1 deletion b/‎lectures/cass_fiscal.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎lectures/cass_fiscal_2.md‎
Lines changed: 1 addition & 1 deletion b/‎lectures/cass_fiscal_2.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎lectures/chow_business_cycles.md‎
Lines changed: 3 additions & 3 deletions b/‎lectures/chow_business_cycles.md‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎lectures/hansen_singleton_1982.md‎
Lines changed: 1 addition & 1 deletion b/‎lectures/hansen_singleton_1982.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎lectures/hansen_singleton_1983.md‎
Lines changed: 2 additions & 2 deletions b/‎lectures/hansen_singleton_1983.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎lectures/inventory_q.md‎
Lines changed: 10 additions & 10 deletions b/‎lectures/inventory_q.md‎
Lines changed: 10 additions & 10 deletions
diff --git a/‎lectures/lqcontrol.md‎
Lines changed: 1 addition & 1 deletion b/‎lectures/lqcontrol.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎lectures/markov_perf.md‎
Lines changed: 5 additions & 1 deletion b/‎lectures/markov_perf.md‎
Lines changed: 5 additions & 1 deletion
@@ -127,7 +127,7 @@ @article{BakshiChabiYo2012
   number    = {1},
   pages     = {191--208},
   year      = {2012},
-  doi       = {10.1016/j.jfineco.2011.10.004}
+  doi       = {10.1016/j.jfineco.2012.01.003}
 }
 
 @article{BackusGregoryZin1989,
@@ -138,7 +138,7 @@ @article{BackusGregoryZin1989
   number    = {3},
   pages     = {371--399},
   year      = {1989},
-  doi       = {10.1016/0304-3932(89)90033-X}
+  doi       = {10.1016/0304-3932(89)90027-5}
 }
 
 @article{Hansen2012,
@@ -172,7 +172,8 @@ @article{Borovicka2020
   number    = {1},
   pages     = {206--251},
   year      = {2020},
-  publisher = {University of Chicago Press}
+  publisher = {University of Chicago Press},
+  doi       = {10.1086/704072}
 }
 
 @article{Sandroni2000Markets,
 
@@ -962,7 +962,7 @@ The Blackwell order says that, absent costs, more information is always better f
 
 With costs, the consumer chooses quality investment $\theta$ to maximize *net value*.
 
-If quality investment translates into experiment accuracy with diminishing returns — say, accuracy $\phi(\theta) = 1 - e^{-a\theta}$ for a rate parameter $a$ — then the marginal value of information eventually decreases in $\theta$.
+If quality investment translates into experiment accuracy with diminishing returns -- say, accuracy $\phi(\theta) = 1 - e^{-a\theta}$ for a rate parameter $a$ -- then the marginal value of information eventually decreases in $\theta$.
 
 With a convex cost $c(\theta) = c \, \theta^2$, the increasing marginal cost eventually overtakes the declining marginal value, producing an interior optimum.
 
 
@@ -1133,7 +1133,7 @@ and capital stock across time:
     - The jump in $\tau_c$ depresses $\bar{R}$ below $1$, causing a *sharp drop in consumption*.
 - After $T = 10$:
     - The effects of anticipated distortion are over, and the economy gradually adjusts to the lower capital stock.
-    - Capital must now rise, requiring *austerity* —consumption plummets after $t = T$,  indicated by  lower levels of consumption.
+    - Capital must now rise, requiring *austerity* --consumption plummets after $t = T$,  indicated by  lower levels of consumption.
     - The interest rate gradually declines, and consumption grows at a diminishing rate along the path to the terminal steady-state.
 
 +++
 
@@ -498,7 +498,7 @@ This means that foreign households begin repaying part of their external debt by
 
 We now explore the impact of an increase in capital taxation in the domestic economy $10$ periods after its announcement at $t = 1$.
 
-Because the change is anticipated, households in both countries adjust immediately—even though the tax does not take effect until period $t = 11$.
+Because the change is anticipated, households in both countries adjust immediately--even though the tax does not take effect until period $t = 11$.
 
 ```{code-cell} ipython3
 shocks_global = {
 
@@ -351,9 +351,9 @@ The second equation is the discrete Lyapunov equation for $\Gamma_0$.
 > But in reality the cycles ... are generally not damped.
 > How can the maintenance of the swings be explained?
 > ... One way which I believe is particularly fruitful and promising is to study what would become of the solution of a determinate dynamic system if it were exposed to a stream of erratic shocks ...
-> Thus, by connecting the two ideas: (1) the continuous solution of a determinate dynamic system and (2) the discontinuous shocks intervening and supplying the energy that may maintain the swings—we get a theoretical setup which seems to furnish a rational interpretation of those movements which we have been accustomed to see in our statistical time data.
+> Thus, by connecting the two ideas: (1) the continuous solution of a determinate dynamic system and (2) the discontinuous shocks intervening and supplying the energy that may maintain the swings--we get a theoretical setup which seems to furnish a rational interpretation of those movements which we have been accustomed to see in our statistical time data.
 >
-> — Ragnar Frisch (1933) {cite}`frisch33`
+> -- Ragnar Frisch (1933) {cite}`frisch33`
 
 Chow's main insight is that oscillations in the deterministic system are *neither necessary nor sufficient* for producing "cycles" in the stochastic system.
 
@@ -1408,7 +1408,7 @@ plt.show()
 
 As $v$ increases, eigenvalues approach the unit circle: oscillations become more persistent in the time domain (left), and the spectral peak becomes sharper in the frequency domain (right).
 
-Complex roots produce a pronounced peak at interior frequencies—the spectral signature of business cycles.
+Complex roots produce a pronounced peak at interior frequencies--the spectral signature of business cycles.
 
 ```{solution-end}
 ```
 
@@ -225,7 +225,7 @@ The vector $z_t$ plays the role of **instruments**.
 
 The conditional Euler equation $E_t[M_{t+1}R_{t+1}^i - 1] = 0$ says that the pricing error is unpredictable given *everything* in the agent's time-$t$ information set.
 
-That is a very strong restriction — it says the pricing error is orthogonal to every time-$t$ measurable random variable.
+That is a very strong restriction -- it says the pricing error is orthogonal to every time-$t$ measurable random variable.
 
 We cannot use the entire information set in practice, but we can pick any finite collection of time-$t$ observable variables $z_t$ and the orthogonality must still hold.
 
 
@@ -36,7 +36,7 @@ kernelspec:
 > rational expectations econometrics. A rational expectations equilibrium is a
 > likelihood function. Maximize it.
 >
-> — An Interview with Thomas J. Sargent {cite}`evans2005interview`
+> -- An Interview with Thomas J. Sargent {cite}`evans2005interview`
 
 ## Overview
 
@@ -1869,7 +1869,7 @@ Our estimates reproduce the pattern that {cite:t}`MehraPrescott1985` later calle
 
 - *Low estimated risk aversion:* The estimated $\hat\alpha$ values (and thus risk aversion $-\hat\alpha$) from the table above are similar to those in {cite:t}`hansen1983stochastic`, who report $\hat\alpha$ between $-0.32$ and $-1.25$.
 
-- *Tiny return predictability:* The unrestricted-VAR $R_R^2$ values are comparable to the 0.02 to 0.06 range in {cite:t}`hansen1983stochastic` — the predictable component of stock returns is small relative to the unpredictable component.
+- *Tiny return predictability:* The unrestricted-VAR $R_R^2$ values are comparable to the 0.02 to 0.06 range in {cite:t}`hansen1983stochastic` -- the predictable component of stock returns is small relative to the unpredictable component.
 
 - *Strong rejection for Treasury bills:* The Euler-equation restrictions are decisively rejected for the nominally risk-free Treasury bill return, just as in Table 4 of {cite:t}`hansen1983stochastic`.
 
 
@@ -35,7 +35,7 @@ A firm must decide how much stock to order each period, facing uncertain demand
 We approach the problem in two ways.
 
 First, we solve it exactly using dynamic programming, assuming full knowledge of
-the model — the demand distribution, cost parameters, and transition dynamics.
+the model -- the demand distribution, cost parameters, and transition dynamics.
 
 Second, we show how a manager can learn the optimal policy from experience alone, using [Q-learning](https://en.wikipedia.org/wiki/Q-learning).
 
@@ -475,15 +475,15 @@ All the manager needs to observe at each step is:
 4. the discount factor $\beta$, which is determined by the interest rate, and
 5. the next inventory level $X_{t+1}$ (which they can read off the warehouse).
 
-These are all directly observable quantities — no model knowledge is required.
+These are all directly observable quantities -- no model knowledge is required.
 
 
 ### The Q-table and the role of the max
 
 It is important to understand how the update rule relates to the manager's
 actions.
 
-The manager maintains a **Q-table** — a lookup table storing an estimate $q_t(x,
+The manager maintains a **Q-table** -- a lookup table storing an estimate $q_t(x,
 a)$ for every state-action pair $(x, a)$.
 
 At each step, the manager is in some state $x$ and must choose a specific action
@@ -492,7 +492,7 @@ and next state $X_{t+1}$, and updates *that one entry* $q_t(x, a)$ of the
 table using the rule above.
 
 It is tempting to read the $\max_{a'}$ in the update rule as prescribing the
-manager's next action — that is, to interpret the update as saying "move to
+manager's next action -- that is, to interpret the update as saying "move to
 state $X_{t+1}$ and take an action in $\argmax_{a'} q_t(X_{t+1}, a')$."
 
 But the $\max$ plays a different role.  
@@ -512,7 +512,7 @@ The rule governing how the manager chooses actions is called the **behavior poli
 
 Because the $\max$ in the update target always points toward $q^*$
 regardless of how the manager selects actions, the behavior policy affects only
-which $(x, a)$ entries get visited — and hence updated — over time.
+which $(x, a)$ entries get visited -- and hence updated -- over time.
 
 In the reinforcement learning literature, this property is called **off-policy**
 learning: the convergence target ($q^*$) does not depend on the behavior policy.
@@ -521,8 +521,8 @@ As long as every $(x, a)$ pair is visited infinitely often (so that every entry
 of the Q-table receives infinitely many updates) and the learning rates satisfy
 standard conditions (see below), the Q-table converges to $q^*$.
 
-The behavior policy affects the *speed* of convergence — visiting important
-state-action pairs more frequently leads to faster learning — but not the
+The behavior policy affects the *speed* of convergence -- visiting important
+state-action pairs more frequently leads to faster learning -- but not the
 *limit*.
 
 In practice, we want the manager to mostly take good actions (to earn reasonable
@@ -555,11 +555,11 @@ The stochastic demand shocks naturally drive the manager across different invent
 
 A simple but powerful technique for accelerating learning is **optimistic initialization**: instead of starting the Q-table at zero, we initialize every entry to a value above the true optimum.
 
-Because every untried action looks optimistically good, the agent is "disappointed" whenever it tries one — the update pulls that entry down toward reality. This drives the agent to try other actions (which still look optimistically high), producing broad exploration of the state-action space early in training.
+Because every untried action looks optimistically good, the agent is "disappointed" whenever it tries one -- the update pulls that entry down toward reality. This drives the agent to try other actions (which still look optimistically high), producing broad exploration of the state-action space early in training.
 
 This idea is sometimes called **optimism in the face of uncertainty** and is widely used in both bandit and reinforcement learning settings.
 
-In our problem, the value function $v^*$ ranges from about 13 to 18. We initialize the Q-table at 20 — modestly above the true maximum — to ensure optimistic exploration without being so extreme as to distort learning.
+In our problem, the value function $v^*$ ranges from about 13 to 18. We initialize the Q-table at 20 -- modestly above the true maximum -- to ensure optimistic exploration without being so extreme as to distort learning.
 
 ### Implementation
 
@@ -581,7 +581,7 @@ def greedy_policy_from_q(q, K):
     return σ
 ```
 
-The Q-learning loop runs for `n_steps` total steps in a single continuous trajectory — just as a real manager would learn from the ongoing stream of data.
+The Q-learning loop runs for `n_steps` total steps in a single continuous trajectory -- just as a real manager would learn from the ongoing stream of data.
 
 At specified step counts (given by `snapshot_steps`), we record the current greedy policy.
 
 
@@ -1267,7 +1267,7 @@ The parameters are $r = 0.05, \beta = 1 / (1 + r), \bar c = 1.5,  \mu = 2, \sigm
 
 Here’s one solution.
 
-We use some fancy plot commands to get a certain style — feel free to
+We use some fancy plot commands to get a certain style -- feel free to
 use simpler ones.
 
 The model is an LQ permanent income / life-cycle model with hump-shaped
 
@@ -140,7 +140,10 @@ v_i(q_i, q_{-i}) = \max_{\hat q_i}
    \left\{\pi_i (q_i, q_{-i}, \hat q_i) + \beta v_i(\hat q_i, f_{-i}(q_{-i}, q_i)) \right\}
 ```
 
-**Definition**  A **Markov perfect equilibrium** of the duopoly model is a pair of value functions $(v_1, v_2)$ and a pair of policy functions $(f_1, f_2)$ such that, for each $i \in \{1, 2\}$ and each possible state,
+```{prf:definition} Markov Perfect Equilibrium
+:label: def-markov-perfect-equilibrium
+
+A **Markov perfect equilibrium** of the duopoly model is a pair of value functions $(v_1, v_2)$ and a pair of policy functions $(f_1, f_2)$ such that, for each $i \in \{1, 2\}$ and each possible state,
 
 * The value function $v_i$ satisfies  Bellman equation {eq}`game4`.
 * The maximizer on the right side of {eq}`game4`  equals $f_i(q_i, q_{-i})$.
@@ -150,6 +153,7 @@ The adjective "Markov" denotes that the equilibrium decision rules depend only o
 "Perfect" means complete, in the sense that the equilibrium is constructed by backward induction and hence builds in optimizing behavior for each firm at all possible future states.
 
 * These include many states that will not be reached when we iterate forward on the pair of equilibrium strategies $f_i$ starting from a given initial state.
+```
 
 ### Computation