You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: lectures/blackwell_kihlstrom.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -962,7 +962,7 @@ The Blackwell order says that, absent costs, more information is always better f
962
962
963
963
With costs, the consumer chooses quality investment $\theta$ to maximize *net value*.
964
964
965
-
If quality investment translates into experiment accuracy with diminishing returns — say, accuracy $\phi(\theta) = 1 - e^{-a\theta}$ for a rate parameter $a$ — then the marginal value of information eventually decreases in $\theta$.
965
+
If quality investment translates into experiment accuracy with diminishing returns -- say, accuracy $\phi(\theta) = 1 - e^{-a\theta}$ for a rate parameter $a$ -- then the marginal value of information eventually decreases in $\theta$.
966
966
967
967
With a convex cost $c(\theta) = c \, \theta^2$, the increasing marginal cost eventually overtakes the declining marginal value, producing an interior optimum.
Copy file name to clipboardExpand all lines: lectures/chow_business_cycles.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -351,9 +351,9 @@ The second equation is the discrete Lyapunov equation for $\Gamma_0$.
351
351
> But in reality the cycles ... are generally not damped.
352
352
> How can the maintenance of the swings be explained?
353
353
> ... One way which I believe is particularly fruitful and promising is to study what would become of the solution of a determinate dynamic system if it were exposed to a stream of erratic shocks ...
354
-
> Thus, by connecting the two ideas: (1) the continuous solution of a determinate dynamic system and (2) the discontinuous shocks intervening and supplying the energy that may maintain the swings—we get a theoretical setup which seems to furnish a rational interpretation of those movements which we have been accustomed to see in our statistical time data.
354
+
> Thus, by connecting the two ideas: (1) the continuous solution of a determinate dynamic system and (2) the discontinuous shocks intervening and supplying the energy that may maintain the swings--we get a theoretical setup which seems to furnish a rational interpretation of those movements which we have been accustomed to see in our statistical time data.
355
355
>
356
-
> — Ragnar Frisch (1933) {cite}`frisch33`
356
+
> -- Ragnar Frisch (1933) {cite}`frisch33`
357
357
358
358
Chow's main insight is that oscillations in the deterministic system are *neither necessary nor sufficient* for producing "cycles" in the stochastic system.
359
359
@@ -1408,7 +1408,7 @@ plt.show()
1408
1408
1409
1409
As $v$ increases, eigenvalues approach the unit circle: oscillations become more persistent in the time domain (left), and the spectral peak becomes sharper in the frequency domain (right).
1410
1410
1411
-
Complex roots produce a pronounced peak at interior frequencies—the spectral signature of business cycles.
1411
+
Complex roots produce a pronounced peak at interior frequencies--the spectral signature of business cycles.
Copy file name to clipboardExpand all lines: lectures/hansen_singleton_1982.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -225,7 +225,7 @@ The vector $z_t$ plays the role of **instruments**.
225
225
226
226
The conditional Euler equation $E_t[M_{t+1}R_{t+1}^i - 1] = 0$ says that the pricing error is unpredictable given *everything* in the agent's time-$t$ information set.
227
227
228
-
That is a very strong restriction — it says the pricing error is orthogonal to every time-$t$ measurable random variable.
228
+
That is a very strong restriction -- it says the pricing error is orthogonal to every time-$t$ measurable random variable.
229
229
230
230
We cannot use the entire information set in practice, but we can pick any finite collection of time-$t$ observable variables $z_t$ and the orthogonality must still hold.
Copy file name to clipboardExpand all lines: lectures/hansen_singleton_1983.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -36,7 +36,7 @@ kernelspec:
36
36
> rational expectations econometrics. A rational expectations equilibrium is a
37
37
> likelihood function. Maximize it.
38
38
>
39
-
> — An Interview with Thomas J. Sargent {cite}`evans2005interview`
39
+
> -- An Interview with Thomas J. Sargent {cite}`evans2005interview`
40
40
41
41
## Overview
42
42
@@ -1869,7 +1869,7 @@ Our estimates reproduce the pattern that {cite:t}`MehraPrescott1985` later calle
1869
1869
1870
1870
-*Low estimated risk aversion:* The estimated $\hat\alpha$ values (and thus risk aversion $-\hat\alpha$) from the table above are similar to those in {cite:t}`hansen1983stochastic`, who report $\hat\alpha$ between $-0.32$ and $-1.25$.
1871
1871
1872
-
-*Tiny return predictability:* The unrestricted-VAR $R_R^2$ values are comparable to the 0.02 to 0.06 range in {cite:t}`hansen1983stochastic`— the predictable component of stock returns is small relative to the unpredictable component.
1872
+
-*Tiny return predictability:* The unrestricted-VAR $R_R^2$ values are comparable to the 0.02 to 0.06 range in {cite:t}`hansen1983stochastic`-- the predictable component of stock returns is small relative to the unpredictable component.
1873
1873
1874
1874
-*Strong rejection for Treasury bills:* The Euler-equation restrictions are decisively rejected for the nominally risk-free Treasury bill return, just as in Table 4 of {cite:t}`hansen1983stochastic`.
Copy file name to clipboardExpand all lines: lectures/inventory_q.md
+10-10Lines changed: 10 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -35,7 +35,7 @@ A firm must decide how much stock to order each period, facing uncertain demand
35
35
We approach the problem in two ways.
36
36
37
37
First, we solve it exactly using dynamic programming, assuming full knowledge of
38
-
the model — the demand distribution, cost parameters, and transition dynamics.
38
+
the model -- the demand distribution, cost parameters, and transition dynamics.
39
39
40
40
Second, we show how a manager can learn the optimal policy from experience alone, using [Q-learning](https://en.wikipedia.org/wiki/Q-learning).
41
41
@@ -475,15 +475,15 @@ All the manager needs to observe at each step is:
475
475
4. the discount factor $\beta$, which is determined by the interest rate, and
476
476
5. the next inventory level $X_{t+1}$ (which they can read off the warehouse).
477
477
478
-
These are all directly observable quantities — no model knowledge is required.
478
+
These are all directly observable quantities -- no model knowledge is required.
479
479
480
480
481
481
### The Q-table and the role of the max
482
482
483
483
It is important to understand how the update rule relates to the manager's
484
484
actions.
485
485
486
-
The manager maintains a **Q-table**— a lookup table storing an estimate $q_t(x,
486
+
The manager maintains a **Q-table**-- a lookup table storing an estimate $q_t(x,
487
487
a)$ for every state-action pair $(x, a)$.
488
488
489
489
At each step, the manager is in some state $x$ and must choose a specific action
@@ -492,7 +492,7 @@ and next state $X_{t+1}$, and updates *that one entry* $q_t(x, a)$ of the
492
492
table using the rule above.
493
493
494
494
It is tempting to read the $\max_{a'}$ in the update rule as prescribing the
495
-
manager's next action — that is, to interpret the update as saying "move to
495
+
manager's next action -- that is, to interpret the update as saying "move to
496
496
state $X_{t+1}$ and take an action in $\argmax_{a'} q_t(X_{t+1}, a')$."
497
497
498
498
But the $\max$ plays a different role.
@@ -512,7 +512,7 @@ The rule governing how the manager chooses actions is called the **behavior poli
512
512
513
513
Because the $\max$ in the update target always points toward $q^*$
514
514
regardless of how the manager selects actions, the behavior policy affects only
515
-
which $(x, a)$ entries get visited — and hence updated — over time.
515
+
which $(x, a)$ entries get visited -- and hence updated -- over time.
516
516
517
517
In the reinforcement learning literature, this property is called **off-policy**
518
518
learning: the convergence target ($q^*$) does not depend on the behavior policy.
@@ -521,8 +521,8 @@ As long as every $(x, a)$ pair is visited infinitely often (so that every entry
521
521
of the Q-table receives infinitely many updates) and the learning rates satisfy
522
522
standard conditions (see below), the Q-table converges to $q^*$.
523
523
524
-
The behavior policy affects the *speed* of convergence — visiting important
525
-
state-action pairs more frequently leads to faster learning — but not the
524
+
The behavior policy affects the *speed* of convergence -- visiting important
525
+
state-action pairs more frequently leads to faster learning -- but not the
526
526
*limit*.
527
527
528
528
In practice, we want the manager to mostly take good actions (to earn reasonable
@@ -555,11 +555,11 @@ The stochastic demand shocks naturally drive the manager across different invent
555
555
556
556
A simple but powerful technique for accelerating learning is **optimistic initialization**: instead of starting the Q-table at zero, we initialize every entry to a value above the true optimum.
557
557
558
-
Because every untried action looks optimistically good, the agent is "disappointed" whenever it tries one — the update pulls that entry down toward reality. This drives the agent to try other actions (which still look optimistically high), producing broad exploration of the state-action space early in training.
558
+
Because every untried action looks optimistically good, the agent is "disappointed" whenever it tries one -- the update pulls that entry down toward reality. This drives the agent to try other actions (which still look optimistically high), producing broad exploration of the state-action space early in training.
559
559
560
560
This idea is sometimes called **optimism in the face of uncertainty** and is widely used in both bandit and reinforcement learning settings.
561
561
562
-
In our problem, the value function $v^*$ ranges from about 13 to 18. We initialize the Q-table at 20 — modestly above the true maximum — to ensure optimistic exploration without being so extreme as to distort learning.
562
+
In our problem, the value function $v^*$ ranges from about 13 to 18. We initialize the Q-table at 20 -- modestly above the true maximum -- to ensure optimistic exploration without being so extreme as to distort learning.
The Q-learning loop runs for `n_steps` total steps in a single continuous trajectory — just as a real manager would learn from the ongoing stream of data.
584
+
The Q-learning loop runs for `n_steps` total steps in a single continuous trajectory -- just as a real manager would learn from the ongoing stream of data.
585
585
586
586
At specified step counts (given by `snapshot_steps`), we record the current greedy policy.
**Definition** A **Markov perfect equilibrium** of the duopoly model is a pair of value functions $(v_1, v_2)$ and a pair of policy functions $(f_1, f_2)$ such that, for each $i \in \{1, 2\}$ and each possible state,
143
+
```{prf:definition} Markov Perfect Equilibrium
144
+
:label: def-markov-perfect-equilibrium
145
+
146
+
A **Markov perfect equilibrium** of the duopoly model is a pair of value functions $(v_1, v_2)$ and a pair of policy functions $(f_1, f_2)$ such that, for each $i \in \{1, 2\}$ and each possible state,
144
147
145
148
* The value function $v_i$ satisfies Bellman equation {eq}`game4`.
146
149
* The maximizer on the right side of {eq}`game4` equals $f_i(q_i, q_{-i})$.
@@ -150,6 +153,7 @@ The adjective "Markov" denotes that the equilibrium decision rules depend only o
150
153
"Perfect" means complete, in the sense that the equilibrium is constructed by backward induction and hence builds in optimizing behavior for each firm at all possible future states.
151
154
152
155
* These include many states that will not be reached when we iterate forward on the pair of equilibrium strategies $f_i$ starting from a given initial state.
0 commit comments