You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Here's the function that computes a $v$-greedy policy:
152
152
153
153
```{code-cell} ipython3
154
154
@jax.jit
@@ -157,7 +157,8 @@ def get_greedy(v, model):
157
157
return jnp.argmax(B(v, model), axis=2)
158
158
```
159
159
160
-
Now we define the policy operator $T_\sigma$, which is the Bellman operator with policy $\sigma$ fixed.
160
+
Now we define the policy operator $T_\sigma$, which is the Bellman operator with
161
+
policy $\sigma$ fixed.
161
162
162
163
For a given policy $\sigma$, the policy operator is defined by
163
164
@@ -381,22 +382,26 @@ ax.set_title('OPI execution time vs step size m')
381
382
plt.show()
382
383
```
383
384
384
-
The results show interesting behavior across different values of m:
385
+
Here's a summary of the results
385
386
386
-
* When m=1, OPI is actually slower than VFI, even though they should be mathematically equivalent. This is because the OPI implementation has overhead from computing the greedy policy and calling the policy operator, making it less efficient than the direct VFI approach for m=1.
387
+
* When $m=1$, OPI is slight slower than VFI, even though they should be mathematically equivalent, due to small inefficiencies associated with extra function calls.
387
388
388
-
*The optimal performance occurs around m=25-50, where OPI achieves roughly 3x speedup over VFI.
389
+
*OPI outperforms VFI for a very large range of $m$ values.
389
390
390
-
* For very large m (200, 400), performance degrades as we spend too much time iterating the policy operator before updating the policy.
391
+
* For very large $m$, OPI performance begins to degrade as we spend too much
392
+
time iterating the policy operator.
391
393
392
-
This demonstrates that there's a "sweet spot" for the OPI step size m that balances between policy updates and value function iterations.
393
394
394
395
## Exercises
395
396
396
397
```{exercise}
397
398
:label: ifp_opi_ex1
398
399
399
-
Experiment with different parameter values for the income process ($\rho$ and $\nu$) and see how they affect the relative performance of VFI vs OPI.
400
+
The speed gains achieved by OPI are quite robust to parameter changes.
401
+
402
+
Confirm this by experimenting with different parameter values for the income process ($\rho$ and $\nu$).
403
+
404
+
Measure how they affect the relative performance of VFI vs OPI.
0 commit comments