python3

Robustness

single: Robustness

depth: 2

In addition to what's in Anaconda, this lecture will need the following libraries:

!pip install --upgrade quantecon

Overview

single: Bellman Equation

This lecture modifies a Bellman equation to express a decision-maker's doubts about transition dynamics.

His specification doubts make the decision-maker want a robust decision rule.

Robust means insensitive to misspecification of transition dynamics.

The decision-maker has a single approximating model.

He calls it approximating to acknowledge that he doesn't completely trust it.

He fears that outcomes will actually be determined by another model that he cannot describe explicitly.

All that he knows is that the actual data-generating model is in some (uncountable) set of models that surrounds his approximating model.

He quantifies the discrepancy between his approximating model and the genuine data-generating model by using a quantity called entropy.

(We'll explain what entropy means below)

He wants a decision rule that will work well enough no matter which of those other models actually governs outcomes.

This is what it means for his decision rule to be "robust to misspecification of an approximating model".

This may sound like too much to ask for, but ….

… a secret weapon is available to design robust decision rules.

The secret weapon is max-min control theory.

A value-maximizing decision-maker enlists the aid of an (imaginary) value-minimizing model chooser to construct bounds on the value attained by a given decision rule under different models of the transition dynamics.

The original decision-maker uses those bounds to construct a decision rule with an assured performance level, no matter which model actually governs outcomes.

Note

In reading this lecture, please don't think that our decision-maker is paranoid when he conducts a worst-case analysis. By designing a rule that works well against a worst-case, his intention is to construct a rule that will work well across a set of models.

Let's start with some imports:

import pandas as pd
import numpy as np
from scipy.linalg import eig
import matplotlib.pyplot as plt
%matplotlib inline
import quantecon as qe

Sets of Models Imply Sets Of Values

Our "robust" decision-maker wants to know how well a given rule will work when he does not know a single transition law ….

… he wants to know sets of values that will be attained by a given decision rule F under a set of transition laws.

Ultimately, he wants to design a decision rule F that shapes these sets of values in ways that he prefers.

With this in mind, consider the following graph, which relates to a particular decision problem to be explained below

The figure shows a value-entropy correspondence for a particular decision rule F.

The shaded set is the graph of the correspondence, which maps entropy to a set of values associated with a set of models that surround the decision-maker's approximating model.

Here

Value refers to a sum of discounted rewards obtained by applying the decision rule F when the state starts at some fixed initial state x₀.
Entropy is a non-negative number that measures the size of a set of models surrounding the decision-maker's approximating model.
- Entropy is zero when the set includes only the approximating model, indicating that the decision-maker completely trusts the approximating model.
- Entropy is bigger, and the set of surrounding models is bigger, the less the decision-maker trusts the approximating model.

The shaded region indicates that for all models having entropy less than or equal to the number on the horizontal axis, the value obtained will be somewhere within the indicated set of values.

Now let's compare sets of values associated with two different decision rules, F_r and F_b.

In the next figure,

The red set shows the value-entropy correspondence for decision rule F_r.
The blue set shows the value-entropy correspondence for decision rule F_b.

The blue correspondence is skinnier than the red correspondence.

This conveys the sense in which the decision rule F_b is more robust than the decision rule F_r

more robust means that the set of values is less sensitive to increasing misspecification as measured by entropy

Notice that the less robust rule F_r promises higher values for small misspecifications (small entropy).

(But it is more fragile in the sense that it is more sensitive to perturbations of the approximating model)

Below we'll explain in detail how to construct these sets of values for a given F, but for now ….

Here is a hint about the secret weapons we'll use to construct these sets

We'll use some min problems to construct the lower bounds
We'll use some max problems to construct the upper bounds

We will also describe how to choose F to shape the sets of values.

This will involve crafting a skinnier set at the cost of a lower level (at least for low values of entropy).

Inspiring Video

If you want to understand more about why one serious quantitative researcher is interested in this approach, we recommend Lars Peter Hansen's Nobel lecture.

Other References

Our discussion in this lecture is based on

HansenSargent2000
HansenSargent2008

The Model

For simplicity, we present ideas in the context of a class of problems with linear transition laws and quadratic objective functions.

To fit in with our earlier lecture on LQ control, we will treat loss minimization rather than value maximization.

To begin, recall the infinite horizon LQ problem, where an agent chooses a sequence of controls {u_t} to minimize

$$\sum_{t=0}^{\infty} \beta^t \left\{ x_t' R x_t + u_t' Q u_t \right\}$$

subject to the linear law of motion

x_t + 1 = Ax_t + Bu_t + Cw_t + 1, t = 0, 1, 2, …

As before,

x_t is n × 1, A is n × n
u_t is k × 1, B is n × k
w_t is j × 1, C is n × j
R is n × n and Q is k × k

Here x_t is the state, u_t is the control, and w_t is a shock vector.

For now, we take {w_t} := {w_t}_t = 1^∞ to be deterministic --- a single fixed sequence.

We also allow for model uncertainty on the part of the agent solving this optimization problem.

In particular, the agent takes w_t = 0 for all t ≥ 0 as a benchmark model but admits the possibility that this model might be wrong.

As a consequence, she also considers a set of alternative models expressed in terms of sequences {w_t} that are "close" to the zero sequence.

She seeks a policy that will do well enough for a set of alternative models whose members are pinned down by sequences {w_t}.

Soon we'll quantify the quality of a model specification in terms of the maximal size of the expression $\sum_{t=0}^{\infty} \beta^{t+1}w_{t+1}' w_{t+1}$.

Constructing More Robust Policies

If our agent takes {w_t} as a given deterministic sequence, then, drawing on intuition from earlier lectures on dynamic programming, we can anticipate Bellman equations such as

J_t − 1(x) = min_u{x′Rx + u′Qu + β J_t(Ax + Bu + Cw_t)}

(Here J depends on t because the sequence {w_t} is not recursive)

Our tool for studying robustness is to construct a rule that works well even if an adverse sequence {w_t} occurs.

In our framework, "adverse" means "loss increasing".

As we'll see, this will eventually lead us to construct the Bellman equation

J(x) = min_umax_w{x′Rx + u′Qu + β [J(Ax + Bu + Cw) − θw′w]}

Notice that we've added the penalty term − θw′w.

Since w′w = ∥w∥², this term becomes influential when w moves away from the origin.

The penalty parameter θ controls how much we penalize the maximizing agent for "harming" the minimizing agent.

By raising θ more and more, we more and more limit the ability of maximizing agent to distort outcomes relative to the approximating model.

So bigger θ is implicitly associated with smaller distortion sequences {w_t}.

Analyzing the Bellman Equation

So what does J in rb_wcb0 look like?

As with the ordinary LQ control model, J takes the form J(x) = x′Px for some symmetric positive definite matrix P.

One of our main tasks will be to analyze and compute the matrix P.

Related tasks will be to study associated feedback rules for u_t and w_t + 1.

First, using matrix calculus, you will be able to verify that

$$\begin{aligned} \begin{aligned} \max_w &\{ (Ax + B u + C w)' P (Ax + B u + C w) - \theta w'w \} \\\ & \hspace{20mm} = (Ax + Bu)' \mathcal D(P) (Ax + Bu) \end{aligned} \end{aligned}$$

where

𝒟(P) := P + PC(θI − C′PC)^− 1C′P

and I is a j × j identity matrix. Substituting this expression for the maximum into rb_wcb0 yields

x′Px = min_u{x′Rx + u′Qu + β (Ax + Bu)′𝒟(P)(Ax + Bu)}

Using similar mathematics, the solution to this minimization problem is u = − Fx where F := (Q + βB′𝒟(P)B)^− 1βB′𝒟(P)A.

Substituting this minimizer back into rb_owb and working through the algebra gives x′Px = x′ℬ(𝒟(P))x for all x, or, equivalently,

P = ℬ(𝒟(P))

where 𝒟 is the operator defined in rb_d and

ℬ(P) := R − β²A′PB(Q + βB′PB)^− 1B′PA + βA′PA

The operator ℬ is the standard (i.e., non-robust) LQ Bellman operator, and P = ℬ(P) is the standard matrix Riccati equation coming from the Bellman equation --- see this discussion.

Under some regularity conditions (see HansenSargent2008), the operator ℬ ∘ 𝒟 has a unique positive definite fixed point, which we denote below by P̂.

A robust policy, indexed by θ, is u = − F̂x where

F̂ := (Q + βB′𝒟(P̂)B)^− 1βB′𝒟(P̂)A

We also define

K̂ := (θI − C′P̂C)^− 1C′P̂(A − BF̂)

The interpretation of K̂ is that w_t + 1 = K̂x_t on the worst-case path of {x_t}, in the sense that this vector is the maximizer of rb_mp0 evaluated at the fixed rule u = − F̂x.

Note that P̂, F̂, K̂ are all determined by the primitives and θ.

Note also that if θ is very large, then 𝒟 is approximately equal to the identity mapping.

Hence, when θ is large, P̂ and F̂ are approximately equal to their standard LQ values.

Furthermore, when θ is large, K̂ is approximately equal to zero.

Conversely, smaller θ is associated with greater fear of model misspecification and greater concern for robustness.

Robustness as Outcome of a Two-Person Zero-Sum Game

What we have done above can be interpreted in terms of a two-person zero-sum game in which F̂, K̂ are Nash equilibrium objects.

Agent 1 is our original agent, who seeks to minimize loss in the LQ program while admitting the possibility of misspecification.

Agent 2 is an imaginary malevolent player.

Agent 2's malevolence helps the original agent to compute bounds on his value function across a set of models.

We begin with agent 2's problem.

Agent 2's Problem

Agent 2

knows a fixed policy F specifying the behavior of agent 1, in the sense that u_t = − Fx_t for all t
responds by choosing a shock sequence {w_t} from a set of paths sufficiently close to the benchmark sequence {0, 0, 0, …}

A natural way to say "sufficiently close to the zero sequence" is to restrict the summed inner product $\sum_{t=1}^{\infty} w_t' w_t$ to be small.

However, to obtain a time-invariant recursive formulation, it turns out to be convenient to restrict a discounted inner product

$$\sum_{t=1}^{\infty} \beta^t w_t' w_t \leq \eta$$

Now let F be a fixed policy, and let J_F(x₀, w) be the present-value cost of that policy given sequence w := {w_t} and initial condition x₀ ∈ ℝⁿ.

Substituting − Fx_t for u_t in rob_sih, this value can be written as

$$J_F(x_0, \mathbf w) := \sum_{t=0}^{\infty} \beta^t x_t' (R + F' Q F) x_t$$

where

x_t + 1 = (A − BF)x_t + Cw_t + 1

and the initial condition x₀ is as specified in the left side of rob_fpv.

Agent 2 chooses w to maximize agent 1's loss J_F(x₀, w) subject to rb_dec.

Using a Lagrangian formulation, we can express this problem as

$$\max_{\mathbf w} \sum_{t=0}^{\infty} \beta^t \left\{ x_t' (R + F' Q F) x_t - \beta \theta (w_{t+1}' w_{t+1} - \eta) \right\}$$

where {x_t} satisfied rob_lomf and θ is a Lagrange multiplier on constraint rb_dec.

For the moment, let's take θ as fixed, allowing us to drop the constant βθη term in the objective function, and hence write the problem as

$$\max_{\mathbf w} \sum_{t=0}^{\infty} \beta^t \left\{ x_t' (R + F' Q F) x_t - \beta \theta w_{t+1}' w_{t+1} \right\}$$

or, equivalently,

$$\min_{\mathbf w} \sum_{t=0}^{\infty} \beta^t \left\{ -x_t' (R + F' Q F) x_t + \beta \theta w_{t+1}' w_{t+1} \right\}$$

subject to rob_lomf.

What's striking about this optimization problem is that it is once again an LQ discounted dynamic programming problem, with w = {w_t} as the sequence of controls.

The expression for the optimal policy can be found by applying the usual LQ formula (see here).

We denote it by K(F, θ), with the interpretation w_t + 1 = K(F, θ)x_t.

The remaining step for agent 2's problem is to set θ to enforce the constraint rb_dec, which can be done by choosing θ = θ_η such that

$$\beta \sum_{t=0}^{\infty} \beta^t x_t' K(F, \theta_\eta)' K(F, \theta_\eta) x_t = \eta$$

Here x_t is given by rob_lomf --- which in this case becomes x_t + 1 = (A − BF + CK(F, θ))x_t.

Using Agent 2's Problem to Construct Bounds on the Value Sets

The Lower Bound

Define the minimized object on the right side of problem rb_a2o as R_θ(x₀, F).

Because "minimizers minimize" we have

$$R_\theta(x_0, F) \leq \sum_{t=0}^\infty \beta^t \left\{ - x_t' (R + F' Q F) x_t \right\} + \beta \theta \sum_{t=0}^\infty \beta^t w_{t+1}' w_{t+1},$$

where x_t + 1 = (A − BF + CK(F, θ))x_t and x₀ is a given initial condition.

This inequality in turn implies the inequality

$$R_\theta(x_0, F) - \theta \ {\rm ent} \leq \sum_{t=0}^\infty \beta^t \left\{ - x_t' (R + F' Q F) x_t \right\}$$

where

$${\rm ent} := \beta \sum_{t=0}^\infty \beta^t w_{t+1}' w_{t+1}$$

The left side of inequality rob_bound is a straight line with slope − θ.

Technically, it is a "separating hyperplane".

At a particular value of entropy, the line is tangent to the lower bound of values as a function of entropy.

In particular, the lower bound on the left side of rob_bound is attained when

$${\rm ent} = \beta \sum_{t=0}^{\infty} \beta^t x_t' K(F, \theta)' K(F, \theta) x_t$$

To construct the lower bound on the set of values associated with all perturbations w satisfying the entropy constraint rb_dec at a given entropy level, we proceed as follows:

For a given θ, solve the minimization problem rb_a2o.
Compute the minimizer R_θ(x₀, F) and the associated entropy using rb_pdt22.
Compute the lower bound on the value function $R_\theta(x_0, F) - \theta \ {\rm ent}$ and plot it against ${\rm ent}$.
Repeat the preceding three steps for a range of values of θ to trace out the lower bound.

Note

This procedure sweeps out a set of separating hyperplanes indexed by different values for the Lagrange multiplier θ.

The Upper Bound

To construct an upper bound we use a very similar procedure.

We simply replace the minimization problem rb_a2o with the maximization problem

$$V_{\tilde \theta}(x_0, F) = \max_{\mathbf w} \sum_{t=0}^{\infty} \beta^t \left\{ -x_t' (R + F' Q F) x_t - \beta \tilde \theta w_{t+1}' w_{t+1} \right\}$$

where now θ̃ > 0 penalizes the choice of w with larger entropy.

(Notice that θ̃ = − θ in problem rb_a2o)

Because "maximizers maximize" we have

$$V_{\tilde \theta}(x_0, F) \geq \sum_{t=0}^\infty \beta^t \left\{ - x_t' (R + F' Q F) x_t \right\} - \beta \tilde \theta \sum_{t=0}^\infty \beta^t w_{t+1}' w_{t+1}$$

which in turn implies the inequality

$$V_{\tilde \theta}(x_0, F) + \tilde \theta \ {\rm ent} \geq \sum_{t=0}^\infty \beta^t \left\{ - x_t' (R + F' Q F) x_t \right\}$$

where

$${\rm ent} \equiv \beta \sum_{t=0}^\infty \beta^t w_{t+1}' w_{t+1}$$

The left side of inequality robboundmax is a straight line with slope θ̃.

The upper bound on the left side of robboundmax is attained when

$${\rm ent} = \beta \sum_{t=0}^{\infty} \beta^t x_t' K(F, \tilde \theta)' K(F, \tilde \theta) x_t$$

To construct the upper bound on the set of values associated all perturbations w with a given entropy we proceed much as we did for the lower bound

For a given θ̃, solve the maximization problem rba2omax.
Compute the maximizer V_θ̃(x₀, F) and the associated entropy using rbpdt223.
Compute the upper bound on the value function $V_{\tilde \theta}(x_0, F) + \tilde \theta \ {\rm ent}$ and plot it against ${\rm ent}$.
Repeat the preceding three steps for a range of values of θ̃ to trace out the upper bound.

Reshaping the Set of Values

Now in the interest of reshaping these sets of values by choosing F, we turn to agent 1's problem.

Agent 1's Problem

Now we turn to agent 1, who solves

$$\min_{\{u_t\}} \sum_{t=0}^{\infty} \beta^t \left\{ x_t' R x_t + u_t' Q u_t - \beta \theta w_{t+1}' w_{t+1} \right\}$$

where {w_t + 1} satisfies w_t + 1 = Kx_t.

In other words, agent 1 minimizes

$$\sum_{t=0}^{\infty} \beta^t \left\{ x_t' (R - \beta \theta K' K ) x_t + u_t' Q u_t \right\}$$

subject to

x_t + 1 = (A + CK)x_t + Bu_t

Once again, the expression for the optimal policy can be found here --- we denote it by F̃.

Nash Equilibrium

Clearly, the F̃ we have obtained depends on K, which, in agent 2's problem, depended on an initial policy F.

Holding all other parameters fixed, we can represent this relationship as a mapping Φ, where

F̃ = Φ(K(F, θ))

The map F ↦ Φ(K(F, θ)) corresponds to a situation in which

agent 1 uses an arbitrary initial policy F
agent 2 best responds to agent 1 by choosing K(F, θ)
agent 1 best responds to agent 2 by choosing F̃ = Φ(K(F, θ))

As you may have already guessed, the robust policy F̂ defined in rb_oc_ih is a fixed point of the mapping Φ.

In particular, for any given θ,

K(F̂, θ) = K̂, where K̂ is as given in rb_kd
Φ(K̂) = F̂

A sketch of the proof is given in the appendix <rb_appendix>.

The Stochastic Case

Now we turn to the stochastic case, where the sequence {w_t} is treated as an IID sequence of random vectors.

In this setting, we suppose that our agent is uncertain about the conditional probability distribution of w_t + 1.

The agent takes the standard normal distribution N(0, I) as the baseline conditional distribution, while admitting the possibility that other "nearby" distributions prevail.

These alternative conditional distributions of w_t + 1 might depend nonlinearly on the history x_s, s ≤ t.

To implement this idea, we need a notion of what it means for one distribution to be near another one.

Here we adopt a very useful measure of closeness for distributions known as the relative entropy, or Kullback-Leibler divergence.

For densities p, q, the Kullback-Leibler divergence of q from p is defined as

$$D_{KL} (p, q) := \int \ln \left[ \frac{p(x)}{q(x)} \right] p(x) \, dx$$

Using this notation, we replace rb_wcb0 with the stochastic analog

J(x) = min_umax_{ψ ∈ 𝒫}{x′Rx+u′Qu+β [∫J(Ax+Bu+Cw) ψ(dw)−θD_KL(ψ,ϕ)]}

Here 𝒫 represents the set of all densities on ℝⁿ and ϕ is the benchmark distribution N(0, I).

The distribution ϕ is chosen as the least desirable conditional distribution in terms of next period outcomes, while taking into account the penalty term θD_KL(ψ, ϕ).

This penalty term plays a role analogous to the one played by the deterministic penalty θw′w in rb_wcb0, since it discourages large deviations from the benchmark.

Solving the Model

The maximization problem in rb_wcb1 appears highly nontrivial --- after all, we are maximizing over an infinite dimensional space consisting of the entire set of densities.

However, it turns out that the solution is tractable, and in fact also falls within the class of normal distributions.

First, we note that J has the form J(x) = x′Px + d for some positive definite matrix P and constant real number d.

Moreover, it turns out that if (I − θ^− 1C′PC)^− 1 is nonsingular, then

$$\begin{aligned} \begin{aligned} \max_{\psi \in \mathcal P} &\left\{ \int (Ax + B u + C w)' P (Ax + B u + C w) \, \psi(dw) - \theta D_{KL}(\psi, \phi) \right\} \\\ & \hspace{20mm} = (Ax + Bu)' \mathcal D (P) (Ax + Bu) + \kappa(\theta, P) \end{aligned} \end{aligned}$$

where

κ(θ, P) := θln [det (I − θ^− 1C′PC)^− 1]

and the maximizer is the Gaussian distribution

ψ = N((θI−C′PC)^− 1C′P(Ax+Bu),(I−θ^− 1C′PC)^− 1)

Substituting the expression for the maximum into Bellman equation rb_wcb1 and using J(x) = x′Px + d gives

x′Px + d = min_u{x′Rx+u′Qu+β (Ax+Bu)′𝒟(P)(Ax+Bu)+β [d+κ(θ,P)]}

Since constant terms do not affect minimizers, the solution is the same as rb_owb, leading to

x′Px + d = x′ℬ(𝒟(P))x + β [d + κ(θ, P)]

To solve this Bellman equation, we take P̂ to be the positive definite fixed point of ℬ ∘ 𝒟.

In addition, we take d̂ as the real number solving d = β [d + κ(θ, P)], which is

$$\hat d := \frac{\beta}{1 - \beta} \kappa(\theta, P)$$

The robust policy in this stochastic case is the minimizer in rb_wcb2, which is once again u = − F̂x for F̂ given by rb_oc_ih.

Substituting the robust policy into rb_md we obtain the worst-case shock distribution:

w_t + 1 ∼ N(K̂x_t, (I − θ^− 1C′P̂C)^− 1)

where K̂ is given by rb_kd.

Note that the mean of the worst-case shock distribution is equal to the same worst-case w_t + 1 as in the earlier deterministic setting.

Computing Other Quantities

Before turning to implementation, we briefly outline how to compute several other quantities of interest.

Worst-Case Value of a Policy

One thing we will be interested in doing is holding a policy fixed and computing the discounted loss associated with that policy.

So let F be a given policy and let J_F(x) be the associated loss, which, by analogy with rb_wcb1, satisfies

J_F(x) = max_{ψ ∈ 𝒫}{x′(R+F′QF)x+β [∫J_F((A−BF)x+Cw) ψ(dw)−θD_KL(ψ,ϕ)]}

Writing J_F(x) = x′P_Fx + d_F and applying the same argument used to derive rb_mls we get

x′P_Fx + d_F = x′(R + F′QF)x + β [x′(A−BF)′𝒟(P_F)(A−BF)x+d_F+κ(θ,P_F)]

To solve this we take P_F to be the fixed point

P_F = R + F′QF + β(A − BF)′𝒟(P_F)(A − BF)

and

$$d_F := \frac{\beta}{1 - \beta} \kappa(\theta, P_F) = \frac{\beta}{1 - \beta} \theta \ln [ \det(I - \theta^{-1} C' P_F C)^{-1} ]$$

If you skip ahead to the appendix <rb_appendix>, you will be able to verify that − P_F is the solution to the Bellman equation in agent 2's problem discussed above <rb_a2> --- we use this in our computations.

Implementation

The QuantEcon.py package provides a class called RBLQ for implementation of robust LQ optimal control.

The code can be found on GitHub.

Here is a brief description of the methods of the class

d_operator() and b_operator() implement 𝒟 and ℬ respectively
robust_rule() and robust_rule_simple() both solve for the triple F̂, K̂, P̂, as described in equations rb_oc_ih -- rb_kd and the surrounding discussion
- robust_rule() is more efficient
- robust_rule_simple() is more transparent and easier to follow
K_to_F() and F_to_K() solve the decision problems of agent 1 <rb_a1> and agent 2 <rb_a2> respectively
compute_deterministic_entropy() computes the left-hand side of rb_pdt
evaluate_F() computes the loss and entropy associated with a given policy --- see this discussion <rb_coq>

Application

Let us consider a monopolist similar to this one, but now facing model uncertainty.

The inverse demand function is p_t = a₀ − a₁y_t + d_t.

where

$$d_{t+1} = \rho d_t + \sigma_d w_{t+1}, \quad \{w_t\} \stackrel{\textrm{IID}}{\sim} N(0,1)$$

and all parameters are strictly positive.

The period return function for the monopolist is

$$r_t = p_t y_t - \gamma \frac{(y_{t+1} - y_t)^2}{2} - c y_t$$

Its objective is to maximize expected discounted profits, or, equivalently, to minimize $\mathbb E \sum_{t=0}^\infty \beta^t (- r_t)$.

To form a linear regulator problem, we take the state and control to be

$$\begin{aligned} x_t = \begin{bmatrix} 1 \\ y_t \\ d_t \end{bmatrix} \quad \text{and} \quad u_t = y_{t+1} - y_t \end{aligned}$$

Setting b := (a₀ − c)/2 we define

$$\begin{aligned} R = - \begin{bmatrix} 0 & b & 0 \\\ b & -a_1 & 1/2 \\\ 0 & 1/2 & 0 \end{bmatrix} \quad \text{and} \quad Q = \gamma / 2 \end{aligned}$$

For the transition matrices, we set

$$\begin{aligned} A = \begin{bmatrix} 1 & 0 & 0 \\\ 0 & 1 & 0 \\\ 0 & 0 & \rho \end{bmatrix}, \qquad B = \begin{bmatrix} 0 \\\ 1 \\\ 0 \end{bmatrix}, \qquad C = \begin{bmatrix} 0 \\\ 0 \\\ \sigma_d \end{bmatrix} \end{aligned}$$

Our aim is to compute the value-entropy correspondences shown above <rb_vec>.

The parameters are

a₀ = 100, a₁ = 0.5, ρ = 0.9, σ_d = 0.05, β = 0.95, c = 2, γ = 50.0

The standard normal distribution for w_t is understood as the agent's baseline, with uncertainty parameterized by θ.

We compute value-entropy correspondences for two policies

The no concern for robustness policy F₀, which is the ordinary LQ loss minimizer.
A "moderate" concern for robustness policy F_b, with θ = 0.02.

The code for producing the graph shown above, with blue being for the robust policy, is as follows

# Model parameters

a_0 = 100
a_1 = 0.5
ρ = 0.9
σ_d = 0.05
β = 0.95
c = 2
γ = 50.0

θ = 0.002
ac = (a_0 - c) / 2.0

# Define LQ matrices

R = np.array([[0.,   ac,   0.],
            [ac, -a_1,  0.5],
            [0.,  0.5,  0.]])

R = -R  # For minimization
Q = γ / 2

A = np.array([[1., 0., 0.],
            [0., 1., 0.],
            [0., 0., ρ]])
B = np.array([[0.],
            [1.],
            [0.]])
C = np.array([[0.],
            [0.],
            [σ_d]])

# ----------------------------------------------------------------------- #
#                                 Functions
# ----------------------------------------------------------------------- #


def evaluate_policy(θ, F):

    """
    Given θ (scalar, dtype=float) and policy F (array_like), returns the
    value associated with that policy under the worst case path for {w_t},
    as well as the entropy level.
    """

    rlq = qe.robustlq.RBLQ(Q, R, A, B, C, β, θ)
    K_F, P_F, d_F, O_F, o_F = rlq.evaluate_F(F)
    x0 = np.array([[1.], [0.], [0.]])
    value = - x0.T @ P_F @ x0 - d_F
    entropy = x0.T @ O_F @ x0 + o_F
    return list(map(float, (value, entropy)))


def value_and_entropy(emax, F, bw, grid_size=1000):

    """
    Compute the value function and entropy levels for a θ path
    increasing until it reaches the specified target entropy value.

    Parameters
    ==========
    emax: scalar
        The target entropy value

    F: array_like
        The policy function to be evaluated

    bw: str
        A string specifying whether the implied shock path follows best
        or worst assumptions. The only acceptable values are 'best' and
        'worst'.

    Returns
    =======
    df: pd.DataFrame
        A pandas DataFrame containing the value function and entropy
        values up to the emax parameter. The columns are 'value' and
        'entropy'.
    """

    if bw == 'worst':
        θs = 1 / np.linspace(1e-8, 1000, grid_size)
    else:
        θs = -1 / np.linspace(1e-8, 1000, grid_size)

    df = pd.DataFrame(index=θs, columns=('value', 'entropy'))

    for θ in θs:
        df.loc[θ] = evaluate_policy(θ, F)
        if df.loc[θ, 'entropy'] >= emax:
            break

    df = df.dropna(how='any')
    return df


# ------------------------------------------------------------------------ #
#                                    Main
# ------------------------------------------------------------------------ #


# Compute the optimal rule
optimal_lq = qe.lqcontrol.LQ(Q, R, A, B, C, beta=β)
Po, Fo, do = optimal_lq.stationary_values()

# Compute a robust rule given θ
baseline_robust = qe.robustlq.RBLQ(Q, R, A, B, C, β, θ)
Fb, Kb, Pb = baseline_robust.robust_rule()

# Check the positive definiteness of worst-case covariance matrix to
# ensure that θ exceeds the breakdown point
test_matrix = np.identity(Pb.shape[0]) - (C.T @ Pb @ C) / θ
eigenvals, eigenvecs = eig(test_matrix)
assert (eigenvals >= 0).all(), 'θ below breakdown point.'


emax = 1.6e6

optimal_best_case = value_and_entropy(emax, Fo, 'best')
robust_best_case = value_and_entropy(emax, Fb, 'best')
optimal_worst_case = value_and_entropy(emax, Fo, 'worst')
robust_worst_case = value_and_entropy(emax, Fb, 'worst')

fig, ax = plt.subplots()

ax.set_xlim(0, emax)
ax.set_ylabel("Value")
ax.set_xlabel("Entropy")
ax.grid()

for axis in 'x', 'y':
    plt.ticklabel_format(style='sci', axis=axis, scilimits=(0, 0))

plot_args = {'lw': 2, 'alpha': 0.7}

colors = 'r', 'b'

df_pairs = ((optimal_best_case, optimal_worst_case),
            (robust_best_case, robust_worst_case))


class Curve:

    def __init__(self, x, y):
        self.x, self.y = x, y

    def __call__(self, z):
        return np.interp(z, self.x, self.y)


for c, df_pair in zip(colors, df_pairs):
    curves = []
    for df in df_pair:
        # Plot curves
        x, y = df['entropy'], df['value']
        x, y = (np.asarray(a, dtype='float') for a in (x, y))
        egrid = np.linspace(0, emax, 100)
        curve = Curve(x, y)
        print(ax.plot(egrid, curve(egrid), color=c, **plot_args))
        curves.append(curve)
    # Color fill between curves
    ax.fill_between(egrid,
                    curves[0](egrid),
                    curves[1](egrid),
                    color=c, alpha=0.1)

plt.show()

Here's another such figure, with θ = 0.002 instead of 0.02

Can you explain the different shape of the value-entropy correspondence for the robust policy?

Appendix

We sketch the proof only of the first claim in this section <rb_eq>, which is that, for any given θ, K(F̂, θ) = K̂, where K̂ is as given in rb_kd.

This is the content of the next lemma.

Lemma. If P̂ is the fixed point of the map ℬ ∘ 𝒟 and F̂ is the robust policy as given in rb_oc_ih, then

K(F̂, θ) = (θI − C′P̂C)^− 1C′P̂(A − BF̂)

Proof: As a first step, observe that when F = F̂, the Bellman equation associated with the LQ problem rob_lomf -- rb_a2o is

P̃ = − R − F̂′QF̂ − β²(A − BF̂)′P̃C(βθI + βC′P̃C)^− 1C′P̃(A − BF̂) + β(A − BF̂)′P̃(A − BF̂)

(revisit this discussion if you don't know where rb_a2be comes from) and the optimal policy is

w_t + 1 = − β(βθI + βC′P̃C)^− 1C′P̃(A − BF̂)x_t

Suppose for a moment that − P̂ solves the Bellman equation rb_a2be.

In this case, the policy becomes

w_t + 1 = (θI − C′P̂C)^− 1C′P̂(A − BF̂)x_t

which is exactly the claim in rb_kft.

Hence it remains only to show that − P̂ solves rb_a2be, or, in other words,

P̂ = R + F̂′QF̂ + β(A − BF̂)′P̂C(θI − C′P̂C)^− 1C′P̂(A − BF̂) + β(A − BF̂)′P̂(A − BF̂)

Using the definition of 𝒟, we can rewrite the right-hand side more simply as

R + F̂′QF̂ + β(A − BF̂)′𝒟(P̂)(A − BF̂)

Although it involves a substantial amount of algebra, it can be shown that the latter is just P̂.

(Hint: Use the fact that P̂ = ℬ(𝒟(P̂)))

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

robustness.rst

robustness.rst

Robustness

Overview

Sets of Models Imply Sets Of Values

Inspiring Video

Other References

The Model

Constructing More Robust Policies

Analyzing the Bellman Equation

Robustness as Outcome of a Two-Person Zero-Sum Game

Agent 2's Problem

Using Agent 2's Problem to Construct Bounds on the Value Sets

The Lower Bound

The Upper Bound

Reshaping the Set of Values

Agent 1's Problem

Nash Equilibrium

The Stochastic Case

Solving the Model

Computing Other Quantities

Worst-Case Value of a Policy

Implementation

Application

Appendix

Files

robustness.rst

Latest commit

History

robustness.rst

File metadata and controls

Robustness

Overview

Sets of Models Imply Sets Of Values

Inspiring Video

Other References

The Model

Constructing More Robust Policies

Analyzing the Bellman Equation

Robustness as Outcome of a Two-Person Zero-Sum Game

Agent 2's Problem

Using Agent 2's Problem to Construct Bounds on the Value Sets

The Lower Bound

The Upper Bound

Reshaping the Set of Values

Agent 1's Problem

Nash Equilibrium

The Stochastic Case

Solving the Model

Computing Other Quantities

Worst-Case Value of a Policy

Implementation

Application

Appendix