- ## 4.1 Value at Risk

A natural question to ask when making a risky investment is: how bad can things get?

The problem with this question is that there's no limit to the range of possibilities. We could speculate that evil aliens with hopelessly superior technology land on Earth and destroy all human life. This is not a very satisfying answer unless you're a science fiction fan. It's highly unrealistic, and if it does happen, there's nothing we can do about it. Such extreme speculation is not a guide to actions we should take now.

A more interesting question is: how bad are the things that can realistically happen that we might have a chance of planning for and doing something about? This seems more down-to-earth, but we need to know what "realistically" means.

A measure called _Value at Risk_ (or VaR$^1$) was created to answer the question “how bad can things get _realistically_.” It's now widely used and, in some cases, required by financial regulators to be computed and acted on. 

VaR is simply a percentile of a probability distribution of wealth$^2$. Let $F_t(w)$ be the cumulative distribution function for the wealth of an institution or portfolio at some time $t$ years in the future. ($t$ might be denominated in other time units like days and is often left implicit.) For a probability $0\leq p\leq 1$, we find the amount of wealth $w_p$ so that $F_t(w_p)=1-p$. $w_p$ is called the _$t$-year $p$-Value at Risk_.

For example, if a bank wants to estimate how bad things can get in its trading operations, it would let $w$ be the net value of its trading capital at the end of a day. The 99% one-day VaR for its trading operations would be the level that the end-of-day trading capital was expected to exceed, 99 days out of 100.

The obvious problem with VaR is: what happens on the $100^{th}$ day? Suppose a bank decrees that the 99% one-day VaR of its trading capital has to be no more than a certain low but tolerable amount. The bank checks its various trades and finds that everything is OK. But suppose that someone at the bank is running a strategy where every day, she sells \\$100,000 worth of options that have only a 1-in-200 chance of paying off. If the options don't pay off the bank will keep the \\$100,000, but if they do pay off, the bank will owe \\$1,000,000,000. The 99% VaR does not "see" the 99.5% quantile where the options strategy pays off (i.e. loses); it only "sees" the \\$100,000 profits and deems the trade riskless. But the chance that the bank owes a billion dollars is $1-.995^n$ where $n$ is the number of days the strategy is employed; in about six months, there will be a greater-than-50% chance that the bank will owe a billion dollars.

This is an example of _Goodhart's Law_, named after British economist Charles Goodhart's statement in 1974: "Any statistical relationship will break down when used for policy purposes." Goodhart was talking about Frank Knight's second category of risk &mdash; statistical probability (section 1.1) &mdash; in the context of public policy targets set by the Bank of England, the central bank of the United Kingdom. As soon as economic actors know that there is a target, they will alter their behavior so that (for example) events that only happened 1% of the time in the past now happen more often.

Goodhart's Law was extended by [J&oacute;n Dan&iacute;elsson](https://doi.org/10.1016/S0378-4266(02)00263-7) to say "A risk model breaks down when used for its intended purpose." Dan&iacute;elsson pointed out that some people assume in error that
>...the role of the risk forecaster is akin to a meteorologists job, who can forecast the weather, but not influence it. [But] if risk measurements influence people's behavior, it is inappropriate to assume market prices follow an independent stochastic process.

Our example bank tried to define "how bad can things get _realistically_" to exclude far-fetched scenarios like alien invasions by drawing a line at 99% VaR. But as soon as that line was drawn, people were encouraged to do things just on the other side of that line. We will see that there are better measures than VaR, but any measure is subject to gaming. That does not mean that metrics should be abandoned: risk management should be made as rigorous as possible, but no more rigorous. If metric rigor is getting in the way of common sense, metric rigor needs to step aside.

The more formal definition of Value at Risk (VaR) is:
$$VaR(p)=\inf\{x\mid Pr(-X>x)\leq 1-p\}\tag{4.1}$$

$X$ is the random variable of interest. Let’s assume $X$ is in P&L terms, so $–X$ is the distribution of (positive) losses from $X$. The definition (say for $p=99\%$) says we look for the smallest loss $x$ that is exceeded 1% of the time. If $X$ has a continuous cumulative distribution function $F$, then we can write
$$VaR(p)=-F^{-1}(1-p);\hspace{2em}VaR(p)=x \iff \int_{-\infty}^{-x}dF(y)=1-p\tag{4.2}$$
_Expected Shortfall_ is an average of the “bad” VaRs:
$$ES(p)=\frac{1}{1-p}\int_p^1{VaR(z)dz}\tag{4.3}$$

If there is a continuous cdf $F$, then we can define _cVaR, or conditional Value at Risk_, as the expected value given that you are in the VaR tail:
$$cVaR(p)=\frac{-1}{1-p}\int_{-\infty}^{-VaR(p)}{ydF(y)}\tag{4.4}$$
Note that when there is a continuous cdf $F$, the change of variable $1-z=F(y)$ in the definition of $ES(p)$ makes it equivalent to $cVaR(p)$.

The $\frac{1}{1-p}$ factor on the right of (4.3) and (4.4) normalizes the probability density functions so they integrate to one in the tail. For ES, the pdf is uniform, while for cVaR the pdf is the pdf of the random variable X.

Discrete distributions can sometimes cause problems. Consider for example the toss of a weighted coin that loses \\$100, 2% of the time, and gains \\$1 the other 98% of the time. The pdf $f$ is a mass function; $f(-100)=.02$; $f(1)=.98$. From (4.1) we see that $VaR(p)=100$ for $p\geq 98\%$; $VaR(p)=1$ for $p<98\%$.

We now calculate 99% ES and 99% cVaR for this distribution.
- For the ES calculation, the $VaR(z)$ in the integral is a constant: 100. $So ES(99\%)=VaR(99\%)=100$.
- For the cVaR calculation, the integral in (4.4) ranges from $-\infty$ to $-VaR(99\%)=-100$. The $dF(y)$ term equals $f(y)dy$ where $f$ is the probability mass function, which in this range is only nonzero at $y=-100$ where it equals .02. So the $cVaR(99\%)=200$.

So ES and cVaR don't agree here. Usually some kind of interpolation or extrapolation can be applied between probability mass points to fix things up, assuming that such manipulation does not irrevocably depart from the underlying process.

<br><br>
<font size=1>
$^1$ The abbreviation VaR is case-sensitive: usually "Var" means statistical variance (the square of standard deviation), while "VaR" means value at risk.
<br>$^2$ VaR might also be in P&L (profit&loss, i.e. change in wealth) or percentage terms.</font>

- # 4.2 Coherent Risk

In the late 1990s, four co-authors [Artzner, Delbaen, Eber and Heath ("ADEH")](https://people.math.ethz.ch/~delbaen/ftp/preprints/CoherentMF.pdf) postulated a series of mathematical axioms that, they claimed, any reasonable ("coherent") measure of financial risk should have. They were trying to avoid damaging situations like the example of the trading desk selling options. We have of course already seen a reasonable axiomatic system – VNM utility theory – that is mathematically powerful, but doesn’t really capture the range of human economic responses. ADEH is another one – the axioms are reasonable but humans aren’t. But it can still help to avoid some damaging situations.

ADEH's definition of risk is more accounting-oriented than the essentially probabilistic Knight definitions. For ADEH, risk is how much cash has to be put into a portfolio to make it acceptable. An "acceptable" portfolio is one that passes regulatory or other hurdles. So to ADEH, risk is a peril &mdash; the chance of having an unacceptable portfolio &mdash; and  the metric for risk is how much capital you have to set aside in order to mitigate the peril.

For example if your portfolio contains the options trading strategy noted above &mdash; the one where you could lose \\$1,000,000,000 one out of 200 times &mdash; losing a billion dollars would probably be unacceptable. Maybe "acceptable" means never having less than \\$100,000,000. If you added \\$1,100,000,000 dollars in cash to that portfolio, you'd eliminate the possibility of ending up below the threshhold, making the portfolio (or at least that part of the portfolio) acceptable. So the ADEH-type risk of the options trade might be 1.1 billion dollars.

More formally, ADEH defined a _risk measure_ as a mapping from a random variable $X$ to the real numbers. $X$ could be an amount of money; a change in an amount of money; or a rate of return. The outcome space $\Omega$ (which ADEH assume is finite) encompasses all relevant economic conditions one period forward.

ADEH assumed there is a single risk-free rate $R_f$ (a nonrandom random variable specifying how much wealth \\$1 invested in a sure thing will produce one period forward). They put forward four axioms that sensible risk measures $\rho$ must satisfy.
- Axiom T (**Translation Invariance**):
$\rho(X+\alpha R_f)=\rho(X)-\alpha$. This says we can decrease our risk by adding cash to our portfolio. ADEH point out that this means $\rho(X+\rho(X)R_f)=0$, i.e. $\rho(X)$ is the amount of cash that will eliminate risk.
- Axiom S (**Subadditivity**):
$\rho(X+Y)\leq\rho(X)+\rho(Y)$. This says diversification helps, or at least doesn’t hurt, to reduce risk. ADEH note that if this were not true, entities (trading desks, banks, companies, portfolios) would be encouraged to split up into pieces to give the appearance of having less risk. 
- Axiom PH (**Positive Homogeneity**):
If $\lambda\geq0$, then $\rho(\lambda X)=\lambda\rho(X)$. We know from Axiom S that $\rho(nX)\leq n\rho(X)$. This says that equality holds, and fills in between the integers. In effect this ignores liquidity risk, which ADEH acknowledge; they are focused on whether an amount of money is sufficient or not.
- Axiom M (**Monotonicity**): 
If $X\leq Y$, then $\rho(Y)\leq \rho(X)$. Here $X\leq Y$ means that the random variable $Y$ statewise dominates the random variable $X$ (section 1.6). So this axiom says that having more money is less risky than having less money.

A risk measure satisfying these four axioms is called _coherent_.

To see whether you've understood coherent risk, see if you can resolve the following puzzle: Suppose $X$ is a random variable that gives us more money than we have now under any circumstances; say $X(0)=0$ where 0 is the current time, and $X(1)=2$ or $X(1)=4$ with equal probability at time 1 in the future. Let $Y=2X$. Axiom PH says $\rho(Y)=2\rho(X)$. But Axiom M says $\rho(Y)\leq \rho(X)$. How can both statements be true?

The axioms of subadditivity and positive homogeneity are sometimes replaced by a single, weaker axiom:
- Axiom C (**Convexity**)
If $1\geq λ\geq 0$, then $\rho(\lambda X+(1-\lambda)Y)\leq\lambda\rho(X)+(1-\lambda)\rho(Y)$.

A risk measure satisfying translation invariance, convexity, and monotonicity is called a _convex_ risk measure.

We've already seen an unfortunate failing of the VaR measure - it fails to detect rare but disastrous events that are below the probability threshhold. But ADEH give another example that shows that VaR isn't convex, so it certainly isn't coherent.

Suppose we have two digital options on a stock. The first one, A, costs h at time 0 and pays 1000 at time 1 if the value of the stock is over some H, and 0 otherwise. The second digital option, B, costs l (lower case L) at time 0 and pays 1000 at time 1 if the value of the stock is under some L, and 0 otherwise. Here $L<H$.

The payoff profile of buying both A and B: you get 1000 if the stock ends up under L or over H, and nothing in between. Suppose we choose L and H so that $Pr(Stock<L)=Pr(Stock>H)=80bps$. A is a lottery ticket that pays off 1000 less than 1% of the time, and zero the rest of the time. Its expected value is 8, and any risk-averse utility function is going to value it less than 8. Even a risk-loving but reasonable utility function will value it closer to 8 than to 1000. Similarly for option B.

If the risk-free return relative rate is 1 (risk-free rate 0), then there is no time value of money to worry about and the 99% money VaR of writing (going short) 2 A’s is -2h, because 99.2% of the time the option does not pay off and you pocket 2h from writing the options. Having positive money is negative value at risk. Similarly the 99% money VaR of writing 2 B’s is -2l.

However, suppose you write one A and one B. The middle ground between L and H – the place where neither pays off – has $100\%-.8\%-.8\%=98.4\%$ probability. There is a $1.6\%$ chance the stock will end up either very low (less than L, where B pays off) or very high (greater than H, where A pays off). The 99% money VaR of writing one A and one B is 1000-l-h, a (large) positive number.

So this is another example of VaR not seeing concentrated risk. One A and one B is a better-diversified portfolio than two A’s. In effect the VaR measure encourages banks to double up on their low-probability bets (or triple up or quadruple up…). 

You can’t hide problems in the tail with expected shortfall/cVaR – they get averaged in. ES/cVaR does the right thing in the digital options example. It can be shown that expected shortfall (and therefore cVaR when there is a continuous cdf F) is a coherent risk measure.


- # Review of Bayes' Rule

Bayes Rule is simple to state:
$$Pr(E\mid F)=\frac{Pr(F\mid E)Pr(E)}{Pr(F)}$$
It is essentially a restatement of the definition of conditional probability in section 1.6. Despite this uncomplicated derivation, Bayes' Rule is profound. We'll walk through an example to get an intuitive feel for Bayes' Rule.

Suppose you suspect you have a rare disease that makes you unable to compute your personal utility function. Fortunately there's a very accurate test for this condition. You go to the doctor and take the test, which comes back positive. You are now terrified that you won't be able to handle a meeting with generous billionaires as in section 1.4.2, and contemplate entering an intensive treatement program for the condition.

We need to be more precise: by "rare," we mean that only one in ten thousand people has the disease. And by "very accurate," we mean that the test is right 99% of the time.

The relevant outcome space $\Omega$ in this case is the set of roughly 7.5 billion people in the world. $\Omega$ can be divided into four disjoint events:
- Test says you have disease and you have disease;
- Test says you have disease and you don't have disease;
- Test says you don't have disease and you don't have disease; and
- Test says you don't have disease and you do have disease.

From the $10^{-4}$ probability of having the disease, we know that 750,000 people in the world have the disease. This allows us to start filling in a table:

| Outcomes |  Have Disease |  Don't Have |  Total  |
|----------|:-------------:|------------:|:-------:|
| Test+    |               |             |         |
| Test-    |               |             |         |
| Total    |  750,000      | ~7.499Bn    |  7.5Bn  |

Of the 750,000 who have the disease, 99% are properly diagnosed and 1% (7,500) are misdiagnosed as not having the disease. Similarly for the 7.499Bn who don't have the disease. So we can completely fill in the table:

| Outcomes |  Have Disease |  Don't Have   |     Total     |
|----------|:-------------:|--------------:|:-------------:|
| Test+    |  742,500      |  74,250,000   |   74,992,500  |
| Test-    |    7,500      | 7,350,750,000 | 7,350,757,500 |
| Total    |  750,000      | 7,425,000,000 | 7,500,000,000 |

With a positive test result, you are in the Test+ row. But the vast majority of people with a positive test score are test mistakes in the "don't have" column. In fact even with a positive test result, your chance of having the disease is only 1%.

If $n=|\Omega|$ is the size of the outcome space, then we know
- Event E (have disease) has size $10^{-4}n=750,000$.
- Event W (wrong test result) has size $10^{-2}n$
- Event F (positive test result) has size $(10^{-2}+10^{-4}-2\cdot10^{-6})n$   
Event F's size is not directly given but is computed from the off-diagonals of the 2x2 matrix.

| Outcomes |  Have Disease | &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Don't Have &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;                            | &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;     Total     |
|----------|:-------------:|------------------------------------:|:--------------------:|
| Test+    |  $E\cap F$    |  $(\Omega\setminus E)\cap F$                     |   $F$  |
| Test-    | $E\cap (\Omega\setminus F)$      |                 $(\Omega\setminus E)\cap (\Omega\setminus F)$     | $\Omega\setminus F$
| Total    |  $E$      | $\Omega\setminus E$                     | $\Omega$ |

In terms of probabilities, $p_E=10^{-4}$ is the probability of having the disease, and $p_W=10^{-2}$ is the probability of a wrong test result. Then the table looks like:

| Outcomes |  Have Disease | &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Don't Have &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;                            | &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;    Total     |
|----------|:-------------:|------------------------------------:|:--------------------:|
| Test+    |  $p_E(1-p_W)$    |  $(1-p_E)p_W$                     |   $p_F=p_E+p_W-2p_E p_W$  |
| Test-    | $p_Ep_W$      |                 $(1-p_E)(1-p_W)$     | $1-p_E-p_W+2p_E p_W$
| Total    |  $p_E$      | $1-p_E$                     | $1$ |


In [3]:
.99*7.5*.99

7.35075