+ # Chapter 1. What is Risk?

Risk is _lack of information about the future._ A situation is risky if it has widely varying possible outcomes and we are unable to determine with high confidence which outcome will occur. A riskless or risk-free situation is one whose future is known exactly.

Humans outpace other species in the ability to worry, so there is a tendency to focus on whether or not negative outcomes might loom in the future. Indeed, most definitions of the noun "risk" stress the downside: for example, _the possibility that something unpleasant or unwelcome will happen_ is the first of five definitions of risk in the [Oxford English Dictionary](https://en.oxforddictionaries.com/definition/risk). This is a gloomy strain of our more general definition above: an earthquake is not a risk because we lack information about how enjoyable it will be. It is a risk because we lack information about whether or not this unequivocally unpleasant thing will happen.

Expressing your love for the first time to another person involves _the possibility that something unpleasant or unwelcome will happen:_ the other person may reject you. So why do people override their pounding pulses and stammer out their feelings? It is because the unknown future also includes positive outcomes: the other person may return your affection. So declaring your feelings is a risky situation, but you do it because there are good as well as bad possible outcomes.

The Oxford English Dictionary comes to encompassing positive outcomes in definition 4 of risk: _A person or thing regarded as likely to turn out well or badly in a particular context or respect. 'Western banks regarded Romania as a good risk.’_ The phrase "good risk" in the example indicates that Western banks thought the good outcome (getting repaid by Romania) was sufficiently likely that they would do well to loan to the country.

So in practice there are two kinds of risk:
 - **perils**, where there are no positive outcomes. There is only (a) nothing happens; or (b) bad things happen. Earthquakes, forest fires, floods, hurricanes and other natural disasters are risks in this sense.
 - **gambles**, where there are both positive and negative outcomes. Investments - such as loaning money to Romania - are risks in this sense.

This class is about managing risk.
 - **Perils** are managed by avoidance (stay out of earthquake zones) and fortification (build strong structures).
 - **Gambles** require a more subtle approach, because both positive and negative outcomes have to be weighed against each other. Should I declare my love? Should I loan to Romania? These are tough questions.


+ # 1.1 Frank Knight's Formulation

In 1921, Frank Knight - then an Associate Professor of Economics at the University of Iowa; later the influential head of the University of Chicago's economics department - wrote [_Risk, Uncertainty, and Profit_](http://www.econlib.org/library/Knight/knRUP.html).

Knight delved into the philosophical nature of knowledge itself, but eventually got down to a very pragmatic list of ways in which we can organize our knowledge (or lack thereof) about the future:
>1. $\underline{\text{A priori probability}}$. Absolutely homogeneous classification of instances completely identical except for really indeterminate factors. This judgment of probability is on the same logical plane as the laws of mathematics.
2. $\underline{\text{Statistical probability}}$. Empirical evaluation of the frequency of association between predicates, not analyzable into varying combinations of equally probable alternatives.
3. $\underline{\text{Estimates}}$. The distinction here is that there is _no valid basis of any kind_ for classifying instances.
(pp. 224-225).

An example of something with _a priori_ probability is the throw of a perfect die. As Knight says, "the mathematician can easily calculate the probability that any proposed distribution of results will come out of any given number of throws." That doesn't help us know which face will come up on the next throw of the die; of that we remain ignorant. But we know that betting even money on the same face coming up 50 times in a row is a bad idea.

A little less information about the future attaches to Knight's second category, _statistical probability_. An example here might be the chance that a 40-year-old male dies in the next year. Life insurance companies have gathered extensive statistics about mortality rates among 40-year-old males. As long as they have a sufficiently large pool of insureds, they can make a reasonable guess as to what to charge for insuring a 40-year-old male life. But there is no _a priori_ mathematical model as there is with the throw of a die.

The final category has the least information. Here intuition must be used. Will my beloved return my affection if I declare it? There is no mathematical model, so the first category doesn't apply. Very few people will have been able to build large databases of outcomes of their previous declarations of love, so the second category doesn't apply either. The terrified lover is left only with hunches.

Knight condenses his three categories into two with this more succint statement (p. 233):
>To preserve the distinction which has been drawn... between the measurable uncertainty and an unmeasurable one, we may use the term "risk" to designate the former and the term "uncertainty" for the latter.

+ # 1.2 Finite Probability Spaces

At about the same time as Frank Knight was developing his ideas, the foundations of modern probability theory were being constructed by people like Émile Borel, Henri Lebesgue, and Andrey Kolmogorov. But Knight's formulation is amenable to a simple finite treatment.

For his "risk" (measurable) category, he assumed a finite number of future outcomes $s_1, ..., s_n$ and a known set of associated probabilities $p_1, ..., p_n$ (where $\sum_{i=1}^n p_i = 1$ and each $p_i≥0$). In the more general language of probability theory, $\{s_1, ..., s_n\}$ is the _sample space_, often denoted $\Omega$.

An example of Knightian risk is an American roulette wheel. Such a wheel has 38 outcomes ($s_i$ = the ball falls in the slot numbered i for $i=1,...,36$; $s_{37}$ = the ball falls in the slot labeled 0; and $s_{38}$ = the ball falls in the slot labeled $00$.) So we have 

$$\Omega_{roulette} = \{s_1, ..., s_{38}\} = \{1, ..., 36, 0, 00\} \tag{1.1}$$

Every slot is equally likely, so $p_i=\frac{1}{38}$ for $i=1,...,38$.

Casinos make money on their roulette wheels, just as they do at their dice tables. In both cases, bettors can wager on _events,_ which are combinations of outcomes. For example, in American roulette a bettor can place a bet on even - that is, a bet that pays off if the future outcome lies in the event
$$E=\{s_2,s_4,...,s_{36}\} \tag{1.2}.$$
Note that neither $s_{37}$ nor $s_{38}$ (neither 0 nor double-0) is contained in E. So the sum of the probabilities associated with the event E is only $\frac{18}{38}\approx47.4\%$

If you bet on even and it occurs, you get two dollars for every dollar you bet. If it doesn't occur you lose the money you bet and you get nothing. The expected value of a dollar bet is $\frac{18}{38}*2+\frac{20}{38}*0=18/19,$ meaning the casino expects to make 5.2 cents for every dollar bet on the even event. Put another way, the casino charges a 5.2% fee [per 4 minutes](https://www.roulettelife.com/index.php?topic=358.0) to provide the entertainment of betting on the even event.

Knightian risk is amenable to simple probabilistic calculations like this.

<span style="color:red">**Perhaps the most common mistake made in financial mathematics is to assume that the world of finance is a world of Knightian risk.**</span>

+ # 1.3 Knightian Uncertainty

Within the broad umbrella of risk=lack of information about the future, Knight identified a more difficult type: what he called "unmeasurable," and what economists now call Knightian Uncertainty. 

Knightian Uncertainty means either
* The list of outcomes $\Omega=\{s_1,...,s_n\}$ is known, but the associated probabilities $p_1,...,p_n$ are not; or
* The list of outcomes $\Omega$ is not known.

Declaring love is a situation where the outcomes are (broadly) known: acceptance or rejection. But there is _no valid basis of any kind_ to arrive at accurate probabilites.


John Maynard Keynes, in his [1937 "General Theory of Employment" paper (pp. 213-214)](https://macroeconomiauca.files.wordpress.com/2012/05/keynes_general_theory_of_employment_qje_1937.pdf), gave examples of Knightian Uncertainty:
>By "uncertain" knowledge, let me explain, I do not mean merely to distinguish what is known for certain from what is only probable. The game of roulette is not subject, in this sense, to uncertainty...The sense in which I am using the term is that in which the prospect of a European war is uncertain, or the price of copper and the rate of interest twenty years hence, or the obsolescence of a new invention, or the position of private wealth owners in the social system in 1970. About these matters there is no scientific basis on which to form any calculable probability whatsoever.

In Keynes’ examples, the outcomes of whether or not there will be “a European war” are known: yes or no. But the "prospect" (probability) is not. Or more precisely, in 1937 the probability _was_ not; now we know there was a European (in fact, World) war. But in 1937 some future states of the world contained a European war and some didn’t. Eventually ([by September 3, 1939 when France and Britain declared war on Germany](https://www.history.com/topics/world-war-ii/world-war-ii-history#section_2)) there were no longer any future states of the world that didn't have a European war in them.

Even deeper uncertainty attached to his last example, “the position of private wealth owners in the social system in 1970” – a date 33 years in the future from the time of his writing. Neither all the outcomes nor all the probabilities were known. Keynes was concerned about whether socialism, capitalism, or communism would prevail, but those broad groupings didn’t constitute a precise and exhaustive listing of the outcomes, let alone guide the choice of associated probabilities.

It is difficult to imagine how the upper-crust Keynes would have reacted if he were told that in his country in 1970, a kind of socialist-tinged capitalism would be in place and one of the largest private wealth owners would be a 30-year-old commoner who grew up desperately poor in Liverpool. (John Lennon of the Beatles).

<span style="color:red">**The financial world is a world of Knightian Uncertainty, not of Knightian Risk.**</span>

Despite the fuzziness of Knightian Uncertainty, “Il faut parier.” (You have to make a bet – [Pascal's Wager](
https://plato.stanford.edu/entries/pascal-wager/)). You can’t just freeze and do nothing because the future is uncertain. If you do, you’ve made a wager by doing nothing.

In many cases we approach problems as if they can be described by Knightian Risk. However we must never forget that this is an approximation; a guide; a forensic tool to help frame our thinking and intuition as we make decisions about an uncertain future.

+ # 1.4 Making risky decisions

All the focus on classifying what may happen in the future is done so we can decide what to do now: Set the payoffs on a roulette wheel; Price a life insurance policy; Declare love or pine away, mute.

We will build a framework for thinking about making such risky decisions. There are very few purely right or wrong answers; in many situations, personal preferences for safety vs. risk play an important role.

So we'll now ask several questions about what you might do in risky situations. Answer what seems right to you. Then we'll start building up a mechanism to think about these situations.

+ # 1.4.1 Reader Poll 1 - St. Petersburg Paradox

The St. Petersburg paradox was posed by Nicolas Bernoulli in 1713, and resolved by his cousin Daniel Bernoulli in 1738 (at which time he was residing in the city of St. Petersburg).

N. Bernoulli asked how much someone should pay to enter a doubling lottery.
In N. Bernoulli’s lottery, a coin is tossed. If it comes up heads, the participant gets \\$2 and the lottery is over. If it comes up tails, it is tossed a second time. If it then comes up heads, the participant gets \\$4 and the lottery is over. Otherwise a third toss with an \\$8 payoff if heads, and so forth, doubling the payoff every time… the lottery continues until the first head comes up. If the head comes up on the $i^{th}$ toss, the payoff is $\$2^i$.

N. Bernoulli pointed out that the expected value is infinite, since the probability of the lottery ending on the $i^{th}$ toss is $2^{-i}$. We might (anachronistically, since Knight was two hundred years in the future) label this Knightian Risk or _a priori_ probability, and evaluate all the outcomes and all the probabilities to find that the expected value to the participant of entering the lottery is
$$\sum_{i=1}^\infty 2^{-i} 2^i = \sum_{i=1}^\infty 1 = \infty$$

The lottery is risky - we don't know which toss of the coin will be the first head - but unequivocally positive: you will get at least \\$2 if you enter it. So it seems sensible that you would pay at least \\$2 to enter, since you'll get that half the time, and more than that the other half of the time. But it seems just as sensible that you would not pay infinity, or even a very large portion of your personal wealth, to enter this lottery.

We'll analyze this in more detail, but for now answer intuitively:

# How much would you pay to enter this lottery?

Record your answer as $A_{pete}$.

+ # 1.4.2 Reader Poll 2 - The (Fictional) Generous Billionaires

You are walking down the street and you run into Warren Buffett and Bill Gates

![Warren Buffett](buffett.png)![Bill Gates](gates.png)

“Hello,” Bill Gates says, “We've decided to be generous to the next person we see, and that's you! But we have a disagreement about how to dispense our generosity. So we need you to decide between the following two offers:"

Warren Buffett says: “My offer is very simple. I'm just going to give you this check for \$500,000,000.”

Gates says, “But my offer is more interesting. I’m going to toss a fair coin – 50% chance of either heads or tails. If it comes up heads, I’ll give you \$1,000,000,000. But if it comes up tails, I’ll give you nothing.”

# You must choose only one. Decide whether you want to take Buffett’s offer or Gates’ offer.

Record your answer as $A_{generous}$ (Buffett or Gates).

+ # 1.4.2, continued: Changing Generosity

If you chose Buffet's offer (the check for \\$500,000,000):
>Assume that Gates's offer (\\$1,000,000,000 or 0 depending on coin toss) remains unchanged, but that Buffett is only offering a check for \\$400,000,000. Would you still take the check? What about a check for \\$200,000,000? What about \\$100,000,000? At some point you will switch - presumably if Buffett's offer is only \\$0.01, you'll take Gates's coin toss instead. So where's your switching point, i.e what amount of sure check from Buffett will put you just on the edge between that check and Gates's billion-or-zero coin toss? Record this number as $A_{switch}$.

If you chose Gates's offer (billion-or-zero coin toss):
>Assume that Gates's offer remains unchanged, but that Buffet offers a check for \\$1,000,000,000. At that point you are always better off with Buffet, so presumably you would take Buffet's over over Gates. So somewhere on the way up between a Buffett offer of \\$500,000,000 and a Buffett offer of \\$1,000,000,000 you switched from Gates to Buffet. What is your switching point? Record this number as $A_{switch}$.

+ # 1.4.3 Reader Poll 3 - A Fictional Unpleasant Encounter

You are walking down a dark street when you run into a thug pointing a gun at you.

![Thug](thug.png)

“Hello,” the thug says, “Because I am a thug, I am going to do something that an economist might characterize as unequivocally negative.”

“I’m going to give you a choice. In Option A, I will break one of your fingers.”

“Ouch,” you say.

“But I’m going to give you another option. In Option B, I will toss a fair coin. If it comes up heads, you can leave unharmed. But if it comes up tails, I will break two of your fingers.”

Which would you choose? Record your answer as $A_{thug}$.

+ # 1.5 Some Basic Economics and Financial Terminology

We're now going to build up a mechanism to analyze your decisions. If you have already taken an economics class, then this section should be familiar to you. If you haven't already taken an economics class, then you should do so because this section won't teach you economics. This section is intended to clarify the terminology we will use.

Let's start with a definition: *Economics is the study of how people allocate human effort and natural resources.*

The outcome sets we will study will generally encompass a range of allocations of effort and resources. We will be focused on characterizing and dealing with the lack of knowledge of which allocations will be desirable and which allocations will be undesirable. That is, we will be concerned with the risks of allocation.

The allocation problem studied in economics is essentially a problem of human cooperation. Suppose there were only two people in the world; one a farmer and the other a hunter. The farmer alone may fail to survive a year in which the crop is infested by insects; the hunter alone may fail to survive a year in which game migrates elsewhere. But by pooling their efforts, they might survive on apples in a year when the crop is good and the hunting is bad. Next year perhaps the apple crop will be bad, but since they cooperated, the hunter will still be alive to bring in enough wild boar meat so they can both survive. By cooperating and diversifying their efforts, they can better deal with the unknowns of the future food supply.

Current world population is about [7.5 billion](https://www.census.gov/popclock/). Human efforts have branched out far beyond farming and hunting. How do we decide which things should get done and which things shouldn’t, and who should do what? How do we choreograph the efforts of 7.5 billion people?

Of course there is no global central decision-making authority that allocates the efforts of the 7.5 billion people in the world. On the other hand, it is not the case that each person is entirely free to decide what to do. Certainly not in totalitarian countries where a central authority dictates behavior, but not even in countries that are nominally free. The world has a spectrum of systems to allocate human effort ranging from highly centrally planned to highly distributed.

While allocation systems vary widely in different parts of the world, virtually every economic system uses money as some part of the allocation process. If I wanted to become a professional opera singer, in most parts of the world there would be nothing preventing me from giving a recital where I promise to hit all nine high C's in the aria _Pour mon âme_ from Donizetti's _La fille du régiment_. But since I have a terrible singing voice and wouldn't be able to hit a single one of them, no one would attend - let alone pay for - my recital. If I devoted my efforts to giving recitals to which no one came, I would not have enough money for food, shelter or clothing. I would have to reallocate my efforts to something more useful and less painful to others.

Money isn't the only factor in allocation - if someone offers you a large amount of money to commit a crime, hopefully primarily ethics and secondarily fear of arrest will cause you to decline. So personal preferences and a moral, legal, and regulatory framework usually also play a big role in the allocation of effort.

To see how money directs effort and resources, consider the three functions through which it acts:

Most simply, money is a _medium of exchange_. In our two-person economy we imagined the farmer and the hunter exchanging apples and boar meat, but for any larger economy, exchange and barter are far less efficient than selling apples for cash to (say) a professional online gamer who stops by your orchard, and then using the cash to buy boar meat from a butcher who is miles away from your apples. Money streamlines the process of exchanging the results of peoples' productive efforts.

The second use of money is as a _store of value_. If you happen to be a great farmer and singlehandedly grow enough apples to feed 100 people for a year, you can convert the apples into money by selling them to many people. You can then use the money at a later time. For example, when you are too old to operate a farm, you will still be able to pay younger farmers and hunters to supply you with apples and boar meat. You have converted your very productive 100-people-year effort into money and stored it for the future.

That’s really rather remarkable when you think about it. You were very successful growing apples – but that was 40 years ago. Your apples are long gone. But you can still get a young person to go out and get meat for you. That’s because the young person trusts the money – the hunter can use your money to get someone else to do something for her, like programming a boar-finding app. As long as everyone trusts the money – i.e. believes they can use it to get other humans to expend effort on their behalf at different times and places – the money can be used to store value.

The final function of money is as _financial capital_. In modern economies, many people have gotten beyond hand-to-mouth existence. They have money beyond what is needed to satisfy their current - and maybe even their future - wants. They can begin to use the money to gain control of resources and to direct the productive efforts of other people.

For example, we imagined a farmer who had done well enough growing apples to buy meat not only now, but in his retirement years in the future. But beyond this, the farmer might have an idea for a better way to get meat: create a cattle ranch rather than hunt for it. Ranching domesticated cattle is vastly more efficient than hunting for wild animals. If the farmer has enough money after satisfying his present and future wants, he might start a cattle ranch and pay some other people to run the ranch for him. He hasn't quit his day job growing apples; he's invested money but not his direct productive effort in the ranch. When money is used in this way - to facilitate the production of something, rather than for current or future consumption - it's called _capital_. More broadly capital is an item that increases the effectiveness of human effort or of natural resources.

Economists have argued for centuries over what exactly constitutes capital and who should benefit from it. At one end of the spectrum, capital's "economic value merely represents the power of one class to appropriate the earnings of another." ([Henry George](http://www.henrygeorge.org/pchp2.htm)). This kind of thinking leads to the banning or heavy restriction of private ownership of capital.

At the other end of the spectrum - simply called _capitalism_, the ownership of capital by private individuals is thought to foster innovation and progress. For example, our farmer is highly incented to come up with his cattle ranch idea and make sure that it works, since his money is at stake. So rather than appropriating his ranch employees' earnings, he is creating jobs for them and increasing food production efficiency. That benefits everyone, except maybe the farmer's boar-hunting friend, who is put out of business by the new ranch technology. Hopefully she can get one of the many new jobs created at the ranch.

Neither pure idea works in practice: both the banning of private ownership of capital, and laissez-faire (literally "let them do" [whatever they want to do]) capitalism have been tried. Ironically they both inevitably lead to a similar result - the untenable concentration of power in the hands of a few. Most economic systems in the world today consist of some form of regulated capitalism, aiming to get the benefits that come with private ownership of capital (motivation, innovation, and efficiency), while avoiding the pitfalls (concentration of power in the hands of a few plutocrats).

Returning to our ranch-owning farmer, he might be initially successful with his cattle ranch but he might decide that he could make an even bigger, more efficient, ranch if he had more money than he personally can raise. He might estimate that the ranch is currently worth \\$100,000, all of which he owns. But if he could get another \\$25,000 to buy more grazing land and cattle sheds, he thinks the value of the ranch will be even more than \\$125,000. So - noticing that the online gamer who keeps buying his apples appears to be prosperous - he might ask her if she's interested in owning a 20% portion of the ranch company in exchange for \\$25,000 of new capital. If all goes well, the new \\$25,000 does indeed make the ranch better and it becomes worth (say) \\$200,000 overall. The farmer's share is now worth \\$160,000; the gamer's, \\$40,000. The farmer and the gamer have converted their skills in producing apples and winning games - together with a good idea about meat production - into even more money.

Just as money's _medium of exchange_ function allowed more efficient allocation of human effort than barter or direct exchange, there are more efficient ways of allocating capital than a farmer having a one-on-one talk with a gamer. A _security_ is a claim on ownership of something. In the case of the ranch, the farmer initially owned all the _stock_ in the ranch company; _stock_ is a security that is a claim on ownership of a company. A _share_ is a unit of stock; the total number of shares representing all of the ownership of a company can vary. For example at this writing the ownership of the world's largest company, Apple Computer, is divided into [4.8 billion shares](https://www.nasdaq.com/symbol/aapl/stock-report).

Security markets allow more efficient exchange of capital. Instead of having to find a specific investor and make a one-off exchange of shares for money, the cattle ranch entrepreneur could list shares of his ranch company on a _stock exchange_ and potentially have millions of investors from all over the world evaluating his company's prospects and deciding whether or not to direct capital to it.

Other securities include _commodity contracts_, which are claims on ownership of natural resources, and _bonds_, which are claims on specific monies that are expected to be generated by an activity in the future.

An investor in a company's stock participates along with the other stockholders in the company's success or failure. An investor in a company's bonds receives specific amounts at specific dates if the company is able to pay them, but doesn't receive anything over the agreed-on amounts if the company does well.

In securities markets, you can allocate some of your stored human effort to the item represented by the security. If the collective decisions made by everyone buying and selling securities are wise, then the productive efforts of much of the human race will be directed efficiently and the world will prosper and progress. If not, we will have a situation like the Great Depression of the 1920s and 1930s or the Great Recession of 2008 and after.

This class focuses on quantitative models for the risks inherent in investing in securities, particularly portfolios of securities. But we should never forget that securities are just placeholders for human effort and natural resources. All the mathematical models we will discuss ultimately need to have a sensible effect on the allocation of human effort and natural resources.

+ # 1.6 Some Basic Probability Terminology

If you already know probability theory, then this section should be familiar to you. If you don't already know probability theory, you should take a class in probability: this section won't teach it to you. This section is intended to clarify the terminology we will use. At the end of this section we will have built up enough terminology so we can try to apply a standard concept in probability to the generous billionaire problem.

In probability theory, we start with a _sample space_ $\Omega$ which is the set of all things that can happen; i.e. all outcomes. For example $\Omega$ might be the set $\Omega_{roulette}$ of 38 American roulette outcomes that we considered in $(1.1)$ above.

An _event_ is a subset of the sample space, such as {2,4,…,36} representing the “even” event as in $(1.2)$. While we have given discrete finite examples, in fact the sample space can be continuous. In order to make both countable and uncountable sample spaces work sensibly, probability theorists insist that the set of all events be a sigma-algebra (denoted $\sigma$-algebra). A $\sigma$-algebra on $\Omega$ is a collection of subsets of $\Omega$ that (a) includes the empty set; (b) is closed under complement; and (c) is closed under countable union and intersection. The pair $(\Omega,S)$ is called a _measurable space_ when $S$ is a $\sigma$-algebra of $\Omega$.

A **probability measure $p$** on a measurable space $(\Omega,S)$ maps $S$ into the unit interval $[0,1]$ and satisfies p($\emptyset$)=0 ($\emptyset$ means the null set), $p(\Omega)=1$, and
$$p\bigl(\bigcup_{i} E_i\bigr) = \sum_{i} p_i \tag{1.3}$$
where the $E_i$ are a countable collection of pairwise disjoint events in S. A _probability space_ is the triple $(\Omega,S,p)$ of sample space, associated sigma-algebra, and associated probability measure.

The probability $p(E)$ of an event $E\in S$ is the event's _unconditional probability_. If we want to know the probability of an event, given that another event happened, we compute the event's _conditional probability_. For example, if you throw an unweighted six-sided die, you know before looking that the probability that the number two came up is $1/6$. But suppose you throw the die and I look at it before you do. I might tell you that an even number came up and ask you for the probability that the number was two. You now know the probability is $1/3$, not $1/6$.

Formally the _conditional probability_ of event E, given that event F has occurred, is written $p(E\mid F)$, and by definition equals
$$p(E\mid F)=\frac{p(E\cap F)}{p(F)}$$
In our example $F$ was the event that an even number came up ($p(F)=1/2$) and event $E$ was the number two coming up. ($p(E)=p(E\cap F)=1/6$). So $p(E\mid F)=\frac{1/6}{1/2}=\frac{1}{3}$.

If (a) the sample space $\Omega$ is finite or countably infinite; and (b) the $\sigma$-algebra is the power set of $\Omega$, then the probability measure is a _probability mass function $p$_ that assigns a value $p(\omega)$ to each $\omega\in\Omega$, where $\sum_{\omega\in\Omega}p(\omega)=1$. If (a) is true but not (b), then there are one or more probability mass functions that are compatible with the probability measure.

For example, suppose $\Omega$ consists of the six outcomes of the throw of a single die, and $S=\{\emptyset,\{1\},\{2,3,4,5,6\},\Omega\}$. A probability measure that assigns $\frac{1}{6}$ to the event $\{1\}$ and $\frac{5}{6}$ to the event $\{2,3,4,5,6\}$ is compatible with the usual mass function that assigns $\frac{1}{6}$ to each outcome. It is also compatible with a mass function that assigns $\frac{1}{6}$ to each of $1,2,3,4;$ assigns $\frac{1}{3}$ to $5$; and zero to $6$.

When the sample space is not countably infinite, the probability measure $p$ will be defined on elements of the sigma-algebra but not necessarily on individual outcomes.

For our purposes a _random variable_ $X$ is a function that maps $\Omega$ into the real numbers ${\rm I\!R}$, where $\Omega$ is the sample space of a probability space $(\Omega,S,p)$. The intuition is easiest for a _discrete random variable_ where $\Omega$ is countable and $S$ is the power set of $\Omega$; in that case we can look at the values of the random variable on each outcome.

For example $\Omega$ might be a set of people taking a Physics 2 class; S the power set of $\Omega$; and the probability mass function $p(\omega)=1/n$ where $n$ is the number of people in $\Omega$. The random variable $X(\omega)$ might be the score of person $\omega$ on $\omega$'s Physics 2 final exam. We could then write an expression like $Pr(X>90)$ to mean $p(\{\omega\mid X(\omega)>90\})=\sum_{X(\omega)>90}p(\omega)$. $Pr(X>90)$ tells us the probability that a person taking Physics 2 scored over 90 on the final exam.

More generally the random variable is a _measurable function_. In this case there is a probability space $(\Omega,S,p)$ and a measurable space $(A,B)$. Further, if $s=\{\omega \mid X(\omega)\in b\}$, then $s\in S$ whenever $b\in B$.

For our purposes when the domain sample space $A$ is uncountable it is usually the real numbers ${\rm I\!R}$, or an interval on the real line. Usually $B$ is a _Borel algebra_ of the sample space $A$, which is the smallest $\sigma$-algebra containing all of its open sets.

So more generally, a random variable $X$ is a measurable function with an associated domain probability space $(\Omega,S,p)$ and a range measurable space $(A,B)$. Further, there is an associated probability function $Pr$ (more precisely, $Pr_X$) defined as
$$Pr(X\in b)=p(\{\omega \mid X(\omega)\in b\})\text{ when } b\in B$$

For example $X(\omega)$ might be the logarithm of the price of an asset in economic scenario $\omega$, and the $b$ we're interested in might be the interval $(-\infty,5]$ indicating we want to know the probability that the log-price is less than or equal to 5. In that case we might use a notation like $Pr(X≤5)$.

In fact $Pr(X≤x)$ is called a _cumulative distribution function (cdf)_ for the random variable $X$, and is often also denoted $F(x)$ (or $F_X(x)$ if we want to be explicit about the underlying random variable). Because of the properties of probability measures, $F(x)$ is a nondecreasing function ranging from 0 to 1 over its domain, which is the real numbers or some subset of the real numbers.

If $F(x)$ is differentiable, then its derivative $pdf(x)=F^\prime(x)$ is called the _probability density function (pdf)_ of the random variable $X$. (Again, we would used notation like $pdf_X(x)$ if it isn't clear what random variable X underlies the distribution.) When $X$ takes on only a countable number of values then $f(x)$ is a _probability mass function_ giving the probability that the discrete value $x$ will be observed.

The most widely used probability distributions include the uniform distribution $F(x)=x$ where $0≤x≤1$; and the normal distribution $pdf(𝑥)=\frac{1}{\sqrt{2\pi}} exp⁡(−𝑥^2/2)$.

The expectation operator $\mathbb{E}[]$ gives the average value of a function of a random variable over the random variable's probability distribution. More formally,

$$\mathbb{E}[𝑓(X)]=\int{𝑓(𝑥)𝑝𝑑𝑓(𝑥)𝑑𝑥} = \int{𝑓(𝑥)𝑑𝐹(𝑥)} \tag{1.4}$$

Note the distribution over which the expectation is taken is often not explicitly stated in the $\mathbb{E}[]$ notation. If it's not clear by context which pdf(x) or F(x) is being used, a superscript or a subscript is attached to $\mathbb{E}$.

The average or mean value $\mathbb{E}[X]=\int{x\cdot pdf(x)dx}=\int{xdF(x)}$ is also called the first moment of the distribution and is often denoted as $\overline{X}$. Often the greek letter $\mu$ is used for the mean, i.e. $\mu_X=\overline{X}$, or just $\mu=\overline{X}$ when the context is clear.

Another metric that captures a central value of a distribution is its _median_, often denoted by the Greek letter $\nu$ ("nu"):
$$\nu_X=m \text{ such that} \int_{-\infty}^m pdf_{X}(x)dx=\int_m^{\infty} pdf_X(x)dx=\frac{1}{2} \tag{1.5}$$
That is, the median is the point where it's equally likely that a point of the distribution will be to its left or to its right on the real line. (1.5) works for continuous distributions; for discrete distributions some interpolation may be needed.

The $i^{th}$ (central) moment of the distribution is
$$m_i=\mathbb{E}[(X-\mu)^i]=\int{(x-\mu)^i pdf(x)dx} \tag{1.6}$$
We will often use the second moment, called variance, sometimes written Var(X). Here $Var(X)=m_2=\mathbb{E}[(X-\mu)^2]=\int{(x-\mu)^2 pdf(x)dx}$. The square root of variance is called standard deviation; usually when we refer to the _volatility_ of a distribution we mean its standard deviation. Often the greek letter $\sigma$ is used for standard deviation, i.e. $Var(X)=\sigma^2$ or $\sigma^2(X)$.

Skewness is the scaled third central moment, $s=m_3/\sigma^3$. If the distribution is symmetric about its mean, then its skewness is zero. There is some ambiguity about the definition of skewness, with some authors looking at the difference between a distribution's mean and its median, all divided by standard deviation. This is (a) not the definition we will use; and (b) not equivalant to our (third-moment-based) definition. The general concept is similar - zero skew means some kind of symmetry; positive skew means a distribution that tends to stretch out more to the right than to the left - but the two definitions can differ on specific distributions.

While any number of moments of a distribution can be computed, most distributions that are of practical use can be specified by their first four moments. The scaled fourth moment ($m_4/\sigma^{4}$) is called _kurtosis_. Kurtosis is a unitless quantity that is often compared to the kurtosis of a normal distribution for context. The term _excess kurtosis_ is often used, meaning kurtosis minus three, since three is the kurtosis of a normal distribution. In fact kurtosis is so often reported relative to a normal distribution that sometimes writers say "kurtosis" when they mean "excess kurtosis," which of course can be rather confusing. We will try to set context when we use this term. A distribution with positive excess kurtosis is called _leptokurtic_ or _fat-tailed_, meaning it has more probability attached to unusual observations than a normal distribution. A distribution with zero excess kurtosis is _mesokurtic_, and a _thin-tailed_ distribution with negative excess kurtosis is called _platykurtic_.

A random variable $X$ _first order stochastically dominates_ another random variable $Y$ if $F_Y(x)≥F_X(x)$ for all scalars x. (There are other kinds of stochastic dominance, but if we say just "stochastic dominance," we will mean first order.) Equivalently, $X$ stochastically dominates $Y$ if $\mathbb{E}[f(X)]≥\mathbb{E}[f(Y)]$ for all increasing functions $f$ where the expectation is defined.

Intuitively, $X$ stochastically dominates $Y$ when $Y$ has more probability associated with low outcomes than $X$ does. Eventually both  cumulative distribution functions must get to their highest values - the value 1. But $Y$ is always in more of a hurry to get to 1 than $X$, meaning $Y$ is always more likely to have a disappointing result than $X$.

Does that help us resolve the generous billionaire choice? If $X$ was the sure check for \\$500,000,000 and $Y$ was the coin toss for \\$1,000,000,000 or zero, then neither was stochastically dominant. $X's$ cumulative distribution function looks like $F_X(x)=0$ for $x<500,000,000$ and $F_X(x)=1$ otherwise; $Y's$ cdf looks like $F_Y(x)=0$ for $x<0$; $F_Y(x)=\frac{1}{2}$ for $x\leq 0 < 1,000,000,000$; and $F_Y(x)=1$ for $x \geq 1,000,000,000$. So $F_Y(x)>F_X(x)$ between 0 and 500,000,000, but then the inequality goes the other way between 500,000,000 and 1,000,000,000.

If the sure check was for \\$1,000,000,000 ($Z$), then $Z$ would stochastically dominate the coin toss $Y$. In fact this $Z$ _statewise dominates_ $Y$, which is an even stronger condition; $Z$ is better than (or equal to) $Y$ in every future state of the world. Certainly any sentient decision maker would prefer a statewise dominant random variable.

Suppose the generous billionaires have only one coin between them and they both make offers referencing a toss of that coin. Warren Buffett offers to give you \\$100,000,000 if the coin comes up tails, and nothing if it comes up heads (random variable $W$). Bill Gates makes his same offer - nothing if the coin comes up tails, \\$1,000,000,000 if heads (random variable $Y$). $Y$ stochastically dominates $W$, but it doesn't statewise dominate. Still, no matter what attitude you have toward risk and reward, you should prefer the stochastically dominant offer - since the future states are arbitrary, the stochastically dominant offer will be statewise dominant in a re-ordering of states.

Unfortunately these observations don't help with the original billionaire choice problem, since the original offers were designed so that there wasn't stochastic dominance. To frame choices where there isn't clear dominance, an additional mechanism is needed.

+ # 1.7 Utility Theory

Utility theory is a disciplined way to assess tradeoffs between risk and reward. This will be our first exploration of a mathematical framework that attempts to describe and predict human financial behavior.

The concept appears to have been invented by [Gabriel Cramer](https://www.economicshelp.org/blog/glossary/expected-utility-theory/) - who you may already know through Cramer's Rule for solving systems of linear equations - in 1728. Writing to Daniel Bernoulli, he summed it up as follows:
>the mathematicians estimate money in proportion to its quantity, and men of good sense in proportion to the usage that they may make of it.

Since Cramer was himself a mathematician, his apparent separation of "men of good sense" and "the mathematicians" is odd. He favored a square root function for utility - in other words \\$25 is not 25 times as useful as \\$1; it is only 5 times as useful.

Applying Cramer's square root utility function to the St. Petersburg lottery, we could find its expected (Cramer) utility rather than its expected value. A win of $\$2^i$ only has "usage" or utility of $\$2^{i/2}$, so summing probabilities time utilities gives
$$\sum_{i=1}^\infty 2^{-i} (2^i)^{\frac{1}{2}} = \sum_{i=1}^\infty 2^{-i/2}= \sqrt{2}+1 \approx 2.41$$
Thus entering the St. Petersburg lottery has the same usefulness as you would get from $2.41^2=5.83$ sure dollars. So (assuming you rate usefulness with a square root utility function) the St. Petersburg lottery is worth the same to you as \\$5.83, not infinity. If your $A_{pete}$ from our class poll above was close to \\$5.83, then you may have (probably not consciously) been using a square root utility function.

In 1738 [Daniel Bernoulli](https://www.jstor.org/stable/1909829) (the recipient of Cramer's observation ten years earlier) pursued the idea further, stating
>The determination of the value of an item must not be based on the price, but rather on the utility it yields…. There is no doubt that a gain of one thousand ducats is more significant to the pauper than to a rich man though both gain the same amount.

(You can still get historical reproductions of ducats from the [Austrian Mint](https://www.muenzeoesterreich.at/eng/Produkte/1-Ducat![image.png](attachment:image.png)) for &euro;129, or about \\$160. The gold content of a ducat hasn't changed much, so roughly speaking, Bernoulli's "one thousand ducats" is the equivalent of \\$160,000 today.)

Daniel Bernoulli expressed two ideas in the quote above - his first echoes Cramer's observation of ten years earlier.  In the same paper, Bernoulli suggested a logarithmic utility function which is still widely used; this seems to capture preferences more realistically than Cramer's square root function, which is more aggressive. With a logarithmic utility function, the usefulness of $\$2^i$ is $i$. (We might as well use base-2 logarithms here; any other base just applies a constant scale factor which cancels out when comparing utilities to make decisions.) So the expected logarithmic utility is
$$\sum_{i=1}^\infty 2^{-i} i = 2$$
The sure amount that has a log-2-utility of 2 is $2^2=\$4$. This amount tends to be the most popular with bidders for the St. Petersburg lottery, although both lower and higher values are also often observed.

So both Cramer and Bernoulli "solved" the St. Petersburg Paradox by explaining - through a utility function - why people would not bid infinity to enter it.

But Bernoulli's second quoted sentence above introduces a new idea: that the game would likely not be seen in isolation – it would have to be evaluated in the context of the gambler’s entire wealth, so the true answer is more complicated. Suppose the generous billionaires hadn't been so generous and had only offered a choice between (a) a sure check for \\$0.50 and (b) a coin toss for \\$1.00 or zero. Probably neither amount of the diminished choice is significant compared to your usual spending patterns, so you might take the coin toss just because \\$1.00 is a little more noticeable than \\$0.50. So, as Bernoulli noted, context matters.

While Cramer and Daniel Bernoulli were convincing when they explained why no one bids infinity for the St. Petersburg Paradox, they omitted a simpler and more practical consideration. According to the bank [Credit Suisse](https://www.credit-suisse.com/corporate/en/articles/news-and-expertise/global-wealth-report-2018-us-and-china-in-the-lead-201810.html), in 2018 the wealth of the world was \\$317 trillion. That's about two to the $48^{th}$ power. So even if the world agreed to pool its wealth to pay you (if necessary) in the St. Petersburg lottery, the sum stops at $i=48$. In reality the economic agent offering you the St. Petersburg lottery is not the entire world, so the sum stops quite a bit earlier than $i=48$. So the supposed infinite value of the St. Petersburg Paradox does not arise from any lottery that could be played in practice.

+ ## 1.7.1 Von Neumann Morgenstern Utility Theory

In 1944, utility theory was formalized into a mathematical discipline by John von Neumann and Oskar Morgenstern (“VNM”) VNM established a full axiomatic system around the general intuition of utility theory in [Theory of Games and Economic Behavior](https://archive.org/details/in.ernet.dli.2015.215284/page/n5).

As we saw above, a utility function can describe some behaviors reasonably. But it's implausible to think that economic agents are continually computing utility functions as they make choices. (An economic agent takes economic actions and is generally a person, a group of people, a company or other organization, a government, or an algorithm.) But it's certainly true that economic agents have preferences - they prefer (say) billionaire offer 1 to billionaire offer 2. So we can assume that economic agents can express a preference between pairs of probability distributions assigning probabilities to different levels of wealth. More generally, agents can express preferences between probability distributions that include non-monetary outcomes (“I prove P=NP;” “Caltech beats UCLA in basketball;” etc.).

VNM analyzed preferences by starting with a finite, discrete formulation – essentially the same as the formulation of Knightian Risk. They assumed:

- There is a situation we wish to evaluate with $n$ mutually exclusive outcomes of interest, which we can denote by $s_1, s_2, …, s_n$.

     - A very simple setup might be the toss of a coin, with two possible outcomes – heads ($s_1$) or tails ($s_2$). When the outcomes are monetary amounts, VNM called them “prizes.”

     - If we wanted to be extremely ambitious, we could enumerate every possible future state of the universe in our (very large) finite set of outcomes, assuming that there are a finite number of future states of the universe$^1$. While that wouldn't be a particularly practical endeavor, it does show that the simple finite framework is sufficiently flexible for most real-world problems.

- There is a set of probabilites associated with future outcomes – that is, a set of nonnegative real numbers $p_1, p_2, …, p_n$ that sum to one where $p_i$ is the probability of outcome $s_i$ occurring. VNM used the term _lottery_ to denote a set of probabilities.

    - This is a discrete probability measure, which together with the outcomes forms a discrete random variable.

- An economic agent makes choices between lotteries (probability vectors) with a preference relation.

For example, there might be three outcomes: $s_1$=you receive \$0; $s_2$=you receive \$500,000,000; and $s_3$=you receive \$1,000,000,000. You may be asked to decide between two probability vectors, $p_1=(0,1,0)$ and $p_2=(.5,0,.5)$.

$p_1$ means you definitely receive \$500,000,000, while $p_2$ means you have a 50-50 chance of getting either \$1,000,000,000 or nothing.
This is a reframing your generous billionaire choice.

More generally, let $s$ be the $n$-vector whose entries contain all possible future outcomes. Define $\Delta(s)$ as the set of all lotteries on $s$; that is, the set of all probability $n$-vectors $p$ where $𝑝^{\prime}𝑢=1$ and all elements of $p$ are non-negative. (In this class $u$ is a unit vector of all ones, with dimension determined by context. Prime ($\prime$) denotes transpose. So $p^{\prime}𝑢$ is the dot product of $p$ and $u$.

Clearly $\Delta(s)$ is convex – if $p_1, p_2\in\Delta(𝑠)$, then for any $0≤\alpha≤1$ we must have $\alpha p_1+(1−\alpha)p_2\in\Delta(𝑠)$.
Note that we haven’t actually used $s$ in the definition of $\Delta(s)$, so for any vector of $n$ outcomes we get the same set of lotteries, namely the segment of the hyperplane $p^{\prime}𝑢=1$, each $p_i≥0$, in $n$-space ${\rm I\!R}^n$. It will however be convenient to keep in mind a set of outcomes $s$ when looking at $\Delta(s)$.

A ranking between lotteries in $\Delta(s)$ is a binary relation $\succeq$ where $\succeq\in\Delta(s)\times\Delta(s)$. For two lotteries $p,q\in\Delta(s)$ we write $p\succeq q$ if $p$ is preferred to or equivalent to $q$. (Equivalence is denoted by $\equiv$ and means both $p\succeq q$ and $q\succeq p$). We can write $p\succ q$ when $p\succeq q$ but it is not true that $q\succeq p$.

The agent's decision process is encoded in the $\succeq$ function. Therefore we need to determine the properties of such preference functions.

<br><br>
<font size=1>$^1$It has been estimated that a cubic meter of space can take on at most 10 to the 10 to the 70 configurations (https://phys.org/news/2015-03-universe-finite-infinite.html). If the universe is finite (whether it is or not may never be known) then there are at most a finite number of configurations of the universe. In any case the number of cubic meters of space that matter to humans during the next several billion years is finite.</font>

+ ## 1.7.2 VNM Axioms and Theorem

VNM (p. 26 of _Theory of Economic Games and Behavior_) required that certain axioms apply to preference functions, arguing that these axioms were simply expressions of rationality. The first set of requirements (translated into modern terminology) indicated that preference functions are now what are called _total preorders$^2$_, satisfying these axioms: 

1. _Transitivity._ For $p,q,r\in\Delta(s)$, $p\succeq q$ and $q\succeq r$ means $p\succeq r$.
2. _Connex._ For $p,q\in\Delta(s)$, either $p\succeq q$ or $q\succeq p$ or both; if both are true then we say $p\equiv q$.

A third property, _reflexivity_ ($p\succeq p$ so $p\equiv p$) follows from these two.

These axioms indicate that an economic agent has one of three mutually exclusive opinions about every pair $p$ and $q$ of lotteries; either $p$ is strictly preferred to $q$; or $q$ is strictly preferred to $p$; or $p$ and $q$ are equivalent. A lottery is equivalent to itself, but there can be other lotteries that are not the same but are equivalent. The existence of unequal equivalence is the difference between a total preorder and a total order. Transitivity means there are no circular preferences; if I like chocolate better than vanilla and vanilla better than strawberry, then I like chocolate better than strawberry.

While humans aren't necessarily always rational, we can generally assume that departures from these axioms will be quickly corrected. VNM adopted two other axioms that are sensible but not as basic as the first three. We've updated the language and format from their original presentation:

<span>3.</span> _Independence_. If $p, q, r\in\Delta(s)$ with $p\succeq q$ and $\alpha\in[0, 1]$, then $\alpha p+(1-\alpha)r\succeq \alpha q+(1-\alpha)r$.

Intuition: if $p$ is preferred to $q$, that preference remains unchanged when we mix in a piece of a third lottery $r$. If I like chocolate better than strawberry, then - no matter how I feel about vanilla - I like (a mixture of chocolate and vanilla) better than (the same mixture of strawberry and vanilla).

While this is not horribly implausible, it is also not as basic a guide to rational behavior as the previous axioms. It is certainly possible that there could be interactions between items that would alter the results. To get powerful results, VNM had to make powerful assumptions: in this case, they assumed that lotteries are independent of each other and there are no such interactions.

The last VNM axiom is even stronger:

<span>4.</span> _Continuity_. If $p, q, r \in\Delta(s)$ with $p\succeq q\succeq r$, then there is a scalar $\alpha\in[0,1]$ such that $\alpha p+(1-\alpha)r\equiv q$.

Intuition: If we have three ordered lotteries, we can create a linear mixture of the worst one ($r$) and the best one ($p$) that is equivalent to the one in the middle ($q$). So if I prefer chocolate to vanilla, and I prefer vanilla to strawberry, then there is some mixture of chocolate and strawberry that I like the same as vanilla.

This property requires that preferences don't just jump from one level to another, but vary smoothly between (for example) strawberry and chocolate without skipping over the vanilla satisfaction level. Like the independence axiom, this is not implausible but is also not entirely fundamental. I might hate the taste of mixtures and dislike even the smallest adulteration of my beloved chocolate, so there is no combination that gets me to the same satisfaction level as vanilla.

The two strong VNM axioms are not implausible, but basically enforce a kind of linearity of preference when combining lotteries. Eventually the strength of these axioms gets us into trouble - we'll show later that at some point going down this path departs from reasonably describing choices that most people make. But for now and to a first order, these axioms seem to give a sensible description of how people make economic choices. A preference function seems much more intuitive than a utility function. But (in Appendix A.1 p. 617ff of _Theory of Economic Games and Behavior_), VNM proved that utility functions and preference functions lead to the same results.

>**VNM Theorem.** An economic agent has a preference function $\succeq$ satisfying the four VNM axioms if and only if the agent has a associated utility function $U:\Delta(s)\rightarrow{\rm I\!R}$ where for any $p,q\in\Delta(s)$, $p\succeq q \text{ if and only if } U(p)\geq U(q)$.

More precisely, let $\Delta(s)$ be a convex subset of ${\rm I\!R^n}$. Let $\succeq$ be a binary relation on $\Delta(s)$. Then $\succeq$ satisfies the four axioms above if and only if there is a function $U:\Delta(s)\rightarrow{\rm I\!R}$ such that:
<br>a. U represents $\succeq$ (i.e. $\forall p, q\in\Delta(s)$, $p\succeq q \iff U(p)\geq U(q)$)
<br>b. U is affine (i.e. $\forall p,q\in\Delta(s), α\in(0, 1)$, $U(\alpha p + (1- \alpha)q) = \alpha U(p) + (1- \alpha)U(q)$)

Further, U is unique up to a positive linear transformation: If $V:\Delta(s)\rightarrow{\rm I\!R}$ also satisfies (a) and (b), then there are $b,c\in{\rm I\!R}$ (where $b > 0$) such that $V = bU + c$.

The proof of the VNM theorem is quite long although not particularly hard. We'll give a flavor of the proof but not the whole thing. The first thing VNM note is
>**Lemma** If $p,q\in\Delta(s)$ with $p\succ q$, and $\alpha>\beta\in [0,1]$, then $\alpha p + (1-\alpha)q \succ \beta p + (1-\beta)q$.

**Proof of lemma** Define $\gamma=\frac{\beta}{\alpha}$; $\gamma$ is in the unit interval. Since $p\succeq q$, the independence axiom says $\alpha p+(1-\alpha) q \succeq \alpha q+(1-\alpha) q = q$. Let $r=\alpha p+(1-\alpha) q$; we can apply the indpendence axiom to $r\succeq q$ by adding some (more) $r$ to both sides: $(1-\gamma)r + \gamma r\succeq (1-\gamma)q+\gamma r$. Collecting terms, the RHS is $\beta p + (1-\beta)q$ and the LHS is $r=\alpha p + (1-\alpha)q$, proving the lemma.

Continuing along these lines, they prove that the map from the unit interval to linear combinations $\alpha p + (1-\alpha)q$ is one-to-one, monotone, and onto $\{r \mid p\succeq r\succeq q\}$. They then break down lotteries into combinations of 100% lotteries, i.e. they let $x_i$ be the lottery that gives outcome $s_i$ with 100% probability. Without loss of generality, we can assume the lotteries are ordered so that $x_n\succeq x_{n-1}\succeq ... \succeq x_1$. Start construction of the utility function $U$ by defining $U(x_1)=0$ and $U(x_n)=1$, unless $x_n\equiv x_1$ in which case the trivial utility function $U(x)=0$ for all lotteries $x$ works.

From the continuity axiom we know that there is an $\alpha_i$ so that $\alpha_i x_n + (1-\alpha_i)x_1\equiv x_i$ for any $1\leq i \leq n$. They set $U(x_i)=\alpha_i$. The 100% lotteries then become the basis for all the other lotteries, since a general lottery $x$ is an n-vector of probabilities $x=(p_1,...,p_n)$; we define $U(x)=\sum{p_i \alpha_i}$. The axioms and observations like the Lemma allow them to prove that this utility function encodes preference order.

So in fact VNM did not only prove that there is a utility function $U$ that operates on lotteries and encodes preference order. They also proved (as the construction of $U$ shows) that we can just have utility functions that operate on outcomes (which are probably monetary amounts). So if $p=(p_1,...,p_n)$ is a lottery in $\Delta(s)$, a VNM preference function can be encoded as $\sum_{i=1}^n p_i u(s_i)$. We use the small-u notation to indicate a function that operates on outcomes, while the big-U notation indicated a function that operates on lotteries.

<br><br>
<font size=1>$^2$Stanley, Richard P. (1997), _Enumerative Combinatorics, Vol. 2_, Cambridge Studies in Advanced Mathematics, 62, Cambridge University Press, p. 297.</font>

+ ## 1.7.3 Risk Aversion

A special kind of outcome set is monetary – where each outcome state $s_i$ represents an amount of money, we noted above VNM called them _prizes_. A utility function on prizes maps real numbers to real numbers.

So in utility terms, your original meeting with the billionaires had three possible prizes: $s_1$=you receive \$0; $s_2$=you receive \$500Mn; and $s_3$=you receive \$1Bn.

We asked about the preference between two lotteries $p_1=(0,1,0)$ and $p_2=(.5,0,.5)$. $p_1$ is a sure thing – you definitely receive \\$500Mn. Risk is lack of knowledge about future outcomes of an action, but we know exactly what will happen with $p_1$. So there is no risk in $p_1$.

$p_2$ on the other hand is a risky lottery. Our poll asked whether or not $p_1\succ p_2$. We now know (assuming the VNM Theorem holds) that we can answer that by looking at whether or not
$$\sum_{i=1}^3 p_1(i) u(s_i) > \sum_{i=1}^3 p_2(i) u(s_i)$$
This is equivalent to asking if $𝑢(500𝑀𝑛)>\frac{1}{2} 𝑢(0)+\frac{1}{2} 𝑢(1𝐵𝑛)$.

Given the VNM equivalence, we can introduce a more universal notation when outcomes are prizes. If we write $\mathbb{E}_p[x]$, where $p$ is a lottery, we mean the expected value of the lottery without utility,
$$\mathbb{E}_p[x]=\sum p_i s_i \tag{1.10}$$
For the St. Petersburg lottery, this quantity was infinite. For the risky billionaire coin toss, this quantity was $500Mn.

(1.10) is a scalar monetary amount, so if we have a utility function that operates on monetary amounts, we can apply it to (1.10): $u(\mathbb{E}_p[x])$ is the utility of definitely receiving the expected value of the lottery $p$.

On the other hand, we can evaluate the utility of the risky lottery with the expression
$$\mathbb{E}_p[u(x)]=\sum p_i u(s_i) \tag{1.11}$$

The difference is the order in which we apply probabilities and utilities – if we take expected value first, then we create a sure thing and then evaluate its utility. If we take utility first, then we have a variety of outcomes with different utilities and different probabilities.

We can characterize attitude towards risk as follows. For nonsure p’s:

- Risk aversion: $u(\mathbb{E}_p[x])>\mathbb{E}_p[u(x)]$

- Risk indifference or neutrality: $u(\mathbb{E}_p[x])=\mathbb{E}_p[u(x)]$

- Risk seeking: $u(\mathbb{E}_p[x])<\mathbb{E}_p[u(x)]$

In your billionaire meeting, $\mathbb{E}_{p_2}[x]$=\$500Mn, so we asked about the relationship between $u(500Mn)=u(\mathbb{E}_{p_2}[x])$ and $\mathbb{E}_{p_2}[u(x)]$ (the utility of taking the risk of the coin toss); in other words we asked whether you were risk averse, neutral, or risk seeking.

If your $A_{generous}$ choice was for the sure check, then you are risk-averse. Most people in fact are risk-averse. This seems to be a deep evolutionary tendency; in fact even other species demonstrate risk aversion. [Chen et. al.](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=911244) set up a token-based economy for capuchin monkeys, and found that - once they understood the idea of token-based money - they preferred to spend on definite rewards (one apple slice, say) than random rewards (a variable number of apple slices).

We should say that most people demonstrate _upside risk aversion_. Your $A_{thug}$ choice was probably for the coin toss, figuring that you had a 50% chance of escaping unharmed. While having two broken fingers is worse than having one broken finger, it probably isn't twice as bad. In other words, the negative utility grows more slowly than linear (risk neutral), just like positive utility. This probably made you _downside risk-seeking_. Since the utility of any number of broken fingers is negative, and assuming the utility of nothing happening (no broken fingers) is zero, your decision probably looked like $-.5|u_2|>-|u_1|$, or $|u_2|<2|u_1|$ (with obvious, if morbid, notation).

Recall that a function $f$ is _concave_ if $f(ax + (1-a)y) ≥ af(x) + (1-a)f(y)$ for all $a$ in $[0,1]$ (mnemonic: “inside the cave is better”). If the inequality goes the other way, the function is _convex_; if both sides are equal the function is _affine_. When a function $f$ is twice differentiable, it is concave, affine, or convex according to whether the second derivative $f^{\prime\prime}$ is negative, zero, or positive.

It is not hard to see the following
>**Proposition**: A utility function u on prizes is risk averse if and only if it is concave; is risk indifferent iff affine; and risk seeking iff convex.

We should really say "upside risk averse" in the Proposition since, as the thug example showed, the same concavity property that causes upside risk aversion will cause downside risk seeking. We had $-u_2>-2u_1$ where $u_i$ is the negative utility of $i$ broken fingers; the pain of two broken fingers is not as negative as double the pain of one broken finger. That's the same idea as \$1,000,000,000 not being twice as useful as \$500,000,000, but it causes the opposite result.

Given a utility function $u$ on prizes and a lottery $p$, the _certainty equivalent_ of $p$ is the number $m$ such that
$$u(m) = \mathbb{E}_p[u(x)]\tag{1.12}$$
$m$ is the sure prize that the utility function values equally with the risky set of outcomes in $p$. The amount $A_{switch}$ you recorded in the billionaire choice problem was your certainty equivalent of the billion dollar coin toss.

For most sensible purely financial situations, we can in effect introduce a fifth axiom of utility functions on prizes: people never prefer less money to more. The fact that people are sometimes philanthropic is not a counterexample, since the outcome set that takes philanthropy into account includes non-monetary outcomes. If the only outcomes are monetary, it is reasonably safe to assume more is preferred to less, and therefore utility functions on prizes are nondecreasing. For a scalar utility function $u$, this means $u's$ first derivative is non-negative: $u^{\prime}(x)\geq 0$.

This leads to another
>**Proposition:** Suppose the outcome set consists of an interval of the real line and $u$ is a continuous utility function on this interval. Then every lottery $p$ has at least one certainty equivalent. If $u$ is strictly increasing, then $p$ has at most one certainty equivalent.

For well-behaved utility functions, we can capture the degree of risk aversion with [Pratt's](http://www.jstor.org/stable/1913738) _coefficient of absolute risk aversion_, which is the second derivative divided by the first derivative of a (twice continuously differentiable strictly increasing) utility function:
$$Coeff_{abs}(u,x)=-\frac{u^{\prime\prime}(x)}{u^{\prime}(x)}\tag{1.13}$$
The _coefficient of relative risk aversion_ is $Coeff_{rel}(u,x)=Coeff_{abs}(u,x)\cdot x$.

For example, Gabriel Cramer's utility function was $u(x)=\sqrt{x}$. So $u^{\prime}(x)=\frac{1}{2\sqrt{x}}$, and $u^{\prime\prime}(x)=-\frac{1}{4\sqrt{x}}$, giving an absolute risk aversion of $\frac{1}{2x}$. For positive arguments x, this indicates the square root’s absolute risk aversion is positive – that is, the square root is risk-averse.

The square root’s relative risk aversion is ½, a constant. This means that the square root function belongs to the CRRA (Constant relative risk aversion) class of utility functions. CARA (Constant absolute risk aversion) is another important class.

The coefficient of risk aversion captures our intuitive idea of risk aversion as follows:
>**Definition**. Given two utility functions $u$ and $v$, _$u$ is more risk averse than $v$_ if, for every lottery $p$ and sure amount $m$ where $\mathbb{E}_p[u(x)]≥u(m)$, then $\mathbb{E}_p[v(x)]≥v(m)$, while there exists a lottery $q$ and a sure amount $k$ such that $\mathbb{E}_q[v(x)]≥v(k)$ but $\mathbb{E}_q[u(x)]<u(k)$.

This definition says if $u$ is “willing” to take the risk on $p$, then $v$ is too, but there is some $q$ where $v$ is “willing” to take the risk but $u$ isn’t.

This intuitive concept of willingness to take gambles ties in to the numerical coefficient in the way one would hope:
>Theorem (Pratt): $u$ is more risk averse than $v$ if and only if $u$’s coefficient of absolute risk aversion is higher than $v$’s.

Suppose we have a binary lottery: it takes 0 with 50% probability, takes \$10 with 50% probability. This is a losing lottery, so we have to pay people to enter it. Assume people have a utility function $u$.

If a person’s starting wealth is $w$, then the indifference point $c$ – the minimum amount we have to pay people to enter this losing lottery – is:
$$u(w)=.5*u(w+c)+.5*u(w+c-10)$$

- If the indifference point $c$ is the same for all wealth levels $w$, then $u$ is CARA (Constant Absolute Risk Aversion).
- If $c$ is increasing in $w$, then u is IARA (Increasing Absolute Risk Aversion).
- If $c$ is decreasing in $w$, then u is DARA (Decreasing Absolute Risk Aversion).
Most people act as if their utility function is DARA, which was Daniel Bernoulli’s insight back in 1738.

Imagine another binary lottery: it takes 0 with 50% probability, and takes 10% of your wealth with 50% probability. Let $f$ be the fraction of existing wealth $w$ we have to pay to get indifference to entering this lottery. Then the indifference fraction $f$ is where:
$$u(w)=.5u(w(1+f))+.5u(w(1+f-.1))$$

- If $f$ is the same for all wealth levels $w$, then $u$ is CRRA (Constant Relative Risk Aversion).
- If $f$ is increasing in $w$, then $u$ is IRRA (Increasing Relative Risk Aversion).
- If $f$ is decreasing in $w$, then $u$ is DRRA (Decreasing Relative Risk Aversion).
Most people act as if their utility function is CRRA or IRRA.

Generally economists find that most plausible utility functions fall in the HARA class (Hyperbolic Absolute Risk Aversion). A utility function $u$ is HARA if its coefficient of absolute risk tolerance $–\frac{u^{\prime}(x)}{u^{\prime\prime}(x)}$ is a linear function of the form $ax+b$. Note the coefficient of absolute risk tolerance is the reciprocal of the coefficient of absolute risk aversion.

Note if $a=0$ we have CARA and if $b=0$ we have CRRA. If $a\neq 0$ and $a\neq 1$, then $u(x)=(a-1)^{-1}(ax+b)^{(a-1)/a}$ up to an affine transformation.

+ ## 1.7.4 Drawbacks of Utility Theory

The mathematical scaffolding built on utility theory is powerful. Surely all this cleverness must give us a great guide to how people make economic decisions. If we could just get each person to paste a label on their forehead specifying their utility function, we could set about maximizing utility globally and move toward economic Nirvana for the human race.

Well, no. People have noticed various inconsistencies between utility theory and common sense. Utility remains a powerful concept and a good first-order guide to economic decision-making. But eventually it does depart from what people do. One point of departure is the overstrong set of assumptions about what constitutes rational decision making: while necessary to reason mathematically, some of these assumptions are debatable. Another point of departure comes from humans' stubborn insistence on being irrational.

Consider Daniel Bernoulli's framework - logarithmic utility dependent on current wealth $w$. If an economic actor's current $w$ is low enough, utility theory predicts that the actor will not take __**any**__ gamble beyond that wealth level.

For example, if someone's current wealth is $w=\$100$, and she is offered a coin toss where tails means she loses \\$101 and heads means she wins \\$1,000,000,000,000, logarithmic utility says she will decline the gamble since
$$\frac{1}{2}ln(100-101)+\frac{1}{2}ln(100+10^{12})=-\infty<ln(100)$$
In fact there is no finite amount of money on the "win" side of the coin toss that will induce such an economic actor to take the gamble. But that's absurd - of course anyone who is close to destitute would not mind being totally destitute, or even owing a little, in order to have a chance at a trillion dollars.

This particular example can be patched up - we can put a floor on the argument to the logarithm. But Rabin's Paradox (due to [Matthew Rabin](https://www.jstor.org/stable/2999450)) indicates this isn't just a technical problem with the logarithm: there's a fundamental problem that is difficult to fix. Rabin proves a theorem that says
>within the expected-utility model, anything but virtual risk-neutrality over modest stakes implies manifestly unrealistic risk aversion over large stakes. This theorem is entirely "nonparametric," assuming nothing about the utility function except concavity.

Rabin shows inconsistencies like this:
>Suppose that, from any initial wealth level, a person turns down gambles where she loses \\$100 or gains \\$110, each with 50% probability. Then she will turn down 50-50 bets of losing \\$1,000 or gaining _any_ sum of money.

This is similar to our observation about a person with low wealth and a logarithmic utility function, but it applies to any concave (risk-averse) utility function. The conclusion from Rabin's theorem is that it's difficult to calibrate a utility function to make reasonable decisions over both modest amounts and large amounts.

Another problem is the possibility of _background noise_. It is rare that economic agents are presented with as pure a decision as the St. Petersburg Paradox, or the generous billionaires, or the probabilistic thug. In business decisions, risks of changes in the general economic environment may be as large or larger than the risk of success or failure of a particular business venture. In fact, [Pomatto, Strack and Tamuz](www.its.caltech.edu/~lpomatto/fosd.pdf) have shown that if there are two random variables $X$ and $Y$ with $\mathbb{E}[X]>\mathbb{E}[Y],$ then we can always find a zero-mean background noise random variable $Z$ independent of $X$ and $Y$ so that $X+Z$ stochastically dominates $Y+Z$. Thus the right kind of background noise can obliterate any risk considerations that might go into a decision between $X$ and $Y$.

For example suppose $X$ is the generous billionaire coin toss \\$1Bn-or-zero, and $Y$ is a sure check for \\$400Mn. Then $\mathbb{E}[X]>\mathbb{E}[Y],$ but as we've seen risk aversion will cause many people to choose $Y$ anyway. However let $Z$ take \\$250Mn on heads and pay \\$250Mn on tails. Then $X+Z$ pays \\$750Mn and \\$250Mn on heads and tails, respectively. This stochastically dominates $Y+Z$ which pays \\$150Mn and \\$650Mn on heads and tails, respectively. Thus any minimally rational economic agent will prefer $X+Z$ to $Y+Z$, so the presence of the background noise $Z$ has befuddled any risk-averse decision.

Another inconsistency is called the _Allais problem_, described in a 1953 paper by [Maurice Allais](https://www.jstor.org/stable/1907921). Allais first poses (p. 527) a happy choice between
- Situation A: Definitely receive 100Mn
- Situation B: 10% chance of 500Mn; 89% chance of 100Mn; 1% chance of nothing.
<br>
He notes that most people prefer A over B. In utility theory terms this means $u(100)>.1u(500)+.89u(100)+.01u(0)$, or $.11u(100)>.1u(500)+.01u(0)$.

He then poses an apparently different choice between
- Situation C: 11% chance of receiving 100Mn; 89% chance of receiving nothing
- Situation D: 10% chance of receiving 500Mn; 90% chance of receiving nothing
<br>
He notes most people prefer D over C. In utility terms this means $.1u(500)+.9u(0)>.11u(100)+.89u(0)$, or $.1u(500)+.01u(0)>.11u(100)$. This of course is the opposite direction implied by preferring A over B.

One way to address the Allais problem is to assume that people are systematically irrational in the way they perceive probabilities. Let $d(x)$ be a monotone increasing distortion function applied to probabilities (the unit interval $[0,1]$), where $d(0)=0$ and $d(1)=1$ but $d$ increases rapidly just above 0, flattens to about 50% in the middle, and then increases rapidly again just before 1. If people are acting on $d(probability)$ rather than on the stated probability, it can explain why someone would choose A>B and D>C. The idea of systematic irrationality has been pursued deeply by _behavioral economists_, who have found both psychological theory and empirical observations to back the idea that economic humans are not only irrational, but are reliably irrational.

These drawbacks tell us not to rely too heavily on utility theory alone. But when taken together with common sense, utility theory can be a powerful tool to help understand decision making in the presence of risk.