How much should you pay for information to help you make a decision?

Imagine that I run a business, and I'm thinking of making a new product. I think it will sell well, but it's probably worth running a survey or doing some user research to be more certain. How much should I spend on that research?  This scenario asks me to _value_ information.

If I'm going to spend \\$100,000 developing and selling it, that's the max I stand to lose, but if I'm already highly certain it will succeed, wouldn't it be excessive to spend \\$100,000 on research? This thought process implies that the _expected value_ of information is related to my expected loss. However, if I run the survey and learn that the product is actually highly likely to fail, then the _actual value_ of information turned out to be worth the \\$100,000 that I saved. 

In this post, I'm going to examine the question of how much one should pay for information, using a simple gambling game under the lens of maths and probability.

## The dice game

I offer to play this game with as many times as you'd like. I'll roll a dice, if it lands on or 1-2, you lose \\$900. If it lands on 3 to 6, you win \\$60. Even though you're more likely to win than lose, the cost of losing is so great that your losses will far outstrip your winnings after a few games.

We can calculate the **Expected Value** of the game - how much you make or lose on average each game, by multiplying the value of each outcome with its probability, and adding the results.

| Outcome     | Probability | Value   | Expected Value |
|-------------|-------------|---------|----------------|
| 1, 2        | 1/3         | -\\$900 | -\$300          |
| 3, 4, 5, 6  | 2/3         |  \\$60  | \\$40         |
| **Overall** | 1           |         | **-\\$260**    |

Or, as totally legit math notation, first defining the concept of a fair six-sided dice:

$$
\Pr(x)_D = \begin{cases} 1/6 &  x \in \{1, 2, 3, 4, 5, 6\} \\ 0 & \textrm{otherwise} \\ \end{cases}
$$

Then defining the game in terms of $x$ and computing its expected value:

$$
\begin{align}
F(x) &= \begin{cases}-900 & x \in \{1, 2\}  \\ 60 & x \in \{3, 4, 5, 6\} \end{cases} \\
E|F(x)| &= \sum_{x \in \{1..6\}} \Pr(x)F(x) \\
&= \frac{1}{6} (60 + 60 + 60 + 60 - 900 - 900) \\
&= -260
\end{align}
$$

Given this setup, you would not want to play this game - that is your **default decision**. If you decide not to play, but I roll the decide to see what would have happened and it lands on a 5, you miss out on \\$60. That's **Opportunity Loss**. Since the chance of that happening is 2/3, your **Expected Opportunity Loss** is \\$40 - the expected value of being wrong, given your current decision. 

Here are the same definitions in handy table format:


| Term                      | Definition                                                                                             | Value in this game  |
|---------------------------|--------------------------------------------------------------------------------------------------------|---------------------|
| Default Decision          | The decision you would make given the information you have.                                            | Don't play!         |
| Opportunity Loss          | The amount that you expect to lose (or not make) if your decision was wrong                            | \\$60               |
| Expected Opportunity Loss | The expected value of Opportunity Loss, i.e. the chance of each loss scenario multiplied by its value. | \\$60 * 2/3 = \\$40 |
|                           |                                                                                        




## Perfect information

Now I'm going to give you the opportunity to learn the result of the game:

1. I roll the dice secretly,
2. For a price, I offer to tell you the result,
3. After buying (or not buying) the result, you decide whether to play.

Because I'm offering you _perfect information_ on the result of this game, you're never at risk of losing $900. You either:

- Don't buy the result, and don't play,
- Learn that you'll win and decide to play, or
- Learn that you'll lose, decide not to play.

In this situation, what's the absolute maximum you should pay to know the dice roll?

- Perhaps \\$60? No - if you pay \\$60, you'll make that money back 2/3 of the time, but lose it 1/3 of the time. Overall, you expect to lose \\$20 per game.
- Perhaps \\$0? That would be great for you, but I'm not gonna give you the information for free!

Let's derive this with #maths. I want to find $k$, such that the expected value of the game is at least \\$0 if you pay up to $k$ for perfect information. The game looks like this now:

$$
G(x) = \begin{cases}- k + 0 & x \in \{1, 2\} \\ - k + 60 & x \in \{3, 4, 5, 6\} \end{cases}
$$

Solving $E|G(x)| \geq 0$ for $k$.

$$
\begin{align}
0 &\leq E|G(x)| = \sum_{x \in \{1..6\}} \Pr(x)G(x) \\
&\leq \frac{2}{6}(-k) + \frac{4}{6}(- k + 60)\\
&\leq -k + \frac{4 * 60}{6} \\
\therefore k &\leq40
\end{align}
$$

The absolute maximum you should pay is $40, since you expect to break even. If you can pay less, you'll make a profit. If you pay more, you'll lose in the long run. Here's the table for the expected value when you pay \\$40 for information.

| Outcome     | Probability |          Value                               | Expected Value |
|-------------|-------------|:---------------------------------------------|----------------|
| 1, 2        | 1/3         | -\\$40 + \\$0 = -\\$40 (decide not to play)  | -\\$13.33      |
| 3, 4, 5, 6  | 2/3         | -\\$40 + \\$60 = \\$20 (decide to play)      |  \\$13.33      |
| **Overall** | 1           |                                              | **-\\$0**      |

Note that the expected value of _perfect_ information is equal to the expected oppportunity loss.


You can rarely hope for perfect information in complex scenarios, but we can at least say that you would never want to pay more for _imperfect_ information than for perfect information. This has an interesting consequence: **the _maximum_ you should be willing to pay for information is unrelated to the expected value of your default decision**. 

If we changed the game so that you lose \\$9 million on a 1 or 2, it doesn't matter, since with perfect information you'll never risk that loss anyway! On the other hand, if I occasionally lie to you, you definitely _would_ care if you stood to lose \\$9 million versus \\$900. The question is, how much would you care?

## Imperfect information

In a new and dangerous version of the game, I offer you information for $k$, but there's a chance $p$ that I'll lie to you. This will be represented by $Y$ (for "wh**y** would you lie to me?!").

$$
\Pr(y)_Y = \begin{cases} p &  lie \\ 1 - p & truth \\ \end{cases}
$$

When I lie, you end up playing a losing game, or missing out on a winning game.

$$
H(x, y) = \begin{cases}
- k + 0 & x \in \{1, 2\} \cap y = truth \\
- k -900 & x \in \{1, 2\} \cap y = lie \\
- k + 60 & x \in \{3, 4, 5, 6\} \cap y = truth \\
- k + 0 & x \in \{3, 4, 5, 6\} \cap y = lie \\
\end{cases}
$$

What is the maximum you should pay $k$, in terms of the unreliability of information $p$? Solving $E|H(x, y)| \geq 0$ for $k$ and $p$:

$$
\begin{align}
0 &\leq E|H(x, y)| \\
&\leq \sum_{x \in \{1..6\},\\y \in \{truth, lie\}} \Pr(x)\Pr(y)H(x, y) \\
&\leq \frac{1}{3}(1-p)(-k) + \frac{1}{3}p(-k - 900) + \frac{2}{3}(1-p)(-k + 60) + \frac{2}{3}p(-k + 0)\\
\end{align}
$$

Putting that through the [magic math unicorn](https://www.wolframalpha.com/input/?i=%5Cfrac%7B1%7D%7B3%7D%281-p%29%28-k%29+%2B+%5Cfrac%7B1%7D%7B3%7Dp%28-k+-+900%29+%2B+%5Cfrac%7B2%7D%7B3%7D%281-p%29%28-k+%2B+60%29+%2B+%5Cfrac%7B2%7D%7B3%7Dp%28-k+%2B+0%29+%3E%3D+0):

$$
k \leq 40 - 340p \\
$$

To understand this, consider the extremes:

- If I never lie ($p = 0$), I'm giving you perfect information. We've already covered that case: $k \leq \$40$, the expected opportunity loss when not playing the game.
- If I always lie ($p = 1$), then I'm perfectly misleding you. I'd have to _give you_ \\$300 per game to make up for your losses, which is the expected opportunity loss when playing the game!

Between $p = 0$ and $1$, the expected value of information varies linearly. I lie to you less than $p = 30/340 \approx 12\%$ of the time, then there exists a price at which it's worth paying to play.

Finally, we can also imagine a version where instead of truth or lies, I give you truth or nothing. When I give you nothing, you'll just stick with your default decision to not play. If there's even a small chance I might give you the truth then there should be _some price_ worth paying, but if I'll never tell you anything, then the information is clearly worth nothing. Symbolically, $k \leq 40 - 40p$.

This suggests two categories of imperfect information: potentially useless information, and potentially misleading information.

**Potentially useless information** might influence to move from your default decision, but only if it was wrong. For example: an office manager of a small company is deciding whether to provide a vegetarian option for lunch. If the number of vegetarians exceeds a threashold they'll include the option, but since they've not heard that anyone is vegetarian they might default to not doing so. In that situation, a survey of the office can only help, but it could be useless if too many vegatarians miss the survey.

**Potentially misleading information** might influence to move from your default decision, even if it was _right_. Taking the example from the opening of product research, perhaps we're so confident in the product that the default decision will be to go ahead with it, but we're willing to pay a bit to check that's not a mistake. If by chance, a survey of potential customers is answered by an unrepresentative slice of the population who all hate it, it could persuade us to abandon the project even if it was a good idea.

# Bounding the value of information

### TODO: finish recap


Ultimately, these are simple examples which aren't going to help you put an exact price tag on a customer survey for a complex new product, but nevertheless, I believe these results form  guidelines for thinking about how much to spend on measurement.


For the upper bound:

1. The _most_ you should spend is the expected opportunity loss if your default decision is wrong.

For the lower bound, it depends on whether the information could be _misleading_, or simply _useless_.

2. If the information 
