In [1]:
import numpy as np
import matplotlib.pyplot as plt

<br>

# Basics
---

**Utility avoids confusing importance and likelihood**. To make good decisions, it is not enough to know the probability of each outcome, we must know the desirability of each outcome.

<br>

### Maximum Expected Utility

The **utility** of an outcome is a numeric value that allows to compare the relative desirability of each outcome. The utility assigns a value $U(s)$ to the a state $s$. If a rational agent has the choice between several actions, the agent should chose the action that has the maximum expected utility:

&emsp; $a^* = \underset{a}{\operatorname{argmax}} \mathbb{E}_{s' \sim p(s, a)}[U(s')]$
&emsp; where $s'$ is the outcome of action $a$ at state $s$

The **utility** is equivalent to the state value in Reinforcement Learning, while the **action value** is equivalent to the expected utility of taking the action $a$.

<br>

### Finding the utility

The choice of the utility is critical and defines the situation the agent will prefer. Successfully handling a problem requires finding the appropriate utility function. Unfortunately, this is not that easy. If we take the example of money, the utility does not grow linearly with the amount of money:

* people will tend to prefer to secure 100,000 rather than throwing a coin to possibly gain 250,000
* people will tend to prefer to lose 100 every month rather than having a 1% chance of losing 10,000

This behavior is call **risk-adversity** and do not always apply. Some people are not risk-adverse, and it also depends on how much they have in their bank account. Someone with plenty of money will likely take more changes, and someone with a big debt will also do so (in a desesperate attempt to save their situation).

We can easily explain this by modeling the utility of money as a logarithm of the total amount of money you would have after your choice is made:

In [27]:
def utility(money: int):
    if money >= 0:
        return np.log(1 + money)
    else:
        return - np.log(1 + abs(money))

def will_bet(bank_account: int, secure_win: int, possible_win: int, prob_win: float):
    win_utility = utility(bank_account) * (1 - prob_win) + prob_win * utility(bank_account + possible_win)
    return utility(bank_account + secure_win) < win_utility

print(will_bet(1000, 1000, 3000, 0.5))   # Will take the chance
print(will_bet(1000, 10000, 30000, 0.5)) # Will not take the chance
print(will_bet(-10000, 1000, 3000, 0.3)) # Will take the change although the expected gain is less

True
False
True


<br>

### Composing utilities

* sum of utility for independent criteria

<br>

### Post decision disappointment

* example of random function around 0

<br>

# Information Value Theory
---

* gathering information has a cost too: agent should only do it when increase in value
* the value of information is equal to the increase in expected value in our decision making