In [1]:
%load_ext autoreload
%autoreload 2
import sys
sys.path.insert(0,'../../modules')

In [2]:
import numpy as np

# Utility from first principals
Consider a set of outcomes. We have a preference over each outcome, along with a probability of that outcome given our actions. We write $A>B$ if we prefer $A$ over $B$ and $A\sim B$ if we don't care. The probabilities are written with $p$ as usual. 
### Imposing constraints on preferences
We assume as axioms the following hold: <br>
If $A > B$ or $B > A$ or $A \sim C$ (completeness) <br>
If $A\geq B$ and $B\geq C$ then $A\geq C$ (transitivity) <br>
If $A\geq C\geq B$ then there is a probability $p$ such that when the probability of $A$ is $p$ and $B$ is $1-p$ then the preference for both $A$ and $B$ is equal to $C$. (continuity) <br>
If $A\geq B$ then for any $C$ and probability $p$ the preference for both $A$ and $C$ as outcomes is greater than the preference for $B$ and $C$ where the probability of $B$ and $A$ is $p$ and the probability of $C$ is $1-p$. (independence) <br> <br>
From these axioms it follows there is a utility function $U$ for which $U(A) > U(B)$ if and only if $A > B$ and $U(A) = U(B)$ if and only if $A \sim B$ <br> <br>
The axioms imply that the utility of a set of outcomes is given by the expectation over the utility, $\sum_i p_i U(S_i)$ where $S$ are the outcomes.

We can compare different actions by comparing the final utility over all possible outcomes. For instance, suppose you have two coins in a game. Coin 1 ($C1$) is fair while Coin 2 ($C2$) has only a $25\%$ chance of heads. Say for the first coin you get $\$10$ if its heads and $\$5$ if it is false, while for coin 2 you get $\$14$ if heads, $6$ otherwise. You can choose which coin to flip, which one? <br>
If we say utility is equal to dollars, then the utility for the first coin is $0.5\times 10 + 0.5\times 5 = 7.5$. For the second coin the utility is $0.25 \times 14 + 0.75 \times 6 = 8$. Therefore the second coin is the best. 

In [20]:
coin1_samples = np.random.choice([0,1],1000,p=[0.5,0.5])
c1_ut = (10*np.sum(coin1_samples==0)+5*np.sum(coin1_samples==1))/1000
print(c1_ut,"coin 1 estimated utility")
coin2_samples = np.random.choice([0,1],1000,p=[0.25,0.75])
c2_ut = (14*np.sum(coin2_samples==0)+6*np.sum(coin2_samples==1))/1000
print(c2_ut,"coin 2 estimated utility")

7.51 coin 1 estimated utility
7.912 coin 2 estimated utility


In the above case we could use the expected dollars as a utility, but this isn't always the case. There are different appraoches to risk. <br>
**Risk Neutral** behaviour is used above in the coin case. The return is all that matters. A $50\%$ chance of getting $\$100$ and a $100\%$ chance of getting $\$50$ are the same. <br>
**Risk Seeking** utility has a preference for higher returns. So, the first option above is picked. <br>
**Risk Averse** behaviour prefers the second option, $100\%$ chance of getting $\$50$. <br>
In general we are interested in maximizing the expected utility of an action. Say we have actions $a$, states $s$ and observations $o$. The expected utility is:
$$EU(a|o)=\sum_{s'}p(s'|a,o)U(s')$$
The best action is the $a$ which maximizes this function.