In [1]:
%load_ext autoreload
%autoreload 2
import sys
sys.path.insert(0,'../../modules')

In [2]:
import numpy as np
import factors
from factors_inference import sum_product_variable_elimination

# Decision Networks
We can implement the idea of utility using a directed PGM. The decision nodes are treated the same as any other variable, but we set the probability to 1 for whatever decision we are making. Then we get a set of infered values for each node using an inference algorithm. Then a utility function is used with each set of values to get an expecation. An example is given below, where you want to know whether you should take the bus, given it might rain and you could be late to work. 

In [3]:
rain_factor = factors.Factor(["it will rain"],[2])
bus_factor = factors.Factor(["bus is early"],[2])
decision = factors.Factor(["try to catch the bus"],[2])
catch_bus = factors.Factor(["caught bus","bus is early","try to catch the bus"],[2,2,2])
arrive_dry = factors.Factor(["dry","it will rain","caught bus"],[2,2,2])
arrive_late = factors.Factor(["late","caught bus","try to catch the bus"],[2,2,2])
rain_factor.set_all([0.7,0.3]) # 30% chance of rain
bus_factor.set_all([0.85,0.15]) # 15% chance bus is early
catch_bus.set_all([1.0,0.1,1.0,0.6,0.0,0.9,0.0,0.4]) # 90% if the bus is on time 40% if not. 
arrive_dry.set_all([0,0,0.95,0,1,1,0.05,1])
f = -1 # need something other than nan for values which cannot be true. Doesn't matter what this variable is.
arrive_late.set_all([0.95,0.1,f,0.95,0.05,0.9,f,0.05])
utility_factor = factors.Factor(["dry","late"],[2,2])
utility_factor.set_all([-2,-7,0,-5])
utility_relevant_nodes = [arrive_dry,arrive_late]
non_utility_relevent_factors = [rain_factor,bus_factor,catch_bus]

In [4]:
def run_decision_network(decision_node,utility_nodes,utility_factor,other_nodes,evidence_vars=[],evidence_vals=[]):
    max_utility = -np.inf
    best_decision = -1
    print("decision",decision_node.names)
    print("evidence",dict(zip(evidence_vars,evidence_vals)))
    for option in decision_node.indexes:
        decision_node.array = np.zeros(decision_node.array.shape)
        decision_node.set(option,1)
        var_names_to_marginalize = [node.names[0] for node in [decision_node]+other_nodes]
        all_factors = [decision_node]+utility_nodes+other_nodes
        probs = sum_product_variable_elimination(all_factors,evidence_vars,evidence_vals,var_names_to_marginalize)
        total_utility = np.sum(factors.product(probs,utility_factor).array)
        print("choice",option,"UTILITY:",total_utility.round(5))
        if(total_utility>max_utility):
            max_utility = total_utility
            best_decision = option
    return best_decision,max_utility

This algorithm is actually very simple: <br> 
1. Iterate through each decision and infer the joint distribution over hidden variables (given evidence)
2. Multiply the utility of each possible outcome by its probability, and sum the result to get the expectation
3. Make the decision corresponding to the maximum expected utility.
**Seeing it will not rain:** (definitely shouldn't try and catch the bus)

In [5]:
best_decision,max_utility = run_decision_network(decision,
                             utility_relevant_nodes,
                             utility_factor,
                             non_utility_relevent_factors,
                             evidence_vars=["it will rain"],
                             evidence_vals=[0])
print("best decision",best_decision)

decision ['try to catch the bus']
evidence {'it will rain': 0}
choice [0] UTILITY: -0.25
choice [1] UTILITY: -0.99375
best decision [0]


**Seeing it will rain:** (almost certainly better ot catch the bus and avoid being wet)

In [8]:
best_decision,max_utility = run_decision_network(decision,
                             utility_relevant_nodes,
                             utility_factor,
                             non_utility_relevent_factors,
                             evidence_vars=["it will rain"],
                             evidence_vals=[1])
print("best decision",best_decision)

decision ['try to catch the bus']
evidence {'it will rain': 1}
choice [0] UTILITY: -2.15
choice [1] UTILITY: -1.32625
best decision [1]


**Not knowing the weather:**

In [9]:
best_decision,max_utility = run_decision_network(decision,
                             utility_relevant_nodes,
                             utility_factor,
                             non_utility_relevent_factors)
print("best decision",best_decision)

decision ['try to catch the bus']
evidence {}
choice [0] UTILITY: -0.82
choice [1] UTILITY: -1.0935
best decision [0]


Often decision diagrams are drawn with the random variables as circles the decisions as squares and the utility as its own node, a diamond.

# Value of information
A very important question to ask is whether it is worth getting a piece of information in order to make a decision. Say, in the above example, checking the weather report. What is the value of that information? In answering this we assume a rational actor which will make the best decision given the information available to them. So, in the above case when I don't know anything my best decision is to not take the bus, with a maximum expected utility of $-0.82$. Alternatively if I know it will not rain my maximum expected utility is $-0.25$ and if I know it will it is $-1.32625$. Obviously, the probability of the observation also matters. To get the expected utility knowing information we multiply the maximum utility for each observation by the probability of the observation and sum them up. So in the above case, when the chance of rain is $30\%$: <br>
$$-1.32625 \times 0.3 + -0.25 \times 0.7 = -0.572875$$
So, the expected utility if I can know the information is $-0.572875$ <br>
The value of information is simply the difference between this value and the maximum expected utility knowing nothing. $-0.572875-(-0.82)=0.241725$ <br>
So, with the information I can expect my final utility to go up by $0.241725$.