## RLDMUU 2025
#### Decision making 
jakub.tluczek@unine.ch

In [1]:
import numpy as np

In today's lab we are going to consider the meteorologist problem. Assume there are $n$ weather stations, which predict what is the chance that it's going to rain or not. In the beginning we believe each weather station the same, that is, we believe that prior probability a given station is correct is equal to $\frac{1}{n}$, and at any time prior probabilities sum up to 1. We consider a repeated problem, where we decide whether or not to take a coat outside. We're happy if we don't take the coat and it doesn't rain - after all we don't have to carry it around with ourselves. We are indifferent if it rains or not when we takea a coat - we're not getting wet after all. However, we really don't like if we don't take the coat and it rains, since we have the risk of getting sick. Our utility table can be summed up as follows:

| U | no rain | rain |
| - | --------| ---- |
| no coat | 1 | -10 |
| coat | 0 | 0 |

Your task is to update the belief about the meteo stations after observing outcomes (if it rains or not) and pick the best corresponding action, both by calculating the expected utility or using the MAP approach. First, implement fucntions calculating the marginal forecast for a given outcome in `marginal_prediction`.

In [121]:
def marginal_prediction(belief, forecast, outcome):
    # TODO: return the marginal prediction for a given outcome (rains or not in our case) for all the stations
    return belief @ forecast[:,outcome]

Then, calculate expected utilities, either using the forecasts (and associated priors!) from all stations, or just picking the one you trust the most with MAP.

In [122]:
def expected_utility(belief, forecast, outcome, utility):
    # TODO: calculate the expected utility for an outcome, given the prior beliefs, forecast and  utility matrix
    outcome_prob = marginal_prediction(belief,forecast, outcome)
    return np.mean(outcome_prob*utility[:,outcome])

def expected_MAP_utility(belief, forecast, outcome, utility):
    # TODO: Calculate the expected utility, using only the data from the station you trust the most
    station = belief.index(max(belief))
    forecast_prob = forecast[station,outcome]
    return np.mean(forecast_prob*utility[:,outcome])


Now it's time to pick the action that maximizes utility:

In [113]:
def best_action(belief, forecast, utility, use_map=False):
    # TODO: pick the action that gives you maximal expected utility. MAP flag indicates whether you use this method or not
    if use_map:
        station = np.argmax(belief)
        forecast_prob = forecast[station,:]
        expected_util_action = utility @ forecast_prob
        return np.argmax(expected_util_action)
    else:
        prob_outcomes = belief@forecast
        expected_util_action = utility @ prob_outcomes.T
        return np.argmax(expected_util_action)
    pass

Finally, update the belief given the forecasts and actual outcome.

In [119]:
def update_belief(belief, forecast, outcome):
    # TODO: update the belief about whether the station is the one to be trusted or not. Return the belief
    probabilities = forecast[:,outcome]
    new_belief = probabilities*belief/np.sum(probabilities@belief)
    return new_belief
    pass

In [120]:
from enum import Enum

class Outcome(Enum):
    NO_RAIN = 0
    RAIN = 1

T = 4 # time horizon
N = 3 # number of stations

# forecast table with rain probabilites
# each row represents a station
forecasts = np.matrix('0.1 0.1 0.3 0.4; 0.4 0.1 0.6 0.7; 0.7 0.8 0.9 0.99')

n_outcomes = 2
# probability table placeholder for forecasts
P = np.zeros([N, n_outcomes])
# initial belief of 1/n for each station
belief = np.ones(N) / N
# actual events - whether it rained or not
rain = [Outcome.NO_RAIN.value, Outcome.NO_RAIN.value, Outcome.RAIN.value, Outcome.NO_RAIN.value]

for t in range(T):
    for model in range(N):
        # Filling up the probability table
        P[model,1] = forecasts[model,t] # the table predictions give rain probabilities
        P[model,0] = 1.0 - forecasts[model,t] # so no-rain probability is 1 - that
    probability_of_rain = marginal_prediction(belief, P, Outcome.RAIN.value)
    # declaring our utility matrix
    U  = np.matrix('1 -10; 0 0')
    # picking best actions
    action = best_action(belief, P, U)
    MAP_action = best_action(belief, P, U, use_map=True)
    print(f"RESULTS ROUND {t+1}")
    print(f"Best action: {action}\t MAP best action {MAP_action}")
    # updating beliefs
    belief = update_belief(belief, P, rain[t])
    print("New beliefs:")
    print(belief)


RESULTS ROUND 1
Best action: 1	 MAP best action 1
New beliefs:
[0.5        0.33333333 0.16666667]
RESULTS ROUND 2
Best action: 1	 MAP best action 1
New beliefs:
[0.57446809 0.38297872 0.04255319]
RESULTS ROUND 3
Best action: 1	 MAP best action 1
New beliefs:
[0.39130435 0.52173913 0.08695652]
RESULTS ROUND 4
Best action: 1	 MAP best action 1
New beliefs:
[0.59866962 0.39911308 0.00221729]
