# DX 704 Week 6 Project

This project will develop a treatment plan for a fictious illness "Twizzleflu".
Twizzleflu is a mild illness caused by a virus.
The main symptoms are a mild fever, fidgeting, and kicking the blankets off the bed or couch.
Mild dehydration has also been reported in more severe cases.
These symptoms typically last 1-2 weeks without treatment.
Word on the internet says that Twizzleflu can be cured faster by drinking copious orange juice, but this has not been supported by evidence so far.
You will be provided with a theoretical model of Twizzleflu modeled as a Markov decision process.
Based on the model, you will compute optimal treatment plans to optimize different criteria, and compare patient discomfort with the different plans.

The full project description, a template notebook, and raw data are available on GitHub: [Project 6 Materials](https://github.com/bu-cds-dx704/dx704-project-06).

We will model Twizzleflu as a Markov decision process.
The model transition probabilities are provided in the file "twizzleflu-transitions.tsv" and the expected rewards are in "twizzleflu-rewards.tsv".
The goal for Twizzleflu is to minimize the expected discomfort of the patient which is expressed as negative rewards in the file.

## Example Code

You may find it helpful to refer to these GitHub repositories of Jupyter notebooks for example code.

* https://github.com/bu-cds-omds/dx601-examples
* https://github.com/bu-cds-omds/dx602-examples
* https://github.com/bu-cds-omds/dx603-examples
* https://github.com/bu-cds-omds/dx704-examples

Any calculations demonstrated in code examples or videos may be found in these notebooks, and you are allowed to copy this example code in your homework answers.

## Part 1: Evaluate a Do Nothing Plan

One of the treatment actions is to do nothing.
Calculate the expected discomfort (not rewards) of a policy that always does nothing.

Hint: for this value calculation and later ones, use value iteration.
The analytical solution has difficulties in practice when there is no discount factor.

In [1]:
# YOUR CHANGES HERE

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Import twizzleflu-rewards.tsv and twizzleflu-transitions.tsv
rewards = pd.read_csv('twizzleflu-rewards.tsv', sep='\t')
transitions = pd.read_csv('twizzleflu-transitions.tsv', sep='\t')

In [2]:
rewards.head()

Unnamed: 0,action,state,reward
0,do-nothing,exposed-1,0.0
1,do-nothing,exposed-2,0.0
2,do-nothing,exposed-3,0.0
3,do-nothing,symptoms-1,-0.5
4,do-nothing,symptoms-2,-1.0


In [3]:
transitions.head()

Unnamed: 0,action,state,next_state,probability
0,do-nothing,exposed-1,exposed-2,0.8
1,do-nothing,exposed-1,recovered,0.2
2,do-nothing,exposed-2,exposed-3,0.8
3,do-nothing,exposed-2,recovered,0.2
4,do-nothing,exposed-3,symptoms-1,0.8


In [4]:
# one step state-action values from a value estimate.
# will use this a lot!

def compute_qT_once(R, P, gamma, v):
    # Rewards + gamma x expected future rewards for each action
        # R means immediate rewards
        # gamma means discount factor
        # P means transition probabilities
        # v means value function
        # R has shape (nA, nS)
        # P has shape (nA, nS, nS)
        # v has shape (nS,)
        # returns Q^T with shape (nA, nS)
    return R + gamma * P @ v


def iterate_values_once(R, P, gamma, v):
    # one step of value iteration
    # returns v_{i+1} given v_i
    # this represents the maximum reward over all actions from computing Q^T
    return np.max(compute_qT_once(R, P, gamma, v), axis=0)

def value_iteration(R, P, gamma, max_iterations=100, tolerance=0.001):
    # initial approximation v_0
        # Vector of all zeros
    v_old = np.zeros(R.shape[-1])

    # Iterate up to max_iterations (default 100)
    for i in range(max_iterations):
        # compute v_{i+1}
            # one step of value iteration
        v_new = iterate_values_once(R, P, gamma, v_old)

        # check if values did not change much (less than tolerance, default 0.001)
        if np.max(np.abs(v_new - v_old)) < tolerance:
            # If there was no significant change, return v_{i+1}
                # Only return if converged (meaning that values did not change much)
            return v_new
        # If there was significant change, continue to next iteration
            # Reassign v_{i+1} to v_i
        v_old = v_new

    # return v_{max_iterations} if not converged
    return v_old

In [22]:
# one step state-action values from a value estimate.
# will use this a lot!

def compute_qT_once_discomfort(R, P, gamma, v):
    # (C | -R) + gamma x expected future rewards for each action
        # C | -R means immediate discomfort
        # gamma means discount factor
        # P means transition probabilities
        # v means value function
        # R has shape (nA, nS)
        # P has shape (nA, nS, nS)
        # v has shape (nS,)
        # returns Q^T with shape (nA, nS)
    return -R + gamma * P @ v

def iterate_values_once_discomfort(R, P, gamma, v):
    # one step of value iteration
    # returns v_{i+1} given v_i
    # this represents the maximum reward over all actions from computing Q^T
    return np.min(compute_qT_once_discomfort(R, P, gamma, v), axis=0)

def value_iteration_discomfort(R, P, gamma, max_iterations=100, tolerance=0.001):
    # initial approximation v_0
        # Vector of all zeros
    v_old = np.zeros(R.shape[-1], dtype=float)

    # Iterate up to max_iterations (default 100)
    for i in range(max_iterations):
        # compute v_{i+1}
            # one step of value iteration
        v_new = iterate_values_once_discomfort(R, P, gamma, v_old)
        # check if values did not change much (less than tolerance, default 0.001)
        if np.max(np.abs(v_new - v_old)) < tolerance:
            # If there was no significant change, return v_{i+1}
                # Only return if converged (meaning that values did not change much)
            return v_new
        # If there was significant change, continue to next iteration
            # Reassign v_{i+1} to v_i
        v_old = v_new

    # return v_{max_iterations} if not converged
    return v_old

In [32]:
a_dn = "do-nothing"

# state index
states = sorted(set(rewards["state"]) | set(transitions["state"]) | set(transitions["next_state"]))
S = {s: i for i, s in enumerate(states)}
ns = len(states)
gamma = 1.0

R = np.zeros((1, ns), dtype=float)
P = np.zeros((1, ns, ns), dtype=float)

In [33]:
R

array([[0., 0., 0., 0., 0., 0., 0.]])

In [34]:
P

array([[[0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.]]])

In [35]:
rewards

Unnamed: 0,action,state,reward
0,do-nothing,exposed-1,0.0
1,do-nothing,exposed-2,0.0
2,do-nothing,exposed-3,0.0
3,do-nothing,symptoms-1,-0.5
4,do-nothing,symptoms-2,-1.0
5,do-nothing,symptoms-3,-0.5
6,do-nothing,recovered,0.0
7,drink-oj,exposed-1,0.0
8,drink-oj,exposed-2,0.0
9,drink-oj,exposed-3,0.0


In [36]:
# Fill R for do-nothing (one reward per state; average duplicates if any)
r_dn = (
    rewards[rewards["action"] == a_dn]
      .groupby("state", as_index=True)["reward"]
      .mean()
)
r_dn

state
exposed-1     0.0
exposed-2     0.0
exposed-3     0.0
recovered     0.0
symptoms-1   -0.5
symptoms-2   -1.0
symptoms-3   -0.5
Name: reward, dtype: float64

In [37]:
# Fill R
for s_name, r in r_dn.items():
    if s_name in S:
        R[0, S[s_name]] = float(r)
R

array([[ 0. ,  0. ,  0. ,  0. , -0.5, -1. , -0.5]])

In [38]:
# Fill P for do-nothing
    # Select only do-nothing transitions
subP = transitions[transitions["action"] == a_dn]
subP

Unnamed: 0,action,state,next_state,probability
0,do-nothing,exposed-1,exposed-2,0.8
1,do-nothing,exposed-1,recovered,0.2
2,do-nothing,exposed-2,exposed-3,0.8
3,do-nothing,exposed-2,recovered,0.2
4,do-nothing,exposed-3,symptoms-1,0.8
5,do-nothing,exposed-3,recovered,0.2
6,do-nothing,symptoms-1,symptoms-1,0.7
7,do-nothing,symptoms-1,symptoms-2,0.3
8,do-nothing,symptoms-2,symptoms-2,0.7
9,do-nothing,symptoms-2,symptoms-3,0.3


In [39]:
# Sum probabilities for duplicate pairs
subP_agg = (
    subP
      .groupby(["state", "next_state"], as_index=False)["probability"]
      .sum()
)
subP_agg

Unnamed: 0,state,next_state,probability
0,exposed-1,exposed-2,0.8
1,exposed-1,recovered,0.2
2,exposed-2,exposed-3,0.8
3,exposed-2,recovered,0.2
4,exposed-3,recovered,0.2
5,exposed-3,symptoms-1,0.8
6,recovered,recovered,1.0
7,symptoms-1,symptoms-1,0.7
8,symptoms-1,symptoms-2,0.3
9,symptoms-2,symptoms-2,0.7


In [40]:
# For each state, normalize outgoing probabilities
    # If none, default to self-loop
for s_name in states:
    # Get state index
    s_idx = S[s_name]
    # Get outgoing transition rows
    rows = subP_agg[subP_agg["state"] == s_name]

    # Normalize probabilities
    if len(rows) == 0:
        # No outgoing transitions specified: stay in place
        P[0, s_idx, s_idx] = 1.0
        continue

    # Normalize probabilities by dividing by total
    total = rows["probability"].sum()
    if total <= 0:
        # Degenerate row: also make it a self-loop
        P[0, s_idx, s_idx] = 1.0
        continue

    # Fill in normalized probabilities
    for _, row in rows.iterrows():
        sp_name = row["next_state"]
        if sp_name in S:
            P[0, s_idx, S[sp_name]] = float(row["probability"]) / float(total)

In [41]:
P

array([[[0. , 0.8, 0. , 0.2, 0. , 0. , 0. ],
        [0. , 0. , 0.8, 0.2, 0. , 0. , 0. ],
        [0. , 0. , 0. , 0.2, 0.8, 0. , 0. ],
        [0. , 0. , 0. , 1. , 0. , 0. , 0. ],
        [0. , 0. , 0. , 0. , 0.7, 0.3, 0. ],
        [0. , 0. , 0. , 0. , 0. , 0.7, 0.3],
        [0. , 0. , 0. , 0.3, 0. , 0. , 0.7]]])

In [43]:
v_optimal_discomfort = value_iteration_discomfort(R, P, gamma)
print("Optimal value function (expected discomfort) for gamma=1.0:")
print(v_optimal_discomfort)
print()

# Pack as dictionary mapping state to value
state_values_discomfort = {states[i]: v for i, v in enumerate(v_optimal_discomfort)}
print("State values (expected discomfort) for gamma=1.0:")
print(state_values_discomfort)

# Convert directly to a dataframe
do_nothing_discomfort = pd.DataFrame(list(state_values_discomfort.items()), columns=['state', 'expected_discomfort'])
do_nothing_discomfort

Optimal value function (expected discomfort) for gamma=1.0:
[3.41097781 4.26449121 5.33132703 0.         6.66481885 4.99977911
 1.66665378]

State values (expected discomfort) for gamma=1.0:
{'exposed-1': np.float64(3.410977809744379), 'exposed-2': np.float64(4.264491212473884), 'exposed-3': np.float64(5.331327031522923), 'recovered': np.float64(0.0), 'symptoms-1': np.float64(6.664818853984066), 'symptoms-2': np.float64(4.99977911446515), 'symptoms-3': np.float64(1.6666537816771334)}


Unnamed: 0,state,expected_discomfort
0,exposed-1,3.410978
1,exposed-2,4.264491
2,exposed-3,5.331327
3,recovered,0.0
4,symptoms-1,6.664819
5,symptoms-2,4.999779
6,symptoms-3,1.666654


Save the expected discomfort by state to a file "do-nothing-discomfort.tsv" with columns state and expected_discomfort.

In [44]:
# YOUR CHANGES HERE

do_nothing_discomfort.to_csv('do-nothing-discomfort.tsv', sep='\t', index=False)

Submit "do-nothing-discomfort.tsv" in Gradescope.

## Part 2: Compute an Optimal Treatment Plan

Compute an optimal treatment plan for Twizzleflu.
It should minimize the expected discomfort (maximize the rewards).

In [73]:
# one step state-action values from a value estimate.
# will use this a lot!

def compute_qT_once(R, P, gamma, v):
    return R + gamma * P @ v

def iterate_values_once(R, P, gamma, v):
    return np.max(compute_qT_once(R, P, gamma, v), axis=0)

def value_iteration(R, P, gamma, max_iterations=100, tolerance=0.001):
    # initial approximation v_0 for each action and reward
    v_old = np.zeros(R.shape[-1], dtype=float)

    # Iterate for each action
    for a in range(R.shape[0]):
        # Iterate for each iteration
        for i in range(max_iterations):
            # compute v_{i+1} for each action
            v_new = iterate_values_once(R[a:a+1, :], P[a:a+1, :, :], gamma, v_old)
            # check if values did not change much (less than tolerance, default 0.001)
            if np.max(np.abs(v_new - v_old)) < tolerance:
                # If there was no significant change, return v_{i+1}
                return v_new
            # If there was significant change, continue to next iteration
            # Reassign v_{i+1} to v_i
            v_old = v_new

    # return v_{max_iterations}
    return v_old

In [45]:
S, ns, gamma

({'exposed-1': 0,
  'exposed-2': 1,
  'exposed-3': 2,
  'recovered': 3,
  'symptoms-1': 4,
  'symptoms-2': 5,
  'symptoms-3': 6},
 7,
 1.0)

In [46]:
R2 = np.zeros((3, ns), dtype=float)
P2 = np.zeros((3, ns, ns), dtype=float)

In [47]:
R2

array([[0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0.]])

In [48]:
P2

array([[[0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.]],

       [[0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.]],

       [[0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.]]])

Save the optimal actions for each state to a file "minimum-discomfort-actions.tsv" with columns state and action.

In [49]:
r_all = (
    rewards
      .groupby(["action", "state"], as_index=False)["reward"]
      .mean()
)
r_all

Unnamed: 0,action,state,reward
0,do-nothing,exposed-1,0.0
1,do-nothing,exposed-2,0.0
2,do-nothing,exposed-3,0.0
3,do-nothing,recovered,0.0
4,do-nothing,symptoms-1,-0.5
5,do-nothing,symptoms-2,-1.0
6,do-nothing,symptoms-3,-0.5
7,drink-oj,exposed-1,0.0
8,drink-oj,exposed-2,0.0
9,drink-oj,exposed-3,0.0


In [51]:
# Fill R2 for all actions (one reward per state; average duplicates if any)
for _, row in r_all.iterrows():
    a_name = row["action"]
    s_name = row["state"]
    r = row["reward"]

    if a_name == "do-nothing":
        a_idx = 0
    elif a_name == "drink-oj":
        a_idx = 1
    elif a_name == "sleep-8":
        a_idx = 2
    else:
        continue

    if s_name in S:
        R2[a_idx, S[s_name]] = float(r)
R2

array([[ 0.   ,  0.   ,  0.   ,  0.   , -0.5  , -1.   , -0.5  ],
       [ 0.   ,  0.   ,  0.   ,  0.   , -0.375, -0.75 , -0.375],
       [ 0.   ,  0.   ,  0.   ,  0.   , -1.   , -2.   , -1.   ]])

In [57]:
# Fill P2 for all actions
    # For each action, select only relevant transitions
subP2 = transitions
subP2

Unnamed: 0,action,state,next_state,probability
0,do-nothing,exposed-1,exposed-2,0.8
1,do-nothing,exposed-1,recovered,0.2
2,do-nothing,exposed-2,exposed-3,0.8
3,do-nothing,exposed-2,recovered,0.2
4,do-nothing,exposed-3,symptoms-1,0.8
5,do-nothing,exposed-3,recovered,0.2
6,do-nothing,symptoms-1,symptoms-1,0.7
7,do-nothing,symptoms-1,symptoms-2,0.3
8,do-nothing,symptoms-2,symptoms-2,0.7
9,do-nothing,symptoms-2,symptoms-3,0.3


In [58]:
# Sum probabilities for duplicate pairs
subP2_agg = (
    subP2
      .groupby(["action", "state", "next_state"], as_index=False)["probability"]
      .sum()
)
subP2_agg

Unnamed: 0,action,state,next_state,probability
0,do-nothing,exposed-1,exposed-2,0.8
1,do-nothing,exposed-1,recovered,0.2
2,do-nothing,exposed-2,exposed-3,0.8
3,do-nothing,exposed-2,recovered,0.2
4,do-nothing,exposed-3,recovered,0.2
5,do-nothing,exposed-3,symptoms-1,0.8
6,do-nothing,recovered,recovered,1.0
7,do-nothing,symptoms-1,symptoms-1,0.7
8,do-nothing,symptoms-1,symptoms-2,0.3
9,do-nothing,symptoms-2,symptoms-2,0.7


In [59]:
# For each state, normalize outgoing probabilities
    # If none, default to self-loop
for a_name in ["do-nothing", "drink-oj", "sleep-8"]:
    if a_name == "do-nothing":
        a_idx = 0
    elif a_name == "drink-oj":
        a_idx = 1
    elif a_name == "sleep-8":
        a_idx = 2
    else:
        continue

    for s_name in states:
        # Get state index
        s_idx = S[s_name]
        # Get outgoing transition rows
        rows = subP2_agg[(subP2_agg["action"] == a_name) & (subP2_agg["state"] == s_name)]

        # Normalize probabilities
        if len(rows) == 0:
            # No outgoing transitions specified: stay in place
            P2[a_idx, s_idx, s_idx] = 1.0
            continue

        # Normalize probabilities by dividing by total
        total = rows["probability"].sum()
        if total <= 0:
            # Degenerate row: also make it a self-loop
            P2[a_idx, s_idx, s_idx] = 1.0
            continue

        # Fill in normalized probabilities
        for _, row in rows.iterrows():
            sp_name = row["next_state"]
            if sp_name in S:
                P2[a_idx, s_idx, S[sp_name]] = float(row["probability"]) / float(total)

In [60]:
P2

array([[[0.  , 0.8 , 0.  , 0.2 , 0.  , 0.  , 0.  ],
        [0.  , 0.  , 0.8 , 0.2 , 0.  , 0.  , 0.  ],
        [0.  , 0.  , 0.  , 0.2 , 0.8 , 0.  , 0.  ],
        [0.  , 0.  , 0.  , 1.  , 0.  , 0.  , 0.  ],
        [0.  , 0.  , 0.  , 0.  , 0.7 , 0.3 , 0.  ],
        [0.  , 0.  , 0.  , 0.  , 0.  , 0.7 , 0.3 ],
        [0.  , 0.  , 0.  , 0.3 , 0.  , 0.  , 0.7 ]],

       [[0.  , 0.8 , 0.  , 0.2 , 0.  , 0.  , 0.  ],
        [0.  , 0.  , 0.8 , 0.2 , 0.  , 0.  , 0.  ],
        [0.  , 0.  , 0.  , 0.2 , 0.8 , 0.  , 0.  ],
        [0.  , 0.  , 0.  , 1.  , 0.  , 0.  , 0.  ],
        [0.  , 0.  , 0.  , 0.  , 0.75, 0.25, 0.  ],
        [0.  , 0.  , 0.  , 0.  , 0.  , 0.75, 0.25],
        [0.  , 0.  , 0.  , 0.25, 0.  , 0.  , 0.75]],

       [[0.  , 0.5 , 0.  , 0.5 , 0.  , 0.  , 0.  ],
        [0.  , 0.  , 0.5 , 0.5 , 0.  , 0.  , 0.  ],
        [0.  , 0.  , 0.  , 0.5 , 0.5 , 0.  , 0.  ],
        [0.  , 0.  , 0.  , 1.  , 0.  , 0.  , 0.  ],
        [0.  , 0.  , 0.  , 0.  , 0.8 , 0.2 , 0.  ],
        

In [74]:
v_optimal_reward = value_iteration(R2, P2, gamma)
print("Optimal value function (expected reward) for gamma=1.0:")
print(v_optimal_reward)
print()

Optimal value function (expected reward) for gamma=1.0:
[-3.41097781 -4.26449121 -5.33132703  0.         -6.66481885 -4.99977911
 -1.66665378]



In [71]:


# Pack as dictionary mapping action and state to value
state_values_reward = {(a_name, states[i]): v for i, v in enumerate(v_optimal_reward)}

# Unpack state into the action and state columns
state_values_reward = pd.DataFrame(list(state_values_reward.items()), columns=['action_state', 'expected_reward'])
state_values_reward[['action', 'state']] = state_values_reward['action_state'].apply(pd.Series)
state_values_reward = state_values_reward.drop(columns=['action_state'])

print("State values (expected reward) for gamma=1.0:")
print(state_values_reward)

# Reorder into action, state, and expected_reward columns
state_values_reward = state_values_reward[['action', 'state', 'expected_reward']]

# Convert directly to a dataframe
minimum_discomfort_actions = state_values_reward
minimum_discomfort_actions

State values (expected reward) for gamma=1.0:
   expected_reward   action       state
0        -0.749165  sleep-8   exposed-1
1        -1.498689  sleep-8   exposed-2
2        -2.997944  sleep-8   exposed-3
3         0.000000  sleep-8   recovered
4        -5.996779  sleep-8  symptoms-1
5        -4.499580  sleep-8  symptoms-2
6        -1.499973  sleep-8  symptoms-3


Unnamed: 0,action,state,expected_reward
0,sleep-8,exposed-1,-0.749165
1,sleep-8,exposed-2,-1.498689
2,sleep-8,exposed-3,-2.997944
3,sleep-8,recovered,0.0
4,sleep-8,symptoms-1,-5.996779
5,sleep-8,symptoms-2,-4.49958
6,sleep-8,symptoms-3,-1.499973


In [62]:
# YOUR CHANGES HERE

minimum_discomfort_actions.to_csv('minimum-discomfort-actions.tsv', sep='\t', index=False)

Submit "minimum-discomfort-actions.tsv" in Gradescope.

## Part 3: Expected Discomfort

Using your previous optimal policy, compute the expected discomfort for each state.

In [None]:
# YOUR CHANGES HERE

# Similar to Part 2

Save your results in a file "minimum-discomfort-values.tsv" with columns state and expected_discomfort.

In [None]:
# YOUR CHANGES HERE

...

Submit "minimum-discomfort-values.tsv" in Gradescope.

## Part 4: Minimizing Twizzleflu Duration

Modifiy the Markov decision process to minimize the days until the Twizzle flu is over.
To do so, change the reward function to always be -1 if the current state corresponds to being sick and 0 if the current state corresponds to being better.
To be clear, the action does not matter for this reward function.


In [None]:
# YOUR CHANGES HERE

# Modify for the situation
def compute_qT_once(R, P, gamma, v):
    if :
        return -1
    elif :
        return 0
    


Save your new reward function in a file "duration-rewards.tsv" in the same format as "twizzleflu-rewards.tsv".

In [None]:
# YOUR CHANGES HERE

...

Submit "duration-rewards.tsv" in Gradescope.

## Part 5: Optimize for Shorter Twizzleflu

Compute an optimal policy to minimize the duration of Twizzleflu.

In [None]:
# YOUR CHANGES HERE

...

Save the optimal actions for each state to a file "minimum-duration-actions.tsv" with columns state and action.

In [None]:
# YOUR CHANGES HERE

...

Submit "minimum-duration-actions.tsv" in Gradescope.

## Part 6: Shorter Twizzleflu?

Compute the expected number of days sick for each state to a file.

In [None]:
# YOUR CHANGES HERE

...

Save the expected sick days for each state to a file "minimum-duration-days.tsv" with columns state and expected_sick_days.

In [None]:
# YOUR CHANGES HERE

...

Submit "minimum-duration-days.tsv" in Gradescope.

## Part 7: Speed vs Pampering

Compute the expected discomfort using the policy to minimize days sick, and compare the results to the expected discomfort when optimizing to minimize discomfort.

In [None]:
# YOUR CHANGES HERE

...

Save the results to a file "policy-comparison.tsv" with columns state, speed_discomfort, and minimize_discomfort.

In [None]:
# YOUR CHANGES HERE

...

Submit "policy-comparison.tsv" in Gradescope.

## Part 8: Code

Please submit a Jupyter notebook that can reproduce all your calculations and recreate the previously submitted files.

## Part 9: Acknowledgements

If you discussed this assignment with anyone, please acknowledge them here.
If you did this assignment completely on your own, simply write none below.

If you used any libraries not mentioned in this module's content, please list them with a brief explanation what you used them for. If you did not use any other libraries, simply write none below.

If you used any generative AI tools, please add links to your transcripts below, and any other information that you feel is necessary to comply with the generative AI policy. If you did not use any generative AI tools, simply write none below.