# Learning the Best Diabetes Medication

## Narrative

When people find they have high blood sugar, they are evaluated using metric **A1C**.

**Four Major Groups of Drugs**:

+ Sensitizers : These target liver, muscle and fat cells but may cause fluid retention and therefore should not be used for patients with history of kidney failure.

+ Secretagoges: These drugs increase insulin sensitivity , but often cause hypoglycemia and weight gain

+ Alpha -glucosidase: These slow the rate of starch metabolism , but can cause digestive problems.

+ Peptide analogs : These mimic natural hormones in the body that stimulate insulin production 

<img src="photos/potential_each_drug_reduction.png" width=800 height=600 />

After testing a drug on a patient for a period of time, we observe the reduction in the A1C level, and then use this observation to update our estimate of how well the drug works on the patient.

From observing the performance of each drug over many (that is, millions) of patients , it is possible to construct probality distribution of the reduction in A1C levels across all patients. The results is shown in table 4.1.

<img src="photos/table_sugar_reduction.png" width=800 height=600 />

## Basic Model

For our basic model, we are going to assume that we have five choises of medication:
**matformin**, or a drug drawn from one of the **four major drug groups**.

$$
\chi = \{ x_1, x_2, x_3, x_4, x_5 \}
$$

To create a model, let: 

$\overline{\mu}^{0}_{x}$ = The mean reduction in the A1C for drug choice $x$ across the population, <br>
$\overline{\sigma}^{0}_{x}$ = The standard deviation in the reduction in A1C for drug $x$

We do not know the reduction we can expect from each drug, so we represent it as a random variable $\mu_x$ where we assume $\mu_x$ isnormally distributed, which we write as:

$$
\mu_x \sim N(\overline{\mu}_x^0, \overline{\sigma}_x^0)
$$

We refer to the normal distribution $ N(\overline{\mu}_x^0, \overline{\sigma}_x^0)$ as the *prior distribution of belief* about $\mu_x$ . If we try a drug $x$ on a patinet, we make a noisy observation of truth value $\mu_x$. Assume we make a choice of drug $x^n$ using what we know after $n$ trials, after which we observe the outcome of the $n+1$st trial, which we denote $W^n+1$ (this is the reduction in the A1C level). This can be written: 

$$
W^{n+1} = \mu_x + \epsilon^{n+1}
$$

Remember that we do not know $\mu_x$ ; this is a random variable, where $\overline{\mu}_x^n$ is our current estimate of the mean $\mu_x$   

### State variable:

Our state variable is our belief about the random variable $\mu_x$ which is the true effect of each drug on a particular patient after $n$ trials . $S^0$ is the initial state, which we write as: 

$$
S^0 = (\overline{\mu}_x^0, \overline{\sigma}_x^0) \\
x \in \chi
$$

After $n$ experiments, the state is: 

$$
S^n = (\overline{\mu}_x^n, \overline{\sigma}_x^n) \\
x \in \chi
$$

Later, we are going to find it useful to work with the *precision* of our belief, which is given by: 

$$
\beta^n_x = \frac{1}{(\overline{\sigma}_x^n)^2} 
$$

We can then write our state variable as:

$$
S^n = (\overline{\mu}_x^n, \beta_x^n) \\
x \in \chi 
$$

### Decision Variable

The decision is the choice of medication to try for a month, which we write as:

$$
x^n = \text{The choice of medication}, \\
\in \chi = \{x_1,..., x_M\}
$$

We are going to determine $x^n$ using a policy $X^{\pi}(S^n)$ that depends only on the state variable $S^n$ . Along with the assumption of the normal distribution in $S^0$.

### Exogenous information

After making the decision $x^n$, we observe: 

$$
W^{n+1}_x = \text{The reduction in the A1C level resulting from the drug $x=x^n$ }\\
\text{we prescribed for the $n+1$st trial}
$$

### Transition Function

The transition function captures how the observed reduction in A1C, $W_x^{n+1}$, affects our belief state $S^n$. If we try drug $x=x^n$ and observe $W_x^{n+1}$, we can update our estimate of the mean and precision using:

$$
\overline{\mu}_x^{n+1} = \frac{\beta^n_x\overline{\mu}_x^n + \beta^W_x W_x^{n+1}}{\beta_x^n + \beta_x^W} \\
\beta_x^{n+1} = \beta_x^n + \beta^W
$$

The transition function which we earlier wrote as a generic function:

$$
S^{n+1} = S^M(S^n, x^n, W^{n+1})
$$

### Objective Function

Each time we prescribe a drug $x=x^n$, we observe the reduction in the A1C represented by $W_{x^n}^{n+1}$. We want to find a policy that chooses a drug $x^n = X^{\pi}(S^n)$ that maximizes the expected total reduction in A1C.

**Perfromance Metric**:

$$
C(S^n, x^n, W^{n+1}) = W^{n+1}_{x^n}
$$

**Finding the Best Policy**

$$
max \ \ E \{\sum_{n=0}^{N-1}W_{x^n}^{n+1} | S_0 \}
$$