# Stochastic vs. Deterministic Dynamic Programming for reservoir operations

## Deterministic Dynamic Programming (DDP)

Consider the three-season reservoir operation problem from [DPexample.ipynb](https://colab.research.google.com/github/EnvSystemsUVA/CodingExamples/blob/main/07_DPexample.ipynb), except now, assume the inflow distributions each season are $Y_1=\ln(Q_1)\sim N(2.3, 0.2)$, $Y_2=\ln(Q_2)\sim N(3.5, 0.3)$, and $Y_3=\ln(Q_3)\sim N(3.1, 0.3)$ in seasons 1, 2 and 3, respectively and $\rho(Y_t,Y_{t+1})=0.5$. Your goal is to minimize the sum across the three seasons of expected squared deviations above and below a constant storage target of 20 and above and below a constant release target of 20.

Let's look at the statistics of these flow distributions.

In [None]:
import numpy as np
import pandas as pd
import scipy.stats as ss
import matplotlib.pyplot as plt

# inflow parameters
muY = np.array([2.3, 3.5, 3.1]) # log-space mean flows
sigmaY = np.array([0.2, 0.3, 0.3]) # log-space standard deviations
meanQ = np.exp(muY + 0.5*sigmaY**2) # real-space mean flows
rho = 0.5 # correlation between consecutive log-space flows

nSeasons = len(muY) # number of seasons
for i in range(nSeasons):
  print("Mean flow, season %d: %0.1f" % ((i+1), meanQ[i]))
  print("1st percentile flow, season %d: %0.1f" % ((i+1), np.exp(ss.norm.ppf(0.01,loc=muY[i],scale=sigmaY[i]))))
  print("Median flow, season %d: %0.1f" % ((i+1), np.exp(ss.norm.ppf(0.5,loc=muY[i],scale=sigmaY[i]))))
  print("99th percentile flow, season %d: %0.1f" % ((i+1), np.exp(ss.norm.ppf(0.99,loc=muY[i],scale=sigmaY[i]))))
  print("\n")

Using deterministic discrete dynamic programming, find the operating policy if you assume you receive the mean inflow every season. Consider 7 discrete storage values: 0, 5, 10, 15, 20, 25, and 30. Assume the releases cannot be less than 10 or greater than 40. Report your results as a table showing the optimal release in each of the three seasons for each of the seven storage levels. Use bounds of [10,40] for releases and [0,30] for storage.

This code is the same as in [DPexample.ipynb](https://colab.research.google.com/github/EnvSystemsUVA/CodingExamples/blob/main/07_DPexample.ipynb), it's just using mean flows from the above log-normal distributions (11.0, 38.5, 25.8) instead of (10,50,30).

In [None]:
# targets
Stargets = np.ones([nSeasons])*20 # target of 20 every period
Rtargets = np.ones([nSeasons])*20 # target of 20 every period

# reservoir parameters
K = 30 # reservoir capacity
Rmax = 50 # maximum possible release
Rmin = 10 # minimum possible release

######################## DDP Optimization ########################

def calcCostDDP(S, Q, Starget, Rtarget, bounds, FutureCost):
  '''
  Function to calculate the optimal release (Rbest) from each storage state (S)
  and associated present and future cost (Cbest), defined as the total squared
  deviation between storage and release targets for that stage.

  Inputs:
    S: 1-D array of discrete storage values representing the states
    Q: inflow received at this stage (scalar)
    Starget: target storage for next stage (scalar)
    Rtarget: release target for this stage (scalar)
    bounds: bounds on possible releases (array of length 2)
    FutureCost: 1-D array of future costs at each state that will be added to the
      cost of the optimal state transition at this stage to compute present + future cost

  Outputs:
    Rbest: 1-D array of optimal releases from each state in S
    Cbest: 1-D array of present + future costs associated with each state S
  '''

  # initialize current cost at infinity and releases at 0
  Cbest = np.empty([len(S)])+np.inf
  Rbest = np.zeros([len(S)])
  for i, s in enumerate(S): # storage at stage t
    # find optimal storage to move to at stage t+1
    for j, sNext in enumerate(S): # storage at stage t+1
      R = s + Q - sNext # release to get to sNext
      # find cost of this release if it's feasible
      if R >= bounds[0] and R <= bounds[1]:
        # compute total cost C (total deviation from targets + future cost at sNext)
        S_deviation = (sNext-Starget)**2
        R_deviation = (R-Rtarget)**2
        C = S_deviation + R_deviation + FutureCost[j]
        # update optimal value (Cbest) and decision (Rbest) if better than current best
        if C < Cbest[i]:
          Cbest[i] = C
          Rbest[i] = R

  return Rbest, Cbest

# get indices of stages
nStages = nSeasons
forward_indices = np.arange(nStages)
backward_indices = forward_indices[::-1] # reverse order of stages for backward-moving DP
backward_indices = np.insert(backward_indices,0,0) # put 0 at beginning to make it cyclic

# discretize states
states = np.arange(0,31,5)
nStates = len(states)

# bounds on decision variables (releases)
bounds = np.array([Rmin, Rmax])

# initialize matrices with costs of each state at each stage
# and optimal releases to make from each state at each stage
DDP_costs = np.empty([nStates,nStages])
DDP_release_policy = np.empty([nStates,nStages])

# initialize FutureCost at 0 for all states; will update as we move backwards
FutureCost = np.zeros([nStates])

# begin backward-moving DDP
loop = True
while loop:
  count = 0
  for index in backward_indices[0:-1]:
    # find optimal release and value of each state in this stage
    # states (storage targets) are for next period (index) while inflows and releases are for this period (index-1)
    R, FutureCost = calcCostDDP(states, meanQ[index-1], Stargets[index], Rtargets[index-1], bounds, FutureCost)

    # count iterations with no change in optimal release
    if np.all(R == DDP_release_policy[:,index-1]):
      count += 1

    # update best releases and value of each state
    DDP_costs[:,index] = FutureCost
    DDP_release_policy[:,index-1] = R

  # stop loop if no change in optimal decisions across all iterations
  if count == len(backward_indices[0:-1]):
    break

DDP_release_policy_df = pd.DataFrame(DDP_release_policy, columns=["Season 1","Season 2","Season 3"],index=states)
DDP_release_policy_df.index.rename("Storage",inplace=True)
DDP_release_policy_df

## Stochastic Dynamic Programming (SDP)

Now explicitly consider uncertainty in the optimization using stochastic dynamic programming.

First, we'll compute the transition probabilities from 10 discrete log-space flow levels $Y_{t-1}$ in season $t-1$ to 10 discrete log-space flow levels $Y_t$ in season $t$. The log-space flows of consecutive seasons follow a bivariate normal distribution:  

$f(Y_{t-1}, Y_t) = \frac{1}{2\pi \sigma_{y_{t-1}} \sigma_{y_t} \sqrt{1-\rho^2}} \exp \Bigg(- \frac{1}{2(1-\rho^2)} \bigg[ \bigg(\frac{y_{t-1}-\mu_{y_{t-1}}}{\sigma_{y_{t-1}}}\bigg)^2 - 2\rho\bigg(\frac{y_{t-1}-\mu_{y_{t-1}}}{\sigma_{y_{t-1}}}\bigg)\bigg(\frac{y_t-\mu_{y_t}}{\sigma_{y_t}}\bigg) + \bigg(\frac{y_t-\mu_{y_t}}{\sigma_{y_t}}\bigg)^2  \bigg] \Bigg)$  

From this, the conditional distribution of $Y_t$ given an observed value of $Y_{t-1}$ is normally distributed with conditional mean $\mu_{y_t|y_{t-1}}$ and conditional standard deviation $\sigma_{y_t|y_{t-1}}$:  

\begin{align}
f(Y_t | Y_{t-1}=y_{t-1}) &= \frac{1}{\sigma_{y_t|y_{t-1}}\sqrt{2\pi}} \exp \bigg( -\frac{(y_t-\mu_{y_t|y_{t-1}})^2}{\sigma_{y_t|y_{t-1}}^2} \bigg)\\
\mu_{y_t|y_{t-1}} &= \mu_{y_t} + \rho\frac{\sigma_{y_t}}{\sigma_{y_{t-1}}}\Big(y_{t-1}-\mu_{y_{t-1}}\Big)\\
\sigma_{y_t|y_{t-1}} &= \sigma_{y_t}\sqrt{(1-\rho^2)}
\end{align}

In [None]:
forward_indices = np.append(forward_indices,0)
print(forward_indices)
for i, index in enumerate(forward_indices[0:-1]):
  print("First season: ", index+1)
  print("Next season: ", forward_indices[i+1]+1)

In [None]:
# find transition probabilities from discrete flow values in one season to the next
def calcTransProb(mu, sigma, rho, nLevels):
  transprob = np.empty([nLevels,nLevels])
  # discrete levels of log-space flows
  Ylevels1 = np.linspace(ss.norm.ppf(0.01,mu[0],sigma[0]),ss.norm.ppf(0.99,mu[0],sigma[0]),nLevels)
  Ylevels2 = np.linspace(ss.norm.ppf(0.01,mu[1],sigma[1]),ss.norm.ppf(0.99,mu[1],sigma[1]),nLevels)
  print("Log-space flow levels, t-1: ", Ylevels1)
  print("Log-space flow levels, t: ", Ylevels2)
  for i in range(nLevels):
    # find conditional distribution of Qlevels2[j] given flow is Qlevels1[i] in previous season
    mu_cond = mu[1] + rho*(sigma[1]/sigma[0])*(Ylevels1[i] - mu[0])
    sigma_cond = sigma[1] * np.sqrt(1-rho**2)
    for j in range(nLevels):
      transprob[i,j] = ss.norm.pdf(Ylevels2[j], mu_cond, sigma_cond)

    #normalize probabilities to sum to 1
    transprob[i,:] = transprob[i,:] / np.sum(transprob[i,:])

  return transprob

# discretize inflows into nLevels and find transition probabilities between them
nLevels=10
transprob = []
for i, index in enumerate(forward_indices[0:-1]):
  print("Season " + str(index+1) + " to Season " + str(forward_indices[i+1]+1))
  mu = np.array([muY[index],muY[forward_indices[i+1]]]) # log-space mean
  sigma = np.array([sigmaY[index],sigmaY[forward_indices[i+1]]]) # log-space standard deviation
  transprob.append(calcTransProb(mu, sigma, rho, nLevels))
  print("Transition probabilities: ", transprob[i])
  print("\n")

Let's plot these transition probabilities to improve the visualization.

In [None]:
import matplotlib as mpl

max_transprob = 0
for i in range(nSeasons):
  if np.max(transprob[i]) > max_transprob:
    max_transprob = np.max(transprob[i])

norm = mpl.colors.Normalize(0,max_transprob)
contour_cmap = mpl.cm.get_cmap('viridis')

fig, ax = plt.subplots(1,3, layout="constrained")

# make a heatmap of the transition probabilities of each stage
for i, index in enumerate(forward_indices[0:-1]):
  mu = np.array([muY[index],muY[forward_indices[i+1]]]) # log-space mean
  sigma = np.array([sigmaY[index],sigmaY[forward_indices[i+1]]]) # log-space standard deviation
  Ylevels1 = np.linspace(ss.norm.ppf(0.01,mu[0],sigma[0]),ss.norm.ppf(0.99,mu[0],sigma[0]),nLevels)
  Ylevels2 = np.linspace(ss.norm.ppf(0.01,mu[1],sigma[1]),ss.norm.ppf(0.99,mu[1],sigma[1]),nLevels)
  x, y = np.meshgrid(Ylevels1, Ylevels2)
  cf = ax.flat[i].contourf(x, y, np.transpose(transprob[i]), norm=norm)
  ax.flat[i].set_xlabel(r"$Y_{t-1}$")
  ax.flat[i].set_title("Season " + str(i+1) + "-" + str(forward_indices[i+1]+1))
  if i == 0:
    ax.flat[i].set_ylabel(r"$Y_t$")

cbar = fig.colorbar(mpl.cm.ScalarMappable(norm=norm, cmap=contour_cmap))
cbar.ax.set_ylabel("Transition Probability")

Next we'll modify the DDP function to calculate the costs of present and *expected* future costs across possible inflows $Y_t$ where the probability of those inflows is determined based on the above conditional distribution. Since those inflow probabilities depend on the flow we received in the previous period, we'll now optimize a policy at each stage for the releases as a function of the current storage $S_t$ and the previous inflow $Y_{t-1}$.

In [None]:
######################## SDP Optimization ########################
from scipy.optimize import minimize

def findBestR(R, s, y1, j, S, Ylevels2, transprob, Starget, Rtarget, FutureExpCost):
  # compute present cost: squared release deviation
  C = (R-Rtarget)**2

  # add expected future cost over possible inflows
  ExpCost = 0
  for m, y2 in enumerate(Ylevels2): # inflow at this stage, t
    prob = transprob[j,m] # probability of getting inflow q2 given q1
    sNext = s + np.exp(y2) - R # storage we would move to with this inflow exp(y2) and release R

    # find closest storage states to sNext and interpolate future cost at those storages
    futureCost = np.interp(sNext,S,FutureExpCost[:,j])

    # compute future cost C, weighted by the probability of this flow transition
    S_deviation = (sNext-Starget)**2
    ExpCost += (S_deviation + futureCost) * prob

  # add present and expected future cost
  C += ExpCost

  return C

def calcCostSDP(S, Ylevels1, Ylevels2, Qguess, transprob, Starget, Rtarget, bounds, FutureExpCost):
  '''
  Function to calculate the optimal release (Rbest) from each storage state (S)
  and associated present and future cost (Cbest), defined as the total squared
  deviation between storage and release targets for that stage.

  Inputs:
    S: 1-D array of discrete storage values representing the states
    Qlevels1: 1-D array of discrete inflow levels in the previous stage
    Qlevels2: 1-D array of discrete inflow levels in this stage
    transprob: 2-D array of transition probabilities from Qlevels1 to Qlevels2
    Starget: target storage for next stage (scalar)
    Rtarget: release target for this stage (scalar)
    bounds: bounds on possible releases (array of length 2)
    FutureExpCost: 2-D array of future expected costs at each combination of S and Qlevels1
      that will be added to the cost of the optimal state transition at this stage
      to compute present + expected future cost

  Outputs:
    Rbest: 2-D array of optimal releases as a function of S and Qlevels1
    Cbest: 2-D array of present + expected future costs associated with each
      combination of S and Qlevels1
  '''

  # initialize current cost at infinity and releases at 0
  Cbest = np.empty([len(S),len(Ylevels1)])+np.inf
  Rbest = np.zeros([len(S),len(Ylevels1)])
  for i, s in enumerate(S): # storage at stage t
    for j, y1 in enumerate(Ylevels1): # inflow at previous stage (t-1)
      # find optimal release at this stage
      # use mean inflow as an initial guess
      # constrain it to not exceed reservoir capacity at mean inflow
      constraints = ({'type': 'ineq', 'fun': lambda x: K - (s + Qguess - x[0])}, # capacity - Snext >= 0
        {'type': 'ineq', 'fun': lambda x: s + Qguess - x[0]}) # Snext >= 0
      result = minimize(findBestR, x0=Qguess, bounds=[[bounds[0],bounds[1]]], constraints=constraints,
                        args=(s, y1, j, S, Ylevels2, transprob, Starget, Rtarget, FutureExpCost))
      R = result.x[0]
      C = result.fun

      # update optimal value (Cbest) and decision (Rbest) if better than current best
      if C < Cbest[i,j]:
        Cbest[i,j] = C
        Rbest[i,j] = R

  return Rbest, Cbest

In [None]:
# use same stages, states, indices and bounds as for DDP
# initialize matrices with costs of each state at each stage
# and optimal releases to make from each state at each stage
SDP_costs = np.empty([nStates, nLevels, nStages])
SDP_release_policy = np.empty([nStates, nLevels, nStages])

# initialize FutureCost at 0 for all states; will update as we move backwards
FutureExpCost = np.zeros([nStates, nLevels])

# begin backward-moving SDP
tolerance = 0.1# average % difference in optimal releases from one cycle to the next at which to stop looping
avgPctDiff = np.inf
cycle = 0 # number of times cycling through all seasons
while loop:
  count = 0
  cycle +=1
  print(cycle)
  for index in backward_indices[0:-1]:
    mu = np.array([muY[index-2],muY[index-1]])
    sigma = np.array([sigmaY[index-2],sigmaY[index-1]])
    Ylevels1 = np.linspace(ss.norm.ppf(0.01,mu[0],sigma[0]),ss.norm.ppf(0.99,mu[0],sigma[0]),nLevels)
    Ylevels2 = np.linspace(ss.norm.ppf(0.01,mu[1],sigma[1]),ss.norm.ppf(0.99,mu[1],sigma[1]),nLevels)

    # find optimal release and value of each state in this stage
    # states (storage targets) are for next period (index) while inflows and releases are for this period (index-1)
    R, FutureExpCost = calcCostSDP(states, Ylevels1, Ylevels2, meanQ[index-1], transprob[index-1], Stargets[index], Rtargets[index-1], bounds, FutureExpCost)

    # find average % difference in optimal releases at this stage compared to the last loop
    if cycle > 1:
      avgPctDiff = np.mean(np.abs(R - SDP_release_policy[:,:,index-1])*100/SDP_release_policy[:,:,index-1])
      if avgPctDiff < tolerance:
        count += 1

    # update best releases and value of each state
    SDP_costs[:,:,index] = FutureExpCost
    SDP_release_policy[:,:,index-1] = R

  # stop loop if average % change in optimal decisions across all iterations < tolerance
  if count == len(backward_indices[0:-1]):
    break

# print the release policy of each stage
for i in range(nSeasons):
  cols = []
  Ylevels1 = np.linspace(ss.norm.ppf(0.01,muY[i-1],sigmaY[i-1]),ss.norm.ppf(0.99,muY[i-1],sigmaY[i-1]),nLevels)
  for j in range(nLevels):
    cols.append("Q = %0.2f" % np.exp(Ylevels1[j]))
  SDP_release_policy_df = pd.DataFrame(SDP_release_policy[:,:,i], columns=cols,index=states)
  SDP_release_policy_df.index.rename("Storage",inplace=True)
  print("Season " + str(i+1) + " release policy", SDP_release_policy_df)

Again, let's see the release policy visually with a heat map.

In [None]:
norm = mpl.colors.Normalize(Rmin, 40)
fig, ax = plt.subplots(1,3, layout="constrained")

# make a heatmap of the release policy each stage
for i, index in enumerate(forward_indices[0:-1]):
  mu = np.array([muY[index],muY[forward_indices[i+1]]]) # log-space mean
  sigma = np.array([sigmaY[index],sigmaY[forward_indices[i+1]]]) # log-space standard deviation
  Ylevels1 = np.linspace(ss.norm.ppf(0.01,mu[0],sigma[0]),ss.norm.ppf(0.99,mu[0],sigma[0]),nLevels)
  x, y = np.meshgrid(states, Ylevels1)
  cf = ax.flat[i].contourf(x, y, np.transpose(SDP_release_policy[:,:,i]), norm=norm)
  ax.flat[i].set_xlabel(r"$S_t$")
  ax.flat[i].set_title("Season " + str(i+1) + "-" + str(forward_indices[i+1]+1))
  if i == 0:
    ax.flat[i].set_ylabel(r"$Y_{t-1}$")

cbar = fig.colorbar(mpl.cm.ScalarMappable(norm=norm, cmap=contour_cmap))
cbar.ax.set_ylabel("Release")
fig.suptitle("SDP Release Policy")

## Simulation

Simulate 50 years of operations in which the release each season is determined by the DDP operating policy found to be optimal in part (a) vs. the SDP operating policy found to be optimal in part (b). In both cases, if there is insufficient water to meet the release prescribed by the policy, only release as much water as is available. Likewise, if the prescribed release would result in exceeding the reservoir capacity, release as much as needed to prevent that. This may result in the release falling below 10 or exceeding 40, but that's okay for the purpose of this simulation.

This code is the same as in [DPexample.ipynb](https://colab.research.google.com/github/EnvSystemsUVA/CodingExamples/blob/main/07_DPexample.ipynb).

In [None]:
# initialize storages and releases for simulation of 50 years of 3 seasons with NLP and DP policies
nYears = 50

class Solution():
  def __init__(self):
    '''initialize Solution class with certain attributes for DP vs. NLP solution'''
    self.simS = np.zeros([nYears,nSeasons])
    self.simR = np.zeros([nYears,nSeasons])
    self.S_costs = np.zeros([nYears])
    self.R_costs = np.zeros([nYears])
    self.Total_costs = np.zeros([nYears])
    self.prescribedR = None
    self.Rmin_violations = 0
    self.Rmax_violations = 0

  def getSimStates(self, Q, year, season):
    '''method of Solution class to calculate simulated R and S'''
    # adjust prescribed release if not physically possible
    # R = min(prescribedR, simS + Q) prevents it from releasing more water than is available
    # max(simS + Q - K, R) prevents storage capacity from being exceeded
    self.simR[year,season] = max(self.simS[year,season] + Q - K,
                           min(self.prescribedR, self.simS[year,season] + Q))

    # count the number of violations of Rmin or Rmax
    if self.simR[year,season] > Rmax:
      self.Rmax_violations += 1
    elif self.simR[year, season] < Rmin:
      self.Rmin_violations += 1

    # calculate new storage
    if season != (nSeasons-1): # storage in next season of same year
      self.simS[year,season+1] = self.simS[year,season] + Q - self.simR[year,season]
    elif year != (nYears-1): # storage in season 1 of next year
      self.simS[year+1,0] = self.simS[year,season] + Q - self.simR[year,season]

  def getSimCosts(self, year):
    '''method of Solution class to calculate cost (total deviation from targets) over simulation'''
    self.S_costs[year] = np.sum((self.simS[year,:] - Stargets)**2)
    self.R_costs[year] = np.sum((self.simR[year,:] - Rtargets)**2)
    self.Total_costs[year] = self.S_costs[year] + self.R_costs[year]

In [None]:
from scipy.interpolate import RegularGridInterpolator as interp2d

# create objects of Solution class for DDP and SDP solutions
DDP = Solution()
SDP = Solution()

# start at target storage
DDP.simS[0,0] = Stargets[0]
SDP.simS[0,0] = Stargets[0]

# vector of standard normal random variables for flow simulation
Z = np.zeros([nYears*nSeasons+1])

# generate prior season's random normal inflow
seed = 0
np.random.seed(seed)
Z[seed] = ss.norm.rvs(0,1,1)[0]
Qpast = np.exp(Z[seed]*sigmaY[-1] + muY[-1])

# simulate operations over 50 years of 3 seasons
for year in range(nYears):
  for season in range(nSeasons):
    # generate this season's inflow (set a seed to make it reproducible)
    seed += 1
    np.random.seed(seed)
    # generate flow conditional on previous season's flow and transform to real-space
    Z[seed] = rho*(Z[seed-1]) + ss.norm.rvs(0,1,1)[0]*np.sqrt(1-rho**2)
    Qnow = np.exp(Z[seed]*sigmaY[season] + muY[season])

    # find DDP releases from its policy: interpolate release between nearest storages
    DDP.prescribedR = np.interp(DDP.simS[year,season],states,DDP_release_policy[:,season])

    # find SDP releases from its policy: interpolate release between nearest storages and flows
    Ylevels1 = np.linspace(ss.norm.ppf(0.01,muY[season-1],sigmaY[season-1]),ss.norm.ppf(0.99,muY[season-1],sigmaY[season-1]),nLevels)
    f = interp2d((states, Ylevels1), SDP_release_policy[:,:,season])
    if np.log(Qpast) <= Ylevels1[0]: # interpolate between storages at lowest flow level
      SDP.prescribedR = np.interp(SDP.simS[year,season],states,SDP_release_policy[:,0,season])
    elif np.log(Qpast) >= Ylevels1[-1]: # interpolate between storages at highest flow level
      SDP.prescribedR = np.interp(SDP.simS[year,season],states,SDP_release_policy[:,-1,season])
    else: # interpolate over 2-D grid
      SDP.prescribedR = f(np.array([SDP.simS[year,season],np.log(Qpast)]))[0]

    # find actual release (what is physically possible) and calculate storage from mass balance
    DDP.getSimStates(Qnow, year, season)
    SDP.getSimStates(Qnow, year, season)

    # update past flow for next season's calculation
    Qpast = Qnow

  # calculate total cost (squared deviations from targets) in simulated year
  DDP.getSimCosts(year)
  SDP.getSimCosts(year)

## DDP vs. SDP Comparison over Simulation

Based on your simulation from part (c), make a 3x1 panel figure of the empirical cumulative distribution function of total squared deviations from the storage target in one panel, from the release target in another panel, and the total across both in the third panel. Do this for the policies from parts (a) and (b) using a different color for each. Discuss the differences you see in performance between the operating policies found using DDP vs. SDP and why.

In [None]:
# cumulative probabilities
p = np.arange(1,nYears+1,1) / (nYears+1)

# plot ECDF of R, S and total deviations for each policy
fig = plt.figure(figsize=[12,4])

# make list of things to loop through for each plot
DP_costs = [DDP.S_costs, DDP.R_costs, DDP.Total_costs]
SDP_costs = [SDP.S_costs, SDP.R_costs, SDP.Total_costs]
xlabels = ["Squared Storage Deviations", "Squared Release Deviations", "Total Squared Deviations"]

for i in range(len(DP_costs)):
  ax = fig.add_subplot(1,3,i+1)
  # step-function of sorted deviations from target for each algorithm
  l1, = ax.step(np.sort(DP_costs[i]),p,color="tab:blue",linewidth=2)
  l2, = ax.step(np.sort(SDP_costs[i]),p,color="tab:green",linewidth=2)
  ax.set_xlabel(xlabels[i], fontsize=16)
  ax.set_ylabel("Cumulative Probability", fontsize=16)
  ax.tick_params(axis="both",labelsize=14)

ax.legend([l1, l2],["DDP","SDP"],fontsize=16,loc="lower right")
fig.tight_layout()
fig.show()

SDP far outperforms DDP on squared storage deviations, but does slightly worse on squared release deviations. Overall, though, it has lower total squared deviations.

Plot the number of violations of Rmin and Rmax from DDP and SDP over the course of the simulation.

In [None]:
violations = ("Rmin", "Rmax")
solutions = {
    'DDP': (DDP.Rmin_violations, DDP.Rmax_violations),
    'SDP': (SDP.Rmin_violations, SDP.Rmax_violations),
}

x = np.arange(len(violations))  # the label locations
width = 0.25  # the width of the bars
multiplier = 0

fig, ax = plt.subplots(layout='constrained')

for attribute, measurement in solutions.items():
    offset = width * multiplier
    rects = ax.bar(x + offset, measurement, width, label=attribute)
    multiplier += 1

# Add some text for labels, title and custom x-axis tick labels, etc.
ax.set_ylabel('Number of Violations')
ax.set_title('Release Violations')
ax.set_xticks(x + width, violations)
ax.legend(loc='upper left')

DDP has one violation of Rmin and Rmax, while SDP has no violations of Rmin and one of Rmax.