Dynamic Programming
=======

# Theory
**Dynamic Programming (DP)** 

## Definition


### Algorithm


## Examples
### Production Planning Problem
Find optimal production schedule for filling an order at the specified delivery date at minimum cost
$$
\begin{aligned}
&  \max_{x, \theta_r, \theta_h} E \int_0^{\infty} \left( e^{(\xi - \delta)t} \frac{x^{1 - \gamma}}{1 - \gamma} \right) \, dt \\[5pt]
&\text{subject to} \qquad \begin{aligned}
    dW &= \left( a_r \theta_r + a_h \theta_h + \frac{\mu_m}{S_m} + \mu_m \lambda_m \frac{c_m}{p} \right) \\
    &\times W \, dt - x(t) \, dt + \left( \frac{\sigma_r}{S_r} + \sigma_r \lambda_r \frac{c_r}{p} \right) \\
    &\times \theta_r W \, dz_r + \left( \frac{\sigma_h}{S_h} + \sigma_h \lambda_h \frac{c_h}{p} \right) \\
    &\times \theta_h W \, dz_h
\end{aligned}
, \quad W(0)=W_0
\end{aligned}
$$





where $u(t)$ is the production rate and x(t) is the current inventory at hand. 
This an optimal control problem with $u$ as control variable  and $x$ as state variable
$$\begin{aligned}\underline{\text{Solution}}&\end{aligned}$$
Using the Hamilton-Jacobi-Bellman equation, derived using principle of optimality
```
Placeholder
```
<!-- $$
\overbrace{-J_{t}(t,x,u)}^{\text{Value Function}}=\min_{u}\; [\;\overbrace{f( t,x,u)}^{(c_1 u^2 + c_2x)} + \overbrace{J_{x}( t,x)}^{\text{Value Function}} \overbrace{g( t,x,u)}^{x^\prime}\;] 
$$
$$
\begin{equation}
\Rightarrow -J_{t} = \min_{u}\; [\; (c_1 u^2 + c_2x) + J_{x}u\;] 
\end{equation}
$$ -->

find min $\text{w.r.t }$ $u$
```
Placeholder
```
<!-- $$
\frac{\partial}{\partial u} [\; (c_1 u^2 + c_2x) + J_{x}u\;] = 0 \qquad \Rightarrow  u = \frac{-J_x}{2c_1}
$$ -->

substitute $u$ back to Hamilton-Jacobi-Bellman equation
```
Placeholder
```
<!-- $$
\begin{aligned}
-J_{t} &= \dfrac{1}{4c_1}J_{x}^{2}+c_{2}x  -\dfrac{J_x^{2}}{2c_{1}}\\[10pt]
\therefore\qquad 0 &= J_{t}+c_2x-\dfrac{J_x^{2}}{4c_1}
\end{aligned}
$$ -->
The differential equation of optimal control for $\lambda(t)$
```
Placeholder
```
<!-- This is pde is analytically solved using the following anzats $J(t,x) = a + bxt + hx^2/t + kt^3$, hence computing derivatives
$$
J_{t} = bx - hx^2/t^2 + 3kt^2,  \qquad\qquad\qquad J_{x} = bt + 2hx/t
$$
substitute $J_t \;\&\; J_x$ back 
$$
\begin{aligned}
0 &= bx - hx^2/t^2 + 3kt^2 + c_2x - \dfrac{(bt + 2hx/t)^{2}}{4c_1}\\[10pt]
\end{aligned}
$$ -->

# Implementation


## Imports

In [1]:
using OptimalControl
using NLPModelsIpopt
using Plots

ocp = @def begin

    t ∈ [0, 1], time
    x ∈ R, state
    u ∈ R, control

    x(0) == 5

    ẋ(t) == -x(t)^3 + u(t)

    ∫( -0.5(x(t)^2 + u(t)^2) ) → max

end;

sol = solve(ocp)

plot(sol)

SyntaxError: invalid character '∈' (U+2208) (1047582103.py, line 7)

In [None]:
using OptimalControl
using NLPModelsIpopt
using Plots

ocp = @def begin

    t ∈ [0, 10], time
    x ∈ R, state
    u ∈ R, control

    x(0) == 0

    ẋ(t) == u(t)

    ∫( 13*u(t)^2 + 19*x(t) ) → min

end;

sol = solve(ocp)

plot(sol)

## Parameters

## Algorithm


## Results Visualization & Behaver Analysis


In [None]:
using DifferentialEquations, Plots
μₕ = 0.26
σₕ = 0.14
S₀ = 0
f(S, p, t) = μₕ
g(S, p, t) = σₕ
dt = 1 / 12
tspan = (0.0, 12)
prob = SDEProblem(f, g, S₀, tspan)

sol = solve(prob, EM(), dt = dt)

plot(sol)

In [None]:
# First, ensure you have MarkovBounds.jl installed
using MarkovBounds
using Distributions, Random

# Parameters
γ = 3.5  # risk aversion parameter
δ = 0.03  # discount rate
ξ = 0.02  # population growth rate
p = 2.47  # water price ($/kL)

# Water sources: reservoirs (r), stormwater (h), manufactured (m)
mean_inflows = Dict(:r => 584.0, :h => 0.26, :m => 0.0)  # in GL
stddev_inflows = Dict(:r => 197.0, :h => 0.14, :m => 0.0)
costs = Dict(:r => 0.0, :h => 0.29, :m => 1.08)  # operating costs per kL
fixed_costs = Dict(:r => 1166.0, :h => 0.39, :m => 656.0)  # annual fixed capital cost in $M

# State and Control Variables
# Define initial stocks and capacities
initial_stock = Dict(:r => 387.0, :h => 0.09, :m => 150.0)  # in GL
capacity = Dict(:r => 1290.0, :h => 0.22, :m => 150.0)

# Utility Function: Iso-elastic (power) utility
function utility(x; γ=γ)
    return x^(1 - γ) / (1 - γ)
end

# Transition Dynamics (using a simple Euler-Maruyama scheme)
function water_dynamics(stock, inflow, σ, dt=1.0)
    return stock + inflow * dt + σ * sqrt(dt) * randn()
end

# Discounted reward for each period (expected utility of water consumption)
function reward(x)
    return utility(x)
end

# Define State Transition
function transition(state, action, noise)
    new_state = Dict()
    for src in [:r, :h, :m]
        inflow = mean_inflows[src]
        σ = stddev_inflows[src]
        new_state[src] = water_dynamics(state[src], inflow, σ) - action[src]
        new_state[src] = max(0, min(capacity[src], new_state[src]))  # enforce capacity limits
    end
    return new_state
end

# Set up the Markov Decision Process (MDP)
states = [
    Dict(:r => s_r, :h => s_h, :m => s_m)
    for s_r in 0:100:capacity[:r]
    for s_h in 0:0.05:capacity[:h]
    for s_m in 0:100:capacity[:m]
]

actions = [
    Dict(:r => a_r, :h => a_h, :m => a_m)
    for a_r in 0:10:100
    for a_h in 0:0.01:0.1
    for a_m in 0:10:50
]

# Transition probabilities (based on stochastic inflows)
mdp = MDP(states, actions, discount=exp(-δ))
for (s, a) in IterTools.product(states, actions)
    next_s = transition(s, a, randn())
    add_transition!(mdp, s, a, next_s, reward(a))
end

# Solve using policy iteration
policy, value = policy_iteration(mdp)

# Display the optimal policy and value function
println("Optimal Policy:")
for s in states
    println("State $s => Action $(policy[s])")
end

println("\nValue Function:")
for s in states
    println("State $s => Value $(value[s])")
end

In [None]:
using MarkovBounds
using DynamicPolynomials
using SemialgebraicSets
using 

@polyvar(x[1:2]) # state variables
@polyvar(u) # control variables
@polyvar(t) # time variable
X = @set(x[1] >= 0 && x[2] >= 0) # state space

γ = [1, 2, 1, 2, 0.25*0.1] # model parameters

f = [γ[1] * x[1] - γ[2] * x[1] * x[2] ;
     γ[4] * x[1] * x[2] - γ[3] * x[2] - x[2]*u] # drift coefficient

g = [γ[5]*x[1]; 0] # diffusion coefficient
σ = polynomial.(g*g') # diffusion matrix

lv = DiffusionProcess(x, f, σ, X, time = t, controls = [u])


U = @set(u >= 0 && u <= 1) # set of admissible controls|
stagecost = (x[1]-0.75)^2 + (x[2] - 0.5)^2/10 + (u - 0.5)^2/10
obj = Lagrange(stagecost) # Lagrange type objective
T = 10.0 # control horizon
lv_control = ControlProcess(lv, T, U, obj)


LoadError: MethodError: no method matching DiffusionProcess(::Vector{Variable{DynamicPolynomials.Commutative{DynamicPolynomials.CreationOrder}, Graded{LexOrder}}}, ::Vector{Polynomial{DynamicPolynomials.Commutative{DynamicPolynomials.CreationOrder}, Graded{LexOrder}, Float64}}, ::Matrix{Polynomial{DynamicPolynomials.Commutative{DynamicPolynomials.CreationOrder}, Graded{LexOrder}, Float64}}, ::BasicSemialgebraicSet{Int64, Polynomial{DynamicPolynomials.Commutative{DynamicPolynomials.CreationOrder}, Graded{LexOrder}, Int64}, FullSpace}; time::Variable{DynamicPolynomials.Commutative{DynamicPolynomials.CreationOrder}, Graded{LexOrder}}, controls::Vector{Variable{DynamicPolynomials.Commutative{DynamicPolynomials.CreationOrder}, Graded{LexOrder}}})
This error has been manually thrown, explicitly, so the method may exist but be intentionally marked as unimplemented.

[0mClosest candidates are:
[0m  DiffusionProcess(::Vector{<:Variable}, ::Vector{<:AbstractPolynomialLike}, ::Matrix{<:AbstractPolynomialLike}, ::Any; iv, controls, poly_vars)[91m got unsupported keyword argument "time"[39m
[0m[90m   @[39m [35mMarkovBounds[39m [90m~/.julia/packages/MarkovBounds/BKlhS/src/[39m[90m[4mprocesses.jl:135[24m[39m
[0m  DiffusionProcess(::__T_x, ::__T_f, ::__T_σ, ::__T_X, [91m::__T_iv[39m, [91m::__T_controls[39m, [91m::__T_poly_vars[39m) where {__T_x, __T_f, __T_σ, __T_X, __T_iv, __T_controls, __T_poly_vars}[91m got unsupported keyword arguments "time", "controls"[39m
[0m[90m   @[39m [35mMarkovBounds[39m [90m~/.julia/packages/ConcreteStructs/7Lv7u/src/[39m[90m[4mConcreteStructs.jl:142[24m[39m
[0m  DiffusionProcess(::Vector{<:Variable}, ::Vector{<:AbstractPolynomialLike}, ::Matrix{<:AbstractPolynomialLike}; ...)
[0m[90m   @[39m [35mMarkovBounds[39m [90m~/.julia/packages/MarkovBounds/BKlhS/src/[39m[90m[4mprocesses.jl:135[24m[39m
[0m  ...


LoadError: MethodError: no method matching polynomial(::Int64)
The function `polynomial` exists, but no method is defined for this combination of argument types.

[0mClosest candidates are:
[0m  polynomial([91m::Vector[39m, [91m::AbstractPolynomialBasis[39m)
[0m[90m   @[39m [36mMultivariateBases[39m [90m~/.julia/packages/MultivariateBases/CmLpc/src/[39m[90m[4minterface.jl:50[24m[39m
[0m  polynomial([91m::PolyJuMP.SAGE.Decomposition[39m)
[0m[90m   @[39m [32mPolyJuMP[39m [90m~/.julia/packages/PolyJuMP/aU72s/src/SAGE/[39m[90m[4mSAGE.jl:115[24m[39m
[0m  polynomial([91m::SOSDecomposition[39m)
[0m[90m   @[39m [33mSumOfSquares[39m [90m~/.julia/packages/SumOfSquares/xNLIR/src/[39m[90m[4msosdec.jl:119[24m[39m
[0m  ...
