<a href="https://colab.research.google.com/github/arashfahim/Stochastic_Control_FSU/blob/main/Deep_BSDE.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [2]:
!git clone https://github.com/arashfahim/Stochastic_Control_FSU

Cloning into 'Stochastic_Control_FSU'...
remote: Enumerating objects: 111, done.[K
remote: Counting objects: 100% (14/14), done.[K
remote: Compressing objects: 100% (12/12), done.[K
remote: Total 111 (delta 3), reused 2 (delta 2), pack-reused 97[K
Receiving objects: 100% (111/111), 2.84 MiB | 6.56 MiB/s, done.
Resolving deltas: 100% (52/52), done.


In [3]:
! ls -l

total 8
drwxr-xr-x 1 root root 4096 Dec  4 14:27 sample_data
drwxr-xr-x 4 root root 4096 Dec  7 14:40 Stochastic_Control_FSU


In [4]:
path = r'/content/Stochastic_Control_FSU/arashfahim/Stochastic_Control_FSU/'

In [5]:
# mount google drive
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


# Sotchastic control and BSDEs

The goal of this notebook is to implement a BSDE dynamic programming principle, DPP, for a stochastic control problem.

The stochastic control problem is given by

$\inf_{u}\mathbb{E}\bigg[\int_{0}^{T}e^{-rt}C(t,X^u_t,u_t)dt+e^{-rT}g(X^u_T)\bigg]$

with

$dX_t=\mu(t,X^u_t,u_t)dt+\sigma(t,X^u_t,u_t)dB_t$



We assume that $\sigma$ does not depend on $u$. When $\sigma$ depends on $u$, the BSDE is second order and more involve.

$dX^u_t=\mu(t,X^u_t,u_t)dt+\sigma(t,X^u_t)dB_t$

The HJB is given by

$\begin{cases}
\partial_t V(t,x) + \frac{\sigma^2(t,x)}{2}\partial^2_x V(t,x) -rV(t,x) + F(t,x,\partial_x V(t,x))=0\\
V(T,x)=g(x)
\end{cases}$

where

$F(t,x,p)=\inf_u\Big\{C(t,x,u)+p\mu(t,x,u)\Big\}$


The BSDE is given by


$Y_t=Y_T+\int_t^T e^{-r(s-t)}\Big(F(s,X_s,Z_s)dt-Z_s\sigma(s,X_s)dB_s\Big)$

Or

$\begin{cases}
dY_t=-F(t,X_t,Z_t)dt+Z_t\sigma(t,X_t)dB_t\\
Y_T=g(X_T)
\end{cases}$

where $dX_t=\sigma(t,X_t)dB_t$ is free of control.


$Y_t$ and $Z_t$ are related to the solution to the HJB by $Y_t=V(t,X_t)$ and $Z_t=\partial_x V(t,X_t)$.


Note that here both $Y$ and $Z$ are unknowns. Particularly, $Y_0=V(0,X_0)$ is not known.

# Numerical approximation of PDEs with BSDEs

We discretize the BSDE:


$\begin{cases}
Y_{t_{i+1}}= Y_{t_i}-F({t_i},X_{t_i},Z_{t_i})dt+Z_{t_i}\sigma({t_i},X_{t_i})\Delta B_{t_{t+1}}\\
Y_T=g(X_T)
\end{cases}$

where $X_{t_{i+1}}=X_{t_i}+\sigma(t_i,X_{t_i})\Delta B_{t_{t+1}}$

If $\hat{Y}$ is an approximate solution to the BSDE through the above method, $\hat{Y}_{t_i}$ is an approximation of $V(t_i,X_{t_i})$. Specifically, if we start from  $\hat{Y}_{0}$ and run the discretize BSDE forward, instead of backward, we must obtain $g(X_T)$ at time $T$.

This is a basis for the deep numerical solution for the BSDE. Let's introduce two neural networks ${\hat Y}_{t}=\Phi(t,x;\alpha)$ and ${\hat Z}_{t}=\Psi(t,x;\beta)$.


Then, simulate $X_0$ by some distribution and increment of Brownina motion to find sample paths through ${\hat X}_{t_{i+1}}={\hat X}_{t_i}+\sigma(t_i,{\hat X}_{t_i})\Delta B_{t_{t+1}}$.

Then, run the BSDE discretization forward in time:


$\Phi(t_{i+1},{\hat X}_{t_{i+1}};\alpha)= \Phi(t_i,{\hat X}_{t_{i}};\alpha)-F\big({t_i},{\hat X}_{t_i},\Psi({t_i},{\hat X}_{t_i};\beta)\big)\Delta t+\Psi(t_{i},{\hat X}_{t_i};\beta)\sigma({t_i},{\hat X}_{t_i})\Delta B_{t_{t+1}}$


Finally, find optimal $\alpha$ and $\beta$ by

$\inf_{\alpha,\beta}\mathbb{E}\Big[\Big(g({\hat X}_T)-\Phi_T({\hat X}_{T};\alpha)\Big)\Big]$

In [6]:
import torch
import torch.nn as nn
import pandas as pd
import numpy as np
import time
from time import strftime, localtime
import torch.optim as optim #import optimizer
# import torch.optim.lr_scheduler as lr_scheduler
import matplotlib
import matplotlib.pyplot as plt
from matplotlib.ticker import MaxNLocator
from matplotlib import cm
from mpl_toolkits.mplot3d import Axes3D

M =10000# number of samples
X = torch.normal(0., 1., size=(M,1))# samples for X~N(0,1)
Y = torch.exp(-X)+torch.exp(X)*torch.normal(0., 1., size=(M,1))# samples for Y=e^{-x}+e^{x}N(0,1)
# X.shape
T=1# terminal horizon
N = 10 # of time steps
Dt= torch.Tensor([T/N])# time step size

In [7]:
# create time stamp to save the result
stamp = strftime('%Y-%m-%d %H:%M:%S', localtime())
print(str(stamp))

2023-12-07 14:41:47


## All at once

In [8]:
class Z_NN(nn.Module):#multi_step, optimal_control
    def __init__(self):
        super(Z_NN, self).__init__()
        self.layer = torch.nn.Sequential()
        self.layer.add_module("L1",torch.nn.Linear(2, 16))
        self.layer.add_module("Tanh", torch.nn.Tanh())
        self.layer.add_module("L2",torch.nn.Linear(16,1))
    def forward(self, tx):
        val = self.layer(tx)
        return val

In [9]:
class Y_NN(nn.Module):#multi_step, optimal_control
    def __init__(self):
        super(Y_NN, self).__init__()
        self.layer = torch.nn.Sequential()
        self.layer.add_module("L1",torch.nn.Linear(1, 16))
        self.layer.add_module("Tanh", torch.nn.Tanh())
        self.layer.add_module("L2",torch.nn.Linear(16,1))
    def forward(self, tx):
        val = self.layer(tx)
        return val

In [148]:
class BSDE(classmethod):#multi_step, optimal_control
    def __init__(self,terminal):
        self.Y_NN = Y_NN()
        self.Z_NN = Z_NN()
        self.M = 1000# number of samples
        self.Te=1.0 # terminal time
        self.K = 20 # total number of steps
        self.Dt= torch.Tensor([self.Te/self.K]) #time step
        self.num_steps = 5 #number of steps to evaluate <= self.K
        # self.N = torch.normal(0., 1., size=[self.M,self.num_steps,1])
        self.terminal = terminal # terminal value function
        self.X0 = torch.normal(0., 1., size=(self.M,1))
        for i in range(self.K):# create time data points
            if i == 0:
                T = torch.Tensor([0.0]).repeat([self.M,1])
            else:
                D = i*self.Dt# time step i
                T = torch.cat((T,D.repeat([self.M,1])),axis=1)
        self.T = T.unsqueeze(2)
        self.B = torch.normal(0., 1., size=(self.M,self.K,1))
        for i in range(self.K-self.num_steps,self.K):
                if i == self.K-self.num_steps:
                    update = torch.cat((self.T[:,i],self.X0),axis=1)# cat time and space
                    self.TX = update.unsqueeze(-1) # add dimension fot bmm
                else:
                    X = update[:,1].unsqueeze(-1)# slice space for update
                    X = X +  torch.sqrt(self.Dt)*self.B[:,i-1] #update, += is prohibited here because it fundamentally change
                    update = torch.cat((self.T[:,i],X),axis=1) # cat  time and space
                    self.TX = torch.cat((self.TX,update.unsqueeze(-1)),axis=-1) # cat to previous data
        X = update[:,1].unsqueeze(-1) #slice last space
        X = X +  torch.sqrt(self.Dt)*self.B[:,-1]# last update, += is prohibited here
        update = torch.cat((torch.Tensor([self.Te]).repeat([self.M,1]),X),axis=1) # last cat
        self.TX = torch.cat((self.TX,update.unsqueeze(-1)),axis=-1) #last cat to previous data

    def F(self,tx):#take a [N,1] tensor
        # tx = torch.cat((t,x),axis=1)
        return torch.Tensor([0.5])*torch.pow(self.Z_NN.forward(tx),2)-self.Z_NN.forward(tx)*(tx[:,1].unsqueeze(-1))

    def loss(self):
        for i in range(self.K-self.num_steps,self.K):
                if i == self.K-self.num_steps:
                    # print(self.TX[:,1,0].shape)
                    Y = self.Y_NN(self.TX[:,1,0].unsqueeze(-1))
                else:
                    # print(self.TX[:,:,i-self.K+self.num_steps].shape,self.Z_NN(self.TX[:,:,i-self.K+self.num_steps]).shape,self.B[:,i-1].shape)
                    Y = Y + self.F(self.TX[:,:,i-self.K+self.num_steps])*self.Dt + self.Z_NN(self.TX[:,:,i-self.K+self.num_steps])*self.B[:,i-1]
        Y = Y + self.F(self.TX[:,:,-1])*self.Dt + self.Z_NN(self.TX[:,:,-1])*self.B[:,-1]
        return torch.mean(torch.pow(self.terminal(self.TX[:,1,-1]).unsqueeze(-1)-Y,2)) #mean value of running and terminal

In [149]:
class terminal(nn.Module):
    def __init__(self,flag,*args):
        super(terminal, self).__init__()
        self.layer = torch.nn.Sequential()
        self.layer.add_module("L1",torch.nn.Linear(1, 16))
        self.layer.add_module("Tanh", torch.nn.Tanh())
        self.layer.add_module("L2",torch.nn.Linear(16,1))
        self.flag = flag
    def forward(self, x):
        if self.flag == 'T':
            return torch.Tensor([0.5])*torch.pow(x,2)-x
        else:
            val = self.layer(x)
            return val
        return x

In [150]:
g = terminal('T')
v2 = BSDE(g)
# v2.value()

In [151]:
torch.autograd.set_detect_anomaly(True);

In [154]:
loss_epoch = []
num_epochs = 1000
lr = 1e-2
parameters = list(v2.Y_NN.parameters()) + list(v2.Z_NN.parameters())
optimizer2 = optim.Adam(parameters, lr)
L_ = torch.Tensor([-2.0])
loss = torch.Tensor([2.0])
epoch=0
while (torch.abs(L_-loss)>1e-6) & (epoch <= num_epochs):# epoch in range(num_epochs):
  optimizer2.zero_grad()
  loss= v2.loss()#self.cost(self.X,self.modelu(X))+ torch.mean(self.terminal(update(self.X,self.modelu(X))))#
  loss.backward()
  optimizer2.step()
  loss_epoch.append(loss)
  if epoch>0:
    L_ = loss_epoch[epoch-1]
  if (epoch % 10==0):
    print("At epoch {} the mean error is {}.".format(epoch,loss.detach()))
  epoch += 1

At epoch 0 the mean error is 0.0067556435242295265.
At epoch 10 the mean error is 0.016793539747595787.
At epoch 20 the mean error is 0.009750490076839924.
At epoch 30 the mean error is 0.00776706775650382.
At epoch 40 the mean error is 0.006860933266580105.
At epoch 50 the mean error is 0.0067389607429504395.
At epoch 60 the mean error is 0.0067915283143520355.
At epoch 70 the mean error is 0.006707805674523115.


## Value function

In [None]:
fig = plt.figure()
plt.scatter(a.numpy(),b,s=5,label='vval2d T-5Delta t',marker='o');
plt.scatter(x.clone().detach().numpy(),vval1(x).clone().detach().numpy(),s=5,label='vval1 T-5Delta t',marker='o');

plt.legend();
plt.show();