<h1> <center> Lab Session : Deep Learning for PDE </center> </h1>



<h2> üìå Objectives: </h2>

This lab session aims to provide a hands-on implementation of the methods presented during the course, focusing on how Deep Learning techniques can be applied to solve Partial Differential Equations (PDEs) that commonly arise in financial mathematics. You will explore and implement algorithms introduced in lectures using Python and its scientific libraries.


<h2>üìö Goal of the Lab: </h2>

By the end of this lab, you will be able to:

- Understand and implement the **Deep Galerkin Method (DGM)** and the **Deep BSDE method**.

- Apply these methods to solve financial PDEs such as those arising in option pricing and risk management.

- Analyze numerical results and compare them to analytical or benchmark solutions.
    
<h2> üóÇÔ∏è Lab Structure and assignments: </h2>

This notebook is organized into the following sections:

**1. [On the DGM](#galerkin-Applications)**  
&nbsp;&nbsp;&nbsp;&nbsp;1.1 [Methodology and Implementation](#galerkin-reminder)  
&nbsp;&nbsp;&nbsp;&nbsp;1.2 [Numerical results on various PDE](#galerkin-results)  

**2. [On the Deep BSDE Solver](#deepBSDE-Applications)**  
&nbsp;&nbsp;&nbsp;&nbsp;2.1 [Methodology and Implementation](#deepBSDE-reminder)  
&nbsp;&nbsp;&nbsp;&nbsp;2.2 [Numerical results on various PDE](#deepBSDE-results)  

**3. [References](#references)**  

 Each subsection of the lab will include **mathematics** and/or **coding** questions indicated by ‚ùì. **Your answers** indicated by ‚úèÔ∏è will count for your final grade of the course, with a weight to be determined later with respect to the project. Note that the project will have a significant higher weight in the final grade.

**Mathematics Questions**

- You can answer directly in the **Jupyter notebook** using LaTeX (compatible with Markdown).



**Coding Questions**

-  Complete the corresponding code sections **directly in the notebook**.
-  **Code readability**, **quality**, and **clarity of comments** will be taken into account in the **grading**.


If you choose this lab, you will have to send your work by e-mail at [samy.mekkaoui@polytechnique.edu](mailto:samy.mekkaoui@polytechnique.edu). The submission deadline will be announced later during the course.

<h2>‚ÑπÔ∏è Other informations: </h2>




- **Key References**: If you want to go deeper on the use of Deep Learning methods for analyzing PDE, you can look at the section [References](#references). <br> <br>



- **Contact**: If you find any mistakes in this notebook, or have any other feedback or questions, please feel free to e-mail me at [samy.mekkaoui@polytechnique.edu](mailto:samy.mekkaoui@polytechnique.edu).

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torch.distributions.normal import Normal

import math
from scipy import stats
import numpy as np
import matplotlib.pyplot as plt
from tqdm import tqdm
from tqdm.notebook import tqdm


device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

<a id=galerkin-Applications></a>

<center> <h1> On  the DGM </h1> </center>

<a id=galerkin-reminder></a>

<h2> 1.1 : Methodology and Implementation: </h2>

In this subsection, you are going to implement the Deep Galerkin method. For this, you are going first to implement a Neural network which will map a function from $(t,s) \in \mathbb{R}^2 \to \mathbb{R}$ with 2 hidden layers with $100$ neurons and with activation function given by `tanh`. Finally, from the last layer to the ouput, you just go through a linear function.


 ‚ùì **Question 1.1.1**: Fill in the definition of the network and especially in the `nn.Sequential` part the neural network using `nn.Linear` and the `tanh` activation function. 
 

`Hint`: To implement the neural network, you are going to use the PyTorch module. The code structure is really convenient using the `nn.Sequential` method so we encourage you to use it. For the ones who don't know about PyTorch, you can look at this video [here](https://www.youtube.com/watch?time_continue=1220&v=IC0_FRiX-sw&embeds_referring_euri=https%3A%2F%2Fdocs.pytorch.org%2F&source_ve_path=MTM5MTE3LDEzOTExNywyODY2Ng).


In [None]:
# Defining the Neural Network architecture
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.fc = nn.Sequential(# Write your code here ...)

    def forward(self, x):
        return self.fc(x)

v_net = Net().to(device)


Now, from your previous neural work denoted by $v$ which will take as input $(t,x) \in [0,T] \times \mathbb{R}^+$, you are going to use the automatic differentiation of Pytorch using the `torch.autograd` module.


 ‚ùì **Question 1.1.2**: Fill in the function compute_derivatives the code to return the derivatives of your neural network. You will use the `torch.autograd` module.
 


In [None]:
# Function to compute first and second derivatives of the network output
def compute_derivatives(
    net: torch.nn.Module,         # Neural network taking (T, S) as input
    S: torch.Tensor,              # Underlying asset(s), shape: (batch_size, input_dim)
    T: torch.Tensor               # Time to maturity, shape: (batch_size, 1)
) -> tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor]:
    """
    Returns:
        output:          net(T, S)
        d_output_dT:     ‚àÇoutput / ‚àÇT
        d_output_dS:     ‚àÇoutput / ‚àÇS
        d2_output_dS2:   ‚àÇ¬≤output / ‚àÇS¬≤
    """
      # Ensure T and S require gradients
      # Concatenate inputs and forward pass through net
      # Compute first-order gradients w.r.t T and S
      # Compute second-order gradient w.r.t S

    return   # output, d_output_dT, d_output_dS, d2_output_dS2


<a id=galerkin-results></a>


<h2> 1.2 : Numerical results on various PDE: </h2>

<h3> The Black-Scholes PDE: </h3>

We give you below the classes `Call` and `Forward` which for a given strike, maturity and quantity can compute the price and the payoff. They will be useful in the following.

In [None]:
r = 0.02
sigma = 0.2

class Call():
    def __init__(self, K, T,q):
        self.T = T # Maturity
        self.K = K # Strike
        self.q = q # Quantity

    def d_plus(self, St, t):
        return (torch.log(St/self.K) + (r + 0.5 * sigma**2) * torch.tensor(self.T-t)) / (sigma * torch.sqrt(torch.tensor(self.T-t)))

    def d_minus(self, St, t):
        return self.d_plus(St, t) - sigma * torch.sqrt(torch.tensor(self.T-t))

    def delta(self, St, t):
        delta = self.q * Normal(0, 1).cdf(self.d_plus(St, t))
        return delta

    def price(self, St, t):
        return self.q * (St * Normal(0, 1).cdf(self.d_plus(St, t)) - self.K * Normal(0, 1).cdf(self.d_minus(St, t)) * torch.exp(torch.tensor(-r*(self.T-t))))
    
    def g(self,St,K):
        return torch.max(St-self.K,torch.tensor(0.0))
class Forward():
    def __init__(self, K, T,q):
        self.T = T # Maturit√©
        self.K = K # Strike
        self.q = q # Quantit√©
    def price(self,St,t):
        return self.q* (St - self.K*torch.exp(torch.tensor(-r*(self.T-t))))
    def g(self,St,K):
        return St - self.K

In [24]:
CallOption = Call(110, 1,1)
ForwardOption=Forward(100,1,1)

 ‚ùì **Question 1.2.1**: Recall the Black-Scholes PDE for an European option given by his payoff $g(S_T)$ for a given function $g$.

‚úèÔ∏è **Your answer**: 

 ‚ùì **Question 1.2.2**: You are going to implement the loss function in the case of the Black-Scholes PDE. For this, you are going to get the loss due to the residual PDE by using the `compute_derivatives` function and then you are going to get the terminal_loss using the terminal function $g$. Fill the missing code in the code below.

In [22]:
# Loss function for training the value network to solve the Black-Scholes PDE
def loss_fn(
    S: torch.Tensor,                  # Underlying asset price(s), shape: (batch_size, 1)
    T: torch.Tensor,                  # Time(s), shape: (batch_size, 1)
    value_net: torch.nn.Module,       # Neural network approximating V(T, S)
    Option,               # Portfolio object ( can be either a Call or a Forward)
) -> torch.Tensor:
    """
    Returns:
        total_loss: scalar loss combining PDE residual and terminal condition error
    """

    # --- Step 1: Compute value and derivatives ---
    # Compute V(T, S), ‚àÇV/‚àÇT, ‚àÇV/‚àÇS, ‚àÇ¬≤V/‚àÇS¬≤ using autograd
    v, dv_dt, dv_dS, d2v_dS2 = compute_derivatives(value_net, S, T)

    # --- Step 2: Compute Black-Scholes operator applied to V ---
    # A_v = r * S * ‚àÇV/‚àÇS + 0.5 * œÉ¬≤ * S¬≤ * ‚àÇ¬≤V/‚àÇS¬≤
    # Full residual: ‚àÇV/‚àÇT + A_v - r * V

    loss_pde = ...  # torch.mean((dv_dt + A_v - r * v) ** 2)

    # --- Step 3: Compute terminal condition loss ---
    # Extract the strike price from the Portfolio contract
    strike = Option.K # 0

    # Compute the true terminal value using the payoff function
    terminal_target = ...

    # Build input with time set to maturity T_max
    T_terminal = ...

    # Evaluate network prediction at T = T_max
    v_terminal_pred = ...

    # Compare prediction to target at maturity
    loss_terminal = ...  # torch.mean((v_terminal_pred - terminal_target) ** 2)

    # --- Step 4: Combine both loss terms ---
    total_loss = loss_pde + loss_terminal

    return total_loss


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

 ‚ùì **Question 1.2.3**: You are going to implement now the training procedure. First, recall how does the training sampling works for the DGM method.
 
 ‚úèÔ∏è **Your answer**: 
 
 
 Then, in our case you are going to sample using a  grid of time points $(T,S)$ as we are working in a 1D setting. For this, you are going to discretize with the following parameters settings :
 
- $T_{max}$ = 1.0
- $S_{max}$ = 200.0
- $S_{min}$ = 20.0
- $t_{points}$ = 100
- $s_{points}$ = 100
- $batch_{size}$ = 1000 
- epochs = 2000
- $learning_{rate}$ = 1e-2.

Of course, you can change this parameters after checking that your code runs well.

In [None]:
# --- Step 1: Create the input grid of (T, S) points for training and validation ---

# Create a grid of time points from 0.01 to T_max using torch.linspace
T = ...  

# Create a grid of asset prices from S_min to S_max
S = ... 

# Create a full meshgrid of (T, S) combinations using torch.meshgrid()
T_grid, S_grid = ...  

# Flatten the grid to get a list of (T, S) pairs
T_flat = ...  
S_flat = ...  

# --- Step 2: Split data into training and validation sets ---

# Randomly shuffle all indices using torch.randperm
indices = ...  

# Use 80% for training and 20% for validation
train_indices = ...
val_indices = ...

# Select the training and validation inputs accordingly
T_train, S_train = ...
T_val, S_val = ...

# --- Step 3: Initialize the optimizer ---

# Use Adam optimizer on the parameters of the value network
optimizer = ...  

# --- Step 4: Initialize lists to store loss values ---

train_losses = []
val_losses = []

# --- Step 5: Training loop ---

# Loop over the number of epochs with a progress bar
for epoch in tqdm(range(epochs), desc="Training Progress"):

    # Reset gradients from previous step
    optimizer.zero_grad()

    # Compute training loss using current network
    loss_train = ...  

    # Backpropagation to compute gradients
    loss_train.backward()

    # Update network parameters
    optimizer.step()

    # Compute validation loss (no backward pass here)
    loss_val = ...  

    # Store loss values for later plotting
    train_losses.append(loss_train.item())
    val_losses.append(loss_val.item())

    # Print progress every 100 epochs
    if epoch % 100 == 0 or epoch == epochs - 1:
        tqdm.write(f"Epoch {epoch}, Train Loss: {loss_train.item():.4f}, Val Loss: {loss_val.item():.4f}")



Now, you can plot your **training** and **validation** losses with the following code:

In [None]:
# --- Step 6: Plot the loss curves ---

plt.figure(figsize=(6,4))

# Plot training and validation loss across epochs
plt.plot(train_losses, label='Train Loss')
plt.plot(val_losses, label='Validation Loss')

plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.title("Evolution of the loss during the Learning Process")
plt.legend()
plt.grid()
plt.show()


You are now going to evaluate your neural network by defining points on the grid $(t,s)$ and by plotting in a 3D map $(t,s,v(t,s))$.

 ‚ùì **Question 1.2.4**: Compute the 3D maps for a call option and for a forward option.  What is the expected shape for the function $s \mapsto C(t,s)$ for a fixed $t$ and for the function $s \mapsto F(t,s)$ where $C$ denotes the price of a call option and $F$ the price of a forward option. Do you indeed observe theses behaviours ?

In [None]:
# --- Step 1: Set the trained network to evaluation mode ---
v_net.eval()  # Disable dropout, etc.

# --- Step 2: Create a grid of time and asset values for plotting ---

# Generate time values from 0 to T_max
T_values = ...  

# Generate asset values from 0 to S_max
S_values = ...  

# Create a meshgrid of all (T, S) pairs
T_grid, S_grid = ... 

# Stack and convert the grid into a PyTorch tensor of shape (N, 2)
points = ...  

# --- Step 3: Evaluate the network over the grid of inputs ---

with torch.no_grad():
    # Predict V(t, S) at all grid points
    v_pred = ...  

# --- Step 4: Plot the 3D surface of the value function ---

# Create a matplotlib figure
fig = plt.figure(figsize=(8,6))

# Add a 3D subplot
ax = fig.add_subplot(1, 1, 1, projection='3d')

# Plot the surface V(t, S)
surf = ...  # ax.plot_surface(..., cmap='viridis')

# Label the axes
ax.set_xlabel('Time t')
ax.set


<h3> A PDE for the Credit Valuation Adjustment (CVA): </h3>

The Credit Valuation Adjustment (CVA) is a financial quantity computed by the banking industry which aims to represent the losses for them in the case of the default of one of their counterparties.

The CVA is usually parametrized by the following quantiites :


- $\tau$ the time default of the counterparty 
- $R^C$ the recovery rate in case of default of the counterparty
- $V_t$ the time at $t$ of the portfolio value such that $(V_t)^+$ correspond the exposure of the portfolio for the bank.
- $T$ the time maturity of the portfolio.



In a default intensity model (ie where we assume that $\tau^c \sim \mathcal{E}(\lambda)$ where $\mathcal{E}(\lambda)$ means that $\mathbb{Q}(\tau^c \geq t) = e^{-\lambda t}$ , the process $(CVA)_{0 \leq t \leq T}$ can be represented as:

\begin{align}\tag{1}
CVA_t = \mathbb{1}_{\tau_C > t}  (1-R^C) \mathbb{E}^{\mathbb{Q}} [ \int_{t}^{T} e^{- (r+\lambda)(s-t)}(V_s)^+ \lambda d s | \mathcal{F}_t] 
\end{align}
where $\mathcal{F}_t$ is the filtration given by the asset price process on the market $(S_t)_{ 0 \leq \leq T}$.

 ‚ùì **Question 1.2.5**: Show that in the case of the Black-Scholes and assuming that $\tau^c \sim \mathcal{E}(\lambda)$ where $\mathcal{E}(\lambda)$ means that $\mathbb{Q}(\tau^c \geq t) = e^{-\lambda t}$ is independant from $(V_t)_{0 \leq t \leq T}$ , the process $CVA_t$ can be represented as a suitable function $\phi$ such that 
 
 $$CVA_t = \mathbb{1}_{\tau_C \geq t} \phi(t,S_t) \hspace{0.1 cm} dt \otimes  d \mathbb{P} a.e ,$$ 
 
 where the function $[0,T] \times \mathbb{R}^d \ni (t,x) \mapsto \phi(t,x)$ is solution to the following PDE:
 
 
\begin{align}\label{eq : PDE CVA}
    \partial_t \phi(t,x) + \mathcal{L}\phi(t,x) - (r+ \lambda^C)\phi(t,x) + (1-R^c)(V(t,x))^+\lambda^C &= 0, \quad (t,x) \in [0,T( \times \mathbb{R}_*^+ \\
    \phi(T,x) &= 0, \quad x \in \mathbb{R}_{*}^+ \notag 
\end{align}

where we used that $V_t$ is a function of $V(t,S_t)$.


‚úèÔ∏è **Your answer**: 




In our numerical experiments, we will take the portfolio to be a standard call option with the same characteristics as the `CallOption` object defined before.



Moreover, we will assume $R^C=0$ and show the results for 2 different values of $\lambda^C$.

In [10]:
lambdaC = 0.1 # Default Intensity for the counterparty
R = 0 # Recovery Rate

‚ùì **Question 1.2.6**: Make the fewer updates of the code above to update to our current setting. You should only update one function. Give the name of the function and how to update it.


‚úèÔ∏è **Your answer**: 


‚ùì **Question 1.2.7**: Give another formula for $CVA_t$ which doesn't involve any conditional expectation for the case of an European call.


‚úèÔ∏è **Your answer**: 

‚ùì **Question 1.2.8**: With the help of the 2 previous questions, plot the functions $(t,s) \mapsto CVA(t,s)$ for an European call option with $\lambda^C = 0.1$ and $\lambda^C = 0.4$.  Explain how does the value of $\lambda^C$ impact the value of $CVA$ and give an explanation. Moreover, explain also why the behavior of the mapping $s \mapsto CVA(t,s)$ for a fixed $t$ is expected.


‚úèÔ∏è **Your answer**: 

<h3> A system of coupled PDE arising from risk management metrics: </h3>


The Funding Valuation Adjustment (FVA) and (Capital Valuation Adjustment) (KVA) are financial quantities computed by the banking industry which aims respectively the cost of funding and the cost of remuneration for the shareholders.


In a toy model, we can show that $KVA$ and $FVA$ are  solution to the following coupled systems of PDE associated respectively with $w$ and $v$ :


$$
\begin{align}
\frac{\partial{v}}{\partial{t}}+\mathcal{L}v +  \lambda(max(\alpha f  \sigma  S  |\frac{\partial{v}}{\partial{S}}- \Delta_{bs} |,w) + v - u_{bs})^{-}-rv=0 \quad (t,x) \in ]0,T[\times  \mathbb{R}_{*}^+ \tag{10} \\
\frac{\partial{w}}{\partial{t}}+\mathcal{L}w+ h max(\alpha f \sigma S | \frac{\partial{v}}{\partial{S}}- \Delta_{bs}|,w)-(r+h)w=0, \quad  (t,x) \in ]0,T[ \times  \mathbb{R}_{*}^+ \tag{11} \\
v(T,x)=w(T,x)=0  \quad x \in \mathbb{R}_{*}^+ \notag 
\end{align}
$$

where :

- $h$ represents a dividend rate
- $\alpha$ represents a mishedge parameter
- $\lambda$ is a funding rate 
- $f$ is a quantile level
- $u_{bs}$ and $\Delta_{bs}$ represent the call and delta price of a single call option of same characteristics as before.

 
 For the numerical experiments, we will take $\alpha = 0.3$,  $\lambda=0.02$, $f=1.2$ and $h=0.1$.


In [None]:
alpha = 0.3
hparam = 0.1
f = 1.2
lambd = 0.02

‚ùì **Question 1.2.9**: Make the fewer updates of the code above to update to our current setting. Again, you should only update one function. Give the name of the function and what explain how you can handle the fact that we are dealing with a coupled PDE systems.


‚úèÔ∏è **Your answer**: 


‚ùì **Question 1.2.10**: With the help of the 2 previous questions, plot the functions $(t,s) \mapsto CVA(t,s)$ for an European call option. Explain how does the value of $\lambda$, the funding rate impacts the value of the $FVA$ and the $KVA$ and how does the value of $h$, the dividend rate impact the value of the $KVA$ and the $FVA$ and give an explanation of this behaviour.


‚úèÔ∏è **Your answer**: 

<a id=deepBSDE-Applications></a>

<center> <h1> On  the Deep BSDE Solver </h1> </center>

 ‚ùì **Question 2.1.1**: What is the main difference between the DGM and the Deep BSDE Solver from a mathematical perspective ?

‚úèÔ∏è **Your answer**: 

 ‚ùì **Question 2.1.2**: Discuss the name $\textit{Deep BSDE Solver}$ given by the algorithm from his implementation procedure.  Do you know other algorithms similar to the $\textit{Deep BSDE Solver}$ ? If yes, give the main difference between theses algorithms and the  $\textit{Deep BSDE Solver}$.
 
`Hint`: Think about the word `backward'.
 
‚úèÔ∏è **Your answer**: 



<a id=deepBSDE-reminder></a>

<h2> 2.1 : Methodology and Implementation: </h2>

In this subsection, you are going to implement the Deep BSDE Solver. For this, we are going to do it through a class implementation.

You are going to implement the class Model by using the class `fbsde` which defines all the objects defining a FBSDE equation. First you will need to define a neural network which will approximate $(Z_{t_i})_{i =0 , \ldots, n }$. For this, you will start from a layer of size $dim_x +1$ and goes to size $dim_y \times dim_d$ and the mapping from the process $X$ to $Z$ will be down with the function  `phi`.
Moreover, you will add $y_0$ as a trainable parameter of the neural network which will be learnt during the learning process.
 
 
 `Hint` For setting $y_0$ as a trainable parameter, use the function `nn.Parameter`from PyTorch.
 
 
Once this is done, you will implement the function  `forward` which will gives forward and backward process terminal values $X_T$ and $Y_T$ by using the Euler-Maruyama scheme which needs  to sample paths of Brownian motion.

Then, you will implement the class `BSDEsolver` where will you complete the function `train` by using the loss function in the Deep BSDE Solver using the function $g$ and the associated paths $(X_T,Y_T)$ from your `forward` function in the class `Model` . You should return 2 lists : one for the evolution of the loss during the training processa and the other one for the evolution of the trainable parameter $y_0$.

 `Hint` Use the  `nn.MSELoss ` function to use the quadratic loss in the training process.


In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
from tqdm import tqdm

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# FBSDE problem definition
class fbsde():
    def __init__(
        self,
        x_0: torch.Tensor,               # (dim_x,) initial state
        b: callable,                     # b(t: float, x: Tensor) ‚Üí Tensor(batch_size, dim_x)
        sigma: callable,                 # sigma(t: float, x: Tensor) ‚Üí Tensor(batch_size, dim_x, dim_d)
        f: callable,                     # f(t: float, x: Tensor, y: Tensor, z: Tensor) ‚Üí Tensor(batch_size, dim_y)
        g: callable,                     # g(x: Tensor) ‚Üí Tensor(batch_size, dim_y)
        T: float,                        # final time
        dim_x: int, dim_y: int, dim_d: int
    ):
        self.x_0 = x_0.to(device)
        self.b = b
        self.sigma = sigma
        self.f = f
        self.g = g
        self.T = T
        self.dim_x = dim_x
        self.dim_y = dim_y
        self.dim_d = dim_d


# Neural network model for approximating Y and Z

class Model(nn.Module):
    def __init__(
        self,
        equation: fbsde,                # FBSDE problem
        dim_h: int                      # Hidden layer size
    ):
        super(Model, self).__init__()
        
         #Write your code here for the definition of the neural network and add y_0 as a trainable parameter

    def forward(
        self,
        batch_size: int,               # Number of samples
        N: int                         # Time discretization steps
    ) -> tuple[torch.Tensor, torch.Tensor]:
        """
        Returns:
            x: Tensor(batch_size, dim_x) ‚Äî terminal state
            y: Tensor(batch_size, dim_y) ‚Äî terminal Y value
        """
        def phi(x): # This function will approximate Z through the neural network.
           
        # Simulate forward and backward paths using Euler-Maruyama and phi network
        return x,y

    


# Solver that trains the model to estimate the BSDE solution
class BSDEsolver():
    def __init__(
        self,
        equation: fbsde,               # FBSDE problem
        dim_h: int                     # Hidden layer size
    ):
        self.model = Model(equation,dim_h).to(device)
        self.equation = equation

    def train(
        self,
        batch_size: int,              # Batch size for training
        N: int,                       # Time discretization steps
        itr: int                      # Number of training iterations
    ) -> tuple[list[float], list[float]]:
        
        """
        Returns:
            loss_data: List of training losses per iteration
            y0_data: List of Y‚ÇÄ estimates during training
        """
        criterion = # Define the MSELoss here 
        
        optimizer = torch.optim.Adam(self.model.parameters(),lr=1e-2)

        loss_data, y0_data = [], []

        # Training loop: simulate, compute loss, backprop, optimize
        
        
        return loss_data,y0_data  # loss_data, y0_data


<a id=deepBSDE-results></a>


<h2> 2.2 : Numerical results on various PDE: </h2>

In this subsection, you are going to test your previous implementation to different types of PDEs.

- **Black-Scholes (B-S) PDE** in low and high dimensions.
- **Allen-Cahn PDE**.
- A PDE from optimal stochastic control.


<h3> The Black-Scholes PDE: </h3>

We now assume a $B-S$ model dynamics with the underlying dynamics $S=(S^1,\ldots,S^d)$ and $W=(W^1,\ldots,W^d)$ multidimensional brownian motion given by :
$$
\begin{align}
dS_t^i = S_t^i ( r dt + \sigma^i dW_t^i), \quad S_0^i \in  (\mathbb{R}^{+}_{*}), \quad i =1,\ldots,d
\end{align}
$$

Under the option pricing theory in the $B-S$, for an european option with price at time $t$ denoted by $C(t,S_t)$ we know that we have the following PDE for the option price $v$ defined on $[0,T] \times (\mathbb{R}^{+}_{*})^d$ as :
$$
\begin{align}
\partial_t v + \mathcal{L}v- rv &= 0 \quad (t,x) \in [0,T) \times (\mathbb{R}^{+}_{*})^d \\
v(T,x) &= g(x) \quad x \in (\mathbb{R}^{+}_{*})^d
\end{align}
$$

where the infinitemisal generator is given by :


\begin{align}
\mathcal{L}v(t,x) = r x^{\top} D_x v (t,x) + \frac{1}{2} \sigma^2 Tr(xx^{\top} D^2_x v(t,x))
\end{align}

 
 For the numerical experiments, we will set the following quantities :
  
- $ x_0 = (1,\ldots,1) \in \mathbb{R}^{dim_x}$
- $ K= 1$
- $ r=0.05$
- $ \sigma = 0.2 * I_{dim_x}$ where $I_{dim_x}$ is the identity matrix of size $dim_x$.

 ‚ùì **Question 2.1.3**: In the context of the B-S PDE, what is the associated $f$ function ?


‚úèÔ∏è **Your answer**: 


We are now going to test your previous Deep BSDE Solver with various payoff functions $g$.

<h4> A numerical result on a  Basket Call Option under $B-S$ model  in low and high dimension $d$ :  </h4>

- $g(x) = (\sum_{i=1}^{d} x_i- d K)^+$



 ‚ùì **Question 2.1.4**: Fill in the drift function `b`, the volatility function `sigma`, the running cost function `f` and the terminal function `g` in the case of the Basket Call option?


In [None]:
# Parameters lists

r, sigma_value = 0.05, 0.2 

dim_x, dim_y, dim_d, dim_h, N, itr, batch_size, K =100, 1, 100, dim_x+10, 20, 2000, 1000, 1   # 

x_0, T = torch.ones(dim_x), 1

In [None]:
# Drift function
def b(t, x):
    """
    Args:
        t: (float) Current time
        x: (Tensor) Shape: (batch_size, dim_x)
    Returns:
        Tensor: Shape (batch_size, dim_x)
    """
    
    return   

# Diffusion (volatility) function
def sigma(t, x):
    """
    Args:
        t: (float) Current time
        x: (Tensor) Shape: (batch_size, dim_x)
    Returns:
        Tensor: Shape (batch_size, dim_x, dim_d)
    """
    return  

# BSDE driver function (running cost)
def f(t, x, y, z):
    """
    Args:
        t: (float) Time
        x: (Tensor) Shape: (batch_size, dim_x)
        y: (Tensor) Shape: (batch_size, dim_y)
        z: (Tensor) Shape: (batch_size, dim_y, dim_d)
    Returns:
        Tensor: Shape: (batch_size, dim_y)
    """
    return   

# Terminal condition (payoff)
def g(x):
    """
    Args:
        x: (Tensor) Shape: (batch_size, dim_x)
    Returns:
        Tensor: Shape: (batch_size, dim_y)
    """
    return  




In [None]:
# Create FBSDE equation
equation = fbsde(
    x_0=x_0,                # (Tensor) Initial value, shape: (dim_x,)
    b=b,                   # (function) Drift
    sigma=sigma,           # (function) Diffusion
    f=f,                   # (function) Driver
    g=g,                   # (function) Terminal condition
    T=T,                   # (float) Terminal time
    dim_x=dim_x,
    dim_y=dim_y,
    dim_d=dim_d
)

# Instantiate BSDE solver


 ‚ùì **Question 2.1.5**: Plot the evolution of the error during the learning process and the evolution of the initial price of the option $y_0$. 
 
 
 `Hint`: Use  the function  `train` in the  `BSDEsolver` class.
 

‚ùì **Question 2.1.6**: Compare the true value of a call option with the same parameters given from the BS price function with the price given by the Deep BSDE Solver. 

You can implement below the true value of a call option in dimension $d=1$ for instance given by the famous B-S formula.
 

‚úèÔ∏è **Your answer**: 


<h4> A numerical result on a Price Put under $B-S$ model in low and high dimension $d$:  </h4>


- $g(x) = (d K- \sum_{i=1}^{d} x_i)^+$

 ‚ùì **Question 2.1.7**: Fill in the drift function `b`, the volatility function `sigma`, the running cost function `f` and the terminal function `g` in the case of the Basket Put option?

 ‚ùì **Question 2.1.8**: Plot the evolution of the error during the learning process and the evolution of the initial price of the option $y_0$?

‚ùì **Question 2.1.9**: Compare the true value of a put option with the same parameters given from the BS price function with the price given by the Deep BSDE Solver. 

You can implement below the true value of a put option in dimension $d=1$ for instance given by the famous B-S formula.
 

‚úèÔ∏è **Your answer**: 


<h4> A numerical result for a Binary Call under $B-S$ model in dimension 1 : </h4> 


- $g(x) = \mathbb{1}_{x \geq K}$

 ‚ùì **Question 2.1.10**: Fill in the drift function `b`, the volatility function `sigma`, the running cost function `f` and the terminal function `g` in the case of the Binary Call option?

 ‚ùì **Question 2.1.11**: Plot the evolution of the error during the learning process and the evolution of the initial price of the option $y_0$?

‚ùì **Question 2.1.12** : What do you observe during the training process and the value of the price option compared to previously ? How can you explain this behaviour ?

‚úèÔ∏è **Your answer**: 


‚ùì **Question 2.1.13**: Compare the true value of a binary option with the same parameters given from the BS price function with the price given by the Deep BSDE Solver. 

You can implement below the true value of a binary option in dimension $d=1$ for instance given by the famous B-S formula.

‚úèÔ∏è **Your answer**: 


<h3> The Allen-Cahn PDE: </h3>

The Allen-Cahn PDE is given by the following :


$$
\begin{align}
\partial_t v + \Delta_{x} v + v - v^3 &= 0 \quad (t,x) \in [0,T(  \times \mathbb{R}^d \notag \\
v(T,x) &= \frac{1}{ 2+ \frac{2}{5} \lVert x \rVert^2} \quad x \in  \mathbb{R}^d
\end{align}
$$

where $\Delta_x v = \sum_{i=1}^{d} \partial^2_{x_i} v$ where we noted $x:=(x_1,\ldots,x_d)$.



‚ùì **Question 2.1.14**: In the context of the Allen-Cahn PDE, what are the associated functions $b$,$\sigma$, $f$ and $g$ ?


‚úèÔ∏è **Your answer**:


In the numerical experiments, we will set $T=\frac{3}{10}$ with $x_0=(0,\ldots,0) \in \mathbb{R}^d$ and $d=100$ and try to recover the true estimate value of the PDE which can be shown to be $\approx$ 0.052802.

In [None]:
dim_x, dim_y, dim_d, dim_h, N, itr, batch_size = 100, 1, 100, dim_x + 10, 20, 3000, 1000

x_0, T = torch.zeros(dim_x), 3/10

 ‚ùì **Question 2.1.15**: Fill in the drift function `b`, the volatility function `sigma`, the running cost function `f` and the terminal function `g` in the case of the Allen Cahn PDE
 
  `Hint`: Use  the functions from the `torch` module.


In [None]:
def b(t,x):
    return ...

def sigma(t, x):
    return ...
    #return torch.sqrt(torch.abs(x)).reshape(batch_size, dim_x, dim_d)


def f(t, x, y, z):
    return ...


def g(x):
    return ...

In [None]:
# Create FBSDE equation
equation = fbsde(
    x_0=x_0,                # (Tensor) Initial value, shape: (dim_x,)
    b=b,                   # (function) Drift
    sigma=sigma,           # (function) Diffusion
    f=f,                   # (function) Driver
    g=g,                   # (function) Terminal condition
    T=T,                   # (float) Terminal time
    dim_x=dim_x,
    dim_y=dim_y,
    dim_d=dim_d
)


 ‚ùì **Question 2.1.16**: Plot the evolution of the error during the learning process and the evolution of the initial value $v(0,x_0)$ in the case of the Allen-Cahn PDE. Compare it with the true value given above.

<h3> A PDE from an optimal control problem: </h3>

We consider the following PDE which can be shown to be the PDE arising from an HJB equation in optimal control :


$$
\begin{align}
    \partial_t v +  \Delta_x v - \frac{1}{2} | \nabla_x v|^2 &= 0, \quad (t,x) \in [0,T) \times \mathbb{R}^d \notag \\
    v(T,x) &= g(x) \notag 
\end{align}
$$


‚ùì **Question 2.1.17**: In the context of this optimal control PDE, what are the associated functions $b$,$\sigma$ and $f$ ?


‚úèÔ∏è **Your answer**:

For the numerical experiments, we choose $x_0=0$, $d=100$, and $g(x) = \text{ln}(\frac{1}{2}( 1+ \lVert x \rVert^2)$ with $\textit{semi-explicit form}$ given by Hopf-Cole transformation:

$$
\begin{align}
    v(0,x_0) = - \text{ln}\bigg(\mathbb{E}\Big[\text{exp}\big(-g(x_0 + \sigma W_T) \big)\Big]\bigg) \notag 
\end{align}
$$


In [None]:
sigma_value = np.sqrt(2) 


dim_x, dim_y, dim_d, dim_h, N, itr, batch_size = 100, 1, 100, dim_x+10, 20, 5000, 1000

x_0, T = torch.zeros(dim_x), 1


 ‚ùì **Question 2.1.18**: Fill in the drift function `b`, the volatility function `sigma` and the running cost function `f`  in the case of this PDE from an optimal control problem
 
  `Hint`: Use  the functions from the `torch` module.


In [None]:
def b(t,x):
    return ...

def sigma(t, x):
    return ...
    #return torch.sqrt(torch.abs(x)).reshape(batch_size, dim_x, dim_d)


def f(t, x, y, z):
    return ...


def g(x):
    return ...

In [None]:
# Create FBSDE equation
equation = fbsde(
    x_0=x_0,                # (Tensor) Initial value, shape: (dim_x,)
    b=b,                   # (function) Drift
    sigma=sigma,           # (function) Diffusion
    f=f,                   # (function) Driver
    g=g,                   # (function) Terminal condition
    T=T,                   # (float) Terminal time
    dim_x=dim_x,
    dim_y=dim_y,
    dim_d=dim_d
)


 ‚ùì **Question 2.1.19**: Plot the evolution of the error during the learning process and the evolution of the initial value $v(0,x_0)$ in the case of this PDE. Compare it with the true value given by the Hopf-Cole transformation above.

<a id=references></a>
<h2>  References: </h2>

$\bullet$ $\textit{Germain, Pham, Warin: "Neural networks-based algorithms for stochastic control and PDEs"}$, 2023 available [here](https://arxiv.org/pdf/2101.08068).

$\bullet$ $\textit{Bachouch, Hur√©, Langren√©, Pham : "Deep neural networks algorithms for stochastic control problems on finite horizon: numerical applications"}$,  2019 available [here](https://arxiv.org/pdf/1812.05916).

$\bullet$ $\textit{Hur√©, Pham, Bachouch, Langren√© : "Deep neural networks algorithms for stochastic control problems on finite horizon: convergence analysis"}$,  2019 available [here](https://arxiv.org/pdf/1812.04300).

$\bullet$ $\textit{E, Han, Jentzen : "Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations"}$, 2017 available [here](https://arxiv.org/pdf/1706.04702).

$\bullet$ $\textit{Sirignano, Spiliopoulos : DGM: A deep learning algorithm for solving partial differential equations"}$, 2017 available [here](https://arxiv.org/pdf/1708.07469).