<h1>RELOCOR: REinforcement Learning based Optimal CORrelation for variance reduction</h1>

In this notebook we give a quick start on how to use the package <tt>relocor</tt>.

In [None]:
import relocor
import numpy as np
import torch

<h2>Quick start</h2>

Run a simulation with <tt>main.py</tt>; the parameters that can be modified are:

<ul>
    <li><tt>sde_name</tt>: string for the choice of the SDE to simulate. The already implemented SDE models are:
        <ul>
            <li><tt>'bs'</tt>: one-dimensional Black-Scholes model.</li>
            <li><tt>'multi_bs'</tt>: multi-dimensional Black-Scholes model.</li>
            <li><tt>'heston'</tt>: Heston model.</li>
            <li><tt>'multi_heston'</tt>: Multi-asset Heston model.</li>
            <li><tt>'fishing'</tt>: SDE inspired from a fish biomass evolution with fishing, <a href="https://arxiv.org/abs/2109.06856">arXiv</a>.</li>
        </ul>
    </li> The test function is also given in the sde file.
    <li><tt>action_name</tt>: string for the choice of the parametrization of the action. The already implemented actions are:
        <ul>
            <li><tt>'diag'</tt>: parametrization with a diagonal matrix.</li>
            <li><tt>'ortho'</tt>: parametrization with a diagonal matrix and an orthogonal change of basis with blocks of 2x2 rotations.</li>
            <li><tt>'ortho2d'</tt>: same as before but for the case where the Brownian noise has dimension 2; useful for testing and a little bit more faster for the dimension 2.</li>
        </ul>
    </li>
    <li><tt>AgentClass</tt>: class of the RL algorithm for training; can be algorithm from <a href="https://github.com/DLR-RM/stable-baselines3/tree/master"><tt>stable-baselines3</tt></a> adapted for box environments, such as <tt>A2C, PPO, DDPG, SAC, TD3</tt>, or our Policy Gradient agent <tt>relocor.agents.PG</tt>.</li>
    <li><tt>batch_size</tt>, <tt>batch_eval</tt>: batch sizes for training (only for PG) and during-training-evaluation respectively.</li>
    <li><tt>state_idxs_plot</tt> and <tt> action_idxs_plot</tt>: indexes of which dimensions of the trajectories and of the actions, respectively, to plot.</li>
</ul>

In [None]:
from stable_baselines3 import PPO, A2C, TD3, DDPG, SAC

# Parameters
sde_name = 'multi_bs'
action_name = 'ortho'
AgentClass = relocor.agents.PG
N_euler = 50
T = 1.
EPOCHS = 20
batch_size = 512
batch_eval = 512*16
epoch_eval_freq = 1
state_idxs_plot = [0]
action_idxs_plot = [0]

<h2>Environments, experiment and training</h2>

Our environment <tt>env</tt> is a SDE environment where the state is the two correlated trajectories (X1, X2, t), the action is the correlation matrix and the reward is the telescopic increments of the variance. The environment is written with <a href="https://www.gymlibrary.dev/index.html">OpenAI gym</a> or <a href="https://gymnasium.farama.org/index.html">OpenAI gymnasium</a> environment.

We also need a <tt>batch_env</tt> written in Pytorch for parallelized evaluation; <tt>batch_env</tt> describes the same environment as <tt>env</tt> but the zero'th dimension is for the batch dimension. It requires a <tt>batch_test_function</tt> (or "payoff") and a <tt>batch_action_param</tt>, which are the same as <tt>test_function</tt> and <tt>action_param</tt> but written in Pytorch for batch operations.

In [None]:
sde = relocor.sdes.multi_bs_sde
payoff = relocor.sdes.multi_bs_payoff
batch_payoff = relocor.sdes.multi_bs_batch_payoff
action_param = relocor.actions.ActionOrtho(sde.dim_noise)
batch_action_param = relocor.actions.BatchActionOrtho(sde.dim_noise)

env = relocor.SDEEnvironment(
    sde = sde,
    T = T,
    N_euler = N_euler,
    test_function = payoff,
    action_param = action_param
)

batch_env = relocor.BatchSDEEnvironment(
    sde = sde,
    T = T,
    N_euler = N_euler,
    batch_test_function = batch_payoff,
    batch_action_param = batch_action_param
)


experiment = relocor.Experiment(
    env = env, batch_env = batch_env, AgentClass = AgentClass
)


A constant policy can be evaluated with <tt>experiment.evaluate</tt>. Ensure that the constant policy checks the requirements of the parametrization of the action <tt>action_param</tt>.

In [None]:
variance, total_reward, mean = experiment.evaluate(
    nb_episodes=20,
    batch_size=512*16,
    policy_action=np.array([0., 0., 1.]))

print(variance)

Note that baseline (no correlation) and antithetic policies are already implemented, depending on the parametrization of the action:

In [None]:
variance, total_reward, mean = experiment.evaluate(
    nb_episodes=20,
    batch_size=512*16,
    policy_action=experiment.env.action_param.baseline_action)

print(variance)

# can also use
experiment.env.action_param.antithetic_action

Train the chosen RL algorithm and evaluate.

If there is no <tt>policy_action</tt> specified in <tt>experiment.evaluate</tt>, then the trained agent will be evaluated (must be trained before evaluation).

Upon <tt>batch_eval=0</tt>, there is no callback evaluation during the training (saves time).

In [None]:
experiment.train(
    total_timesteps=N_euler*EPOCHS,
    batch_size = batch_size,
    batch_eval = batch_eval,
    epoch_eval_freq = epoch_eval_freq
    )

variance, total_reward, mean = experiment.evaluate(
    nb_episodes=10,
    batch_size=512*16)

print(variance)

<h2>Plots and save</h2>

Plot the evolution of the variance during the training.

Plot an example of two correlated trajectories and of the action from the trained agent; specify the dimension indexes to plot for both trajectories and action.

Then save in the directory <tt>results</tt> specifying <tt>dir_name</tt>.

In [None]:
experiment.plot_train_variance()

experiment.run_trajectory()
experiment.display_trajectory(state_idxs=state_idxs_plot, action_idxs=action_idxs_plot)

dir_name = 'multi_bs'
experiment.save_experiment(path='./results/{}'.format(dir_name))
experiment.save_trajectory(path='./results/{}'.format(dir_name))

<h2>Custom SDE and test function</h2>

In this section we enter the details to explain how to use the package with a custom SDE.

A custom SDE should be written with the base class <tt>relocor.sdes.SDE</tt> and by specifying the <tt>drift</tt> and <tt>sigma</tt> (diffusion matrix) in numpy as well as the <tt>batch_drift</tt> and <tt>batch_sigma</tt> in Pytorch for the batch operations. If you don't plan to use batch environments for parallelized evaluation, you don't need to define these last two functions. Likewise, specify the test functions in numpy and Pytorch.

In [None]:
# black scholes in one dimension
name = 'multi Black Scholes'

# dimension of the trajectory vector
dim = 2

r = 0.06
sig = 0.3
K = 1.
X0 = 1.

def drift(X):
    return r*X

def sigma(X):
    return np.diag(sig*X)

def batch_sigma(X):
    return torch.diag_embed(sig*X)

multi_bs_sde = relocor.sdes.SDE(
    dim=dim,
    dim_noise=dim, # dimension of the Brownian noise
    drift=drift,
    sigma=sigma,
    X0=np.array(dim*[X0]),
    batch_drift=drift,
    batch_sigma=batch_sigma,
    name=name
)

def multi_bs_payoff(X):
    return np.mean(np.maximum(X-K,0))

def multi_bs_batch_payoff(X):
    return torch.mean(torch.relu(X-K), axis=1, keepdims=True)
