In [None]:
%%R


In [None]:
from preamble import *
plt.close('all')

In [None]:
from Bachelier import *
from BTC_predictor import *

# Application to optimal transport

## Bachelier problem

### Introduction

The purpose of this test is to benchmark the Pi-function, see section \ref{the-conditional-expectation-algorithm} with the Bachelier problem, that is described in the next section. 

It is known since a decade that deep learning methods can be described by kernel methods, see for instance \cite{HJW}. We illustrate this fact with a kernel method, designed with a quite comparable spirit to the Neural network approach. Indeed, this method gives quite comparable results to the NN one : both methods are not convergent, and we do not advise their use for critical applications.

### Problem description

* Consider a martingale process $t \mapsto X(t) \in \RR^D$, given by the Brownian motion $dX=\sigma dW_t$, where the matrix $\sigma \in \RR^{D \times D}$ is randomly generated. The initial condition is $X(0)=(1,\cdots,1)$, w.l.o.g.

* Consider two times $1=t^1<t^2=2$, $t^2$ being the maturity of an option, that is a function denoted $P(x) = \max(b(x)-K,0)$, where $K=1.1$, $b(x) := x \cdot a$ with random weights $a \in \RR^D$. It is straightforward to verify that $b(x)$ follows a Brownian motion $db = \theta dW_t$. To get a fixed value for $\theta$ (fixed to 0.2 in our tests), we normalize the diffusion matrix $\sigma$ above.

* The goal of this test is to benchmark numerical methods aiming to compute the following conditional expectation
$$
  f(z) := \EE^{X(t^2)}\Big( P(\cdot) | X(t^1) = z \Big).
$$
For the Bachelier problem, this last quantity can be determined using a closed formula : the reference value is computed as
\begin{equation}\label{BACH}
  f(z) = \theta \sqrt{t^2 - t^1} pdf(d) + (b(x)-K) cdf(d),\quad d(x,K) := \frac{b(x)-K}{\theta \sqrt{t^2 - t^1}}
\end{equation}
pdf (resp. cdf) holding for the probability density function (resp. cumulative) of the normal law.

### Methodology, Notations, input and output data

For our tests, we use the following notations, and precise their signification for this report, and more generally for Finance applications:

* $x \in \RR^{ N_x \times D}$ denotes the training set of variables. For our test, this set is given by iid samples of the brownian motion $X(t)$ at time $t^1 = 1$. 
  + For quantitative Finance applications, this set typically consists in $N_x$ iid samples of a stochastic process $t \mapsto X(t) \in \RR^D$ at a time $t$. Such samples might be generated by discretization of stochastic processes using Euler methods for instance.
- $f(x) \in \RR^{ N_x \times D_f}$ denotes the training set of values. It is generated as $P(X(t^2) | x)$, $P$ being the payoff of the option described in the previous section, $x$ being the training set.
  + For Finance applications, $f \in \mathcal{C}^1(\RR^D)^M$ is usually a derivable function, having vector valued values corresponding to payoffs, or investment strategies, of portfolios.
- $z \in \RR^{ N_z \times D}$ denotes the test set of variables. It consists for our test as another iid realization of the brownian motion $X(t)$ at time $t^1 = 1$.
  + This set represent usually a set of user defined samples of underlying risks, chosen accordingly to its needs. 
- $f(z) \in \RR^{ N_z \times D_f}$ is the set of reference values, computed using \eqref{BACH} ($D_f = 1$ in this experiment)
  + This set consists in  reference - ground truth - values, approximating $P(z):= \EE\big(f(x_{t^2} | z)\big)$. 
  
To these set we added another one, used for internal computations

- $y \in \RR^{ N_y \times D}$, with $N_y << N_x$.
  + This set corresponds to the weight set for neural networks methods.
  + This set corresponds to what we call a "projection set" for kernel methods (see \cite{MM1} for a definition).

Output data are

- $f_z \in \RR^{ N_z \times D_f}$ the set of predicted values. These are the values that are benchmarked against $f(z)$ in our experiments.

For each numerical experiments, we output a table summarizing the values of $N_x,N_y,N_z,D$. 

### Four methods to tackle the Bachelier problem

In this technical report, we compare four methods for tackling the computation of conditional expectations for the Bachelier problem :

1. The first is a Neural Network one, using Tensorflow to retrieve results. Code can be downloaded at \cite{AS1}, the method itself is described in \cite{NS}.

2. The second one is a standard kernel method, quite comparable to the previous approach, using codpy implementation of kernel methods, that is the textit {projection} function

\begin{equation}\label{proj}
  P^k(x,y,z,f(z)=[]) = k(z,y) k(x,y)^{-1}f(z)
\end{equation}
where $y$ is a $N_y$-size random shuffling of the initial set $x$, and $k(x,y)$ is a Gram matrix, see \cite{MM} for a more detailed description.

3. The third one uses the Pi function above, where $x$ (resp. $z$) are iid sequences of $X(t^1)$ (resp. $X(t^2)$), as presented in the above section.

4.  The fourth one uses also the Pi function above, but choosing $x$ (resp. $z$) as sharp discrepancy sequences of $X(t^1)$, (resp. $X(t^2)$) see \cite{LM3} - \cite{LM1}.

### Test specification

A single test relies on 8 parameters, that we list below. We will be running several scenario to benchmark our results.

In [None]:
%%R
knitr::kable(matrix(c('# xs','# ys','# zs','Dimension','Generator x','Generator z | x','Generator z'),nrow=1,ncol=7),caption = 'A test specification', col.names = c('Nx','Ny','Nz', 'D', 's1','s2','s3'))

Hence this numerical experiment uses $s_1,s_2,s_3$, three seeds for random generators:

- $s_1$ is used to generate iid samples of $X(t^1)$ for the training set of variables $x$.

- $s_2$ is used to generate iid samples of the conditional sampling $X(t^2) | X(t^1) = x$ for the training set of variables.

- $s_3$ is used to generate iid samples of $X(t^1)$ for the test set of variables $z$.

For instance, if $s_1 = s_3$, and $N_z < N_x$, then the test set is a subset of the training set $z \subset x$.


To summarize the methodology, for each scenario of a given 8-uple $N_x,N_y,N_z,D,D_f, s_1,s_2,s_3$, each of our three methods output a prediction $f_z$, that is benchmarked against the ground-truth value $f(z)$. 

To measure errors, we use the percentage RMSE error, expressed as a real number between 0 and 1, called \textbf{basis point error}, as follows:

$$
  RMSE\%(f_z,f(z)) = \frac{\|f_z-f(z)\|_{\ell^2}}{\|f_z\|_{\ell^2}+\|f(z)\|_{\ell^2}} (\#eq:RMSEper)
$$

We presents three tests. Two of them are two-dimensional ($D=2$), allowing graphical representation of input data and output errors to best illustrate our three methods. The third one is concerned with higher dimensional case.

#### Input data settings

In [None]:
codpy_param = {'rescale:xmax': 1000,
'rescale:seed':42,
'sharp_discrepancy:xmax':1000,
'sharp_discrepancy:seed':30,
'sharp_discrepancy:itermax':10,
'discrepancy:xmax':500,
'discrepancy:ymax':500,
'discrepancy:zmax':500,
'discrepancy:nmax':2000 
}

We generate our data set for the test, accordingly to the description given in the previous paragraph.

In [None]:
from Bachelier import *
D, Nx,Ny,Nz = 2, 300,300,300
data_ = data_generator_Bachelier(seed1 = 42, seed2 = 35, seed3 = 42)
data_.set_data(D, Nx,Ny,Nz)
x,z,fx,fz = data_.x,data_.z,data_.fx,data_.fz

##### Training set for different times

We plot $x=X(t^1)$ (training set), generated at time $t^1=1$ with $X(t^2) | X(t^1)$, that are the trajectories generated at time $t^2=2$. We plot a line between each $x$ and $z | x$

In [None]:
x1=data_.variables(data_.T1,Nx,[],seed=data_.seed1)
x2=data_.variables(data_.T2-data_.T1,Nx,x1,seed=data_.seed2)
plt.scatter(x1[:,0],x1[:,1],color="red")
plt.scatter(x2[:,0],x2[:,1],color="green")
plt.plot([x1[:,0],x2[:,0]],[x1[:,1],x2[:,1]],color="black")
plt.show()

##### Training values and ground test values distributions

We plot the generated learning and test set in the following picture, comparing the variable $f(x)$ and the exact to predict $f(z)$, taking as x-axis the corresponding values of $b(x), b(z)$.

In [None]:
basketx = data_.basket(x = x)
basketz = data_.basket(x = z)
multi_plot([(basketx,fx),(basketz,fz)],plot1D,subtitles = ('input data','ground truth values'))

### Running the tests

#### Standard Neural Network

This test uses part of the code available at \cite{AS1}. Our chosen scenarios are listed in Table \@ref(tab:583)

In [None]:
scenarios_list = [ (2, 2**i, 80, 512)  for i in np.arange(5,16,1)]
pd_scenarios_list = pd.DataFrame(scenarios_list,columns = ["D","Nx","Ny","Nz"])

In [None]:
%%R
knitr::kable(py$pd_scenarios_list, caption = "scenario list")%>%
  kable_styling(full_width = T,font_size = 6)

The Table \@ref(tab:984) output the values of this test

In [None]:
data_generator_Bachelier_ = data_generator_Bachelier(seed1 = 42, seed2 = 35, seed3 = 42)
scenario_generator_ = scenario_generator()
scenario_generator_.run_scenarios(scenarios_list,data_generator_Bachelier_,NN_predictor_standard(set_kernel = set_sampler_kernel),data_accumulator(), **codpy_param)
results = scenario_generator_.accumulator.get_output_datas().dropna(axis=1).T

In [None]:
%%R
knitr::kable(py$results, caption = "tensorflow indicators")%>%
  kable_styling(full_width = T,font_size = 6)

We output the predicted values $f_z$ against the exact ones $f(z)$, as  functions of the basket values $b(z)$ in Figure \@ref(fig:585)

In [None]:
basketzs = data_.basket(x = scenario_generator_.accumulator.get_zs())
scenario_generator_.accumulator.plot_predicted_values(basketzs,labelx='Basket values',labely='')

#### Standard codpy kernel

We provide the same approach with the kernel projection operator. The list of scenario for this test is Table \@ref(tab:586)

In [None]:
scenarios_list = [ (2, 2**i, 80, 512)  for i in np.arange(5,16,1)]
pd_scenarios_list = pd.DataFrame(scenarios_list,columns = ["D","Nx","Ny","Nz"])

In [None]:
%%R
knitr::kable(py$pd_scenarios_list, caption = "scenario list")%>%
  kable_styling(full_width = T,font_size = 6)

In [None]:
scenario_generator_.run_scenarios(scenarios_list,data_generator_Bachelier_,codpyprRegressor(set_kernel = set_sampler_kernel),data_accumulator(), **codpy_param)
results = scenario_generator_.accumulator.get_output_datas().dropna(axis=1).T

The Table \@ref(tab:587) output the values of this test

In [None]:
%%R
knitr::kable(py$results, caption = "codpy predictor indicators")%>%
  kable_styling(full_width = T,font_size = 6)

We output the predicted values $f_z$ against the exact ones $f(z)$, as  functions of the basket values $b(z)$ in Figure \@ref(fig:588)

In [None]:
basketzs = data_.basket(x = scenario_generator_.accumulator.get_zs())
scenario_generator_.accumulator.plot_predicted_values(basketzs,labelx='Basket values',labely='')

#### Pi function

We provide the same approach with the Pi function. The list of scenario for this test is Table \@ref(tab:989)

In [None]:
scenarios_list = [ (2, 2**(i), 2**(i), 2**(i))  for i in np.arange(5,10,1)]
pd_scenarios_list = pd.DataFrame(scenarios_list,columns = ["D","Nx","Ny","Nz"])
data_generator_Bachelier_iid_ = data_generator_Bachelier_iid(seed1 = 42, seed2 = 35, seed3 = 42)

In [None]:
%%R
knitr::kable(py$pd_scenarios_list, caption = "scenario list")%>%
  kable_styling(latex_options = "HOLD_position")

The Table \@ref(tab:590) output the values of the tests


In [None]:
scenario_generator_.run_scenarios(scenarios_list,data_generator_Bachelier_iid_,Pi_predictor(set_kernel = set_sampler_kernel),data_accumulator(), **codpy_param)
results = scenario_generator_.accumulator.get_output_datas().dropna(axis=1).T

In [None]:
%%R
knitr::kable(py$results, caption = "Pi indicators")%>%
  kable_styling(full_width = T,font_size = 6)

We output the predicted values $f_z$ against the exact ones $f(z)$, as  functions of the basket values $b(z)$ in Figure \@ref(fig:591)

In [None]:
basketzs = data_.basket(x = scenario_generator_.accumulator.get_zs())
scenario_generator_.accumulator.plot_predicted_values(basketzs,labelx='Basket values',labely='')

#### Pi function - discrepancy sequences

We provide the same approach with the Pi function, with sharp discrepancy sequences. The list of scenario for this test is Table \@ref(tab:592)

In [None]:
scenarios_list = [ (2, 2**(i), 2**(i), 2**(i))  for i in np.arange(5,10,1)]
pd_scenarios_list = pd.DataFrame(scenarios_list,columns = ["D","Nx","Ny","Nz"])
data_generator_Bachelier_iid_ = data_generator_Bachelier_sharp(seed1 = 42, seed2 = 35, seed3 = 42)

In [None]:
%%R
knitr::kable(py$pd_scenarios_list, caption = "scenario list")%>%
  kable_styling(full_width = T,font_size = 6)

The Table \@ref(tab:593) output the values of the tests

In [None]:
scenario_generator_.run_scenarios(scenarios_list,data_generator_Bachelier_iid_,Pi_predictor(set_kernel = set_sampler_kernel),data_accumulator(), **codpy_param)
results = scenario_generator_.accumulator.get_output_datas().dropna(axis=1).T

In [None]:
%%R
knitr::kable(py$results, caption = "Pi-sharp indicators")%>%
  kable_styling(full_width = T,font_size = 6)

We output the predicted values $f_z$ against the exact ones $f(z)$, as  functions of the basket values $b(z)$ in Figure \@ref(fig:994)

In [None]:
basketzs = data_.basket(x = scenario_generator_.accumulator.get_zs())
scenario_generator_.accumulator.plot_predicted_values(basketzs,labelx='Basket values',labely='')

### Comparing methods

The Figure \@ref(fig:995) presents a benchmark for scores, computed accordingly to \@ref(eq:RMSEper). Axis are in log-scale of the size of the training $N_x$.

In [None]:
scenario_generator_.compare_plots(axis_field_labels = [("Nx","scores")],labelx='log2(Nx)',labely='scores',xscale ="log"
)

The Figure \@ref(fig:996) presents a benchmark regarding execution times in seconds. Axis are in log-scale of the size of the training $N_x$.


In [None]:
scenario_generator_.compare_plots(
axis_field_labels = [("Nx","execution_time")],labelx='log2(Nx)',labely='scores',xscale ="log",yscale ="log")

## Time series

This section remains to write properly.

### Recurrent kernels

The implemented method is defined using two integer values : H and P. H is called the historical depth, P the prediction depth. This setting defines a sliding window of size H+P over the dataset, used to define the training set. If the dataset contains N vectors, then the training set can be of size N-H-P. We can iterate the procedure, producing at each step P new predicted values. This allows, theoretically, to produce predicted values of the temporal series at any future times.

This method allows to draw one trajectory, that can be considered as a iid realization of the temporal series, based on the knowledge of its history. On the following example, H and P are set to 360 days. Here the separation date is the 23/11/2020.

In [None]:
%%R
knitr::include_graphics(here::here("CodPyFigs", "1640295971717.png"))

In [None]:
%%R
knitr::include_graphics(here::here("CodPyFigs", "1640295971717.png"))

This method has a lot of forecasting applications, and we do use it for professional purposes. However, in the context of temporal series forecasting, such a method faces a number of questions. For instance :

- It is not clear how to generate other realizations of the studied temporal series.
- As a consequence, it is not clear neither how to generate a pertinent mean estimator using this construction.

Even if long-short term memory, or recurrent networks can produce credible generated samples, beware to unstability issues. We have no theoretical references to support these methods.

### Optimal transport methods for time series

Kernel methods can link easily with optimal transport theory. Using the polar factorization of maps, we can also compute explicitly the quantile of the original distribution, and extrapolate it on any random trajectory set, and we can draw "equi-probable" trajectories (i.e. iid realizations of the underlying process).

We have also a quite clear interpretation of a mean estimator and the method is quite performing.

In [None]:
%%R
knitr::include_graphics(here::here("CodPyFigs", "1640305683779.png"))

## Stress and reverse stress tests