# Quantum Machine Learning for Risk Assessment


One of the main important tasks in financial institutions is risk assessment particularly the computation of the *Value at Risk* (**VaR**) which measures the potential loss in case of an unlikely event. 

This risk metric is simple to calculate once the shape of the financial distribution is determined, as it only requires a quantile calculation of the distribution. 

In general, the computation of these financial distributions is complex and highly time-consuming, making the calculation of **VaR** using them impractical in production environments. 

One approach involves building a surrogate model trained with samples from the complex and well-calibrated financial model. The surrogate model can be evaluated quickly and used more efficiently for VaR calculations. The main issue with this approach is that the surrogate model cannot accurately represent the tails of the financial distribution, which are essential for **VaR** computations. 

Utilizing Differential Machine Learning (**DML**) can help create a more reliable surrogate model that is suitable for **VaR** computations.


The **QQuantLib.qml4var** package from the *FinancialApplications* software library enables users to construct Parametric Quantum Circuits (**PQC**) and train them using techniques from **DML** to develop surrogate models for **VaR** computations.

The theoretical basis of this work can be found in:

* Manzano, A. (2024). Contributions to the pricing of financial derivatives contracts in commoditiy markets and the use of quantum computing in finance [Doctoral dissertation, Universidade da Coruña].


## 1. Outline of the Problem

Let $F(\textbf{x})$ a Cumulative Distribution Function, **CDF**, representing a complex and time-consuming financial distribution, where $\textbf{x}=\{x_0, x_1, \cdots x_{m-1}\}$ is the input feature vector. 

Let $\tilde{\textbf{x}}^j$ with $j=0, 1, \cdots, n-1$, represent $n$ samples obtained from $F(\textbf{x})$, i.e. $\tilde{\textbf{x}}^j \sim F(\textbf{x})$. 

The primary objective is to construct a Parametric Quantum Circuit (**PQC**) $F^*(\textbf{x}, \theta)$, serving as the surrogate model. This model, trained on the $\tilde{\textbf{x}}^j$ samples, should provide an accurate approximation $F(\textbf{x})$, enabling efficient computations **VaR**.


The training procedure follows the standard approach used in Machine Learning (**ML**): define an appropriate **loss function** and then determine the set of parameters (i.e., weights) $\theta$ that minimize this **loss function**.




## 2. Main ingredients for the training

### 2.1 The Empirical Distribution Function

One major caveat presented in the outline is that, in a standard training workflow, both the sample input features, i.e. $\tilde{\textbf{x}}^j \sim F(\vec{x})$, and their corresponding labels are mandatory. However, the labels for the original financial distribution are generally not available. To address this issue, the empirical distribution function will be used for building the labels:

$$F^*_{\text{emp}}(\textbf{x}) = \dfrac{1}{K}\sum_{k = 0}^{K - 1}\textbf{1}_{\textbf{x}^k\leq \textbf{x}}.$$ 

where $K$ represents the number of available samples.

Thus, the dataset for **ML** training will consist of $n$ pairs:

$$\left( \tilde{\textbf{x}}^j, F^*_{\text{emp}}(\tilde{\textbf{x}}^j) \right)$$


### 2.2 The Probability Density function

In addition to the **PQC**, $F^*(\textbf{x}, \theta)$, it is essential to compute the corresponding probability density function, **PDF**, of the surrogate model: 

$$f^*(\textbf{x}, \theta) = \frac{\partial^m F^*(\textbf{x}, \theta)}{\partial x_{m-1} \cdots \partial x_1 \partial x_0}$$

The PDF will be crucial for capturing the finer details of the financial distribution, particularly for **VaR** computations.


### 2.3 The Loss Function

The **loss function** used for training will be:

$$Loss = \alpha_0 * \frac{1}{K} \sum_{k=0}^{K-1}\left(F^*_{\text{emp}}(\tilde{\textbf{x}}^k) -F^*(\textbf{x}^k, \theta)\right) + \alpha_1 \left( -\frac{2}{K} \sum_{k=0}^{K-1} f^*(\tilde{\textbf{x}}^k, \theta)  + Q\left({f^*}^2(\textbf{x}, \theta)  \right) \right)$$ 

Here:
* $Q\left({f^*}^2(\textbf{x}, \theta)\right)$ represents the integral of ${f^*}^2(\textbf{x}, \theta)$ over the domain of the distribution, which can be approximated using numerical methods.

* $K$ representss the represents the number of available training samples.

* $\alpha_0$ and $\alpha_1$ are hyperparameters that balance the contributions of the two terms in the loss function.

## 3. Tutorial Notebooks

The **QQuantLib.qml4var** package enables users to build the necessary functions in *EVIDEN myQLM* for implementing the described workflow. The following notebooks explain how to use the different parts of the package.