# Curvature Trading Strategy: Theory

This notebook contains a brief walkthrough of some of the theory and rationale behind the creation of the trading strategy. The code is directly built upon the theory presented here. Some of the ideas presented in this notebook may sound somewhat incoherent, as it is a highly compressed summary of the project. Abstract in readme file provides a bigger picture of the project.

To do:
- Improve readability

# Graphs
Given a weighted bidirected graph network:

<img src="imgs-for-theory/weighted.jpg" style="width:400px;height:228.1px" align="center"/>

The weighted adjacency matrix $\mathbf{W}$ is:

\begin{equation}
\mathbf{W}=\left[\begin{array}{cccccccc}
{0} & {0.54} & {0.14} & {0} & {0} & {0} & {0} & {0.47} \\
{0.54} & {0} & {0.63} & {0.35} & {0.30} & {0} & {0} & {0.31} \\
{0.14} & {0.63} & {0} & {0.31} & {0} & {0} & {0} & {0} \\
{0} & {0.35} & {0.31} & {0} & {0.54} & {0.43} & {0} & {0.13} \\
{0} & {0.30} & {0} & {0.54} & {0} & {0.54} & {0.62} & {0.54} \\
{0} & {0} & {0} & {0.43} & {0.54} & {0} & {0.37} & {0} \\
{0} & {0} & {0} & {0} & {0.62} & {0.37} & {0} & {0} \\
{0.47} & {0.31} & {0} & {0.13} & {0.54} & {0} & {0} & {0}
\end{array}\right]
\end{equation}

The Degree matrix $\mathbf{D}$ is:

\begin{equation}
D_{m m}=\sum_{n} W_{m n}
\end{equation}

\begin{equation}
\mathbf{D}=\left[\begin{array}{cccccccc}
{1.15} & {0} & {0} & {0} & {0} & {0} & {0} & {0} \\
{0} & {2.13} & {0} & {0} & {0} & {0} & {0} & {0} \\
{0} & {0} & {1.08} & {0} & {0} & {0} & {0} & {0} \\
{0} & {0} & {0} & {1.76} & {0} & {0} & {0} & {0} \\
{0} & {0} & {0} & {0} & {2.54} & {0} & {0} & {0} \\
{0} & {0} & {0} & {0} & {0} & {1.34} & {0} & {0} \\
{0} & {0} & {0} & {0} & {0} & {0} & {0.99} & {0} \\
{0} & {0} & {0} & {0} & {0} & {0} & {0} & {1.45}
\end{array}\right]
\end{equation}

# Laplacian Matrix
The Laplacian Matrix is simply a matrix representation of a graph, defined as:

$$\mathbf{L} = \mathbf{D} - \mathbf{W}$$

\begin{equation}
\mathbf{L}=\left[\begin{array}{cccccccc}
{1.15} & {-0.54} & {-0.14} & {0} & {0} & {0} & {0} & {-0.47} \\
{-0.54} & {2.13} & {-0.63} & {-0.35} & {-0.30} & {0} & {0} & {-0.31} \\
{-0.14} & {-0.63} & {1.08} & {-0.31} & {0} & {0} & {0} & {0} \\
{0} & {-0.35} & {-0.31} & {1.76} & {-0.54} & {-0.43} & {0} & {-0.13} \\
{0} & {-0.30} & {0} & {-0.43} & {2.54} & {1.34} & {-0.37} & {0} \\
{0} & {0} & {0} & {-0.43} & {-0.54} & {1.34} & {-0.37} & {0} \\
{0} & {0} & {0} & {0} & {-0.62} & {-0.37} & {0.99} & {0} \\
{-0.47} & {-0.31} & {0} & {-0.13} & {-0.54} & {0} & {0} & {1.45}
\end{array}\right]
\end{equation}

Some properties of the Laplacian Matrix:

- $\mathbf{L}$ is a symmetric, positive semidefinite matrix.
- The off-diagonal entries of $\mathbf{L}$ are nonpositive. A semidefinite matrix with nonpositive off-diagonal entries is called a Stieltjes matrix. 
- The diagonal entries of $\mathbf{L}$ are the vertex degrees. The row sums and column sums are all zero.
- The rank of $\mathbf{L}$ is $n-k$, where $k$ is the number of connected components of $G$.


# Diffusion Processes

The 1-D Heat Equation is defined as: 
\begin{equation}
    \frac{\partial \phi(x,t)}{\partial t} = \alpha\frac{\partial^{2} \phi(x,t)}{\partial x^{2}} = \alpha\nabla^{2}\phi \quad 
\label{heatequation}
\end{equation}

where $\nabla^{2} = \frac{\partial^{2}}{\partial x^{2}}$, is the continuous Laplacian operator, and can be generalized to n-dimensional cases.

The intuition behind the equation is that the change in heat distribution with respect to time is directly proportional to the second derivative of how the heat is distributed in space, meaning that it is directly related to the curvature of the heat distribution. In particular, points with high curvature compared to its neighbours (e.g. sharp regions) tend to flatten out at a faster rate than points that do have weaker curvature, and the rate at which it flattens out slows down as the region becomes smoother. 

# Discrete Laplace Operator
The Laplacian matrix $\mathbf{L}$ can be interpreted as a discrete analog version of the Laplacian operator $\nabla^{2}$. It serves a similar purpose by measuring the extent a graph differs at one vertex from its values at nearby vertices.

Specifically:
\begin{equation}
\begin{aligned}
\frac{d \phi_{i}}{d t} &=-\alpha \sum_{j} A_{i j}\left(\phi_{i}-\phi_{j}\right) \\
&=-\alpha\left(\phi_{i} \sum_{j} A_{i j}-\sum_{j} A_{i j} \phi_{j}\right) \\
&=-\alpha\left(\phi_{i} D_{i j}-\sum_{j} A_{i j} \phi_{j}\right) \\
&=-\alpha \sum_{j}\left(D_{i j}-A_{i j}\right) \phi_{j} \\
&=-\alpha \sum_{j}\left(L_{i j}\right) \phi_{j}
\end{aligned}
\end{equation}

In matrix-vector notation, this is:

\begin{equation}
\begin{aligned}
\frac{d \mathbf{\phi}}{d t} &=-\alpha(\mathbf{D}-\mathbf{A}) \mathbf{\phi} \\
&=-\alpha \mathbf{L} \mathbf{\phi}
\end{aligned}
\end{equation}

where the matrix $-\mathbf{L}$ replaces the Laplacian operator $\nabla^{2}$.

# Finite Difference Approximation of Derivatives
Via the approximation we arrive at:

\begin{equation}
    \left.\frac{\partial \phi(x,t)}{\partial t}\right|_{t_{i}}\approx  {\phi^{i+1}_{j} - \phi^{i}_{j}}
\end{equation}

\begin{equation}
    \left.\frac{\partial^{2} \phi(x,t)}{\partial x^{2}}\right|_{t_{i}}\approx  {\phi^{i}_{j+1}-2 \phi^{i}_{j}+\phi^{i}_{j-1}}
\end{equation}

Which gives us the Forward-Time Central-Space (FTCS) Approximation for the diffusion equation:

\begin{equation}
    \phi^{i+1}_{j} - \phi^{i}_{j} = -\alpha (\phi^{i}_{j+1}-2 \phi^{i}_{j}+\phi^{i}_{j-1}) 
\end{equation}

# Laplacian relation

By considering a simple path graph, the relation between the FTCS equation and the Laplacian matrix can be observed:

<img src="imgs-for-theory/pathgraph.png" style="width:377.777px;height:50px" align="center"/>

The adjacency and degree matrices are given by:

\begin{equation}
\mathbf{A}=\left[\begin{array}{ccccc}
{0} & {1} & {0} & {0} & {0} \\
{1} & {0} & {1} & {0} & {0} \\
{0} & {1} & {0} & {1} & {0} \\
{0} & {0} & {1} & {0} & {1} \\
{0} & {0} & {0} & {1} & {0}
\end{array}\right] \quad \quad \quad \mathbf{D}=\left[\begin{array}{ccccc}
{1} & {0} & {0} & {0} & {0} \\
{0} & {2} & {0} & {0} & {0} \\
{0} & {0} & {2} & {0} & {0} \\
{0} & {0} & {0} & {2} & {0} \\
{0} & {0} & {0} & {0} & {1}
\end{array}\right]
\end{equation}

The Laplacian matrix is therefore:

\begin{equation}
\mathbf{L}=\left[\begin{array}{ccccc}
{1} & {-1} & {0} & {0} & {0} \\
{-1} & {2} & {-1} & {0} & {0} \\
{0} & {-1} & {2} & {-1} & {0} \\
{0} & {0} & {-1} & {2} & {-1} \\
{0} & {0} & {0} & {-1} & {1}
\end{array}\right]
\label{stencil}
\end{equation}

It can be seen that the Laplacian matrix corresponds to the finite difference approximation of the second derivative, but applied to each individual node of the path graph. Therefore, this is equivalent to applying \mathbf{L} to a particular time instant of the path graph.

Thus, combining equations 2.11 and 2.18 together yields:

$\textbf{Note: $\phi$ should be in bold. Jupyter notebook does not have the \bm package.}$

\begin{equation}
    \mathbf{\phi^{i+1}_{j}} - \mathbf{\phi^{i}_{j}} = -\alpha \mathbf{L}\mathbf{\phi^{i}_{j}}
\end{equation}

where $\mathbf{\phi^{i}_{j}}$ is a $(n_{x} \times 1)$ vector, with $n_{x}$ representing the number of data points along the $x$ dimension.

In essence, what this equation is telling us is that the value of $\phi$ at the next time step depends on, in the discrete case of graphs, the graph Laplacian $\mathbf{L}$, or in other words the curvature of the graph at a particular time instant $t_{i}$. 

# A yield prediction methodology
Denote a swap curve at time t as ${\lambda}(\mathbf{t})$. Yield curve and swap curve will be used interchangeably. 

$\textbf{Note: All $\lambda$ should be in bold. Jupyter notebook does not have the \bm package.}$

By taking equation above, we can apply $\lambda$ in place of $\phi$:
\begin{equation}
    \mathbf{\lambda^{i+1}_{j}} - \mathbf{\lambda^{i}_{j}} = -\alpha\mathbf{L}\mathbf{\lambda^{i}_{j}}
\end{equation}

where $\mathbf{\lambda}^{i}$ represents a yield curve made up of 15 maturities (denoted by the underscored $j$) at a particular time period $t$. These 15 maturities will be stated in section 3.3. All yield curves have the same term structure, and therefore $j$ is absorbed into $\lambda$ vector itself and can therefore be rewritten as:

\begin{equation}
    \mathbf{\lambda}(t+1)-\mathbf{\lambda}(t) = -\alpha\mathbf{L}\mathbf{\lambda}(t)
\end{equation}

However, treating the entire yield curve itself as a diffusive process (i.e. equation 3.2) is not ideal as a strategy to trade on. This is because iteratively applying the Laplacian matrix over and over again to generate the yield curve for future time steps will cause it to flatten out eventually, and at a certain point the yield curve will lack sufficient curvature for the Laplacian to act upon. In general, yield curves do not converge to 'flatness', and trading on a strategy that implies so would not be profitable.

Therefore, the notion of an estimate of the 'average yield curve' is introduced, denoted by $\mathbf{m}$. Hence, the yield curve $\lambda(m,t)$ is made up of two components, $\mathbf{m}$, and a diffusive process $\mathbf{S}(t)$, such that $\mathbf{\lambda}(t)$ itself is not governed by diffusion. Note that the maturity $m$, and the average yield curve $\mathbf{m}$ are not to be confused with.

The yield curve at any time instant can then be rewritten as, 

\begin{equation}
    \mathbf{\lambda}(t) = \mathbf{m} + \mathbf{S}(t)
\end{equation}

and instead of $\mathbf{\lambda}(t)$ behaving in a diffusive manner, we have the diffusive component $\mathbf{S}(t)$ doing so: 

\begin{equation}
    \mathbf{S}(t+1) - \mathbf{S}(t) = -\alpha\mathbf{L}\mathbf{S}(t) 
\end{equation}

\begin{equation}
    \mathbf{S}(t+1) = (\mathbf{I} - \alpha \mathbf{L})\mathbf{S}(t)
\end{equation}

Therefore, the yield curve in the next time step is written as, 

\begin{equation}
\begin{aligned}
    \mathbf{\lambda}(t+1) &= \underbrace{(\mathbf{I}-\alpha \mathbf{L})\overbrace{(\mathbf{\lambda}(t) - \mathbf{m}}^{\mathbf{S}(t)}}_{\mathbf{S}(t+1)}) + \mathbf{m} 
    \\[10pt]
    &= \mathbf{\lambda}(t) - \alpha \mathbf{L} \mathbf{\lambda}(t) + \alpha \mathbf{L}\mathbf{m}
    \\[10pt]
    &= (\mathbf{I}-\alpha \mathbf{L}) \mathbf{\lambda}(t) + \alpha \mathbf{L}\mathbf{m}
\end{aligned}
\label{ypred}
\end{equation}

where $\mathbf{I}$ is the identity matrix.


# Proposed Optimization Problem
Taking the equation above, we can perform coordinate descent to minimize for $\mathbf{L}$ and $\mathbf{m}$. This is easily achieved with cvxpy. 

### Normal Coordinate Descent
\begin{equation}
\begin{aligned}
    \min_{\mathbf{L,m}} \quad & \sum_{t} \left\|\mathbf{\lambda_{j}^{i+1}} - (\mathbf{I} - \mathbf{L}) \mathbf{\lambda_{j}^{i}} -\mathbf{L} \mathbf{m}\right\|^{2}
    \\[10pt]
    \textrm{s.t.}  \quad & \mathbf{L}\mathbf{1} = \mathbf{0}
    \\
    \quad & \mathbf{L} = \mathbf{L}^{T}
\end{aligned}
\label{optproblem}
\end{equation}

### Exponentially-weighted variant
\begin{equation}
\begin{aligned}
    \min_{\mathbf{L,m}} \quad & \sum_{t} \gamma^{t}\left\|\mathbf{\lambda_{j}^{i+1}} - (\mathbf{I} - \mathbf{L}) \mathbf{\lambda_{j}^{i}} -\mathbf{L} \mathbf{m}\right\|^{2}
    \\[10pt]
    \textrm{s.t.}  \quad & \mathbf{L}\mathbf{1} = \mathbf{0}
    \\
    \quad & \mathbf{L} = \mathbf{L}^{T}
\end{aligned}
\label{optproblem2}
\end{equation}

where $\gamma$ is the exponential decay variable and $0<\gamma<1$. $\gamma$ controls the strength of decay, the smaller $\gamma$ is, the quicker the decay.

# Coordinate Descent Basics
Co-ordinate descent is an iterative method in which each iterate is obtained by fixing most components of the variable vector x at their values from the current iteration, and approximately minimizing the objective with respect to the remaining components. Each such sub-problem is a lower dimensional minimization problem, and thus can typically be solved more easily than the full problem. 

We use a simple cyclic co-ordinate descent algorithm where we cyclically iterate through each direction one at a time and minimize the objective function with respect to each direction at a time. For a general case with $n$ variables $(x_{1},x_{2},\ldots,x_{n})$, and $k$ iterations, starting with some initial conditions $\mathbf{x}^{0} = (x_{1}^{(0)}, x_{2}^{(0)},\ldots,x_{n}^{(0)})$, the sequence of single variable minimzations per iteration is as follows:

\begin{equation}
\begin{array}{l}
x_{1}^{(k)} \in \underset{x_{1}}{\operatorname{argmin}} f\left(x_{1}, x_{2}^{(k-1)}, x_{3}^{(k-1)}, \ldots x_{n}^{(k-1)}\right) \\
x_{2}^{(k)} \in \underset{x_{2}}{\operatorname{argmin}} f\left(x_{1}^{(k)}, x_{2}, x_{3}^{(k-1)}, \ldots x_{n}^{(k-1)}\right) \\
x_{3}^{(k)} \in \underset{x_{3}}{\operatorname{argmin}} f\left(x_{1}^{(k)}, x_{2}^{(k)}, x_{3}, \ldots x_{n}^{(k-1)}\right) \\
\quad \ldots \\
\quad x_{n}^{(k)} \in \underset{x_{n}}{\operatorname{argmin}} f\left(x_{1}^{(k)}, x_{2}^{(k)}, x_{3}^{(k)}, \ldots x_{n}\right)
\end{array}
\end{equation}

Note that after we solve for $x_{i}^{(k)}$, we use its new value from then on. Therefore, the iteration starts with an initial condition/estimate $\mathbf{x}^{0}$ as the local minimum of $f$, and iterating through generates a sequence of local minima $\mathbf{x}^{1},\mathbf{x}^{2},\mathbf{x}^{3},\ldots$, where each $\mathbf{x}^{k} = (x_{1}^{(k)},x_{2}^{(k)},\ldots,x_{n}^{(k)})$. 

In the case for both our optimization problems, the objective function $f$ is a function of $\mathbf{L}$ and $\mathbf{m}$. Again, $\mathbf{L}_{0}$ is initialized as a matrix of zeros, and $\mathbf{m}_{0} = \mathbf{m}$. Therefore, for each iteration $k$, 

\begin{equation}
\begin{array}{l}
\mathbf{L}^{(k)} \in \underset{\mathbf{L}}{\operatorname{argmin}} f\left(\mathbf{L},\mathbf{m}^{(k-1)}\right) \\
\mathbf{m}^{(k)} \in \underset{\mathbf{m}}{\operatorname{argmin}} f\left(\mathbf{L}^{(k)},\mathbf{m}\right) \\
\end{array}
\end{equation}

# Mean Adjustment
The vector $\mathbf{m}$ can be further decomposed as:

\begin{equation}
\mathbf{m}=\mu \mathbf{1}+\tilde{\mathbf{m}}
\end{equation}

where $\mu \in \mathbb{R}$ is a scalar representing the mean value of the vector $\mathbf{m}$:

\begin{equation}
\mu=\frac{1}{N} \mathbf{m}^{\top} \mathbf{1}
\end{equation}

while $\tilde{\mathbf{m}} \in \mathbb{R}^{N}$ is the 'residual' component, $\tilde{\mathbf{m}}=\mathbf{m}-\mu \mathbf{1}$, with the condition that the residual sums to zero:

\begin{equation}
\tilde{\mathbf{m}}^{\top} \mathbf{1}=0
\end{equation}

Next, applying the Laplacian $\mathbf{L} \in \mathbb{R}^{N \times N}$ (with the usual conditions that $\mathbf{L}=\mathbf{L}^{\top}$ and $\mathbf{L} \mathbf{1}=\mathbf{0}$) to the vector $\mathbf{m}$ yields:

\begin{equation}
\mathbf{L m}=\mathbf{L}(\mu \mathbf{1}+\tilde{\mathbf{m}})=\underbrace{\mu \mathbf{L} 1}_{0}+\mathbf{L} \tilde{\mathbf{m}}=\mathbf{L} \tilde{\mathbf{m}}
\end{equation}

However, our optimization problem does not involve $\mathbf{m}$ in isolation, but rather through $\mathbf{L}\mathbf{m}$, which via this decomposition is equal to $\mathbf{L}\mathbf{\tilde{m}}$. This creates an ambiguity with the problem as we were previously solving for $\mathbf{m}$ without the decomposition. To resolve for this ambiguity, we can solve for $\tilde{\mathbf{m}}$ instead in the co-ordinate descent scheme, by inserting the additional constraint $\tilde{\mathbf{m}}^{T}\mathbf{1} = 0$, which enforces stationarity.

In this configuration, the diffusive component is now $\tilde{\mathbf{m}}$ and is all that matters, as this contains information pertaining to curvature, which ultimately is what generates the long/short trading signals for the strategy.

# Flowcharts

### Flowchart for the getLaplacian() and swapCurvePrediction() methods

Used to generate $\mathbf{L}$ and $\mathbf{\tilde{m}}$ per day (stride), which is subsequently used to generate the yield curve prediction for the next day, and repeat.

<img src="imgs-for-theory/backtester_flowchart.png" style="width:350px;height:466.66px" align="center"/>

### Flowchart for the backtest() method

See section below for the strategy rationale.

<img src="imgs-for-theory/profits_flowchart.png" style="width:500px;height:502px" align="center"/>

# Backtest method rationale

Recall that the Laplacian $\mathbf{L}$ is a $(15\times 15)$ square matrix, with each of the 15 rows representing the number of maturities ranging from 1 to 30, on any given yield curve. The $i^{th}$ row of the Laplacian is a measure of the curvature of node $i$, which is an individual maturity, relative to the other maturities. 

It was observed that each row of the Laplacian matrix had a spike pattern, a close replication of the stencil unique to that of the 1-D Laplacian operator. Furthermore, for each row the location of the spike is equal to the row number, and this is expected because as mentioned above each row is associated with the curvature of the corresponding maturity relative to the other maturities for a particular trading day (e.g. the values in row 14 correspond to the curvature of the 30Y swap rate relative to the other maturities).

Hence, the Laplacian matrix can be thought of a set of portfolio weights that sum to zero to trade the yield curves. Each row is therefore a portfolio representative of that row’s node maturity, and hence the dot product $\mathbf{L}(t)\boldsymbol{\cdot}\mathbf{\lambda}(t)$ generates a $15 \times 1$ column vector where the $i^{th}$ value of this column vector is the value of the particular portfolio corresponding to the index of the $i^{th}$ maturity for a given unit of time $t$. As a result, in this context 'trading the $i^{th}$ maturity' refers to trading the particular portfolio (dictated by the $i^{th}$ row of $\mathbf{L}$) associated with that maturity. For instance, trading the 1Y maturity would mean trading the portfolio associated with row 0 of the Laplacian, as it is the row that corresponds to the 1Y node.

The trading signal comes from $\tilde{\mathbf{m}}$, the diffusive parameter of the yield curve, and therefore the curvature for a given unit of time $t$ is calculated by $\mathbf{L}(t)\boldsymbol{\cdot}\tilde{\mathbf{m}}(t)$. This dot product essentially tells us the curvature of the portfolio associated with the $i^{th}$ maturity, and the trading signal we have opted for is to short the portfolio (e.g. negative exposure) if the curvature is greater than a threshold $\tau$, and long the portfolio if the curvature is less than $-\tau$. If on any particular day a portfolio has a curvature between $-\tau$ and $\tau$, none of the assets in that portfolio are traded, resulting in zero profits for that day. The rationale behind this strategy is that if the curvature on any given day is deemed 'too high', we initiate a short position as we are betting that the yield curve will diffuse back towards the mean (i.e. that the yield curve will mean revert), and vice versa. 

Based on our swap rate profits assumption (see section below), a long portfolio trade will take profits as $\mathbf{L}(t)\boldsymbol{\cdot}[\mathbf{\lambda}(t+1) - \mathbf{\lambda}(t)]$, and a short portfolio trade will take profits as $\mathbf{L}(t)\boldsymbol{\cdot}[\mathbf{\lambda}(t) - \mathbf{\lambda}(t+1)]$, both of which are controlled by the curvature condition from the above paragraph. The constant of proportionality between the swap rate change and its profit unknown and for the purposes of this strategy it is assumed to be 1 for simplicity.

The profits function therefore takes in the Laplacian matrices generated for each period of time $t$, and uses it to calculate the value of the aforementioned maturity portfolios. 

As the weights of each portfolio sum to zero, we have created what is known as a 'Zero-Investment Portfolio', which is a collection of investments that has a net value of zero when the portfolio is assembled, and effectively requires the investor to take no stake in the portfolio \cite{zeroinvportfolio}. Because there is a new $\mathbf{L}$ every day, we are essentially using the $\mathbf{L}$ for that day to enter a position on that day and then exiting the day after to capture a potential profit. Note that the daily investment amount for each of these portfolios does not matter as it effectively costs nothing to trade the portfolio; the short positions finance the long positions. The profit is therefore proportional to the daily investment amount for each portfolio. 

In the domain of yield curve trading, a cash-neutral butterfly trade is a portfolio where the weights of the maturity components are adjusted so that the combination presents a null dollar duration and is a self-financed transaction (zero-investment portfolio) \cite{butterflystrategy}, a common yield curve arbitrage strategy used amongst practitioners. Usually the butterfly strategy is structured to present a positive gain for parallel yield curve shifts, but there are different types of butterflies that are structured to generate a positive pay-off in the event a particular yield curve movement. In our case with the Laplacian portfolios, it holds real life relevance in the sense that it is somewhat similar to the butterfly strategy, but however each portfolio is unlikely to have a net zero duration. In theory this makes the portfolios less immune to interest rate risk, but we shall see in the next section how profitable our 'butterfly' strategy is.

# Why use Swaps?

It was originally proposed that US Treasury yield data was to be used as the data source, but however swap data is more suitable because, for instance, the maturity of a 5Y bond today will not be 5Y one day later, and therefore another variable will have to be introduced to model this shift in maturity. In contrast, the reason for using swap rates is because for any particular swap rate maturity (e.g. 10Y), the corresponding 10Y swap contract is priced from scratch daily. Therefore, a time-series of a particular 10Y swap rate will always refer to the stated maturity. The treasury yield curve consists of bonds, which are not issued daily and hence as each day goes by, the treasuries roll down the curve and their maturity decreases.

Furthermore, a loose valuation of a swap is calculated as the difference between the present value of the cash flows associated with the fixed leg minus the present value of the cash flows associated with the floating leg \cite{whartonswaps}:

\begin{equation}
    P_{IRS} = P_{fixed} - P_{float}
\end{equation}

where:  
\begin{equation}
P_{\text {fixed }}=N R \sum_{i=1}^{n_{1}} d_{i}
\end{equation}
\begin{equation}
P_{\text {float }}=N \sum_{j=1}^{n_{2}} r_{j} d_{j}
\end{equation}

Here, $N$ represents the notional amount, $R$ is the fixed rate (swap rate), $r_{j}$ are the forecast LIBOR rates, $d_{i}$ is a discount factor associated with the payment date of the $i^{th}$ period and $n_{1}$ and $n_{2}$ are the number of fixed and floating payments respectively. Although this is a very rough valuation of an interest rate swap, we can see that there is a degree of proportionality between the swap price $P_{IRS}$ and swap rate $R$, i.e. when the swap rate goes up so will the swap value. Hence, a swaps profitability is proportional to the change in swap rates.

# Note
The strategy performed best across different swap curves with a larger backtesting rolling window size as well as a curvature threshold of 0, allowing the model to place more total trades, hence increasing its potential to generate more winning trades. However, this style of trading is purely theoretical and not possible in the real world because zero-investment portfolios do not exist!! 

Firstly, when an investor borrows an asset from a broker to short it, there is an interest rate associated with this loan which varies depending on the assets availability, and secondly, trading commissions will be extremely costly for this strategy because we are essentially trading 15 portfolios per day, with 15 rates in each portfolio. This means that there is an upper bound of 225 sets of commissions to be paid per day if all portfolios are traded on that day. It is also extremely impractical to trade so many assets per day.