# Index replication


## Overview 

In finance and invesment, index replication is a popular strategy to construct a portfolio of financial assets that replicates the performance of a specific market index such as S&P 500 or CAC40. The goal is to provide investors with returns that closely match those of the target index. There are two main methods of index replication: full replication and sampling.

* __Full replication__ involves buying all the stocks in the same proportion as they are found in the target index. This method ensures that the portfolio closely tracks the index, minimizing tracking errors, which are discrepancies between the fund's performance and the index's performance. Full replication is straightforward and transparent, but it can be costly and complex due to the need to frequently adjust the portfolio to match changes in the index's composition. 

* __Sampling__, or partial replication, involves selecting a representative sample of securities from the index rather than holding all of them. This method aims to mimic the index's performance while reducing trading costs and complexity. Sampling can lead to higher tracking errors compared to full replication, but it is often more cost-effective and practical.

## Optimization problem 

Formally speaking, portfolio construction is a process of of allocating a given capital among a set of assets. The expected return of a portfolio is calculated as the weighted sum of individual asset returns:
$$E(X) = \sum_{i=1}^{n} w_i E(X_i) $$

where 

* $E(X)$ is the expected return of the portfolio
* $w_i$ is the weight of asset i in the portfolio
* $E(X_i)$ is the expected return of asset i
* $n$ is the number of assets in the portfolio

One major challenge of porfolio managing is to capture and mitigate the flutuation of market. The variation of asset returns can be formulated by the variance of the corresponding return $X_i$. As a consequent, the variation of portfolio return, also called risk, is taken as 
$$ Var(X) =  Var (\sum_{i=1}^{n} w_i E(X_i)) = w \cdot \Sigma \cdot w^T$$

The minimization of tracking error may be expressed as a least-squares problem:

$$w^* = \underset{w \in P}{\mathrm{argmin}} \quad ||Xw - y||_2^2$$

where 

* $X \in \mathrm{R}^{T \times n} $ denotes the multivariate time series of asset returns 
* $y \in \mathrm{R}^T$ is the univariate index return series
* $||\cdot||_2^2$ the 2-norm. 

We tackle this minimization problem using quadratic programming 

$$ \min { \frac{1}{2}} x^T P x + q^T x$$ 

subject to constraints 

* $P = 2  ( ({\log({1 + X}})^T \cdot \log{(1 + X)})$
* $q = -2 ({\log{(1 + X)}})^T \cdot \log{(1 + y)}$

In [None]:
import sys
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
import gurobipy as gp

sys.path.insert(1, '../src')
from backtest import Backtest
from data_loader import *
from optimization import *
from constraints import Constraints
from backtest_mutator import BacktestConfig, BacktestMutator

import quantstats as qs
qs.extend_pandas()

In [None]:
path = '../data/'  # Change this to your path

# Prepare data
raw_data = load_data_msci(path)
data = {'return_series': raw_data['X'],
        'return_series_index': raw_data['y']}

# Setup a mutator
start_date = '2023-02-01'
mutator = BacktestMutator(data, start_date = '2023-01-01', quiet = True)

In [None]:
universe = data['return_series'].columns
constraints = Constraints(selection = universe)
constraints.add_budget()
constraints.add_box(box_type = 'LongOnly')
# Add an unfeasible constraint: Sum of all weights <= -1
constraints.add_linear(None, pd.Series(np.ones(universe.size, dtype=float), index=universe), '<=', -1)

optimization = LeastSquares(solver_name = 'highs', sparse = True)
optimization.constraints = constraints
optimization.set_objective(optimization_data = raw_data)

optimization.model_qpsolvers()
model = optimization.model
model.is_feasible()