

### Problem specification



Let $y_{i,t} \sim D(\mu_{i,t}, \sigma_{i,t})$ be the return of an asset $i$ at time $t$ and let $w_{i,t}$  be the weight of an asset in a portfolio.  We could describe our portfolio by its MLE estimates of mean and variances, i.e.,

$$
\mu_{p,t} = \sum_{i}^{N}w_{i,t}\mu_{i,t} = \mathbf{w}_{t}^{T}\mathbf{\mu}_{t} \\
\sigma_{p,t} = \sum_{i}^{N}\sum_{j}^{N}w_{i,t}w_{j,t}\sigma_{i,j,t} = \mathbf{w}_{t}^{T}\mathbf{\Sigma}_{t}\mathbf{w}_{t}
$$

Assume also that the asset returns are governed by some linear factors $x_{t,j}$ and a random noise term $\epsilon_{i,t+1}$:

$$
y_{i,t+1} = \sum_{j}^{M+1}\beta_{i,j}y_{t,j} + \epsilon_{i,t+1}
$$

Written more compactly:

$$
\mathbf{Y} = \mathbf{X}\mathbf{\beta} + \mathbf{\Epsilon}
$$

where,

- $\mathbf{Y}$ is a $(T,N)$ matrix of $T$ monthly returns of $N$ assets
- $\mathbf{X}$ is a $(T,M)$ matrix of $T$ monthly factors of $M$ number of factors
- $\beta$ is a $(M,N)$ matrix of time-invariant parameters explaining the effect of $M$ factors on $N$ assets. We will use interchangebly the terms 'factor loadings' and 'betas'.
- $\mathbf{\Epsilon}$ is a $(T,N)$ matrix of $T$ monthly error terms of $N$ assets, i.e., $E \sim N(0,\Sigma_{\epsilon})$

Using such factor model, we could estimate the mean and the variance as:

$$
\mathbb{E}_{t}[\mathbf{Y}] = \mathbf{\mu}_{t} = \mathbf{X}\mathbf{\beta} \\
\mathbb{V}_{t}[\mathbf{Y}] = \mathbf{\Sigma}_{t} = \mathbf{X}\Sigma_{\beta}\mathbf{X}^{T} + \Sigma_{\epsilon}
$$

This is a time-series factor model, i.e., the factor loading are akin to estimating correlations through time (in a univarite setting, $\hat{\beta} = \frac{\text{Cov}(y_{i,t}, x_{t})}{\text{Var}(x_{t})}$, while in multivariate the partial effect is adjusted by the interdependency of the predictors).

Our goal is to optimize an arbitrary metric that depends on the distributional moment of $\mathbf{Y}$. This could be anything, but for simplicity, let's start with the Sharpe ratio.

$$
SR_{t} = \frac{\mathbb{E}[\mathbf{y}_{t+1}|\mathbf{x}_{t}] - rf_{t}}{\mathbb{V}[\mathbf{y_{t+1}|\mathbf{x_{t}}]}
$$


### Factors

##### Principal Components

Firstly, we assume that the common factors are principal components derived from the $\mathbf{Y}$ matrix. There's a few ways of doing this.

- Perform a SVD on the $\mathbf{Y}$ matrix
- Perform a PCA on the empirical covariance matrix, i.e., $\text{cov}(\mathbf{Y})$, i.e., $(\mathbf{Y}-\bar{\mathbf{Y}})^{T}(\mathbf{Y}-\bar{\mathbf{Y}})/(T-1)$
- Perform a SVD on the $\mathbf{Y}-\mathbf{\bar{Y}}$, which is the equivalent as the one above

All three are supported by the library. Note that in the last case, the empirical covariance can be retrieved:

$$
\mathbf{\tilde{Y}} = \mathbf{Y}-\mathbf{\bar{Y}} \\
\mathbf{\tilde{Y}} = \mathbf{U}\mathbf{S}\mathbf{V^{T}} \\
\mathbf{L} = \text{diag}(\mathbf{S}) | \mathbf{0} \\
\text{cov}(\mathbf{Y}) = \mathbf{V^{T}}\mathbf{L}\mathbf{V} / (T-1)
$$

where, $\mathbf{L}$ is constructed by concatenating the singular values with a zero matrix such that the dimesions match (recall that otherwise $\mathbf{S}$ is rectangular)

#### Stability of factors

One of the assumptions

#### Testing choices:

1. stabilit of factors
2. stability of factor loadings
3. choice of time-period for PCA (based on what? error term?)


In [5]:
assets = ['CHRW.OQ', 'MCD.N', 'BAX.N', 'AMAT.OQ', 'UPS.N']

In [14]:
import os
import numpy as np
import datetime as dt
from examples.example_executor_mvp_v3 import Executor
import matplotlib.pyplot as plt
import pandas as pd
from models.unsupervised.pca import PcaHandler

In [7]:
START_DATE = '2005-12-31'

PATH_DATA = r'C:\Users\serge\OneDrive\portfolio_management\data\csv'
PATH_API_KEYS = r'C:\Users\serge\OneDrive\reuters\apikeys.csv'
PATH_SAVE_PLOTS = os.path.join(r'results\figures')
PATH_CONFIG = r'C:\Users\serge\IdeaProjects\portfolio_management\notebooks\examples\config_example.yaml'

In [24]:
END_DATE = '2024-04-21'
executor = Executor(START_DATE, END_DATE, PATH_DATA, PATH_API_KEYS, PATH_CONFIG)

executor._load_data()
executor._preprocess()

[Errno 2] No such file or directory: 'C:\\Users\\serge\\OneDrive\\portfolio_management\\data\\csv\\prices\\WM.N.csv'
[Errno 2] No such file or directory: 'C:\\Users\\serge\\OneDrive\\portfolio_management\\data\\csv\\prices\\REG.OQ.csv'
[Errno 2] No such file or directory: 'C:\\Users\\serge\\OneDrive\\portfolio_management\\data\\csv\\prices\\BIO.N.csv'
[Errno 2] No such file or directory: 'C:\\Users\\serge\\OneDrive\\portfolio_management\\data\\csv\\prices\\IVZ.N.csv'
[Errno 2] No such file or directory: 'C:\\Users\\serge\\OneDrive\\portfolio_management\\data\\csv\\prices\\EXPE.OQ.csv'
[Errno 2] No such file or directory: 'C:\\Users\\serge\\OneDrive\\portfolio_management\\data\\csv\\prices\\GEN.OQ.csv'
[Errno 2] No such file or directory: 'C:\\Users\\serge\\OneDrive\\portfolio_management\\data\\csv\\prices\\DAY.N.csv'
[Errno 2] No such file or directory: 'C:\\Users\\serge\\OneDrive\\portfolio_management\\data\\csv\\prices\\GRMN.N.csv'
[Errno 2] No such file or directory: 'C:\\Users\\ser

In [21]:
Y = executor.returns
Y.shape

(324, 276)

In [22]:
pca = PcaHandler(Y, demean=True, method='svd')
X = pca.components(n=10)

In [23]:
pc.shape

(324, 10)

We are perWe will experiment with a two choices:

1. Perform PCA on the matrix $\mathbf{Y} = [\mathbf{y}_{t}, \mathbf{y}_{t+1}, ... \mathbf{y}_{t+n}]$ using a moving window which shift $t$ by 1 at each iteration
2. Perform PCA on the matrix

In [25]:
betas = {}

while END_DATE < '2024-04-21':

    print(END_DATE)

    executor = Executor(START_DATE, END_DATE, PATH_DATA, PATH_API_KEYS, PATH_CONFIG)

    executor._load_data()
    executor._preprocess()
    executor._fit_factor_model()

    idx = np.isin(executor.linear_model.assets_estimated, assets)
    idx = np.where(idx == True)[0]

    _betas = executor.linear_model.factor_loadings[:,idx]

    betas[END_DATE] = _betas

    h = executor.portfolio.horizon
    END_DATE = str((dt.datetime.strptime(END_DATE, '%Y-%m-%d') + dt.timedelta(days=h)).date())


In [None]:
res = {}

for a in range(len(assets)):
    times = list(betas.keys())
    _table = pd.DataFrame(\
        data=np.array([betas[i][:,a] for i in times]),\
        columns=['intercept', 'pc1', 'pc2', 'pc3']\
    )
    _table['date'] = times
    res[assets[a]] = _table

In [None]:
fig, axs = plt.subplots(2, 2, figsize=(11,8))
plt.suptitle(assets[0])
axs[0,0].plot(res[assets[0]].iloc[:,-1],res[assets[0]].iloc[:,0],
              marker='o', color='red')
axs[0,1].plot(res[assets[0]].iloc[:,-1],res[assets[0]].iloc[:,1],
              marker='o', color='red')
axs[1,0].plot(res[assets[0]].iloc[:,-1],res[assets[0]].iloc[:,2],
              marker='o', color='red')
axs[1,1].plot(res[assets[0]].iloc[:,-1],res[assets[0]].iloc[:,3],
              marker='o', color='red')


In [None]:
fig, axs = plt.subplots(2, 2, figsize=(11,8))
plt.suptitle(assets[0])
axs[0,0].plot(res[assets[0]].iloc[:,-1],np.abs(res[assets[0]].iloc[:,0]),
              marker='o', color='red')
axs[0,1].plot(res[assets[0]].iloc[:,-1],np.abs(res[assets[0]].iloc[:,1]),
              marker='o', color='red')
axs[1,0].plot(res[assets[0]].iloc[:,-1],np.abs(res[assets[0]].iloc[:,2]),
              marker='o', color='red')
axs[1,1].plot(res[assets[0]].iloc[:,-1],np.abs(res[assets[0]].iloc[:,3]),
              marker='o', color='red')

res[assets[0]]