# Factor Models

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import statsmodels.api as sm
import yfinance as yf 

%matplotlib inline

## Financial data

We are going to consider three "classes" of companies:

* Clothing companies
* Finance companies
* Tech companies

In [None]:
# Clothing
clothing_tickers = ["NKE", "RL"]

# Finance
finance_tickers = ["CME", "JPM"]

# Tech
tech_tickers = ["AAPL", "GOOGL"]

all_tickers = (
    clothing_tickers + finance_tickers + tech_tickers
)
data = yf.download(
    tickers=all_tickers,
    start="2005-01-01",
    end="2021-05-01",
    interval="1wk"
)
data_ac = data["Adj Close"].resample("1W").first()

### Plot prices

In [None]:
# Normalize everyone to the same scale by dividing
# by their initial price
data_ac_norm = data_ac.divide(data_ac.iloc[0, :], axis=1)

# Figure
fig, ax = plt.subplots(3, figsize=(12, 8))

sectors = [clothing_tickers, finance_tickers, tech_tickers]
titles = ["Clothing", "Finance", "Technology"]
for (i, sector) in enumerate(sectors):
    ax[i].set_title(titles[i])
    for company in sector:
        ax[i].plot(data_ac_norm.index, data_ac_norm.loc[:, company])
        ax[i].spines["right"].set_visible(False)
        ax[i].spines["top"].set_visible(False)

fig.tight_layout()

### Correlations

In [None]:
np.log(data_ac).diff().loc[:, all_tickers].corr()

## Factor model

We are going to propose a particular state space model to try and explain the financial data we observe by using a small number of unobserved factors.

Define our state space model as:

\begin{align*}
  \lambda_{t} &= A \lambda_{t-1} + \eta_t \\
  p_{i, t} &= G_i \lambda_t + \varepsilon_{i, t} \\
  \eta_t &\sim N(0, I) \\
  \varepsilon_t &\sim N(0, \Sigma)
\end{align*}


Note: If we stack our $p_{i, t}$ values, we wind up with

$$P_t = G \lambda_t + \varepsilon_t$$

where $G \equiv \begin{bmatrix} G_1 \\ G_2 \\ \vdots \\ G_I \end{bmatrix}$, $\varepsilon_t \equiv \begin{bmatrix} \varepsilon_1 \\ \varepsilon_2 \\ \vdots \\ \varepsilon_I \end{bmatrix}$

### Estimating with `statsmodels`

The `statsmodels` package has a `DynamicFactorMQ` class that allows us to estimate such a model using a version of the EM algorithm.

Additionally, another cool feature of the `DynamicFactorMQ` class is that it supports data at varying frequencies. For example, the Federal Reserve obtains labor data at a monthly frequency but national account data at a quarterly frequency. The ability to combine frequencies allows us to update our beliefs more frequently than quarterly.

In [None]:
# Often recommend to normalize your data to mean 0 standard deviation 1
df = 100*np.log(data_ac).diff().dropna()
df_std = df.apply(
    lambda x: (x - x.mean()) / x.std()
)

In [None]:
# Build and fit dynamic factor model using
# an EM algorithm
mod = sm.tsa.DynamicFactorMQ(
    df, factors=3, factor_orders=1
)

res = mod.fit(disp=10)

In [None]:
res.summary()

In [None]:
fig, ax = plt.subplots(4, figsize=(12, 8))

company = all_tickers[4]
ax[0].set_title(company)
ax[0].plot(df.index, df.loc[:, company])
ax[0].spines["right"].set_visible(False)
ax[0].spines["top"].set_visible(False)
a
for factor_i in range(1, 4):
    ax[factor_i].set_title(f"Factor {factor_i}")
    ax[factor_i].plot(
        df.index,
        res.factors["smoothed"].loc[:, str(factor_i-1)]
    )
    ax[factor_i].spines["right"].set_visible(False)
    ax[factor_i].spines["top"].set_visible(False)

fig.tight_layout()

### Appendix: NY Fed Nowcasting

* [Nowcasting report](https://www.newyorkfed.org/research/policy/nowcast)
* [Nowcasting methodology](https://www.newyorkfed.org/research/policy/nowcast/methodology.html)
* [Nowcasting paper](https://www.newyorkfed.org/research/staff_reports/sr830)