# 4-Factor Arbitrage

Aung Si<br>
November 20<sup>th</sup>, 2023

---

## Contents
...
## 1. Overview
## 2. Statistical Arbitrage

Technologies like [deep neural networks (DNNs)](https://arxiv.org/pdf/2005.13665.pdf) have ushered in a tilt towards advanced methodologies in the realm of algorithmic trading and portfolio optimization. Amidst them, statistical arbitrage stands out in stark contrast as a foundational algorithm, one worth revisiting. Unlike the nebulous underpinnings—the [black box](https://umdearborn.edu/news/ais-mysterious-black-box-problem-explained#:~:text=This%20inability%20for%20us%20to,when%20they%20produce%20unwanted%20outcomes.) problem—of DNNs, statistical arbitrage relies on straightforward but elegant math. Neural networks, while effective in approximating complex functions, obscure the explicit mathematical relationships between inputs and outputs due to their multi-layered structure and depth of processing. In a field like finance, this lack of transparency paves way for ethical problems with respect to both client welfare and the health of the market overall. Statistical arbitrage, though mathematically rigorous, is much less arcane and thus more manageable.

Statistical arbitrage rests on the concept of *relative* value. It involves modeling the spread between pairs or groups of securities, which historically have exhibited similar price trajectories. The stability of this spread is a key assumption; it's believed to remain consistent over time. When the spread deviates from its expected range, it signals to investors which securities might be overvalued or undervalued. Long positions are entered for the securities that are underpriced, and short positions for those that are overpriced. The spread eventually reverts to its historical norm, referred to as the process of mean reversion, and generates profit: the investor repurchases expensive stocks at lower prices and sells the cheap stocks at higher prices.

The efficacy of statistical arbitrage strategies also tends to ebb and flow. As more traders flock to arbitrage strategies, the opportunities they exploit may diminish, or *arbitraged away*, since pricing inefficiencies get corrected more frequently. The crowd then shift towards more intricate and opaque strategies in hopes of finding opportunities elsewhere. This exodus often creates a fertile ground for statistical arbitrage to regain its potency—the abandonment of statistical arbitrage by some can paradoxically rejuvenate its potential for others. Statistical arbitrage is therefore cyclical in its efficacy yet evergreen in its relevance, making it a powerful methodology worthy of periodic rescrutiny.

### Factor Models

Factor models disaggregate the returns of securities into components attributable to various market-wide or macroeconomic factors. This decomposition is usually done via regression methodologies. Factor models aid investors in understanding how different factors impact asset prices. Common factors include market risk, size, value, and momentum among others (see the Fama French 3-Factor Model or the Capital Asset Pricing Model (CAPM)).

The simplest form of a factor model includes a single factor $F$:

$$R_i = \beta_{iF} \cdot R_F + \alpha + \epsilon,$$

where $\beta_{iF}$ represents how much the returns of security $i$ covaries with those of factor $F$. $\alpha$ represents the excess return of the stock (in excess of what is predictable by the factor), and $\epsilon$, the residuals of the model, represents the portion of the stock returns that are not explainable by the model. In essence, $\epsilon$ is the idiosyncratic portion of the stock's return, specific to it irrespective of any market-wide factors.

It is possible to extend the single-factor model to $N$ factors:

$$R_i = \sum^N_{F=1}\beta_{iF}R_F + \alpha + \epsilon$$

Doing so allows the investor to more comprehensively isolate a security's idiosyncratic returns. An $N$ factor model seeks to answer the question: What is the <i>intrinsic value</i> of a security's return that is unaffected by any market or economic factor?

In practice, models do not include an infinite number of factors, as there are diminishing returns and overfitting issues that place limits on the meaningful number of factors. Adding too many factors risks creating an overly refined model based on random fluctuations rather than genuine trends from the data. This impairs forecast accuracy. Moreover, a model’s complexity grows with more factors, potentially obscuring arbitrage opportunities and complicating real-world application. Each additional factor must have empirical validation; lacking this, there is danger of including components that minimally improve, or even detract from, the analysis. A model must capture risk exposure effectively while avoiding overcomplexity, to precisely identify temporary price inefficiencies within securities unique to them and distinct from market-wide influences. Failing that, the arbitrageur risks sizeable losses via false signals and market noise.

### Beta Neutrality

For beta neutrality, we have to determine the specific amount of investment into factor $F$ such that the beta of the portfolio $P$ with respect $F$ must must be 0:

$$\beta_{PF} = 0\tag{1}$$

We know that the beta of factor $F$ with respect to itself is 1.

$$\beta_{FF} = 1\tag{2}$$

Our portfolio P is constructed such that we invest a certain amount in each asset *and* must also invest a certain amount in factor $F$ to offset its beta contribution. Thus it follows that the portfolio is presented via:

$$\beta_{PF} = \sum^N_{i=1}w_{i}\beta_{iF} + w_{F}\beta_{FF}\text{, }\tag{3}$$

where $\sum^N_{i=1}w_{i}\beta_{iF}$ is the portion of the portfolio's beta present in all the stocks, and $w_{F}\beta_{FF}$ is the portion of the portfolio's beta directly originating from the investment in $F$. Substituting $\left(1\right)$ and $\left(2\right)$ into $\left(3\right)$, we have:

$$
\begin{align*}
& \left[0\right] = \sum^N_{i=1}w_{i}\beta_{iF} + w_F\left[1\right] \\
& \rightarrow w_F = -\sum^N_{i=1}w_{i}\beta_{iF};\quad F = 1, 2,...,M
\end{align*}
$$

Thus, the investment allocation toward each factor F must be the negative of the total portfolio beta that stems from the individual stock investments.