# Introduction to Asset Pricing

#### Author: Gustavo Soares

We will try to follow [Chapter 20 of Cochrane's Asset Pricing book](https://press.princeton.edu/titles/7836.html) but his famous [AFA 2010 presidential address on Discount Rates](https://faculty.chicagobooth.edu/john.cochrane/research/papers/discount_rates_jf.pdf) is also a great reading for this lecture and future lectures to come. Cochrane's lecture notes on [predictability](https://faculty.chicagobooth.edu/john.cochrane/teaching/35150_advanced_investments/week_1_notes.pdf) may also be helpful in discussing the topic.

## Introduction

Back in the seventies, [Fama (1970)](https://www.jstor.org/stable/2325486) started out talking about [efficiency](#efficiency) and arguing that in **informationally efficient markets** prices should incorporate available information about future values. Back then, we thought informationally efficiency markets would be a consequence of competition, free entry, and low information costs. In that competitive context, if there was a way to predict an asset’s future positive/negative returns, then market participants would be willing to buy/sell the asset today and that would bid/pull prices up/down, until prices reflect all the new information. So, back then, we thought in a competitive asset market, with informational efficiency, price changes
should not be predictable. A result with which finance practitioners never agreed.

So, the profession went out there and tried to find the empirical evidence of the efficient markets hypothesis. For example, efficiency implies that trading rules such as “buy/sell when the price crosses some moving average threshold” should not work. The surprising result, particularly for finance practitioners, was that, when accounting for the selection bias and some other technical issues, most trading rules had no predictive power. So, lack of predictibility was no longer a theory. It was also backed by empirical facts that were true in the seventies and remain true today.

Back in the seventies, asset returns seem to display patterns surprisingly close to the Efficient Markets Hypothesis (EMH) predictions, and surprisingly far from the assertions of most practitioners that you could predict the market. However, even from the very begining there were some deviations from the norm, the so called "anomalies". The existence of these "anomalies" were often viewed as by some as unresolved deviations from the EMH paradigm but also laid the foundation for the rise of Behavioral Finance (BF) as an alternative paradigm.

Eventually, the [Stochastic Discount Function](https://faculty.chicagobooth.edu/john.cochrane/research/papers/discount_rates_jf.pdf) (SDF) framework were able to unify at least some of these opposing views and serve as the central organizing framework for current asset-pricing academic research. However, from the very begining of empirical finance, it was clear that at least some aspects of financial markets had to be predictable. In a competitive market, in equilibrium, investments that carry more risk must offer better returns on average. Otherwise, why would anyone hold those assets? Therefore, if one finds a high/low expected return (i.e., an under/overpriced security), that fact does not necessarily imply inefficiency. It could simply mean that the asset embeds more risk and therefore it needs to pay a [risk_premium](#risk_premium) for market participants to hold it in equilibrium.

Under the [SDF framework](https://faculty.chicagobooth.edu/john.cochrane/research/papers/discount_rates_jf.pdf) there is always some “model of market equilibrium” that will “explain” any price. So, in a way, it is a unifying framework. However, we need to be careful because the framework is so generic that one could state completly unreasonable models that can generate risk premia explanations for any proposed anomaly. This problem is called the **joint-hypothesis problem** in Finance. To test if a market is efficient, you often (though not always and event studies are a notable exception) need to define a model of market equilibrium under which the market may be efficient or not. Abnormal returns in excess of the prediction of the market equilibrium model may represent a way of adding return without taking on more risk, or they may represent a return for risk that we do not understand or cannot easily quantify. However, under an expanded, enhanced or enriched market equilibrium model that anomaly may not be an anomaly at all.

So, the history of finance is the more or less the history of finding anomalies relative to existing models and then enriching those models in order to transform the anomaly into a risk premium explanation. In fact, Behavioral Finance started out that way, producing facts that challenged the prevailing equilibrium market model. Between 1970 and 1990, evidence that returns are predictable mounted giving rise to many factor models that attempted to explain predictability of returns in terms of risk premium models. This literature gave rise to a revolution in finance which was the understanding that expected returns and risk premia varied over time but are predictable, correlated across markets, and strongly correlated with business conditions where one might expect risk premiums to affect prices for market participants to hold it in equilibrium. Quite often the facts would be compatible with a “risk premium” explanation and with a “behavioral” explanation but the end result was a vastly expanded set of emprical facts. In that way, the profession went through hundreds of anomalies generating a [factor zoo](https://docs.google.com/spreadsheets/d/1mws1bU56ZAc8aK7Dvz696LknM0Vp4Rojc3n61q2-keY/edit#gid=0).

Predictability became pervasive phenomenon across financial markets. For stocks, bonds, credit spreads, foreign exchange, sovereign debt, and houses. Very often a yield or valuation ratio, often called a **carry** or **value** metric, could be shown to translate one-for-one to expected excess returns. So, high carry and cheap securities were often associated with high expectd returns in the future. This phenomenon was also deeply linked to underlying economic factors as low prices and high expected returns hold more often in “bad times,” when consumption, output, and investment are low, unemployment is high, and businesses are failing, and vice versa. That is, prices are cheap and good investment opportunities arise at times when most people need their money to muddle through tough times and are in fact selling their assets and de-saving. Beyond the academic articles we will cover in this course, [Ilmanen's book](https://www.amazon.com.br/Expected-Returns-Investors-Harvesting-Rewards-ebook/dp/B004YK0JLW) is a great reference into predictability across many asset classes.


## Efficiency <a class="anchor" id="efficiency"></a>

According to [Fama (1970)](https://www.jstor.org/stable/2325486), a market in which prices always “fully reflect” available information is called efficient and there are slightly different definitions of efficiency. Up until about 1980s, there was a lot of work on efficiency and the proffesion used to think of efficient as "weak", "semi-strong", and "strong" froms of efficiency. These categories of efficiency refer to the information set used in the statement "prices reflect all available information." Weak-form tests study the information contained in historical prices. Semi-strong form tests study information (beyond historical prices) which is publicly available. Strong-form tests regard private information. However, today, these concepts have been abandoned. Focus on these categories of efficiency has been replaced by a focus on the risk model assumed, i.e., on the joint hypothesis problem mentioned above.

For our pourposes, the important part here is that efficiency is a statemente about the use of available information. Let's say we have the set of all information available up to time $t$, $\mathscr{I}_{t}$ - for the technically oriented reader $\mathscr{I}_{t}$ is a [$\sigma$-algebra](https://en.wikipedia.org/wiki/%CE%A3-algebra) contained possible past outcomes for prices and other historical data. Efficiency is really a statement of the type:

$$
P_{t}^{i} = E[V_{t+1}^{i} | \mathscr{I}_{t}] \equiv E_{t}[V_{t+1}^{i}]
$$

where $V_{t+1}^{i}$ is a measure of fundamental or intrisic value of the asset. This is different than a statement about predictability. Predictability means simply that we can predict prices, i.e., the market is neither a [random walk](https://en.wikipedia.org/wiki/Random_walk) nor a [martingale](https://en.wikipedia.org/wiki/Martingale_(probability_theory)). The two concepts do not seem to be directly related.

#### Efficiency $\neq$ lack of predictability

Why do people mix the two concepts and say that in efficient markets prices should follow a random walk and not be predictable? Because in some cases one implies the other. For example, assume that risk premium is constant over time, i.e., $E_{t}[r^{e}_{i,t+1}] = R$, then the efficient market hypothesis states that the current price would be the value of the portfolio tomorrow discounted by its expected rate of return, $R$:
$$
P_{t}^{i} = E_{t}[V_{t+1}^{i}] = E_{t}\Big[\frac{P_{t+1}^{i}+D_{t+1}^{i}}{1+R}\Big]
$$
where $D_{t+1}$ here is any benefit and/or cost involved in holding the asset from period $t$ to $t+1$. We will generically call the value $D_{t+1}$ **dividend** but keep in mind that $D_{t+1}$ in fact embeds all sorts of benefits and costs of bearing the property of an asset such as dividends, coupons, rents, storage, depreciation, etc.

Today's, $t=0$, value of the portfolilo at time $t$ is equal to:

$$
M_{t} = \frac{h_{t} \times P_{t}}{(1+R)^t}
$$

where $h_{t}$ is the number of units of the asset held at time $t$. Remember that to construct total returns we need to increase the number of holdings, $h_{t}$, by the amount of dividend paid by the asset. So, on an non-dividend paying date we have $h_{t} = h_{t-1}$ with  $h_{0} = 1$ and on a dividend paying date we use the amount paid in dividends $D_{t}$ to purchase more units of the asset at the prevailing price, i.e., we buy $D_{t} / P_{t}$ units of the asset and now we have $h_{t-1} + D_{t}/P_{t}$ units. So, generically, we have:

$$
h_{t+1} = \frac{P_{t+1}+D_{t+1}}{P_{t+1}} \times h_{t}
$$

units of the asset (when we are in a non-dividend payment date we have $D_{t}=0$. Hence,

$$
E_{t}[M_{t+1}] = E_{t}\Big[\frac{P_{t+1}+D_{t+1}}{P_{t+1}} \times h_{t} \times \frac{P_{t+1}}{(1+R)^{t+1}} \Big] = E_{t}\Big[\frac{P_{t+1}+D_{t+1}}{(1+R)} \times h_{t} \times \frac{1}{(1+R)^{t}} \Big] = P_{t} \times h_{t} \times \frac{1}{(1+R)^{t}} = M_{t}.
$$

Hence, **under the assumption of constant expected returns**, efficiency implies lack of predictability.

Now, let's break that assumption.

## Markets have to be predictable <a class="anchor" id="carry"></a>

#### Campbell and Shiller

[Campbell and Shiller (1998)](https://www.nber.org/papers/w2511) derived what is called **Campbell and Shiller present value identity**:


$$
p_{t} - d_{t} = E_{t}\Big[\sum_{j=1}^{\infty} \rho^{j-1} (\Delta d_{t+j} - r_{t+j})\Big]
$$

where $p_{t}$ and $d_{t}$ are the log of the asset price, $p_{t} = \log P_{t}$, and dividend, $d_{t} = \log D_{t}$, so $p_{t} - d_{t} = \log(P_{t}/D_{t})$ and when the asset has high/low *carry* or high/low *dividend yield* then $p_{t} - d_{t}$ is low/high. In addition, $r_{t} = \log(R_{t}) \equiv \log[(P_{t}+D_{t})/P_{t-1}]$ and $\Delta d_{t} \equiv d_{t}-d_{t-1}$. See a nice derivation of Campbell and Shiller present value identity [here](https://faculty.chicagobooth.edu/john.cochrane/teaching/Empirical_Asset_Pricing/lecture_notes.pdf).

Why is this present value identity so important for understanding predictability? There are a few lessons we can derive from it. You will see a little summary below but see [here](https://faculty.chicagobooth.edu/john.cochrane/research/papers/cochrane%20dog%20that%20did%20not%20bark.pdf) a nice discussion of the topic.

###### Lesson 1: dividends and/or returns need to be predictable

Campbell and Shiller present value identity shows that if future dividend growth, $\Delta d_{t}$ and returns $r_{t}$ are **both** unforecastable, then the asset *carry* or *dividend yield* then $d_{t} - p_{t}$ has to be constant. Since it is easy to see empirically that $d_{t} - p_{t}$ is not constant, therefore either dividend growth or returns (or both) need to be predictable. What's a bit puzzling and confusing in practice is that it is very hard to predict $\Delta d_{t}$ and returns $r_{t}$ while it is very easy to show that $d_{t} - p_{t}$ is not constant.

###### Lesson 2: dividend yield (or carry) predicts returns

As you can see [here](https://faculty.chicagobooth.edu/john.cochrane/teaching/Empirical_Asset_Pricing/lecture_notes.pdf), one can easily use Campbell and Shiller present value identity to show that

$$
r_{t} - E_{t-1}[r_{t}]= (E_{t}-E_{t-1})\Big[\Delta d_{t} + \sum_{j=1}^{\infty} \rho^{j-1} (\Delta d_{t+j} - r_{t+j})\Big] = (E_{t}-E_{t-1})\Big[\Delta d_{t} + (p_{t} - d_{t})\Big]
$$

It turns out, that in practice, $\Delta d_{t+j}$ is very hard to predict beyond its unconditional mean $E[\Delta d_{t}]$. However, we can often find times when $(p_{t} - d_{t})$ is too high or too low. Hence, if there is some variation in investors expectations of future returns, it’s likely to get revealed by the dividend yield or carry signal. That’s not a guarantee of course now you know why carry or dividend yields are so important in predicting returns.

* If dividend yield or carry is high, $(p_{t} - d_{t})$ is low, often in **bad times** because the market just collapsed $(p_{t}$ is low, we have $(E_{t+1}-E_{t})(p_{t+1} - d_{t+1}) > 0$ and we are likely to see positive return surprises in the future
* If dividend yield or carry is low, $(p_{t} - d_{t})$ is low, often in **good times** because the market has rallid and $(p_{t}$ is high, we have $(E_{t+1}-E_{t})(p_{t+1} - d_{t+1}) < 0$ and we are likely to see negative return surprises in the future

Empirically, high prices relative to dividends, have reliably preceded many years of poor returns while low prices have preceded high returns.

###### Lesson 3: returns are predictable if risk premium varies over time

Now, if we combine lesson 1 and 2, we have lesson 3. That is, if the risk premium $E_{t}[r_{t+1}]$ varies over time, then $E_{t}[r_{t+1}] \neq E_{t-1}[r_{t}]$ and $(E_{t}-E_{t-1})(p_{t+1} - d_{t+1}) \neq 0$ by lesson 2 and this is only possible if $d_{t} - p_{t}$ is not constant over time. By lesson 1, this implies that dividends and/or returns need to be predictable. In practice, we know that $\Delta d_{t+j}$ is very hard to predict, so returns $r_{t}$ are predictable.

#### Carry as a pervasive phenomenon

What we have discussed here and argued by [Cochrane in his presidential address](https://faculty.chicagobooth.edu/john.cochrane/research/papers/discount_rates_jf.pdf) is that dividend yields can be used as a forecating signal for future returns. This is actually a **pervasive phenomenon** found in multiple markets. For stocks, bonds, credit spreads, foreign exchange, sovereign debt, and houses, a yield or valuation ratio translates one-for-one to expected excess returns, and does not forecast the cashflow or price change we may have expected:

* **Stocks**: dividend yields forecast returns, not dividend growth;
* **Rates**: a rising yield curve or carry signals future returns not changes in yields or short-term rates;
* **Currencies**: international interest rate spreads signal returns, not exchange rate depreciation;
* **Real Estate**: high price/rent ratios signal low returns, not rising rents;
* **Credit**: variation in credit spreads over time and across firms or categories predict returns, not default probabilities
* **Sovereign debt**: high levels of sovereign or foreign debt signal low returns, not higher government or trade surpluses


## Risk Premium <a class="anchor" id="risk_premium"></a>

Using the modern Stochastic Discount Function (SDF) framework, absent arbitrage opportunities, we can write the price of asset $i$ at time $t$, $P_{t}^{i}$, as: 

$$
P_{t}^{i} = E_{t}[M_{t+1} \times X_{t+1}^{i}]
$$

where $X_{t+1}^{i}$ is the payoff given by asset $i$ at time $t+1$, $M_{t+1}$ is the stochastic discount function at time $t+1$. Note that both $X_{t+1}^{i}$ and $M_{t+1}$ are random variables with respect to the information available up to time $t$ and that is why the expectation operator $E_{t}[\cdot]$ is taken with respect to the information available up to time $t$.

The payoff $X_{t+1}^{i}$ of asset $i$ at time $t+1$ is typically the price of that asset on that date $P_{t+1}^{i}$ plus some benefit and/or cost, $D_{t+1}$, involved in holding the asset from period $t$ to $t+1$. We will generically call the value $D_{t+1}$ as **dividend** but keep in mind $D_{t+1}$ in fact embeds all sorts of benefits and costs of bearing the property of an asset such as dividends, coupons, rents, storage, depreciation, etc. To avoid confusion and to be more generic it is common to rewrite equation above as simply:

$$
E_{t}[M_{t+1} \times (1+R_{t+1}^{i})] = 1
$$

where

$$
R_{t+1}^{i} \equiv \frac{X_{t+1}^{i}}{P_{t}^{i}}-1
$$

is the *total return* of asset $i$ from period $t$ to $t+1$. Note that $R_{t+1}^{i}$ is actually the return on the total return index we carefully learned how to construct in our previous lecture.

Supppose now, we have an asset $f$ for which $X_{t+1}^{f}$ is not a random variable with respect to the information available up to time $t$ and therefore

$$
R_{t+1}^{f} \equiv \frac{X_{t+1}^{f}}{P_{t}^{f}}-1
$$

is a known non-random constant at time $t$ and
$$
E_{t}[M_{t+1} \times (1+R_{t+1}^{f})] = 1 \implies R_{t+1}^{f} = \frac{1}{E_{t}[M_{t+1}]}-1.
$$

We will call this asset $f$ the **risk free asset** and $R_{t+1}^{f}$ the **risk free rate**. Note that $X_{t+1}^{f} = P_{t}^{f}(1+R_{t+1}^{f})$.

Now, we come to the equation that tells us how prices should reflect the asset risk premium. Since both $X_{t+1}^{i}$ and $M_{t+1}$ are random variables with respect to the information available up to time $t$ we can compute their covariance:


$$
Cov_{t}[M_{t+1},R_{t+1}^{i}] = E_{t}[M_{t+1}(1+R_{t+1}^{i})] - E_{t}[M_{t+1}](1+E_{t}[R_{t+1}^{i}])
$$

but since $E_{t}[M_{t+1}(1+R_{t+1}^{i})] = 1$ and $E_{t}[M_{t+1}] =1/(1+R_{t+1}^{f})$ we have that:

$$
E_{t}\Big[\frac{1+R_{t+1}^{i}}{1+R_{t+1}^{f}}-1\Big] = - Cov_{t}[M_{t+1},R_{t+1}^{i}].
$$

Note the term on the left hand side is in fact the *excess returns* of asset $i$ from period $t$ to $t+1$, $R^{e}_{i,t+1}$, which we discussed in our previous lecture.

This equation gives as the fundamental intuition about using risk premium as a way to identify assets or strategies that can generate positive excess returns over long periods of time.

1. Assets that generate high/low returns in good/bad times, i.e., when $M_{t+1}$ is low/high, have negative $Cov_{t}[M_{t+1},R_{t+1}^{i}]$ and therefore positive expected returns $E_{t}[R^{e}_{i,t+1}]$.
2. Assets that generate low/high returns in good/bad times, i.e., when $M_{t+1}$ is low/high, have positive $Cov_{t}[M_{t+1},R_{t+1}^{i}]$ and therefore negative expected returns $E_{t}[R^{e}_{i,t+1}]$.

In other words, assets that have generate positive excess returns are the assets that do badly when you most need your money, i.e., when the economy is doing badly and $M_{t+1}$ is high, $R_{t+1}^{i}$ is low. Why? Because unless this asset gives the holder some extra risk premium, nobody will want to hold that asset in equilibrium. So, it makes sense that in equilibrium these types of assets, the ones that are negatively correlated with $M_{t+1}$ generate positive excess returns over time.

Conversely, assets that do well when the economy is doing badly, i.e. $R_{t+1}^{i}$ is high when $M_{t+1}$ is high, work more as insurance policies than assets. You should pay a premium for that insurance policy protection that will give you money in times of distress. So, it makes sense that in equilibrium these types of assets, the ones that are positively correlated with $M_{t+1}$, generate negative excess returns over time.

## Our first (factor) model <a class="anchor" id="regression"></a>

The SDF framework gave us the equation

$$
E_{t}[R^{e}_{i,t+1}] = - Cov_{t}[M_{t+1},R_{t+1}^{i}].
$$

but we do not really know the shape or form of $M_{t+1}$.

In order to try to model returns, a series of linear factor models such that $M_{t+1} = a + b´f_{t+1}$ started being developed by the profession. Here, $a$ is a simple scalar constant but $b$ is a $k \times 1$ non-random constant vector that multiplies a $k \times 1$ random vector of time-varying factors $f_{t+1}$ that vary along the business cycle, going up and down with the shocks to consumption, investment, etc.

We do not really know the shape or form of $f_{t+1}$ but we know they are the drivers of $M_{t+1}$. We also know that under the assumption $M_{t+1} = a + b´f_{t+1}$ we have:

$$
E_{t}[R^{e}_{i,t+1}] = - (1+R_{t+1}^{f})Cov_{t}\Big[M_{t+1},\frac{1+R_{t+1}^{i}}{1+R_{t+1}^{f}}\Big] = - (1+R_{t+1})^{f}Cov_{t}[M_{t+1},R^{e}_{i,t+1}] = - (1+R_{t+1})^{f}b´Cov_{t}[f_{t+1},R^{e}_{i,t+1}]
$$

and

$$
(1+R_{t+1}^{f}) = \frac{1}{E_{t}[M_{t+1}]} \implies (1+R_{t+1}^{f})b´E_{t}[f_{t+1}] = 1 - (1+R_{t+1}^{f})a
$$

under the same assumption.

Therefore

$$
E_{t}[R^{e}_{i,t+1}] = - (1+R_{t+1})^{f}b´E_{t}[f_{t+1}R^{e}_{i,t+1}]+ (1 - (1+R_{t+1}^{f})a)E_{t}[R^{e}_{i,t+1}]
$$

which implies:

$$
E_{t}[R^{e}_{i,t+1}] = a^{-1}b´E_{t}[f_{t+1}R^{e}_{i,t+1}]
$$

#### Beta representation

Now, define the **beta** of the asset $i$ to the factors $f_{t}$ as the being the $k \times 1$ non-random vector:

$$
\beta_{i,t} \equiv E_{t}[f_{t+1}f_{t+1}´]^{-1}E_{t+1}[(f_{t+1}R^{e}_{i,t+1})]
$$

and the $k \times 1$ non-random vector:

$$
\lambda_{t} \equiv E_{t}[f_{t+1}f_{t+1}´]ba^{-1}
$$

then we have the so-called **beta representation** of a our model given by:

$$
E_{t}[R^{e}_{i,t+1}] = a^{-1}b´E_{t}[f_{t+1}R^{e}_{i,t+1}] = a^{-1}b´E_{t}[f_{t+1}f_{t+1}´]\beta_{i,t} = \beta_{i,t}´\lambda_{t}
$$.

The so called **beta representation** is in fact a model that can be written as:

$$
E_{t}[R^{e}_{i,t+1}] = \alpha_{i,t} + \beta_{i,t}´\lambda_{t}
$$

If markets are efficient under that particular asset pricing model, then we should have $\alpha_{i,t} = 0$. We therefore can use data to test if $\alpha_{i,t} = 0$ or not. If we find that $\alpha_{i,t} \neq 0$, then we have found an **anomaly**. Now, remember our discussion in the introduction that we need to be careful here with the **join-hypothesis problem**. The **anomaly**, $\alpha_{i,t} \neq 0$, is only evidence that the market is inefficient if our model is correct. However, under an expanded, enhanced or enriched market equilibrium model that anomaly may not be an anomaly at all and we may go back to $\alpha_{i,t} = 0$ if we use a different set of factors $f_{t}$.

## Factor models

The SDF framework gave us the so called **beta representation**:

$$
E_{t}[R^{e}_{i,t+1}] = \alpha_{i,t} + \beta_{i,t}´\lambda_{t}
$$

If markets are efficient under that particular asset pricing model, then we should have $\alpha_{i,t} = 0$. We therefore can use data to test if $\alpha_{i,t} = 0$ or not. If we find that $\alpha_{i,t} \neq 0$, then we have found an **anomaly**. However, we also discussed how we need to be careful with the interpretation of the term **anomaly** because of the **join-hypothesis problem**. The **anomaly**, $\alpha_{i,t} \neq 0$, is only evidence that the market is inefficient if our model is correct. However, under an expanded, enhanced or enriched market equilibrium model that anomaly may not be an anomaly at all and we may go back to $\alpha_{i,t} = 0$ if we use a different model.

Broadly speaking, the rationale behind factor models is that the financial performance of assets depends on a relative small number of factors. These factors may be latent and unobservable and ultimately unkown to us. However, they may be related to intrinsic asset characteristics like the accounting ratios for stocks, carry for currencies and rates or the size of non-commercial speculative positions in commodities.

### Arbitrage Pricing Theory (APT)

The equation above is really the equation behind the Arbitrage Pricing Theory (APT) of capital asset pricing first stipulated by [Ross (1976)](https://www.sciencedirect.com/science/article/abs/pii/0022053176900466). The idea is that there are a small number $k$ of underlying time-varying unobservable factors $f_{t+1}$ that drive asset returns. These factors cannot be directly observed but they can be estimated by Principal Component Analysis (PCA) using the $T \times N$ matrix of asset returns:

$$
\mathbf{X}
=
\begin{bmatrix}
R^{e}_{1,1} & \cdots &
R^{e}_{N,1} \\
\vdots & \ddots & \vdots \\
R^{e}_{T,1} & \cdots &
R^{e}_{T,N}
\end{bmatrix}
$$

We will talk about PCA and unsupervised learning some other time but let's assume for now that we can estimate, using the $T \times N$ matrix of asset returns $\mathbf{X}$, a sequence of $k$ dimensional time-varying unobservable factors $\hat{f}_{t}$ for  $t=1,...,T$. So, that we can write a time-series regression of the excess returns $R^{e}_{i,t+1}$ of asset $i$, or the returns of a portfolio of assets, onto the estimated factors $\hat{f}_{t}$ over $t=1,\dots,T$:

$$
R^{e}_{i,t} = \alpha_{i} + \beta_{i}´\hat{f}_{t} + \epsilon_{i,t}
$$

where $\epsilon_{i,t}$ is an error term such that $E_{t}[\epsilon_{i,t}]=0$, $E_{t}[\epsilon_{i,t}|\hat{f}_{t}]=0$, and $E_{t}[\epsilon_{i,t}\epsilon_{j,t}]=0$ for $i\neq j$.

Over long periods of time, we would expect that asset $i$ with high $\beta_{i}$ outperforms assets $j$ with low $\beta_{j}$. Similarly, if we allow a more flexible model such as 

$$
R^{e}_{i,t} = \alpha_{i} + \beta_{i,t}´\hat{f}_{t} + \epsilon_{i,t}
$$

and estimate that model using rolling a rolling regression, we would expect that asset $i$ will perform better in times when $\beta_{i,t}$ is high than in times when $\beta_{i,t}$ is low.


### Capital Asset Pricing Model (CAPM)

The Capital Asset Pricing Model (CAPM) from [Sharpe (1964)](https://onlinelibrary.wiley.com/doi/full/10.1111/j.1540-6261.1964.tb02865.x) is the first and most famous asset pricing factor model. Where the $k \times 1$ random vector of time-varying factors $f_{t}$ is actually a single scalar and equal to the excess returns of the so-called **market portfolio**, $R^{M}_{t}$. In the CAPM, we also assume that the beta to the market portfolio is constant over time, i.e., $\beta_{i,t}=\beta_{i}$ and can be estimated by running a time-series regressing of the excess returns $R^{e}_{i,t+1}$ of asset $i$, or the returns of a portfolio of assets, onto the market portfolio factors $R^{M}_{t+1}$ over $t=1,\dots,T$:

$$
R^{e}_{i,t} = \alpha_{i} + \beta_{i}R^{M}_{t} + \epsilon_{i,t}
$$

Over long periods of time, we would expect that asset $i$ with high $\beta_{i}$ outperforms assets $j$ with low $\beta_{j}$.


### Fama-French style factor models

The Fama–French [three-factor model](https://www.jstor.org/stable/2329112?seq=1) and its updated [five-factor model](https://www.sciencedirect.com/science/article/abs/pii/S0304405X14002323) version was developed by nobel prize winner Eugene Fama and Kenneth French, both from University of Chicago, to describe stock returns. The idea was to extend or enrich the CAPM to take into account other characteristcs beyond the beta to the market portfolio, $R^{M}_{i,t+1}$. Specifically, the three factor model enriching the CAPM with long-short portfolios constructed based on cerrtain stock characteristics: (i) market size and (ii) the book-to-market ratio (B/P). Stocks with high B/P are called **value/cheap** stocks in contrast with **growth/expensive** stocks with low (B/P). So, the three factor model becomes:

$$
R^{e}_{i,t} = \alpha_{i} + \beta_{i}´ \begin{bmatrix}r^{M}_{t} \\ SMB_{t} \\ HML_{t}\end{bmatrix} + \eta_{i,t}
$$

where $SMB_{t}$ is the return of a long-short portfolio composed of stocks with small market capitalization on the long side and stocks with large market capitalization on the short side (called "Small Minus Big" portfolio) and $HML_{t}$ is the return of a long-short portfolio composed of stocks with with high B/M on the long side and stocks with low B/M on the short side (called "High Minus Low"). These two portfolios measure the historic excess returns of small caps over big caps and of value stocks over growth stocks. Historical values for these two portfolios can be found on [on Kenneth French's web page](http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html) and the vector $\beta_{i,t}$ can be estimated by time-series regressions (assuming $\beta_{i,t}=\beta_{i}$ or using rolling regressions) as well.

Mark Carhart´s [four-factor model](https://onlinelibrary.wiley.com/doi/full/10.1111/j.1540-6261.1997.tb03808.x) is an extension of the Fama–French three-factor model including a momentum factor, the MOM factor. Momentum is the tendency for the stock price to continue rising if it is going up and to continue declining if it is going down. The $MOM_{t}$ factor is the return of a long-short portfolio composed of stocks with the highest past returns over a certain lookback period on the long side and stocks with lowest past returns over the same period.

In 2015, Fama and French extended the model to a [five-factor model](https://www.sciencedirect.com/science/article/abs/pii/S0304405X14002323), adding a further two factors profitability and investment. Defined analogously to the HML factor, the profitability factor (RMW) is the difference between the returns of firms with robust (high) and weak (low) operating profitability; and the investment factor (CMA) is the difference between the returns of firms that invest conservatively and firms that invest aggressively.

AQR´s [six-factor model](https://www.aqr.com/Insights/Perspectives/Our-Model-Goes-to-Six-and-Saves-Value-From-Redundancy-Along-the-Way) is perhaps the most used Fama-French style factor model today. Historical values for these portfolios can be found on [AQR's web page](https://www.aqr.com/Insights/Datasets).


### Fama-French portfolio sorts

The [Fama-French factors](http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html#Research), or portfolios, are constructed by sorting assets by a particular characteristic or metric (e.g., size and book-to-market) and then going long the top quintile (top 20%) and going short the bottom quitile (bottom 20%). Those portfolios are called univariate sorts portfolios.

However, Fama-French factors can also be constructed using bivariate and three-way sorts. The bivariate portfolios are the intersections of 2 portfolios formed based on two signals. For example, we can construct [6 Portfolios based on Size and Book-to-Market](http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/Data_Library/six_portfolios.html) if we split the Size signal in two groups using the median (big and small), and the Book-to-Market signal in two three groups using the 30th and 70th percetinles (value, neutral, and growth). These are the so-caleed double sorting 2 x 3 Fama-French dobule-sorted portfolios. Analogously, we can construct [36 three-way portfolios](http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html#Research) by sorting stocks into 2x4x4 groups.

Fama-French portfolios are typically constructed with single name stocks and there are thousands of them to sort through. If you have thousands of assets to choose from, each of these double sort or three-way portfolios will still contain a large set of assets and will still be fairly diversified.

Because single name stocks Fama-French portfolios are so diversified, Fama-French equal weighting or market-cap weighting schemes are not too damaging despite the fact that they are not taking into consideration that the different stocks have different vols, betas to the market and variable correlations. In fact, the simplicity of Fama-French portfolio constructions is one of its appeals for its use in the academic literature but they are hardly ever used in practice.

Moreover, Fama-French portfolios are cash neutral. This property is important when constructing single name stock long-short portfolios because the stocks in the long leg will be purchased with the money collected with the short sale of the stocks in the short leg. However, if you are trading swaps or futures on the underlyings, this restriction does not make any sense. We wil go back to this point later.

Still, Fama–French's [original idea](http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html#Research) was to check if the returns of these portfolios were statistically signficant as in this table extracted from their [1995 paper](https://www.sciencedirect.com/science/article/abs/pii/S0304405X14002323):

![Capture.PNG](attachment:Capture.PNG)

### Asset pricing as a function of characteristics

What is unique about the CAPM and Fama-French style factor models is that the $k \times 1$ random vector of time-varying factors $f_{t}$ are actually portfolios constructed based on one or more asset characteristics. So, unlike the more general APT model, the factors are observable and we can estimate the vector $\beta_{i,t}$ by directly by time-series regressions either by assuming $\beta_{i,t}=\beta_{i}$ or using rolling regressions.

The standard methodology for constructing Fama-French style factors is to sort assets into portfolios based on a characteristic or set of characteristics, look at the portfolio means (especially the 1–10 portfolio alpha, information ratio, and t-statistic), and then see if the spread in means corresponds to a spread of portfolio betas against some factor. However, portfolio sorts are really the same thing as nonparametric cross-sectional regressions, using nonoverlapping histogram weights. The picture below taken from [Cochrane´s presidential address](https://faculty.chicagobooth.edu/john.cochrane/research/papers/discount_rates_jf.pdf) illustrates this point for the one-characteristic (B/M) case: 

![image.png](attachment:image.png)

Since it is very difficult to do portfolio sorts based on a large number of multiple characteristics, it is important that we realize that portfolio sorts are really the same thing as running regressions of excess returns on a very large number of characteristics. Let´s see why.

Suppose that expected returns, $R^{e}_{i,t+1}$ rise with a characteristic $C_{i,t}$, e.g., the log book-to-market ratio, $\log(B_{i,t}/M_{i,t})$ or a set of characteristics $C_{i,t} = (\log(B_{i,t}/M_{i,t}), M_{i,t}, \log(P_{i,t}/P_{i,t-1Y})), \dots, \log(D_{i,t}/P_{i,t}))$. Why would this happen? Within the factor model framework, this may happen because assets with that set of characteristic $C_{i,t}$ may have a high $\beta_{i,t,k}$ with respect to the $k$-th element, $\hat{f}_{k,t}$, of the factor vector $\hat{f}_{t}$. Or, more generally, the beta vector may be a function of those characteristics $\beta_{i,t} = \delta_{0} + \delta_{1}'C_{i,t}$.

Now, under this framework, if we are comparing the returns of two assets, $R^{e}_{i,t} - R^{e}_{j,t}$ of assets that have high and low characteristic $C_{i,t}$ and $C_{j,t}$, respectively, we have:

$$
R^{e}_{i,t} - R^{e}_{j,t} = (\alpha_{i} - \alpha_{j}) + (C_{i,t} - C_{j,t})'\delta_{1}\hat{f}_{t} + (\eta_{i,t+1} - \eta_{j,t+1})
$$

In the no anomaly case, where $\alpha_{i} = \alpha_{j} = 0$, and no correlation across $\eta´s$, we have that the Sharpe ratio of the spread portfolio $R^{e}_{i,t} - R^{e}_{j,t}$ is given by:


$$
Sharpe^{C} = \frac{E[R^{e}_{i,t} - R^{e}_{j,t}]}{\sigma[R^{e}_{i,t} - R^{e}_{j,t}]} = \frac{(C_{i,t} - C_{j,t})'\delta_{1}E[\hat{f}_{t}]}{\sqrt{(C_{i,t} - C_{j,t})'\delta_{1}Var[\hat{f}_{t}]\delta_{1}'(C_{i,t} - C_{j,t})+2\sigma^{2}_{\eta}}}
$$

The Sharpe ratio of this spread portfolio, $Sharpe^{C}$, sorted based on this characteristic is zero if $C_{i,t} = C_{j,t}$ and it increases as we look at further-separated portfolios, i.e., as $\delta_{1}'(C_{i,t} - C_{j,t})$ increases. The Sharpe ratio  $Sharpe^{C}$ will increase until it approaches the pure Sharpe ratio of the factor. If this factor is indeed a risk premium, the Sharpe is positive and long-short portfolios based on this characteristics will generate positive returns over time. So, it makes sense to check if the Sharpe ratio of these spread portfolios are high or, equivalently, if the returns statistically signficant.

Analogously, we can compare returns of an assets, $R^{e}_{i,t}$, we have a high and low characteristic $C_{i,t}$ relative to its mean conditional on the factor level, $\hat{f}_{t}$:

$$
E[R^{e}_{i,t} - E[R^{e}_{i,t}]|\hat{f}_{t}=E[\hat{f}_{t}]] = (\alpha_{i}  - \bar{\alpha}) + \delta_{1}'(C_{i,t} - \bar{C_{i}})E[\hat{f}_{t}],
$$

the returns of asset $i$ are expected to be higher when that characteristic is higher than average, $C_{i,t} > \bar{C_{i}}$ as long as that factor is indeed a risk premium and the Sharpe $E[\hat{f}]$ is positive.


### Fama-Macbeth Regressions

One of the problems with looking at the Sharpe ratio of spread portfolios is that, in finite samples, if we split the set of assets into finer and finer groups, as we sort into more and more characteristics, our portfolios within each groups will contain less and less assets. This will make $\sigma_{\eta}$ increase by the squared-root of the number of assets in each leg of the spread portfolio, decreasing the Sharpe ratio  $Sharpe^{C}$. What if we had something like 20 or 30 characteristics? After all, investment analysts look at thousands of variables to make valuation assesments.

Let's elaborate on the idea we just discussed a bit further. Let's say assets with a high $\beta_{i,t,k}$ with respect to the $k$-th element, $\hat{f}_{k,t}$ are also assets with a high $k$-th element of the vector $\delta_{0} + \delta_{1}'C_{i,t}$. Or, simply, we have:

$$
\beta_{i,t} = \delta_{0} + \delta_{1}'C_{i,t}
$$

then what we have in fact is a model that can be written as:

$$
R^{e}_{i,t+1} = \alpha_{i,t} + (\delta_{0} + \delta_{1}'C_{i,t})'\hat{f}_{t} + \eta_{i,t+1} = \gamma_{i,t} + \xi_{t}'C_{i,t} + \eta_{i,t+1}
$$

where $ \gamma_{i,t} \equiv \alpha_{i,t} + \delta_{0}'\hat{f}_{t}$ and $\xi_{t} \equiv \delta_{1}\hat{f}_{t}$. So, if we estimate a regression directly on characteristics, the estimate of $\xi_{t}$ is really an estimate of $\delta_{1}\hat{f}_{t}$.

Now, we can estimate a sequence of cross-sectional regressions, that is, for each $t=1,\dots,T$, we estimate $\hat{\xi}_{1},\dots,\hat{\xi}_{T}$ one for each time period $t=1,\dots,T$ and the same applies to the model's  **alpha** under the assumption $\gamma_{i,t}=\gamma_{t}$. We can then test whether this model has **alpha** by computing a t-stat on the estimated $\hat{\gamma}_{1},\dots,\hat{\gamma}_{T}$.

Similarly, a statistically significant element of $\xi_{t}$, suggests that factor, which is formed as a linear function of a set of characteristics characteristic, explains the outperformance of one asset over another exactly because the projection of the returns into the factor vector, $\beta_{i,t} \equiv E_{t}[f_{t+1}f_{t+1}´]^{-1}E_{t+1}[(f_{t+1}R^{e}_{i,t+1})]$ if a linear function $\delta_{0} + \delta_{1}'C_{i,t}$ of an observable set of characteristics.


Thinking about it more generally, we can write the model as
$$
R^{e}_{i,t+1} = \gamma_{i,t} + b_{i,t}´X_{i,t} + \nu_{i,t+1}
$$

where $X_{i,t}$ is an individual or market level variable known at time $t$.


If we want to develop a time-series strategy. It makes sense for us to estimate $\gamma_{i,t}$ and $b_{i,t}$ separately for each asset $i$. This can be done by either assuming $\gamma_{i,t}=\gamma_{i}$ and $\beta_{i,t}=\beta_{i}$ and estimating a regression for each asset. We would then have the model's **alpha** and **beta** estimates as:

$$
\hat{\gamma} \equiv N^{-1}\sum_{i}{\hat{\gamma}_{i}} \\
\hat{b} \equiv N^{-1}\sum_{i}{\hat{b}_{i}}
$$

and their variances as:

$$
Var[\hat{\gamma}] = N^{-1}\sum_{i}(\hat{\gamma}_{i} - \hat{\gamma})^{2} \\
Var[\hat{b}] = N^{-1}\sum_{i}(\hat{b}_{i} - \hat{b})(\hat{b}_{i} - \hat{b})'
$$

If we want to develop a relative value strategy, then it makes sense for us to estimate $\gamma_{i,t}$ and $b_{i,t}$ separately for each time period $t$, using a single period or rolling time series regression. We would then have the model's **alpha** and **beta** estimates as:

$$
\hat{\gamma} \equiv T^{-1}\sum_{t}{\hat{\gamma}_{t}} \\
\hat{b} \equiv T^{-1}\sum_{t}{\hat{b}_{t}}
$$

and their variances as:

$$
Var[\hat{\gamma}] = T^{-1}\sum_{t}(\hat{\gamma}_{t} - \hat{\gamma})^{2} \\
Var[\hat{b}] = T^{-1}\sum_{t}(\hat{b}_{t} - \hat{b})(\hat{b}_{t} - \hat{b})'
$$