# Returns and Stylized Facts

Before modeling anything, we need to understand what financial return data actually looks like. This chapter introduces **log-returns**, then walks through the canonical **stylized facts** — empirical regularities observed consistently across assets, markets, and time periods.

## Log-Returns

Let $P_t$ be the price of an asset at time $t$. The **simple return** and **log-return** are defined as:

$$R_t = \frac{P_t - P_{t-1}}{P_{t-1}}, \qquad r_t = \ln\left(\frac{P_t}{P_{t-1}}\right) = \ln P_t - \ln P_{t-1}$$

For small values, $r_t \approx R_t$. We prefer log-returns because:
- They are **time-additive**: $r_{t,t+k} = \sum_{i=1}^{k} r_{t+i}$
- They are defined on $(-\infty, +\infty)$ — convenient for modeling
- They relate directly to the continuously compounded rate of return

```{warning}
Log-returns are **not** cross-sectionally additive. For portfolio returns, simple returns aggregate correctly across assets; log-returns do not.
```

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
from scipy import stats
import warnings
warnings.filterwarnings('ignore')

# Reproducibility
np.random.seed(42)

# Simulate a GBM price series for illustration
n = 1000
mu, sigma = 0.0005, 0.015
eps = np.random.normal(0, 1, n)
log_returns = mu + sigma * eps
prices = 100 * np.exp(np.cumsum(log_returns))

# Also simulate a series with GARCH-like volatility clustering
h = np.zeros(n)
r_garch = np.zeros(n)
omega, alpha, beta = 0.00001, 0.10, 0.85
h[0] = omega / (1 - alpha - beta)
for t in range(1, n):
    h[t] = omega + alpha * r_garch[t-1]**2 + beta * h[t-1]
    r_garch[t] = np.sqrt(h[t]) * np.random.normal()

dates = pd.date_range('2018-01-01', periods=n, freq='B')
df = pd.DataFrame({'price': prices, 'log_return': log_returns, 'r_garch': r_garch}, index=dates)

print("Dataset shape:", df.shape)
print(df['log_return'].describe().round(6))

## The Stylized Facts

Cont (2001) systematically documented statistical properties common to financial return series. We examine the most important ones.

### 1. Fat Tails (Excess Kurtosis)

Financial returns have **heavier tails than a normal distribution**. The kurtosis of a normal distribution is 3 (excess kurtosis = 0). Empirical return series consistently show excess kurtosis well above zero, implying large moves occur far more often than Gaussian models predict.

This has critical implications: Value-at-Risk calculated under normality **systematically underestimates** tail risk.

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(12, 4))

# Normal vs empirical histogram
ax = axes[0]
r = df['r_garch']
r_std = (r - r.mean()) / r.std()
x = np.linspace(-5, 5, 300)

ax.hist(r_std, bins=60, density=True, alpha=0.6, color='steelblue', label='Simulated returns')
ax.plot(x, stats.norm.pdf(x), 'r-', lw=2, label='Normal distribution')
ax.plot(x, stats.t.pdf(x, df=5), 'g--', lw=2, label='t-distribution (df=5)')
ax.set_xlim(-5, 5)
ax.set_xlabel('Standardized return')
ax.set_ylabel('Density')
ax.set_title('Fat Tails: Return Distribution vs Normal')
ax.legend()

# QQ plot
ax = axes[1]
stats.probplot(r_std, dist='norm', plot=ax)
ax.set_title('Normal Q-Q Plot (deviations in tails = fat tails)')
ax.get_lines()[1].set_color('red')

plt.tight_layout()
plt.show()

print(f"Excess kurtosis (GARCH series): {stats.kurtosis(r_garch):.3f}")
print(f"Excess kurtosis (normal benchmark): {stats.kurtosis(np.random.normal(size=10000)):.3f}")

### 2. Volatility Clustering

Returns themselves are nearly unpredictable, but their **absolute values or squares are autocorrelated**. Large moves tend to be followed by large moves (of either sign), and calm periods cluster together. This was first noted by Mandelbrot (1963):

> *"Large changes tend to be followed by large changes — of either sign — and small changes tend to be followed by small changes."*

Formally, $\text{Corr}(|r_t|, |r_{t-k}|) > 0$ for many lags $k$, even though $\text{Corr}(r_t, r_{t-k}) \approx 0$.

In [None]:
from statsmodels.graphics.tsaplots import plot_acf

fig, axes = plt.subplots(2, 2, figsize=(13, 7))

# Returns time series
axes[0, 0].plot(df.index, df['r_garch'], linewidth=0.6, color='steelblue')
axes[0, 0].set_title('Return Series')
axes[0, 0].set_ylabel('$r_t$')

# Squared returns time series
axes[0, 1].plot(df.index, df['r_garch']**2, linewidth=0.6, color='coral')
axes[0, 1].set_title('Squared Returns (Proxy for Volatility)')
axes[0, 1].set_ylabel('$r_t^2$')

# ACF of returns
plot_acf(df['r_garch'], lags=40, ax=axes[1, 0], title='ACF of Returns')

# ACF of squared returns
plot_acf(df['r_garch']**2, lags=40, ax=axes[1, 1], title='ACF of Squared Returns')

plt.tight_layout()
plt.show()

## Summary Table

| Stylized Fact | Description | Implication |
|---|---|---|
| Fat tails | Excess kurtosis > 0 | Normal VaR underestimates risk |
| Volatility clustering | $\text{Corr}(r_t^2, r_{t-k}^2) > 0$ | GARCH-type models needed |
| Absence of autocorrelation | $\text{Corr}(r_t, r_{t-k}) \approx 0$ | Consistent with weak-form efficiency |
| Leverage effect | Negative shock $\Rightarrow$ higher volatility | Asymmetric GARCH (GJR, EGARCH) |
| Long memory in volatility | Slow ACF decay in $|r_t|$ | FIGARCH, HAR models |

## References

- Cont, R. (2001). *Empirical properties of asset returns: stylized facts and statistical issues*. Quantitative Finance, 1(2), 223–236.
- Mandelbrot, B. (1963). *The variation of certain speculative prices*. Journal of Business, 36(4), 394–419.
- Campbell, J., Lo, A., MacKinlay, C. (1997). *The Econometrics of Financial Markets*. Princeton University Press.