---
title: "PB1 Stylised Facts"
author: "Roald Versteeg"
date: "2024-02-08"
output:
  html_document:
    df_print: paged
---



In [None]:
knitr::opts_chunk$set(echo = TRUE)



# Stylised Facts of financial asset returns
## Financial Econometrics - Roald Versteeg

In this worksheet, we are trying to replicate some of the most prevalent stylised empirical facts of financial returns.


Cont (2001) discusses the following eleven stylised facts of asset returns:

1. **Absence of autocorrelations**: with the exception of intraday data.

2. **Heavy tails**: excess kurtosis (>3) and returns have a Pareto-like tail. (leptokurtic returns)

3. **Gain/loss asymmetry**: negative skew (<0), with more extreme losses than extreme gains.

4. **Aggregational Gaussianity**: the distribution of annual returns is closer to a normal than daily returns.

5. **Intermittency**: return volatility is not constant over time but displays heteroskedasticity.

6. **Volatility clustering**: measures of volatility, like absolute returns, have positive autocorrelation.

7. **Conditional heavy tails**: After modelling conditional heteroskedasticity (with eg. GARCH),  conditional returns are closer to normality, but still have some excess kurtosis remaining.

8. **Slow decay of autocorrelation in absolute returns**: 
This is sometimes interpreted as a sign of long-range dependence.

9. **Leverage effect**: negative returns have a bigger impact on volatility than positive returns.

10.**Volume/volatility correlation**: trading volume is correlated with all measures of volatility.

11. **Asymmetry in time scales**: coarse-grained measures of volatility predict fine-scale volatility better than the other way round.

Cont, R. (2001), Empirical properties of asset returns: stylized facts and statistical issues, Quantitative Finance, **1** (2), pp. 223 - 236. \\
http://rama.cont.perso.math.cnrs.fr/pdf/empirical.pdf 

In this worksheet, we'll check for the presence of 1., 2.,  5., 6., and 8.




In [None]:
library(tidyverse)
library(quantmod)




## Loading the data

Let's begin with loading the data using the Quantmod package, which can load data from sources like Yahoo, Google, or FRED.
Alternatively you can load the data directly from a .csv file you have locally.

For this excercise I've chosen daily data of the New York Stock Exchange (NYSE), running from January 1985 to January 2024.



In [None]:
ticker <- "^NYA" 
startDate <- "1985-01-01" 
endDate <- "2024-01-01" # You can use to=Sys.Date() to get data up to the current date

rawData <- getSymbols(ticker, 
                      auto.assign = FALSE,
                      from = startDate, 
                      to = endDate, 
                      src="yahoo")



There is a couple of neat things we can do while the series is expressed as an xts (extended time series), which it will be when you download it from QUantmod like easily converting the frequency of the data or calculating the sum per month.I've provided code below that shows how to calculate log returns and convert daily returns into monthly and yearly returns.



In [None]:
ret.daily <- log( rawData$NYA.Close) - stats::lag( log( rawData$NYA.Close) )

#log returns are summable; calculate as the sum of the daily returns
ret.monthly <- apply.monthly(ret.daily, sum)
ret.yearly <- apply.yearly(ret.daily, sum)

# notice the difference in the date notation between the two.
price.monthly <- to.monthly(rawData)
price.monthly2 <- to.period(rawData, period = "months")
price.yearly <- to.yearly(rawData)

# calculate the monthly returns as the difference in the monthly prices
ret.monthly2 <- log( price.monthly$rawData.Close) - stats::lag( log( price.monthly$rawData.Close) )
ret.yearly2 <- log( price.monthly$rawData.Close) - stats::lag( log( price.monthly$rawData.Close) )



Now I'm going to ignore the code chunk above, and instead convert the data to a dataframe.
We'll have to calculate log returns again and scale as appropriate to monthly or yearly. We'll drop the other series, as we don't need them going forward.



In [None]:
data <- data.frame(date=index(rawData), coredata(rawData)) 

data <- data %>%
  mutate( Price = log(NYA.Close) ) %>% #rename for convenience
  mutate( Ret = Price - lag(Price) ) %>% #log returns
  select( date, Price, Ret) %>%
  drop_na() #drop missing observations (mainly the first observation)






Let's plot the data. We can see that the DJI itself is exhibiting the characteristic random walk behaviour, whilst the returns are mean reverting.



In [None]:
ggplot(data=data, aes(x=date, y=Price)) +
  geom_line() + 
  ylab("Log Prices") +
  xlab("Date") +
  ggtitle("NYSE Log Price")

ggplot(data=data, aes(x=date, y=Ret)) +
  geom_line() + 
  ylab("Returns") +
  xlab("Date") +
  ggtitle("NYSE Log Returns")



## Autocorrelations

Let's check for autocorrelations, using the acf function.



In [None]:
acf_p<-acf( data$Price, lag.max=10 )

acf_r<-acf( data$Ret, lag.max=10,  
            ylim=c( -0.1 , 0.1) #I'm limiting the Y axis to make the values more visible
            ) 



We can see that for the price data, the ACF function shows that the autocorrelation stays at 1.0 even after ten periods. We will ignore this for now, but this will prove important later.

For the returns, the first autocorrelation seems to be significantly negative (the blue lines indicate the confidence interval).

## Assymetry: Skewness




In [None]:
library(moments)
skewness(data$Ret)



We can see that the daily returns indeed exhibit negative skew. We can use the hist() and density() functions to plot the empirical distribution.



In [None]:
hist(data$Ret, breaks=100, main="Daily Returns Histogram")
plot(density(data$Ret), main="Daily Returns Density")



## Heavy Tails: Kurtosis

To check for heavy tails, we can check the kurtosis. For a normal distribution, the kurtosis is 3. If the number is higher, we are dealing with excess kurtosis, also known as heavy or fat tails.



In [None]:
kurtosis(data$Ret) # Note: this is not excess kurtosis


Alternatively we can plot the distribution on a Quantile-Quantile plot (Q-Q plot), which compares the actual distribution (circles) against a normal distribution with a mean an standard deviation equal to the sample estimates.



In [None]:
qqnorm(data$Ret)
qqline(data$Ret)

# The car library contains another functions that can also plot the QQ against a normal.
# library(car)
# qqPlot(data$Ret)



The graph shows that the empirical distribution of returns has significantly more outliers both on the left and the right, visualising the fat tails.

### Testing for Normality

The Jarque-Bera test checks whether the sample skew and kurtosis are jointly equal to those of a normal distribution (A normal has skew = 0 and Kurtosis = 3).



In [None]:
jarque.test(data$Ret)



The JB test shows that the distribution shows significant deviations away from normality.

## Volatility Clustering

A common proxy for volatility is the squared return. 
Absolute returns can be thought of as a proxy of the standard deviation. Let's plot the absolute returns and the squared returns and their autocorrelation to see if there is any evidence for volatility clustering.



In [None]:
data <- data %>% 
  mutate( Ret_sq = Ret^2 ) %>% 
  mutate( Ret_abs = abs(Ret) )

ggplot(data=data, aes(x=date, y=Ret_abs)) +
  geom_line() + 
  ylab("Absolute Returns") +
  xlab("Date") +
  ggtitle("DJI Absolute Returns")

acf_r_abs <- acf( data$Ret_abs, lag.max = 10)

ggplot(data=data, aes(x=date, y=Ret_sq)) +
  geom_line() + 
  ylab("Squared Returns") +
  xlab("Date") +
  ggtitle("DJI Squared Returns")

acf_r_sq <- acf( data$Ret_sq, lag.max = 10)



Looking at the line graph, we can see that there are periods where the absolute returns are consistently spiking, which indicate periods of high volatility, and periods where the absolute returns are consistently low, indicating periods of low volatility. This seems to confirm the stylized fact of Intermittance.

When we plot the ACF of absolute returns, we can see that the autocorrelation coefficient stays fairly stable between 0.2 and 0.4, even after 10 periods. Again, this indicates that there is a temporal dependency in volatility, that does not seem to die out even at fairly long horizons.

## Bonus: Extreme Value Theory and Tail Behaviour
One way to capture the tail dynamics is using Extreme Value Theory (EVT). The Hill (1975) estimator is a common tail estimator which captures the speed of the tail decay. The tail parameter can also be interpreted as the number of bounded moments.



In [None]:
#install.packages(evir)
library(evir)
hillplot <- evir::hill(data$Ret, start = 0.04*length(data$Ret), end = 0.05*length(data$Ret), option = "alpha")

print(last(hillplot$y))



The tail estimate depends a lot on the exact choice where the tail starts. Here we consider between 4% and 5% of the sample. The estimates show that the tail exponent is roughly 2.8, which is very low. The tail exponent can also be interpreted as the number of moments of a distribution that is bounded. In this case, it implies that the 2nd moment (variance) is bounded, but the third moment (skew) and fourth moment (kurtosis) are quite likely not very well defined. Again reinforcing how much stock returns deviate away from normality.


