# International College of Economics and Finance 

# Financial Econometrics. Class 03

# Forecasting

- Inspired by [Forecasting: Principles and Practice](https://otexts.com/fpp2/)

## Class outline
- Let's find the best in terms of prediction ability model for Bitcoin

In [None]:
# Libraries
# For those who has a problem like: package ‘package_name’ is not available (for R version x.x.x)
# install.packages('package_name', dependencies=TRUE, repos='http://cran.rstudio.com/')

library(quantmod)
library(moments)
library(forecast) #I highly recommend to visit the link above. It explains basics of forecasting and time-series modeling in R

# For those who have operational system in Russian but wants it in English
Sys.setlocale("LC_TIME", "C")
format(Sys.Date(), format = "%Y-%b-%d")

## Download Data

In [None]:
sp <- getSymbols("^GSPC", scr = "yahoo", auto.assign = FALSE)

In [None]:
# choose adjusted prices


In [None]:
# make log returns


## Plots and Descriptive Statistics

In [None]:
# Plot your returns


In [None]:
# Let's see summary of the returns


In [None]:
#let's plot the histogram of returns and normal distribution


In [None]:
# What aboit skewness and kurtosis


In [None]:
# let's plot autocorrelations


## Tests for normality and unit root

In [None]:
# test if returns are normally distributed


In [None]:
# what about unit roots


## Models and forecasts

- During the lecture you discussed in-sample fit and out-of-sample fit
- You have seen in-sample fit many many times during your previous econometrics courses
- Let's talk about out-of-sample forecasting
- The most simple model is mean. We will use it for the beginning

In [None]:
# For those of you who suddenly forgot what is in-sample fit
summary(lm(sp.ret ~ 1))

- Let's look closely at rolling window scheme
- For any window we should start with window size or sample size
- Basically, you are fixing the initial data which will be used to fit your model for the first time

In [None]:
N <- length(sp.ret) # let's have a variable with the size\length of our data
# Now we can explicitly say what our window size is
window_size <- 250
# Or we can say long should our prediction be
N_OOS <- round(x = 0.3 * N, digits = 0) # Usually, 25%-30% of data are used for the prediction period
N_sample <- N - N_OOS

- The idea behind 250 observations is that it is usually a year of trading (252 working days in most cases)
- On the other hand you might want to use more data to fit your model. For example, for GARCH models I would definitely recommend more than 250 observations
- Anyway, I am more used to the approach with setting the size of prediction vector, not the estimation window

In [None]:
y_train <- # initialize vector with length of N_sample
y_pred <- # initialize vector with length of N_OOS
y_true <- # get your actual returns

for (i in 1:N_OOS){
    y_train <- #put in your training data
    y_pred[i] <- #make a prediction from your model
}

y_pred <- as.xts(x = y_pred, order.by = index(y_true)) #change the type of variable to xts (time-series)

In [None]:
# plot actual returns and your forecasts


- Wow, that is pretty bad model predictions
- Let's try to quantify how bad they are
- In the lecture this metric is called `MSPE`. I will use more popular notation: `MSE` 

In [None]:
mse <- # calculate mean squared prediction error

- Ok, what's next?
- Well, this model is definitely bad in terms of ... anything
- We need some alternative model
- First model that comes to my mind is AR(1) model
- Let's do it

In [None]:
y_pred_ar <- # initialize vector with length of N_OOS

for (i in 1:N_OOS){
    y_train <- #put in your training data
    ar <- #fit a model
    y_pred_ar[i] <- #make a prediction
}

y_pred_ar <- as.xts(x = y_pred_ar, order.by = index(y_true)) #change the type of variable to xts (time-series)

In [None]:
# plot actual returns and your forecasts

In [None]:
mse_ar <- # calculate mean squared prediction error

In [None]:
mse; mse_ar

- Well, we see that ... **what model is better?**
- But we want to see if its forecasts are statistically better than those of mean model

## DM test

- So, I hope that you remember the procedure
    - Make two sequence of predictions $\hat{y_1} \text{ and } \hat{y_2}$
    - Calculate the loss function (in our case it is a squared loss): $L(e_1) = (y - \hat{y_1})^2; L(e_2) = (y - \hat{y_2})^2$
    - Calculate the difference between them: $d = L(e_1) - L(e_2)$
    - If forecasts are the same: $H_0: E(d) = 0$
    - If not: $H_1: E(d) \ne 0$
    - Calculate good old t-stat: $t = \frac{\frac{1}{T}\sum_1^T d}{\sqrt{\hat{\sigma_d}/T}}=DM$
    - For estimation of $\hat{\sigma_d}$ use HAC estimator

In [None]:
# Let's go
L.mean_model <- # calculate the squared forecasting errors
L.ar <- # calculate the squared forecasting errors

d <- # calculate the difference

d.mean <- #calculate the mean of difference

- As for calculation of HAC estimator I will use (Newey-West, 1987)
- Also, check this [chapter](https://www.econometrics-with-r.org/15-4-hac-standard-errors.html)
- (Newey-West, 1987) HAC estimator
    - So, if our $d$ is actually not serially correlated, then: $V(\bar{d}) = V(\frac{1}{T}\sum_{t=1}^{T}d_t) = \frac{1}{T^2}\sum_{t=1}^{T}V(d_t) = \frac{1}{T}V(d_t)$. In other words it is just an unbiased variance estimator: $\hat{V}(d_t) = \frac{1}{T-1}\sum_{t=1}^{T}(d_t - \bar{d})^2$
    - Of course, we cannot say that. Hence: $V(\bar{d}) = V(\frac{1}{T}\sum_{t=1}^{T}d_t) = \frac{1}{T^2}\sum_{t=1}^{T}V(d_t) + \frac{2}{T^2}\sum_{t=1}^{T-1} \sum_{k=t+1}^{T}cov(d_t, d_k) \stackrel{why?}= \frac{1}{T^2}\sum_{t=1}^{T}V(d_t) + \frac{2}{T^2}\sum_{j=1}^{T-1} (T - j) cov(d_t, d_{t+j})$
    - It is also a good practice to truncate the sum of autocovariances. In most cases it is suggested to use $m = T^{\frac{1}{3}}$
    - Hence, Newey-West estimator is: $\frac{1}{T}\hat{V(d_t)} + \frac{2}{T}\sum_{j=1}^{m} (1 - \frac{j}{m+1}) \hat{cov}(d_t, d_{t+j})$
    - But we have some aces. This one (Diebold, F.X. and Mariano, R.S. (1995)). Authors say that the truncation lag should be $(h-1)$, where $h$ - h-step-ahead forecast. In our case, $h = 1$, meaning, that in 1-step-ahead forecast we can say that Newey-West estimator is $\frac{1}{T}\hat{V}(d_t) = \frac{1}{T} \cdot \frac{1}{T-1}\sum_{t=1}^{T}(d_t - \bar{d})$
    - Meaning that we need to use unbiased variance estimator divided by the number of observations

In [None]:
d.var <- #calculate the unbiased estimator of variance of difference series

In [None]:
dm <- # calculate DM statistics

In [None]:
# let's look at it


In [None]:
# calculate p-value of the test assuming, that DM follows standard normal distribution
# in R you use pnorm() function in order to get cdf
# see more here http://seankross.com/notes/dpqr/


- Harvey, Leybourne, and Newbold (1997) (HLN) suggest that improved small-sample properties can be obtained by:
    - making a bias correction to the DM test statistic, and
    - comparing the corrected statistic with a Student-t distribution with (T-1) degrees of freedom, rather than the standard normal.
- The corrected statistic is obtained as:  
$$\sqrt{\frac{T + 1 - 2h + h(h-1)}{T}}\cdot DM = k \cdot DM \sim t_{T-1}$$

In [None]:
T <- # put in the length of difference series
h <- # how many steps ahead forecast are we doing
k <- # write formula for k

In [None]:
hln <- # calculate corrected statistics

In [None]:
# calculate p-value for the test
# for Student's t distribution cdf use pt() function

In [None]:
#checking ourselves
dm.test(e1 = L.mean_model, e2 = L.ar, power = 1) #We have already squared our errors, that's why power = 1

In [None]:
# A small difference is only because we used unbiased estimator of variance, while in the function a biased one is used
biased_var <- function(x){
    m <- mean(x)
    result <- sum((x - m)^2)/length(x)
    result
}

d.mean/sqrt(biased_var(d)/n)*k; 2*pt(q = -abs(d.mean/sqrt(biased_var(d)/n)*k), df = T - 1)