# __Chapter 9: Univariate Time Series Analysis__

<br>

__Finance 5330: Financial Econometrics__ <br>
Tyler J. Brough <br>
Created:      January 27, 2020 <br>
Last Updated: January 27, 2020 <br>
<br>

## Nota Bene

* These notes are based on Chapter 9 of the book _Analysis of Financial Data_ by Gary Koop.

* We will cover the same material at a deeper mathematical and statistical level in Chapter 10 of the book _Market Models_ by Carol Alexander. 

## Introduction to Univariate Time Series Analysis

The point of departure for this chapter is the linear regression model with lagged variables:

$$
Y_{t} = \alpha + \beta_{0} X_{t} + \beta_{1} X_{t-1} + \beta_{2} X_{t-2} + \cdots + \beta_{p} X_{t-p} + \varepsilon_{t}
$$

Such models are a useful first step in understanding important concepts in time series analysis. Often these _distributed lag models_ can be used without problems, but can be misleading when: 

* the dependent variable $Y_{t}$ depends on lags of itself

* the variables are nonstationary (to be defined)

In this chapter we ignore the $X$ variable and focus entirely on $Y$. These are called __univariate time series__ methods. 

It is important to understand the properties of each individual series before proceeding to regression modelling involving several series. 

### Example: Stock Prices on the NYSE



In [4]:
using DataFrames
using CSV

stocks = CSV.read("./Data/nyse.csv", header = true)

Unnamed: 0_level_0,Date,Return
Unnamed: 0_level_1,Int64,Float64
1,520131,0.01546
2,520229,-0.032049
3,520331,0.041241
4,520430,-0.050401
5,520529,0.025041
6,520630,0.035414
7,520731,0.010408
8,520829,-0.014087
9,520930,-0.023122
10,521031,-0.0069


### __Aside on Logs__

* It is common to take the natural logarithm of time series which are growing over time (i.e. work with $\ln{(Y)}$ in stead of $Y$). Why? 


* A time series graph of $\ln{(Y)}$ will often approximate a straight line. 


* In regressions with logged variables coefficients can be interpretetd as elasticities


* $\ln{(Y_{t})} - \ln{(Y_{t-1})}$ is (approximately) the percentage change in $Y$ between period $t-1$ and $t$


### __Differencing__

$$
\Delta Y_{t} = Y_{t} - Y_{t-1}
$$


* $\Delta Y_{t}$ measures the change (or growth) in $Y$ between periods $t-1$ and $t$.


* If $Y_{t}$ is the log of a variable, then $\Delta Y_{t}$ is the percentage change.


* $\Delta Y_{t}$ is the difference of $Y$ (or first difference)


* $\Delta Y_{t}$ is often alled "delta $Y$"

<br>

## The Autocorrelation Function 

* Correlation between $Y$ and lags of itself shed important light on the properties of $Y$.

* Relates to the idea of a trend (discussed above) and nonstationarity (not discussed yet, but we will!)

<br>

__Example:__ $Y =$ NYSE stock price


* Correlation between $Y_{t}$ and $Y_{t-1}$ is $.999$!


* Correlation between $\Delta Y_{t}$ and $\Delta Y_{t-1}$ is $.0438$.


* These are _autocorrelations_ (i.e. correlations between a variable and lags of itself).

### The Autocorrelation Function: Notation

* $r_{1} =$ correlation between $Y$ and $Y$ lagged one period. 


* $r_{p} =$ correlation between $Y$ and $Y$ lagged $p$ periods. 


* _Autocorrelation function_ treats $r_{p}$ as a function of $p$. 



#### Example: NYSE Data (cont.)

__Table Goes Here__


* $Y$ is highly correlated with lags of itself, but the change in $Y$ is not. 


* Information could also be presented on bar charts. See figures 9.3 and 9.4.

### Autocorrelation: Intuition

* $Y$ is highly correlated over time. $\Delta Y$ does not exhibit this property. 


* If you knew past values of the stock price, you could make a very good estimate of what the stock price was this month. However, knowing past values of the change in stock price will not help you predict the change in stock price this month (note: change in stock price is return, exclusive of dividends)

* $Y$ "remembers the past". $\Delta Y$ does not. 


* $Y$ is a nonstationary series while $\Delta Y$ is stationary. (Note: these words not formally defined yet).

## The Autoregressive Model for Univariate Time Series

* Previous discusssion has focused on graphs and correlations, now we go on to regression.

* Autoregressive model of order $1$ is written as $AR(1)$ and given by: 

$$
Y_{t} = \alpha + \phi Y_{t-1} + \epsilon_{t}
$$


* Figures 9.5, 9.6 and 9.7 indicate the types of behavior that this model can generate. 


* $\phi = 1$ generates trending behavior typical of financial time series. (Called the _random walk_ model - a special case of the $AR(1)$ model).


* $\phi = 0$ looks more like change in financial time series. 

In [5]:
## Julia code goes here!

## Nonstationary versus Stationary Time Series

* Formal definitions require difficult statistical theory. Some intuition will have to suffice. 

* _Nonstationary_ means _anything which is not stationary_.

* Focus on a case of great empirical relevance: ___unit root nonstationarity___. 

### Ways of Thinking about Whether $Y$ is Stationary or has a Unit Root

1. If $\phi = 1$, then $Y$ has a unit root. If $|\phi| < 1$ then $Y$ is stationary. 


2. If $Y$ has a unit root then its autocorrelations will be near one and will not drop much as lag lenght increases. 


3. If $Y$ has a unit root, then it will have a long memory. Stationary time series do not have long memory.  


4. If $Y$ has a unit root then the series will exhibit trend behavior. 


4. If $Y$ has a unit root, then $\Delta Y$ will be stationary. Hence, series with unit roots are often referred to as _difference stationary_.

### More on the $AR(1)$ Model

$$
Y_{t} = \alpha + \phi Y_{t-1} + \epsilon_{t}
$$

* Can rewrite as: 

$$
\Delta Y_{t} = \alpha + \rho Y_{t-1} + \epsilon_{t}
$$

$$
\mbox{where} \quad \rho = \phi - 1
$$

* If $\phi = 1$ (unit root) then $\rho = 0$ and:

$$
\Delta Y_{t} = \alpha + \epsilon_{t}
$$

* __Intuition:__ if $Y$ has unit root, can work with differenced data -- differences are stationary. 

### More on the $AR(1)$ Model

* Test if $\rho = 0$ to see if a unit root is present. 

* $-1 < \phi < +1$ is equivalent to $-2 < \rho < 0$. This is called the _stationarity condition_.

<br>
<br>


__Aside: The Random Walk with Drift Model:__

$$
Y_{t} = \alpha + Y_{t-1} + \epsilon_{t}
$$

* This is thought to hold for many financial variables such as stock prices, exchange rates. 


* __Intuition:__ Changes in $Y$ are unpredictable, so no arbitrage opportunities for investors. 
    - See Hayek's _The Use of Knowledge in Society_
    - See Samuelson's _Proof that Properly Anticipated Prices Fluctuate Randomly_
    - This is known in finance as the _random walk hypothesis_
   
   

## Extensions of the $AR(1)$ Model

* $AR(p)$ model: 

$$
Y_{t} = \alpha + \phi_{1} Y_{t-1} + \cdots + \phi_{p} Y_{t-p} + \epsilon_{t}
$$


* Properties similar to the $AR(1)$ model.


* Alternative way of writing $AR(p)$ model: 

$$
\Delta Y_{t} = \alpha + \rho Y_{t-1} + \gamma_{1} \Delta Y_{t-1} + \cdots + \gamma_{p-1} \Delta Y_{t-p+1} + \epsilon_{t}
$$

* Coefficients in this alternative regression ($\rho$, $\gamma_{1}, \ldots, \gamma_{p-1}$) are simple functions of $\phi_{1}, \ldots, \phi_{p}$.

### The $AR(p)$ Model

* $AR(p)$ is in the form of a regression model.


* $\rho = 0$ implies that the time series $Y$ contains a unit root (and $-2 < \rho < 0$ indicates stationarity).


* If a time series contains a unit root then a regression model involving only $\Delta Y$ is appropriate (i.e. if $\rho = 0$ then the term $Y_{t-1}$ will drop out of the equation). 


* _"If a unit root is present, then you can difference the data to induce stationarity"_.

### More Extensions: Adding a Deterministic Trend

* Consider the following model:

$$
Y_{t} = \alpha + \delta t + \epsilon_{t}.
$$


* The term $\delta t$ is a _deterministic trend_ since it is an exact (i.e. deterministic) function of time. 


* Unit root series contain a so-called _stochastic trend_


* Combine with the $AR(1)$ model to obtain:

$$
Y_{t} = \alpha + \phi Y_{t-1} + \delta t + \epsilon_{t}
$$


* Can generate behavior that looks similar to unit root behavior ___even if $|\phi| < 1$.___ (i.e. even if they are stationary).


* See Figure 9.8

### Summary

* The __nonstationary__ time series variables on which we focus are those containing a __unit root__. These series contain a __stochastic trend__. If we difference these time series, the resulting time series will be stationary. For this reason, they are also called ___difference stationary___.


* The __stationary__ time series on which we focus have $-2 < \phi < 0$. but these series may exhibit trend behavior through the incorporation of a __deterministic trend__. If this occurs, they are also called ___trend stationary___. 

## $AR(p)$ with Deterministic Trend Model

* Most general model we use: 

$$
\Delta Y_{t} = \alpha + \rho Y_{t-1} + \gamma_{1} \Delta Y_{t-1} + \cdots + \gamma_{p} \Delta Y_{t-p+1} + \delta t + \epsilon_{t}
$$

* Why work with this form of the model? 
    - __1.__ A unit root is present if $\rho = 0$. Easy to test. 
    - __2.__ The specification is less likely to run into multicollinearity problems. Remember: in finance we often find $Y$ is highly correlated with lags of itself but $\Delta Y$ is not. 

### Estimation of the $AR(p)$ with Deterministic Trend Model

* OLS can be done in the usual way. 

<br>
<br>

__Example:__ $Y = $ NYSE stock price

* $\Delta Y$ is the dependent variable in the regression below. 

In [6]:
## Julia code goes here!

### Testing in the $AR(p)$ with Deterministic Trend Model

* For everything except $\rho$, testing can be done in usual way using $t$-statistics and $p$-values.

* Hence, can use standard tests to decide whether or not to include deterministic trend. 

<br>
<br>

__Lag Length Selection__

* a common practice: begin with an $AR(p)$ model and look to see if the last coefficient, $\gamma_{p}$ is significant. If not, estimate an $AR(p-1)$ model and see if $\gamma_{p-1}$ is significant. If not, estimate and $AR(p-2)$ model, etc. 

__Example:__ $Y =$ NYSE Stock Price Data

* Sequential testing strategy leads us to drop the deterministic trend and go all the way back to a model with one lag, an $AR(1)$.

* $\Delta Y$ is the dependent variable in the regression.

In [7]:
## Julia code goes here!

### Testing for a Unit Root

* You might think you can test $\rho = 0$ in the same way (i.e. look at $p$-value and, if it is less than say $.05$, reject the unit root hypothesis, if not accept the unit root). 

* __THIS IS INCORRECT!__


* Justification: Difficult statistics 


* Essentially: The $t$-statistic correct, but the $p$-value (and standard error) is wrong.


* A correct test is the ___Dickey-Fuller Test___, which uses the $t$-statistic and compares it to a critical value. (They use simulaton to approximate the test's sampling distribution).


### Practical Advice on Unit Root Testing

* Most computer packages will do the unit root test for you and provide a critical value or a $p$-value for the Dickey-Fuller test. 


* If the $t$-statistic is less negative than the Dickey-Fuller critical value then accept the unit root hypothesis. 


* Else reject the unit root and conclude the variable is stationary (or trend stationary if your regression includes a determinstic trend)


* Alternatively, if you are using software which does not do the Dickey-Fuller test (e.g. Excel), use the following rough rule of thumb which should be okay if your sample size is moderately large (e.g. $T > 50$).

### Testing for a Unit-Root Approximation Strategy 

* Use the sequential testing strategy outlined above to estimate the $AR(p)$ with determinstic trend model. Record the $t$-stat corresponding to $\rho$ (i.e. the coefficient on $Y__{t-1}$).


* If the final version of your model includes a deterministic trend, the Dickey-Fuller critical value is approximately $-3.45$. If the $t$-stat on $\rho$ is more negative than $-3.45$, reject the unit root hypothesis and conclude that the series is stationary. Otherwise, conclude that the series has a unit root. 


* If the final version of your model does not include a deterministic trend, the Dickey-Fuller critical value is approximately $-2.89$. If the $t$-stat on $\rho$ is more negative than this, reject the unit root hypothesis and conclude that the series is stationary. Otherwise, conclude that the series has a unit root.  

### __Example:__ $Y =$ NYSE Stock Price Data (Continued)

* The final version of the $AR(p)$ model did not include a deterministic trend. 


* The $t$-stat on $\rho$ is $-.063$, which is __not__ more negative than $-2.89$.


* Hence we can __accept__ the hypothesis that NYSE stock prices contain a unit root and are, in fact, a random walk.

## Chapter Summary

1. Many financial time series exhibit trend behavior, while their differences do not exhibit such behavior. 

2. The autocorrelation function is a common tool for summarizing the relationship between a variable and lags of itself. 

3. Autoregressive models are regression models used for working with time series variables. Such models can be written in two ways: 
    - one with $Y_{t}$ as the dependent variable
    - one with $\Delta Y_{t}$ as the dependent variable


4. The distinction between stationar and nonstationary models is a crucial one. 

5. Series with unit roots are the most common type of nonstationary series considered in financial research. 

6. If $Y_{t}$ has a unit root then the $AR(p)$ model with $\Delta Y_{t}$ as the dependent variable can be estimated using OLS. Standard statistical results hold for all coeficients except the coefficient on $Y_{t-1}$.

7. The Dickey-Fuller test is a test for the presence of a unit root. It involves testing whether the coefficient on $Y_{t-1}$ is equal to zero (in the $AR(p)$ model with $\Delta Y_{t}$ being the dependent variable). 

## Appendix 9.1: Mathematical Intuition for the $AR(1)$ Model