# Basic Regression Analysis with Time Series Data
## The Nature of Time Series Data

One key feature that distinguishs time series data from cross-sectional data is, temporal ordering. And another one, that the OLS estimators are considered to be $r.v.$s while the time series data set, is considered as a ***realization*** of a **stochastic process**. And since time is irreversible we can only obtain one **realization**. The **sample size** for a time series data set is the *number of time periods* over which we observe the variables of interest.

## Examples of Time Series Regression Models
### Static Models

Suppose that we have time series data available on two variables, $y$ and $z$, where $y_t$ and $z_t$ are dated contemporaneously (marked with same date). Then, we have a typical ***static model*** relating $y$ to $z$:

$$\DeclareMathOperator*{\argmin}{argmin}
\DeclareMathOperator*{\argmax}{argmax}
\DeclareMathOperator*{\plim}{plim}
\newcommand{\ffrac}{\displaystyle \frac}
\newcommand{\d}[1]{\displaystyle{#1}}
\newcommand{\space}{\text{ }}
\newcommand{\bspace}{\;\;\;\;}
\newcommand{\bbspace}{\;\;\;\;\;\;\;\;}
\newcommand{\QQQ}{\boxed{?\:}}
\newcommand{\void}{\left.\right.}
\newcommand{\Tran}[1]{{#1}^{\mathrm{T}}}
\newcommand{\CB}[1]{\left\{ #1 \right\}}
\newcommand{\SB}[1]{\left[ #1 \right]}
\newcommand{\P}[1]{\left( #1 \right)}
\newcommand{\abs}[1]{\left| #1 \right|}
\newcommand{\norm}[1]{\left\| #1 \right\|}
\newcommand{\given}[1]{\left. #1 \right|}
\newcommand{\using}[1]{\stackrel{\mathrm{#1}}{=}}
\newcommand{\asim}{\overset{\text{a}}{\sim}}
\newcommand{\RR}{\mathbb{R}}
\newcommand{\EE}{\mathbb{E}}
\newcommand{\II}{\mathbb{I}}
\newcommand{\NN}{\mathbb{N}}
\newcommand{\ZZ}{\mathbb{Z}}
\newcommand{\QQ}{\mathbb{Q}}
\newcommand{\PP}{\mathbb{P}}
\newcommand{\AcA}{\mathcal{A}}
\newcommand{\FcF}{\mathcal{F}}
\newcommand{\AsA}{\mathscr{A}}
\newcommand{\FsF}{\mathscr{F}}
\newcommand{\dd}{\mathrm{d}}
\newcommand{\I}[1]{\mathrm{I}\left( #1 \right)}
\newcommand{\N}[1]{\mathcal{N}\left( #1 \right)}
\newcommand{\Exp}[1]{\mathrm{E}\left[ #1 \right]}
\newcommand{\Var}[1]{\mathrm{Var}\left[ #1 \right]}
\newcommand{\Avar}[1]{\mathrm{Avar}\left[ #1 \right]}
\newcommand{\Cov}[1]{\mathrm{Cov}\left( #1 \right)}
\newcommand{\Corr}[1]{\mathrm{Corr}\left( #1 \right)}
\newcommand{\ExpH}{\mathrm{E}}
\newcommand{\VarH}{\mathrm{Var}}
\newcommand{\AVarH}{\mathrm{Avar}}
\newcommand{\CovH}{\mathrm{Cov}}
\newcommand{\CorrH}{\mathrm{Corr}}
\newcommand{\ow}{\text{otherwise}}y_t = \beta_0 + \beta_1 z_t + u_t,\bspace t = 1,2,\dots, n$$

Usually, the static model is postulated when a change in $z$ at time $t$ is believed to have an immediate effect on $y$: $\Delta y_t = \beta_1 \Delta z_t$, when $\Delta z_t = 0$. Another possible reason is we want to inspect the tradeoff between $y$ and $z$.

$Remark$

>This is similar to what we've learned before.

### Finite Distributed Lag Models
In a finite distributed lag (***FDL***) model, one or more variables with a lag can affect $y$. An **FDL** of order $k$ can be written as

$$y_t = \alpha_0 + \delta_0 z_t + \delta_1 z_{t-1} + \cdots + \delta_k z_{t-k} + u_t$$

Here, $\delta_0$ is called the ***impact propensity*** or ***impact multiplier***. And the ***lag distribution*** can be obtained by graphing the $\delta_j$ as a function of $j$, and we plot the graph uses estimated values.

Given a permanent increase in $z$, we find that $y$ also increases permanently by $\sum_{i=0}^k \delta_i$ multiply that increase in $z$, after at most $k$ periods. We call this permanent increase in $y$, the ***long-run propensity*** (***LRP***), or ***long-run multiplier***. The **LRP** is often of interest in distributed lag models.

We can transfer the **FDL** model to **static** model by setting the lagged variables to $0$. Sometimes, the primary purpose assuming this model is to test whether $z$ has a lagged effect on $y$. We also omit $z_t$, the **impact propensity** occasionally.

And for any time horizon $h$, we can define the ***cumulative effect*** as $\delta_0 + \delta_1 + \cdots + \delta_h$, as the change in the expected outcome $h$ periods after a permanent, one unit increase in $z$. And the estimated **cumulative effect** can be plotted as a function of $h$. Then **LRP** can be written as

$$\text{LRP} = \delta_0 + \delta_1 + \cdots + \delta_k$$

Multicollinearity in **FDL** of order $k$ happens a lot. Estimating individual $\delta_j$ precisely would be hard. $\text{LRP}$, however, can often be estimated nicely.

**e.f.1**

Suppose that $\text{int}_t = 1.6 + 0.48 \text{inf}_t - 0.15\text{inf}_{t-1} + 0.32\text{inf}_{t-2} + u_t$ where $\text{int}$ is an interest rate and $\text{inf}$ is the inflation rate. What are the impact and long-run propensities?

>$0.48-0.15+0.32 = 0.65$
***
### A Convention about the Time Index
We can always start the time index at $t=1$. As for the initial observation, though the dependent variable at $t=1$ should be related to $z_{1}$, $z_0$ and $z_{-1}$. These three are the initial values in our sample.

## Finite Sample Properties of OLS under Classical Assumptions
### Unbiasedness of OLS

$Assumption$ $\text{TS}.1$

$\bspace$Linear in Parameters

>The stochastic process $\CB{\P{x_{t1},x_{t2}, \dots, x_{tk} ,y_t}: t = 1,2,\dots,n}$ follows the linear model:
>
>$$y_t = \beta_0 + \beta_1 x_{t1} + \cdots + \beta_k x_{tk} + u_t$$
>
>where $\CB{u_t:t=1,2,\dots,n}$ is the sequence of errors or disturbances. Here, $n$ is the number of observations (time periods).

Here, $x_{tj}$ denotes the observation at time period $t$, of $j^{\text{th}}$ **explanatory variables**, or **regressors**, or **independent variables**. And just like the terminology in cross-sectional regression, we call $y_t$ the **dependent variable**, **explained variable**, or **regressand**.

And we should think of $\text{TS}.1$ essentially, as same as $\text{MLR}.1$. Let $\mathbf{x}_t = \P{x_{t1}, x_{t2}, \dots,x_{tk}}$ denote the set of all independent variables in the equation at time $t$. Further let $\mathbf{X}$ denotes the collection of all independent variables for all time periods. $X$ can be thought as a matrix with $n$ rows and $k$ columns.

The $t^{\text{th}}$ row of $\mathbf{X}$ is $\mathbf{x}_t$, consisting of all independent variables for time period $t$.

$Assumption$ $\text{TS}.2$ 

$\bspace$No Perfect Collinearity

>In the sample (and therefore in the underlying time series process), no independent variable is constant nor a *perfect linear* combination of the others.

The issues are essentially same with cross-sectional data.

$Assumption$ $\text{TS}.3$

$\bspace$Zero Conditional Mean

>For each $t$, the expected value of the error $u_t$, given the explanatory variables for *all* time periods, is $0$. Mathematically,
>
>$$\Exp{u_t\mid \mathbf{X}} = 0,\bspace t = 1,2,\dots,n$$

And if $u_t$ is independent of $\mathbf{X}$ and $\Exp{u_t} = 0$, then this assumption automatically holds. One special condition of this assumption, are the ***contemporaneously exogenous*** explanatory variables. In conditional mean terms, we write

$$\Exp{u_t\mid x_{t1},x_{t2},\dots x_{tk}} = \Exp{u_t\mid \mathbf{x}_t} = 0 \Rightarrow \Corr{x_{tj},u_t} = 0,\forall\; j$$

The complete assumption $\text{TS}.3$ is called the ***strict exogeneity***, where $u_t$ must be uncorrelated with $x_{sj}$ where $s\neq t$. 

$Remark$

>**Exogeneity** can gurantee the consistency while unbiasedness requires **strict exogeneity**.
>
>And another way to interpret $\text{TS}.3$ is that the average value of $u_t$ is unrelated to the independent variables in all time periods. It really doesn't matter whether there's correlation in the independent variables or in the $u_t$ across time.
***

How would $\text{TS}.3$ fails? Two leading candidates for failure are *omitted variables* and *measurement error* in some of the regressors. And to keep the **strict exogeneity** assumption in **static** regression model, we need to make sure that $u_t$ is uncorrelated with not just $z_t$, but also the past and future value of $z$. This implies that

- $z$ has no lagged effect on $y$, otherwise the model is wrong.
- Changes in error term $u_t$ today, will have NO chance to cause future changes in $z$. There's no feedback from $y$ to future values of $z$.

And in **finite distributed lag** model, the second implication keeps. Feedback from $u$ to future $z$ should be ruled out. 

**e.f.2**

In the **FDL** model $y_t = \alpha_0 + \beta_0 z_t + \beta_1 z_{t-1} +u_t$, what to assume about the sequence $\CB{z_0,z_1,\dots,z_n}$ in order for $\text{TS}.3$ to hold?

>The explanatory variables are $x_{t1} = z_t$ and $x_{t2} = z_{t-1}$. The absence of perfect collinearity means that these CANNOT be constant, and there CANNOT be an exact linear relationship between them in the sample. 
***
$Theorem.1$ Unbiasedness of OLS

>Under assumptions $\text{TS}.1$ through $\text{TS}.3$, the OLS estimators are unbiased, conditional on $\mathbf X$, and therefore unconditionally as well: $\Exp{\hat\beta_j} = \beta_j$, $j = 0,1,\dots,k$

### The Variances of the OLS Estimators and the Gauss-Markov Theorem

$Assumption$ $\text{TS}.4$

$\bspace$Homoskedasticity

>Conditional on $\mathbf X$, the variance of $u_t$ is the same for all $t$: $\Var{u_t\mid \mathbf X} = \Var{u_t} = \sigma^2$, for $t=1,2,\dots,n$

And if not, we say that the errors are ***heteroskedastic***. Homoskedasticity says two things

- $u_t$ and $\mathbf{X}$ are independent
- $\Var{u_t}$ is a constant over time

$Assumption$ $\text{TS}.5$

$\bspace$No serial correlation

>Conditional on $\mathbf{X}$, the errors in two different time periods are uncorrelated: $\Corr{u_t,u_s\mid \mathbf{X}} = 0$, $\forall t\neq s$

And currently we'll only consider this assumption as $\Corr{u_t,u_s} = 0$, $\forall t\neq s$. When it's failed, we say the errors suffer from ***serial correlation***, or ***autocorrelation***. And also note that this assumption rules out the temporal correlation only in the error term, not the independent variables.

You may ask why this is not assumed in cross-sectional observations. That's because the random sampling assumption, which guarantee that $u_i$ and $u_h$ are independent for any two observations $i$ and $h$. And actually if random sampling assumption is violated, this could be an ideal substitute.

$\text{TS}.1$ through $\text{TS}.5$ are the **Gauss-Markov assumptions** for time series applications.

$Theorem.2$ OLS Sampling Variances

> Under the time series Gauss-Markov Assumptions $\text{TS}.1$ through $\text{TS}.5$, the variance of $\hat\beta_j$, conditional on $\mathbf{X}$, is
>
>$$\Var{\hat\beta_j\mid \mathbf X} = \ffrac{\sigma^2}{\text{SST}_j \P{1-R_j^2}},\bspace j=1,2,\dots,k$$
>
>where $\text{SST}_j$ is the total sum of squares of $x_{tj}$ and $R_j^2$ is the $R$-squared from the regression of $x_j$ on other independent variables.

This theorem and its proof are just like thoes in cross-sectional case.

$Theorem.3$ Unbiased estimation of $\sigma^2$

> Under $\text{TS}.1$ through $\text{TS}.5$, the estimator $\hat\sigma^2 = \ffrac{\text{SSR}}{df}$ is an unbiased estimator of $\sigma^2$, where $df = n-k-1$

$Theorem.4$ Gauss-Markov Theorem

>Under $\text{TS}.1$ through $\text{TS}.5$, the OLS estimators are the best linear unbiased estimators conditional on $\mathbf X$.

**e.f.3**

In the **FDL** model $y_t = \alpha_0 + \delta_0 z_t + \delta_1 x_{t-1} + u_t$, explain the multicollinearity in the explanatory variables.

>Suppose that $\CB{z_t}$ moves slowly over time, then $z_t$ and $z_{t-1}$ can be highly correlated, as can be seen in logs of many economic time series.

### Inference under the Classical Linear Model Assumptions

$Assumption$ $\text{TS}.6$

$\bspace$Normality

> The errors $u_t$ are independent of $X$ and are $i.i.d.$ of $\N{0,\sigma^2}$

Note that this assumption implies $\text{TS}.3$ through $\text{TS}.5$. And taking all assumptions together, we have

$Theorem.5$ Normal sampling distributions

>Under $\text{TS}.1$ through $\text{TS}.6$, the $\text{CLM}$ assumptions for time series, the OLS estimators are **normally distributed**, *conditional* on $\mathbf X$. Further, under the null hypothesis, each $t$ statistic has a $t$ distribution, and each $F$ statistic has an $F$ distribution. The usual construction of confidence intervals is also valid.

## Functional Form, Dummy Variables, and Index Numbers
We use natural logarithm often, that shows time series regression with constant percentage effects, in **static** or **lag** models.

And in **FDL**, **impact multiplier** $\delta_0$ is also called the ***short-run elasticity***, which is the immediate percentage change in $y_t$ given a $1\%$ increase in $z_t$. And similarly, **long-run multiplier** changes into ***long-run elasticity***.

Dummy variables is also very useful since it can indicate whether certain event happened at time $t$. They are key to ***event study***. And one more thing to notice is the ***index number***, which is usually called index. Like S&P index, they aggregate a vast amount of information into a single quantity, comparing a predetermined ***base period*** and the ***base value*** at that time.

## Trends and Seasonality
### Characterizing Trending Time Series

One popular formulation with trending behavior is to write the series $\CB{y_i}$ as 

$$y_t = \alpha_0 + \alpha_1 t + e_t,\bspace t=1,2,\dots$$

where, in the simplest case, $\CB{e_t}$ is an $i.i.d.$ sequence with $\Exp{e_t} = 0$ and $\Var{e_t} = \sigma_t^2$ and we call this ***linear time trend***. We'll interpret $\alpha_1$ as, *holding all other factors fixed, the change in $y_i$ from one period to the next due to the passage of time*.

This **linear time trend** implies that if $\Delta e_t = 0$, $\Delta y_t = y_t - y_{t-1} = \alpha1 + \Delta e_t = \alpha_1$. And another way to think this is that *the average value of this sequence is a linear function of time*: $\Exp{y_t} = \alpha_0 + \alpha_1 t$; and as for the variance we have $\Var{y_t} = \Var{e_t} = \sigma_e^2$.

**e.f.4**

Can a **linear trend** with $\alpha_1<0$ be realistic for all future time periods?

>It depends. It'll be more and more negative as $t$ gets larger. For some $y_t$ which can never be negative, a linear time trend with a negative trend coefficient cannot represent it in all future time periods.
***
Another typical trend is the ***exponential trend*** where we would write

$$\log\P{y_t} = \beta_0 + \beta_1 t + e_t,\bspace t =1,2,\dots$$

To interpret first we notive the approximation for small changes

$$\Delta \log\P{y_t} = \log\P{y_t} - \log\P{y_{t-1}} \approx\ffrac{y_t - y_{t-1}}{y_{t-1}}$$

The right hand side is called the ***growth rate*** in $y$ from period $t-1$ to period $t$. And if further assumed that $\Delta e_t = 0$, we have $\forall\,t$, $\Delta \log\P{y_t} = \beta_1$.

### Using Trending Variables in Regression Analysis

***Spurious regression problem*** would happen if there's a relationship between two or more trending variables. And to eliminate this problem we can simply add the time trend. And commonly, adding trends $t$ and $t^2$ would be enough.

$Remark$

>We'll include a trend when
>
>- if the dependent variable displays an obvious trending behaviour
>- if both the dependent and some independent variables have trends
>- if only some of the independent variables have trends, and *only when a trend is added* their effect on the dependent variable will be visible

### A Detrending Interpretation of Regressions with a Time Trend
Including a time trend is like detrending the original data series before regression. For instance we first obtain the fitted equation

$$\hat y_t = \hat \beta_0 + \hat \beta_1 x_{t1}+ \hat\beta_2 x_{t2} + \hat\beta_3 t$$

Similar to what we did in Chap_03, we can apply three-step OLS regression. $x_{t2}$ on $t$, then $x_{t2}$ on $x_{t1}$ then $y$ on $r_{t2}$.

$\P{1}$ Regress each of $y_t$, $x_{t1}$ and $x_{t2}$ on a constant and the time trend $t$ and save the residuals as $\ddot y_t$, $\ddot x_{t1}$ and $\ddot x_{t2}$.

$$
\begin{align}
y_t &= \hat\alpha_0 + \hat\alpha_1 t + \ddot y_t \\
x_{t1} &= \hat\eta_0 + \hat\eta_1 t +\ddot x_{t1}\\
x_{t2} &= \hat\xi_0 + \hat\xi_1 t + \ddot x_{t2}
\end{align}$$

Then $\ddot y_t$, $\ddot x_{t1}$ and $\ddot x_{t2}$ can be regarded as the detrended variables.

$\P 2$ Run the regression of $\ddot y$ on $\ddot x_{t1}$ and $\ddot x_{t2}$. This regression exactly yields $\hat\beta_1$ and $\hat\beta_2$. Here's a proof for one-variable case.

$Proof$ case: $k=1$ 

>Write the model as $y_t = \beta_0 + \beta_1 x_{t1} + \beta_2 t + u_t$. To detrend, first we have $y_t = \hat\alpha_0 + \hat\alpha_1 t + \ddot y_t$ and $x_{t1} = \hat\eta_0 + \hat\eta_1 t + \ddot x_{t1}$. Then regress the equation
>
>$$\ddot y_t = \hat\theta_1 \ddot x_{t1} + v_t \Rightarrow \hat\theta_1 = \ffrac{\sum\limits_{t=1}^{n} \ddot x_{t1} \ddot y_t}{\sum\limits_{t=1}^{n} \ddot x_{t1}^2}$$
>
>Then using the two-step OLS estimation, we have $x_{t1} = \hat\delta_0 + \hat\delta_1 t + \hat r_{t1}$ and $y_t = \hat\beta_0 + \hat\beta_1 \hat r_{t1} + w_t$. Then, 
>
>$$\hat\beta_1 = \ffrac{\sum\limits_{t=1}^n \hat r_{t1} y_t}{\sum\limits_{t=1}^n \hat r_{t1}^2}$$
>
>Remember that $\sum\limits_{t=1}^n t\cdot \hat r_{t1} = 0$ and notice that $\hat r_{t1} = \ddot x_{t1}$, then
>
>$$\hat\beta_1 = \ffrac{\sum\limits_{t=1}^n \hat r_{t1} \P{\hat\alpha_0 + \hat\alpha_1 t + \ddot y_t}}{\sum\limits_{t=1}^n \hat r_{t1}^2} = \ffrac{\sum\limits_{t=1}^n \ddot x_{t1}\ddot y_t}{\sum\limits_{t=1}^n \ddot x_{t1}^2} + \alpha_0 \ffrac{\sum\limits_{t=1}^n \hat r_{t1}}{\sum\limits_{t=1}^n \hat r_{t1}^2} + \alpha_1 \ffrac{\sum\limits_{t=1}^n \hat r_{t1}  \cdot t }{\sum\limits_{t=1}^n \hat r_{t1}^2} = \hat\theta_1$$

So we may draw the conclusion that *the OLS coefficients in a regression **including a trend** are the **same** as the coefficients in a regression **without a trend** but where **all** the variables have been **detrended before the regression***.

### Computing $R$-Squared when the Dependent Variable is Trending
Usually it'll be higher, comparing with typical $R$-squareds for cross-sectional data, expecially when the dependent variable is trending. 

The adjusted $R$-squared is written as $\bar R^2 = 1 - \ffrac{\hat\sigma_u^2}{\hat\sigma_y^2}$, where $\hat \sigma_u^2$ is the **unbiased estimator of the error variance**, $\hat\sigma_y^2 = \ffrac{\text{SST}}{n-1} = \ffrac{\sum_{t=1}^n \P{y_t - \bar y}^2}{n-1}$. With these formulas we can easily calculate the **error variance** *given that a time trend is included in the regression.* However, $\ffrac{\text{SST}}{n-}$ may overestimate the variance in $y_t$, because it does NOT account for the trend in $y_t$. 

And if there's any polynomial trend, we can calculate the $R$-squared of $y = \beta_0 + \beta_1 x_{t1} + \beta_2 x_{t2} + \beta_3 t + u_t$ by

1. detrend the dependent variable: regress $y_t$ on $t$ to obtain the residuals $\ddot y_t$
2. regress the residual on the independent variables: $\ddot y_t$ on $x_{t1}$, $x_{t2}$ and $t$
3. $R^2 = 1 - \ffrac{\text{SSR}}{\sum_{t=1}^n \ddot y_t^2}$ where $\text{SSR}$ is identical to the sum of squared residuals from the original model

We prefer using this because it's smaller and has net out the effect of the time trend, based on the fact that $\sum_{t=1}^n \ddot y_t^2 \leq \sum_{t=1}^n \P{y_t - \bar y}^2$. And the corresponding adjusted $R$-squared is $\bar R^2 = 1 - \ffrac{\frac{1}{n-4}\text{SSR}}{\frac{1}{n-2}\sum_{t=1}^n \ddot y_t^2}$ where $n-4$ is the $df$ of the original model and $n-2$ is the $df$ of the restricted model.

***

One more thing, in computing the $R$-squared form of an $F$ statistic for testing multiple hypotheses, usual $R$-squared without any detrending would be enough.

### Seasonality

Some data series display seasonal patterns, and are often ***seasonally adjusted*** before they are reported for public use, meaning the seasonal factors are removed. However there's time when we work with seasonally unadjusted data and that's the time we'll add a set of ***seasonal dummy variables*** to account for seasonality in the dependent variable, the independent variables, or both. We call this ***deseasonalizing*** the data and we'll write

$$y_t = \beta_0 + \beta_1 x_{t1} + \beta_2 x_{t2} + \cdots + \beta_k x_{tk} + u_t\\
\Downarrow\\
y_t = \beta_0 + \beta_1 x_{t1} + \beta_2 x_{t2} + \cdots + \beta_k x_{tk} + u_t + \delta_1 \text{Jan}_t + \delta_2 \text{Feb}_t + \cdots + \delta_{11} \text{Nov}_t$$

to eliminate the monthly effect to the model. Note that here we would put only $11$ months rather than $12$ to avoid the... some kind of trap, with $\text{Dec}$ as the base month, $\beta_0$ is the intercept for December. Besides, thougth $F$ test, $\delta_1 = \cdots = \delta_{11} = 0$, we can determine whether there's seasonality in $y_t$ or not.

Further discussions are skipped.
***

**e.f.5**

What's the intercept for March? And why the seasonal dummy variables satisfy the strict exogeneity assumption?

>For March, its intercept is $\mathbf \beta_0 + \delta_3$. Seasonal dummy variables are strictly exogenous because they follow a deterministic pattern. For example, the months do not change based upon whether either the explanatory variables or the dependent variable change.
***