## Dependency

Suppose we observe a time series $\{x_t$$ at times $t_1, \ldots, t_n$. The distribution of the observed data $x_{t_1}, \ldots, x_{t_n}$ is characterized by the joint distribution function:

$
F_{t_1, \ldots, t_n}(c_1, \ldots, c_n) = P(x_{t_1} \leq c_1, \ldots, x_{t_n} \leq c_n)
$

Usually, this is difficult to work with. We may consider the marginal distribution at time $t$:

$
F_t(x) = P(x_t \leq x)
$

or the marginal density:

$
f_t(x) = \frac{\partial F_t(x)}{\partial x}
$

assuming this quantity exists.

## The Mean Function

The **mean function** of a time series is:

$
\mu_{x_t} = E(x_t)
$

assuming this expectation exists. Note that the mean is a function of $ t $. When there is no ambiguity about which time series we are referring to, we may instead write $\mu_t$.

### Example 1.14 Mean Function of a Moving Average Series

$w_t$ denotes a white noise series.
- $\mu_{wt}=E(w_t)=0$

$\mu_{vt}=E(v_t)=\frac{1}{3}[E(w_{t-1})+E(w_{t})+E(w_{t+1})]=0$


### Example 1.15 Mean Function of a Random Walk with Drift

$x_{t} = \delta\,t + \sum_{j=1}^{t}w_{j}, \qquad t=1,2,\ldots$
- $E(w_t)=0$ for all $t$, and $\delta$ is a constant.

$\mu_{xt} = \mathrm{E}(x_{t}) = \delta\,t + \sum_{j=1}^{t}\mathrm{E}(w_{j}) = \delta\,t$

which is a straight line with slope $\delta$.


### Example 1.16: Mean Function of Signal Plus Noise

$\begin{aligned}
\mu_{xt} = \mathrm{E}(x_{t}) 
&= \mathrm{E}\left[2\cos\left(2\pi\frac{t+15}{50}\right) + w_{t}\right] \\
&= 2\cos\left(2\pi\frac{t+15}{50}\right) + \mathrm{E}(w_{t}) \\
&= 2\cos\left(2\pi\frac{t+15}{50}\right)
\end{aligned}$


## Autocovariance Function

**Definition 1.2**: The autocovariance function is defined as the product moment

$
\gamma_{x}(s,t) = \operatorname{cov}(x_{s}, x_{t}) = \operatorname{E}[(x_{s} - \mu_{s})(x_{t} - \mu_{t})], \quad (1.10)
$

for all $s$ and $t$. When no possible confusion exists about which time series we are referring to, we will drop the subscript and write $\gamma_{x}(s,t)$ as $\gamma(s,t)$. Note that $\gamma_{x}(s,t) = \gamma_{x}(t,s)$ for all time points $s$ and $t$.

- The autocovariance function measures linear dependence in time.
- If $x_s$ and $x_t$ are jointly Normally distribution, $\gamma(s,t)=0$ implies independence.

### Example 1.17 Autocovariance of White Noise

By definition, the white noise series $w_t$ has $\mathrm{E}(w_t)=0$ and

$
\gamma_w(s,t) = \operatorname{cov}(w_s, w_t) = 
\begin{cases}
\sigma_w^{2} & s = t, \\
0 & s \neq t.
\end{cases} \quad (1.12)
$


## Covariance of Linear Combination

**Property 1.1**: **Covariance of Linear Combinations**

If the random variables

$
U = \sum_{j=1}^{m} a_j X_j \quad \text{and} \quad V = \sum_{k=1}^{r} b_k Y_k
$

are linear combinations of (finite variance) random variables $\{X_j\}$ and $\{Y_k\}$, respectively, then

$
\operatorname{cov}(U, V) = \sum_{j=1}^{m} \sum_{k=1}^{r} a_j b_k \operatorname{cov}(X_j, Y_k). \quad (1.13)
$

Furthermore, $\operatorname{var}(U) = \operatorname{cov}(U, U)$.

### Example 1.18: Autocovariance of a Moving Average

$\gamma_{v}(s,t) = \operatorname{cov}(v_{s}, v_{t}) = \operatorname{cov}\left\{ \frac{1}{3}\left(w_{s-1} + w_{s} + w_{s+1}\right), \frac{1}{3}\left(w_{t-1} + w_{t} + w_{t+1}\right)\right\}.$

Noting that $\operatorname{cov}(w_{s}, w_{t}) = 0$ for $s \neq t$. 

When $s = t$ we have:

$\begin{aligned}
\gamma_{v}(t,t) 
&= \frac{1}{9}\operatorname{cov}\{(w_{t-1} + w_{t} + w_{t+1}), (w_{t-1} + w_{t} + w_{t+1})\} \\
&= \frac{1}{9}[\operatorname{cov}(w_{t-1}, w_{t-1}) + \operatorname{cov}(w_{t}, w_{t}) + \operatorname{cov}(w_{t+1}, w_{t+1}) \\
&+ \operatorname{cov}(w_{t-1}, w_{t}) + \operatorname{cov}(w_{t-1}, w_{t+1}) + \operatorname{cov}(w_{t+1}, w_{t}) \\
&+ \operatorname{cov}(w_{t+1}, w_{t-1}) + \operatorname{cov}(w_{t}, w_{t-1}) + \operatorname{cov}(w_{t}, w_{t+1})] \\
&= \frac{1}{9}[\operatorname{cov}(w_{t-1}, w_{t-1}) + \operatorname{cov}(w_{t}, w_{t}) + \operatorname{cov}(w_{t+1}, w_{t+1})] \\
&= \frac{3}{9}\sigma_{w}^{2}=\frac{1}{3}\sigma_{w}^{2}
\end{aligned}$

When $s=t+1$ we have:

$\begin{aligned}
\gamma_v(t+1,t) 
&= \frac{1}{9}\text{cov}\{(w_t + w_{t+1} + w_{t+2}), (w_{t-1} + w_t + w_{t+1})\}\\
&= \frac{1}{9}[\text{cov}(w_t, w_t) + \text{cov}(w_{t+1}, w_{t+1})]\\
&= \frac{2}{9}\sigma_w^2
\end{aligned}$

Example 1.18 shows clearly that the smoothing operation introduces a covariance function that decreases as the separation between the two time points increases and disappears completely when the time points are separated by three or more time points.

### Example 1.19: Autocovariance of a Random Walk

For the random walk model, $x_t = \sum_{j=1}^t w_j$, we have:

$\gamma_x(s,t) = \text{cov}(x_s, x_t) = \text{cov}\left(\sum_{j=1}^s w_j, \sum_{k=1}^t w_k\right) = \min\{s,t$ \sigma_w^2,$

For example, with $s = 2$ and $t = 4$:

$\text{cov}(x_2, x_4) = \text{cov}(w_1 + w_2, w_1 + w_2 + w_3 + w_4) = 2\sigma_w^2$

The variance of the random walk, $\text{var}(x_t) = \gamma_x(t,t) = t \sigma_w^2$, increases without bound as time $t$ increases.

## Autocorrelation Function

We can normalize the autocovariance function in the usual way to obtain the **autocorrelation function**:

$
\rho(s, t) = \frac{\gamma(s, t)}{\sqrt{\gamma(s, s) \gamma(t, t)}}
$

- $ -1 \leq \rho(s, t) \leq 1 $
- Size and sign measure strength and direction of linear relationship
- Measure of how well we can forecast with a linear model

## Bivariate Time Series

Suppose we have two time series $\{x_t\}$ and $\{y_t\}$ and we consider the problem of predicting $\{y_t\}$ using $\{x_t\}$. We can consider the **cross-covariance function**:

$
\gamma_{xy}(s, t) = \text{Cov}(x_s, y_t) = E[(x_s - \mu_{x_s})(y_t - \mu_{y_t})]
$

and the **cross-correlation function**:

$
\rho_{xy}(s, t) = \frac{\gamma_{xy}(s, t)}{\sqrt{\gamma_x(s, s) \gamma_y(t, t)}}
$

## Stationarity

In estimation problems, we often consider data consisting of multiple observations from a model, e.g.:

$
X_1, \ldots, X_n \sim \text{iid } F
$

However, in time series, our data $ x_{t_1}, \ldots, x_{t_n} $ often consists of a single observation from a model.

To perform estimation and inference, we might ask:

- Does $\{x_t\}$ exhibit some behavior independent of the absolute point in time?  
- Does $\{x_t\}$ have some nice long-run time behavior?  

If so, then we can draw stronger conclusions about the model from observing many time points, rather than needing many realizations.

A time series $x_t$ is **strictly stationary** if for all $k = 1, 2, \ldots$, all choices of time points $t_1, \ldots, t_k$, and all time shifts $h = \pm 0, \pm 1, \ldots$,

$
(x_{t_1}, \ldots, x_{t_k}) \stackrel{d}{=} (x_{t_1 + h}, \ldots, x_{t_k + h})
$

where $\stackrel{d}{=}$ denotes equality in distribution.

If a time series is strictly stationary, then the joint distribution of the observed values depends only on their spacing in time and not on their absolute points in time.

**Definition 1.6**: A strictly stationary time series is one for which the probabilistic behavior of every collection of values is identical to that of the time-shifted set; i.e.,

$
\{x_{t_{1}},x_{t_{2}},\ldots,x_{t_{k}}\} \stackrel{d}{=} \{x_{t_{1}+h},x_{t_{2}+h},\ldots,x_{t_{k}+h}\}\quad (1.19)
$  


for all:
- $k = 1,2,...$
- Time points $t_{1},t_{2},\ldots,t_{k}$
- Time shifts $h = 0,\pm 1,\pm 2,...$

where $\stackrel{d}{=}$ denotes equality in distribution.

#### Implications of Strict Stationarity:
1. **Distributional Consistency**: All multivariate distributions must match their shifted counterparts for any shift $h$.
2. **Single Point Case** ($k=1$):
   $
   x_{s} \stackrel{d}{=} x_{t} \quad \text{(1.20)}
   $
   - Implies identical probability distributions at all time points
   - Example: Probability of negative value at 1am equals that at 10am
   - If mean function $\mu_t$ exists, it must be constant ($\mu_s = \mu_t$)
   - Counterexample: Random walk with drift violates this (time-dependent mean)

3. **Two Point Case** ($k=2$):
   $
   \{x_{s}, x_{t}\} \stackrel{d}{=} \{x_{s+h}, x_{t+h}\} \quad \text{(1.21)}
   $
   - Autocovariance function satisfies:
     $
     \gamma(s,t) = \gamma(s+h,t+h)
     $
   - Implies autocovariance depends only on time differences $|s-t|$, not absolute times

#### Practical Considerations:
- Strict stationarity is too strong for most applications
- Difficult to verify from a single dataset
- Leads to a weaker definition focusing only on first two moments (mean and covariance)

## Weakly stationary time series

**Definition 1.7**: A weakly stationary time series, $ x_t $, is a finite variance process satisfying:

1. **Constant Mean**:
   $
   \mu_t = E(x_t) \text{ is constant for all } t
   $
   (The mean does not depend on time)

2. **Autocovariance Depends Only on Time Differences**:
   $
   \gamma(s,t) = \text{cov}(x_s, x_t) \text{ depends only on } |s - t|
   $

Henceforth, we will use the term *stationary* to mean *weakly stationary*; if a process is stationary in the strict sense, we will use the term *strictly stationary*.

Because the mean function, $\mathrm{E}(x_{t}) = \mu_{t}$, of a stationary time series is independent of time $t$, we will write:

$
\mu_{t} = \mu \quad \text{(1.22)}
$

Also, because the autocovariance function, $\gamma(s,t)$, of a stationary time series, $x_{t}$, depends on $s$ and $t$ only through their difference $|s-t|$, we may simplify the notation. Let $s = t + h$, where $h$ represents the time shift or *lag*. Then:

$
\gamma(t+h,t) = \operatorname{cov}(x_{t+h},x_{t}) = \operatorname{cov}(x_{h},x_{0}) = \gamma(h,0)
$

because the time difference between times $t + h$ and $t$ is the same as the time difference between times $h$ and $0$. Thus, the autocovariance function of a stationary time series does not depend on the time argument $t$. Henceforth, for convenience, we will drop the second argument of $\gamma(h,0)$ as $\gamma(h)$.


**Definition 1.8**: The autocovariance function of a stationary time series is defined as:

$
\gamma(h) = \operatorname{cov}(x_{t+h}, x_t) = \operatorname{E}[(x_{t+h} - \mu)(x_t - \mu)] \quad \text{(1.23)}
$

**Definition 1.9**: The **autocorrelation function** (ACF) of a stationary time series is defined as:

$
\rho(h) = \frac{\gamma(t+h,t)}{\sqrt{\gamma(t+h,t+h)\gamma(t,t)}} = \frac{\gamma(h)}{\gamma(0)} \quad \text{(1.24)}
$

**Properties**:
- By the Cauchy-Schwarz inequality:  
  $
  -1 \leq \rho(h) \leq 1 \quad \text{for all } h
  $
- Enabling one to assess the relative importance of a given autocorrelation value by comparing with the extreme values $-1$ and $1$