# Time Series Econometrics with Python
## Section 2: Stationary and Non-Stationary Series

### Carlos Góes (UnB and Instituto Mercado Popular)

# Stationarity

Data are stationary when their expected value and variance do not change over time. These exclude, then, data that have seasonal patterns (think of temperatures over the year) and data that have trends (think of GDP). The expected temperature in Los Angeles in July is different than the expected temperature for the same city in December (one should expect December to be colder). Likewise, the expected GDP of the United Kingom 2013 will be different from the expected GDP of the UK in 2113 (one should expect it to be higher in 2113 due to technological advancement and productivity gains).

We usually try to work with stationary data before designing a econometric model. If these are not available, we make adjustments and transformations in the data to make inference more reliable and make non-stationary data close to stationarity. We will deal with seasonality and trends later. For now, we will learn about some kinds of stationary and non-stationary data and why do we try to work with stationary data.

## Okay, so what does stationary data look like?

We will compare two different kinds of randomly generated processes so you can have an idea: white noises and random walks.

An example of stationary data is white noise, formally defined as a stochastic (i.e., random) process with zero autocorrelation. $y$ is a white noise process if: 

\begin{eqnarray}
    y_t &=& e_{t}, \qquad where \\
    E[\epsilon_t] &=& 0, \qquad
    E[\epsilon_t^2] = \sigma^2, \qquad
    E[\epsilon_te_s] = 0_{\quad t \not= s}, \qquad  
    E[y_t] = 0 \nonumber
\end{eqnarray}

In a white noise process, besides being randomly generated, the value of $y$ at period $t$ is unrelated to the value of $y$ in period $t-1$ \footnote{This is exactly what the expression $E[\epsilon_te_s] = 0_{\quad t \not= s}$ means: the expected value of $e$ at period $t$ $(E[\epsilon_t])$ and the expected value of $e$ at period $s$ $(E[e_s])$ will be zero, regardless of their independent actual values}. All values will orbit a certain mean, which is the fixed expected value around which the distribution of random terms is centered (in this case, zero).

In [16]:
# Import all packages

import numpy as np
import scipy as sp
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.formula.api as smf
import statsmodels.api as sm
import seaborn as sns
import csv

# Set the file where your data is located

file = "C:\\Users\\Carlos\\Documents\\GitHub\\cgoes\\tutorial\\python\\Time Series\2. Stationary and Non-Stationary Series\\totalrealearnings.csv"

data = pd.read_csv(file)

print(data)


#data = pd.read_table(file, header=0, index_col='Date')

FileNotFoundError: File b'C:\\Users\\Carlos\\Documents\\GitHub\\cgoes\\tutorial\\python\\Time Series\x02. Stationary and Non-Stationary Series\\totalrealearnings.csv' does not exist