# Introduction to Python and Statsmodels

Welcome and thank you for your interest in Clear Future and Haver analystics offerings in Applied Time Series with Python. This Jupyter notebook is a short example that hopefully will fuel your curiosity. We encourage you to work through the notebook. As a quick start to execute any line of code you can use Shift-Enter. We look forward to seeing you at the Haver office. Good luck and have fun! Daniel and Abdel

Statsmodels is the library we will use for most of our time series work. It has a very general set of models we can estimate including OLS linear models, ARMA models, and VAR/VECM models. I would consider this a workhouse library in Python. It also has some very convenient example datasets we can work with directly from the library.

Lets suppose we are tasked with performing a business-cycle analysis of U.S. Real GDP. We can use some pretty cool startistical tools to help us better understand the long-term behavior of the U.S. economy in both cyclical and trend terms.

Let's start by importing the needed libraries. It is convention to assign a shorter name to many commonly used libraries. For example numpy as np

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import statsmodels.api as sm

Let's take a look at some example data that comes with the pandas library.

In [2]:
df = sm.datasets.macrodata.load_pandas().data

The print command will give you a look into the macrodata data set.

In [3]:
print(sm.datasets.macrodata.NOTE)

::
    Number of Observations - 203

    Number of Variables - 14

    Variable name definitions::

        year      - 1959q1 - 2009q3
        quarter   - 1-4
        realgdp   - Real gross domestic product (Bil. of chained 2005 US$,
                    seasonally adjusted annual rate)
        realcons  - Real personal consumption expenditures (Bil. of chained
                    2005 US$, seasonally adjusted annual rate)
        realinv   - Real gross private domestic investment (Bil. of chained
                    2005 US$, seasonally adjusted annual rate)
        realgovt  - Real federal consumption expenditures & gross investment
                    (Bil. of chained 2005 US$, seasonally adjusted annual rate)
        realdpi   - Real private disposable income (Bil. of chained 2005
                    US$, seasonally adjusted annual rate)
        cpi       - End of the quarter consumer price index for all urban
                    consumers: all items (1982-84 = 100, seasonally adju

The head and tail commands will give you a quick look at the first 5 observations of a dataframe. Try running the tail command om your own.

In [6]:
df.head()
df.tail()

Unnamed: 0,year,quarter,realgdp,realcons,realinv,realgovt,realdpi,cpi,m1,tbilrate,unemp,pop,infl,realint
198,2008.0,3.0,13324.6,9267.7,1990.693,991.551,9838.3,216.889,1474.7,1.17,6.0,305.27,-3.16,4.33
199,2008.0,4.0,13141.92,9195.3,1857.661,1007.273,9920.4,212.174,1576.5,0.12,6.9,305.952,-8.79,8.91
200,2009.0,1.0,12925.41,9209.2,1558.494,996.287,9926.4,212.671,1592.8,0.22,8.1,306.547,0.94,-0.71
201,2009.0,2.0,12901.504,9189.0,1456.678,1023.528,10077.5,214.469,1653.6,0.18,9.2,307.226,3.37,-3.19
202,2009.0,3.0,12990.341,9256.0,1486.398,1044.088,10040.6,216.385,1673.9,0.12,9.6,308.013,3.56,-3.44


We need to index the dataframe with time rather than an ordinal ranking. Look again at the prior line of code. The "year" is currently a series in the dataframe, but the DF is not time indexed. The next few lines will reassign the year as the index.

In [7]:
index = pd.Index(sm.tsa.datetools.dates_from_range('1959Q1', '2009Q3'))

In [None]:
df.index = index

In [None]:
df.head()

Lets plot U.S. real GDP and add a label. Any observations by looking at the level series of U.S. real GDP? 

In [None]:
df['realgdp'].plot()
plt.ylabel("REAL GDP")

## Using Statsmodels to get the trend

The Hodrick-Prescott filter separates a time-series  y_t  into a trend  τ_t and a cyclical component  ζt

We will use a routine in the statsmodels library to perform a Hodrick Prescott filtering on the data.

We will create both a cyclical series (gdp_cycle) and a trend series (gdp_trend)

In [None]:
gdp_cycle, gdp_trend = sm.tsa.filters.hpfilter(df.realgdp)

In [None]:
gdp_cycle

In [None]:
plt.plot(gdp_cycle)

If you ever need to check what the data type is....use the type() command

In [None]:
type(gdp_cycle)

In [None]:
df["trend"] = gdp_trend

Lets plot the HP-filtered long-run trend with the actual data series. Sometimes the trend-series is thought of as potential GDP.

In [None]:
df[['trend','realgdp']].plot()

If you wanted to subset the data using the time index you created earlier, you can easily do that.

In [None]:
df[['trend','realgdp']]["2000-03-31":].plot(figsize=(12,8))

What observations do you draw from filtering U.S. Real GDP into business and trend components? Does it agree with your historical understanding of the U.S. economy?