Creating a PeriodIndex from Arrays

Fixed frequency data sets are sometimes stored with timespan information spread across multiple columns. For example, in this macroeconomic data set, the year and quarter are in different columns:

In [1]:
import pandas as pd
import numpy as np
from pandas import DataFrame, Series

In [3]:
data = pd.read_csv('../../CSV Files/O_Reilly/ch08/macrodata.csv')

In [7]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 203 entries, 0 to 202
Data columns (total 14 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   year      203 non-null    float64
 1   quarter   203 non-null    float64
 2   realgdp   203 non-null    float64
 3   realcons  203 non-null    float64
 4   realinv   203 non-null    float64
 5   realgovt  203 non-null    float64
 6   realdpi   203 non-null    float64
 7   cpi       203 non-null    float64
 8   m1        203 non-null    float64
 9   tbilrate  203 non-null    float64
 10  unemp     203 non-null    float64
 11  pop       203 non-null    float64
 12  infl      203 non-null    float64
 13  realint   203 non-null    float64
dtypes: float64(14)
memory usage: 22.3 KB


In [11]:
data.year.head(), data.quarter.head()

(0    1959.0
 1    1959.0
 2    1959.0
 3    1959.0
 4    1960.0
 Name: year, dtype: float64,
 0    1.0
 1    2.0
 2    3.0
 3    4.0
 4    1.0
 Name: quarter, dtype: float64)

By passing these arrays to *PeriodIndex/ with a frequency, they can be combined to form an index for the DataFrame:

In [12]:
index = pd.PeriodIndex(year = data.year, quarter = data.quarter, freq = 'Q-Dec')

index

PeriodIndex(['1959Q1', '1959Q2', '1959Q3', '1959Q4', '1960Q1', '1960Q2',
             '1960Q3', '1960Q4', '1961Q1', '1961Q2',
             ...
             '2007Q2', '2007Q3', '2007Q4', '2008Q1', '2008Q2', '2008Q3',
             '2008Q4', '2009Q1', '2009Q2', '2009Q3'],
            dtype='period[Q-DEC]', length=203)

In [13]:
data.index = index