# 1.3 - Gathering Greenbook Forecast Data

This script gathers Greenbook forecast data in order to purge the series constructed in 1.2 of the Fed Information Effect following Mirranda-Aggripino (2016). Variables gathered are Greenbook forecast revisions for Real GDP, GDP Price Inflation, and the Unemployment Rate.

The Greenbook data comes from the Philadelphia Fed in *.xlsx* format, and is found at **philadelphiafed.org/research-and-data/real-time-center/greenbook-data/philadelphia-data-set**.

Eventual goal is to get as many quarters-out of forecast revisions as is available for *all* meeting dates - the minimum quarters-out available for all dates, for all variables, is taken for every date, for every variable.

### Preamble

This script makes use of...

- `pandas`
- Regular Expressions (`re`)
- `datetime`
- `time`
- `itertools`
- NumPy

In [11]:
import pandas as pd
import re
from datetime import datetime
import time
from itertools import compress
import numpy as np

### Import Meeting Dates

This code imports the meeting dates scraped in script 1.1.

In [12]:
FOMC_dates = []

with open('dates.csv','r') as FOMC_dates_file:
    
    for line in FOMC_dates_file: # The "for" loop is somewhat redundant - the file has only one line.
        
        raw_FOMC_dates = line.split(',')

for date in raw_FOMC_dates:
    
    unix_date = int(re.search("[0-9]+", date).group(0)) # isolates integer from string

    FOMC_dates.append(unix_date)

### Initialising Dataframes

In [13]:
real_gdp_df = pd.read_excel('gRGDP_1985_Last.xlsx')

real_gdp_df.set_index('Date', inplace = True, drop = True) # Set dataframe index to date

inflation_df = pd.read_excel('gPGDP_1985_Last.xlsx') 

inflation_df.set_index('Date', inplace = True, drop = True)

unemployment_df = pd.read_excel('UNEMP_1985_Last.xlsx')

unemployment_df.set_index('Date', inplace = True, drop = True)

### Changing Dataframe Dates - Columns

The columns of each dataframe are labelled by the corresponding Greenbook date. Each Greenbook bears a date that is roughly a week prior to the FOMC meeting to which it corresponds. In the below block these dates are changed to those established as meeting dates in script 1.1 (in Unix time). It also marks dates prior to 1994 to be dropped.

In [14]:
gdp_dates = []

inflation_dates = []

unemployment_dates = [] # In theory, these three lists should be identical

### GDP ###

for raw_date in real_gdp_df.columns:
    
    eight_fig_date = re.search("\d{8}", raw_date).group(0) # Isolates the yyyymmdd date in the column title
    
    datetime_date = datetime.strptime(eight_fig_date, '%Y%m%d')
    
    greenbook_unix = time.mktime(datetime_date.timetuple())
    
    if greenbook_unix < 757382400: # 757382400 is the start of 1994 in unix.
        
        gdp_dates.append(0) # 0 indicates date is to be dropped
        
        continue
    
    date_corresponds_booleans = [(x > greenbook_unix and x < greenbook_unix + 1814400) 
                                 for x in FOMC_dates] # Returns booleans for whether FOMC date occurs within 3 weeks 
                                                      # (1,814,400 seconds) of Greenbook date
    
    corresponding_dates = list(compress(FOMC_dates,date_corresponds_booleans))
    
    if len(corresponding_dates) != 1:
        
        print("Error with " + str(datetime_date))
    
    gdp_dates.append(corresponding_dates[0])

real_gdp_df.columns = gdp_dates

### Inflation ###

for raw_date in inflation_df.columns:
    
    eight_fig_date = re.search("\d{8}", raw_date).group(0) # Isolates the yyyymmdd date in the column title
    
    datetime_date = datetime.strptime(eight_fig_date, '%Y%m%d')
    
    greenbook_unix = time.mktime(datetime_date.timetuple())
    
    if greenbook_unix < 757382400: # 757382400 is the start of 1994 in unix.
        
        inflation_dates.append(0) # 0 indicates date is to be dropped
        
        continue
    
    date_corresponds_booleans = [(x > greenbook_unix and x < greenbook_unix + 1814400) 
                                 for x in FOMC_dates] # Returns booleans for whether FOMC date occurs within 3 weeks 
                                                      # (1,814,400 seconds) of Greenbook date
    
    corresponding_dates = list(compress(FOMC_dates,date_corresponds_booleans))
    
    if len(corresponding_dates) != 1:
        
        print("Error with " + str(datetime_date))
    
    inflation_dates.append(corresponding_dates[0])

inflation_df.columns = inflation_dates

### Unemployment ###

for raw_date in unemployment_df.columns:
    
    eight_fig_date = re.search("\d{8}", raw_date).group(0) # Isolates the yyyymmdd date in the column title
    
    datetime_date = datetime.strptime(eight_fig_date, '%Y%m%d')
    
    greenbook_unix = time.mktime(datetime_date.timetuple())
    
    if greenbook_unix < 757382400: # 757382400 is the start of 1994 in unix.
        
        unemployment_dates.append(0) # 0 indicates date is to be dropped
        
        continue
    
    date_corresponds_booleans = [(x > greenbook_unix and x < greenbook_unix + 1814400) 
                                 for x in FOMC_dates] # Returns booleans for whether FOMC date occurs within 3 weeks 
                                                      # (1,814,400 seconds) of Greenbook date
    
    corresponding_dates = list(compress(FOMC_dates,date_corresponds_booleans))
    
    if len(corresponding_dates) != 1:
        
        print("Error with " + str(datetime_date))
    
    unemployment_dates.append(corresponding_dates[0])

unemployment_df.columns = unemployment_dates

if (unemployment_dates != inflation_dates or inflation_dates != gdp_dates):
    
    print('Dates do not align for all variables.')
    
else:
    
    print('Dates align for all variables')

Dates align for all variables


### Changing Dataframe Dates - Rows

Each dataframe row contains the forecasts for the quarter with which it is labelled. Here, we change these quarter labels to Unix time.

In [15]:
gdp_quarters = []

inflation_quarters = []

unemployment_quarters = []

### GDP ###

for q in real_gdp_df.index:
    
    year, quarter = str(q).split('.') # q comes in format yyyy.Q
    
    if int(year) < 1994:
        
        gdp_quarters.append(0) # 0 indicates date is to be dropped
        
        continue
    
    month_int = int(quarter)*3 - 2 # Maps Quarter x to month 3x - 2 (i.e. start of quarter)
    
    if month_int < 10:
        
        month = '0' + str(month_int) # Maps 1 to 01, 2 to 02, etc.
    
    else:
        
        month = str(month_int)
    
    year_month = year + month
    
    datetime_date = datetime.strptime(year_month, '%Y%m')
    
    quarter_unix = time.mktime(datetime_date.timetuple())
    
    gdp_quarters.append(int(quarter_unix))

real_gdp_df.index = gdp_quarters
    
### Inflation ###

for q in inflation_df.index:
    
    year, quarter = str(q).split('.') # q comes in format yyyy.Q
    
    if int(year) < 1994:
        
        inflation_quarters.append(0) # 0 indicates date is to be dropped
        
        continue
    
    month_int = int(quarter)*3 - 2 # Maps Quarter x to month 3x - 2 (i.e. start of quarter)
    
    if month_int < 10:
        
        month = '0' + str(month_int) # Maps 1 to 01, 2 to 02, etc.
    
    else:
        
        month = str(month_int)
    
    year_month = year + month
    
    datetime_date = datetime.strptime(year_month, '%Y%m')
    
    quarter_unix = time.mktime(datetime_date.timetuple())
    
    inflation_quarters.append(int(quarter_unix))
    
inflation_df.index = inflation_quarters
    
### Unemployment ###

for q in unemployment_df.index:
    
    year, quarter = str(q).split('.') # q comes in format yyyy.Q
    
    if int(year) < 1994:
        
        unemployment_quarters.append(0) # 0 indicates date is to be dropped
        
        continue
    
    month_int = int(quarter)*3 - 2 # Maps Quarter x to month 3x - 2 (i.e. start of quarter)
    
    if month_int < 10:
        
        month = '0' + str(month_int) # Maps 1 to 01, 2 to 02, etc.
    
    else:
        
        month = str(month_int)
    
    year_month = year + month
    
    datetime_date = datetime.strptime(year_month, '%Y%m')
    
    quarter_unix = time.mktime(datetime_date.timetuple())
    
    unemployment_quarters.append(int(quarter_unix))

unemployment_df.index = unemployment_quarters

if (unemployment_quarters != inflation_quarters or unemployment_quarters != gdp_quarters):
    
    print('Dates do not align for all variables.')
    
else:
    
    print('Dates align for all variables')

Dates align for all variables


### Producing Forecast *Revision* Dataframes

The Mirranda-Agrippino (2016) method makes primary use of forecast revisions from one Greenbook to the next rather than the Greenbook forecasts themselves. These are indicative of new information which the Fed is incorporating into Greenbook forecasts. Below each of the three dataframes established above are manipulated to produce the revision of projections from the Greenbook for one meeting to the Greenbook for the next. 

In [16]:
real_gdp_revisions_df = real_gdp_df - real_gdp_df.shift(axis = 1) # .shift() moves columns one to the right 

inflation_revisions_df = inflation_df - inflation_df.shift(axis = 1)

unemployment_revisions_df = unemployment_df - unemployment_df.shift(axis = 1)

### Dropping Pre-'94 Greenbook Data

All pre-'94 data were labelled `0` in the **Changing Dataframe Dates - Columns** and **Changing Dataframe Dates - Rows** code blocks.

In [17]:
real_gdp_revisions_df = real_gdp_revisions_df.drop([0], axis = 1)

real_gdp_revisions_df = real_gdp_revisions_df.drop([0], axis = 0)

inflation_revisions_df = inflation_revisions_df.drop([0], axis = 1)

inflation_revisions_df = inflation_revisions_df.drop([0], axis = 0)

unemployment_revisions_df = unemployment_revisions_df.drop([0], axis = 1)

unemployment_revisions_df = unemployment_revisions_df.drop([0], axis = 0)

### Removing Entries for Quarters Elapsed when Meeting Occured

Several of the "forecasts" for which numbers are given in the Greenbook are quarters which have elapsed, yet the figures for which may still be revised. I follow Mirranda-Aggripino (2016) in not considering these quarters.

In [18]:
quarter_ends = {}

quarter_starts = (real_gdp_revisions_df.index)

for i in range(len(quarter_starts)):
    
    if i < len(quarter_starts) - 1:
        
        quarter_ends[quarter_starts[i+1]] = quarter_starts[i]
        
    else: # date for end of final quarter must be found artificially
        
        datetime_format = datetime.fromtimestamp(quarter_starts[i])
        
        month = datetime_format.month
        
        year = datetime_format.year
        
        if month <= 9: #Handles January, April, July starts
            
            q_end = datetime(year, month + 3, 1)
            
            unix_format = time.mktime(q_end.timetuple())
            
            quarter_ends[unix_format] = quarter_starts[i]
        
        else: # October handled differently (no 13th month)
            
            q_end = datetime(year + 1, (month + 3) % 12, 1)
            
            unix_format = int(time.mktime(q_end.timetuple()))
            
            quarter_ends[unix_format] = quarter_starts[i]
    
for meeting in (real_gdp_revisions_df.columns):
    
    for q_end in quarter_ends:
        
        if q_end < meeting:
            
            real_gdp_revisions_df.loc[quarter_ends[q_end],meeting] = np.NaN
            
            inflation_revisions_df.loc[quarter_ends[q_end],meeting] = np.NaN
            
            unemployment_revisions_df.loc[quarter_ends[q_end],meeting] = np.NaN
            
        else:
            
            break

### Establishing the Minimum, Mean and Maximum Number of Forecast Revisions in the Greenbook for each FOMC Meeting

In [19]:
leads_min = min([real_gdp_revisions_df.count().min(),
         inflation_revisions_df.count().min(),
         unemployment_revisions_df.count().min()])

leads_mean = np.mean([real_gdp_revisions_df.count().mean(),
         inflation_revisions_df.count().mean(),
         unemployment_revisions_df.count().mean()])

leads_max = max([real_gdp_revisions_df.count().max(),
         inflation_revisions_df.count().max(),
         unemployment_revisions_df.count().max()])

print(leads_min,leads_mean,leads_max)

5 7.40625 9


### Building the Dataframe of Regressors

This block gets the current and next four quarter forecasts or forecast revisions for each variable into a single dataframe for export.

In [20]:
regressors_index = [ # _q0 denotes current quarter, _qi denotes i quarters ahead.
    'rgdp_q0',
    'rgdp_q1',
    'rgdp_q2',
    'rgdp_q3',
    'rgdp_q4',
    'infl_q0',
    'infl_q1',
    'infl_q2',
    'infl_q3',
    'infl_q4',
    'unmp_q0',
    'unmp_q1',
    'unmp_q2',
    'unmp_q3',
    'unmp_q4'
]

regressors_df = pd.DataFrame(index = regressors_index)

for meeting in real_gdp_revisions_df.columns:
    
    rgdp = real_gdp_revisions_df[meeting].dropna().tolist()[:leads_min] # Gets first 5 non-NaN values from each column
    
    infl = inflation_revisions_df[meeting].dropna().tolist()[:leads_min]
    
    unmp = unemployment_revisions_df[meeting].dropna().tolist()[:leads_min]
    
    full_spec = rgdp + infl + unmp
    
    regressors_df[meeting] = full_spec

### Export to *.csv*

In [22]:
regressors_df.to_csv("greenbook.csv")