### Combining GDP data with the jobs report

Brian Dew

Updated: September 20, 2020

----

Notes:

The basic idea here is to use the BEA population estimates plus CPS employment rate and hours worked trends to estimate GDP per hour of work. BLS does this process for its productivity and costs report, using much more comprehensive data, however this approximation has proven decent over time.

The hard part to this measure is getting hours worked right. All of the published measures are either too-low-frequency or don't have a broad enough definition of workers. I want to capture all workers, regardless of full-time or part-time status or of whether they work for the private sector. Also want to capture second and third jobs. As a result, I selected total actual hours worked from the CPS microdata, specifically finding the trend using x13as with default settings. Hours worked from the CPS microdata have issues around holidays falling in the reference week, and also do not capture hours worked for some important categories of labor such as self-employed persons.

In [1]:
%matplotlib inline
import sys
sys.path.append('../src')

import uschartbook.config

from uschartbook.config import *
from uschartbook.utils import *

qtrs = {1: 1, 2: 1, 3: 1, 4: 2, 5:2, 6:2, 7:3, 8:3, 9:3, 10:4, 11:4, 12:4}

In [2]:
aah = {}
epop = {}
cols = ['HRSACTT', 'LFS', 'MONTH']
for year in range(1989, 2022):
    if year >= 1998:
        wgt = 'PWSSWGT'
    else:
        wgt = 'BASICWGT'
    df = pd.read_feather(cps_dir / f'cps{year}.ft', columns=cols + [wgt]).rename({wgt: 'WGT'}, axis=1)
    ah = (df.query('LFS == "Employed"')
            .groupby('MONTH').apply(lambda x: np.average(x.HRSACTT.replace(-1, 0), weights=x.WGT)))
    aah.update({pd.to_datetime(f'{year}-{month}-01'): value for month, value in list(zip(ah.index, ah.values))})
    
    ep = (df.query('LFS == "Employed"').groupby('MONTH').WGT.sum() / 
          df.groupby('MONTH').WGT.sum())
    epop.update({pd.to_datetime(f'{year}-{month}-01'): value for month, value in list(zip(ep.index, ep.values))})

In [3]:
sm = x13_arima_analysis(pd.Series(epop))
epop_sa = sm.seasadj.resample('QS').mean()
sm = x13_arima_analysis(pd.Series(aah))
aah_sa = sm.trend.resample('QS').mean()

       The effect of this outlier is already accounted for by other regressors 
       (usually user-specified or previously identified outliers).

 NOTE: Unable to test AO2020.May due to regression matrix singularity.
       The effect of this outlier is already accounted for by other regressors 
       (usually user-specified or previously identified outliers).

 NOTE: Unable to test LS2020.Mar due to regression matrix singularity.
       The effect of this outlier is already accounted for by other regressors 
       (usually user-specified or previously identified outliers).

 NOTE: Unable to test AO2020.Apr due to regression matrix singularity.
       The effect of this outlier is already accounted for by other regressors 
       (usually user-specified or previously identified outliers).

 NOTE: Unable to test LS2020.May due to regression matrix singularity.
       The effect of this outlier is already accounted for by other regressors 
       (usually user-specified or previously

In [4]:
gdp_code = ('T10106', ['A191RX'])
pop_code = ('T70100', ['B230RC'])
cd = nipa_df(retrieve_table('T10105')['Data'], ['A191RC'])['A191RC'].iloc[-1]
rgdp = nipa_df(retrieve_table('T10106')['Data'], ['A191RX'])
gdp = rgdp / rgdp.iloc[-1] * cd
pop = nipa_df(retrieve_table(pop_code[0])['Data'], pop_code[1]).sort_index()

df = pd.DataFrame()
df['epop'] = epop_sa * 100
df['pop'] = pop['B230RC'] / 1000
#df['hours'] = fred_df('PRS84006023')
df['hours'] = aah_sa
df['gdp'] = gdp['A191RX']
df['input'] = (df['pop'] * (df['epop'] / 100)) * (df['hours'] * 52)
df['gdpinp'] = df['gdp'] / df['input']
datetxt = dtxt(df['gdpinp'].dropna().index[-1])['qtr1']
write_txt(text_dir / 'gdpjobs_ltdt.txt', datetxt)
df.to_csv(data_dir / 'gdpjobslvl.csv', index_label='date')

df['epop_c'] = ((df['pop'] * df['epop'].mean()) * df['hours']) * df['gdpinp']
df['pop_c'] = ((df['pop'].mean() * df['epop']) * df['hours']) * df['gdpinp']
df['hours_c'] = ((df['pop'] * df['epop']) * df['hours'].mean()) * df['gdpinp']

df_g = growth_rate(df.dropna())

df_g['pop_contr'] = df_g['gdp'] - df_g['pop_c']
df_g['epop_contr'] = df_g['gdp'] - df_g['epop_c']
df_g['hours_contr'] = df_g['gdp'] - df_g['hours_c']
df_g['prod'] = df_g['gdp'] - df_g['pop_contr'] - df_g['epop_contr'] - df_g['hours_contr']

result = df_g[['pop_contr', 'epop_contr', 'hours_contr', 'prod']].round(2)

result.to_csv(data_dir / 'gdpjobs.csv', index_label='date')

In [5]:
data = df.dropna()

ltval = f"\${data['gdpinp'].iloc[-1]:.2f}"
prval = f"\${data['gdpinp'].iloc[-2]:.2f}"
prdate = dtxt(data['gdpinp'].index[-2])['qtr1']
prdt = '2019-10-01'
prval2 = f"\${data.loc[prdt, 'gdpinp']:.2f}"
prval3 = f"\${data.loc['2015-10-01', 'gdpinp']:.2f}"
val89 = f"\${data.loc['1989-01-01', 'gdpinp']:.2f}"
gdpval = f"\${data['gdp'].iloc[-1] / 1_000:,.0f}"
gdppr = f"\${data.loc[prdt, 'gdp'] / 1_000:,.0f}"
agghrs = f"{data['input'].iloc[-1] / 1_000:,.0f}"
hrspr = f"{data.loc[prdt, 'input'] / 1_000:,.0f}"

text = (f'In {datetxt}, real GDP was equivalent to roughly {ltval} per hour of '+
        f'work, compared to {prval} in {prdate}, {prval2} in 2019 Q4, {prval3} '+
        f'in 2015 Q4, and {val89} in the first quarter of 1989. Comparing the latest '+
        'data to the pre-COVID data covering 2019 Q4, annualized real GDP '+
        f'is {gdpval} billion in the latest data and '+
        f'{gdppr} billion in 2019 Q4. Aggregate hours worked total '+
        f'{agghrs} billion in the latest quarter and {hrspr} billion '+
        'in 2019 Q4.')
write_txt(text_dir / 'gdp_per_hour.txt', text)
print(text, '\n\n')

poplt = value_text(result['pop_contr'].iloc[-1], style='contribution_to', ptype='pp', digits=2)
poppr = value_text(result.loc[prdt, 'pop_contr'], style='contribution', ptype='pp', digits=2, casual=True)
emplt = value_text(result['epop_contr'].iloc[-1], style='contribution', ptype='pp', digits=2)
emppr = value_text(result.loc[prdt, 'epop_contr'], style='contribution', ptype='pp', digits=2, casual=True)
hrslt = value_text(result['hours_contr'].iloc[-1], style='contribution_to', ptype='pp', digits=2, casual=True)
hrspr = value_text(result.loc[prdt, 'hours_contr'], style='contribution', ptype='pp', digits=2, casual=True)
prodlt = value_text(result['prod'].iloc[-1], style='contribution', ptype='pp', digits=2)
prodpr = value_text(result.loc[prdt, 'prod'], style='contribution_of', ptype='pp', digits=2, casual=True)

text = (f'In {datetxt}, population growth {poplt} annualized GDP growth, and, for comparison, '+
        f'{poppr} in 2019 Q4. '+
        f'Changes in the employed share of the population {emplt} in the latest quarter, and '+
        f'{emppr} in the fourth quarter of 2019. Changes in average hours worked {hrslt} GDP '+
        f'growth in the latest quarter and {hrspr} in 2019 Q4. Lastly, productivity {prodlt} '+
        f'to GDP growth in {datetxt}, compared to {prodpr} in 2019 Q4.')
write_txt(text_dir / 'gdpjobsch.txt', text)
print(text)

In 2021 Q2, real GDP was equivalent to roughly \$75.78 per hour of work, compared to \$75.27 in 2021 Q1, \$71.20 in 2019 Q4, \$68.98 in 2015 Q4, and \$46.02 in the first quarter of 1989. Comparing the latest data to the pre-COVID data covering 2019 Q4, annualized real GDP is \$22,741 billion in the latest data and \$22,546 billion in 2019 Q4. Aggregate hours worked total 300 billion in the latest quarter and 317 billion in 2019 Q4. 


In 2021 Q2, population growth contributed 0.25 percentage point to annualized GDP growth, and, for comparison, added 0.56 percentage point in 2019 Q4. Changes in the employed share of the population contributed 2.92 percentage points in the latest quarter, and added 1.64 percentage points in the fourth quarter of 2019. Changes in average hours worked added 0.83 percentage point to GDP growth in the latest quarter and added 0.36 percentage point in 2019 Q4. Lastly, productivity contributed 2.72 percentage points to GDP growth in 2021 Q2, compared to a redu

In [6]:
df

Unnamed: 0,epop,pop,hours,gdp,input,gdpinp,epop_c,pop_c,hours_c
1989-01-01,48.433067,246.460,37.432008,1.069319e+07,232345.833232,46.022740,2.033649e+07,2.448099e+07,2.055443e+07
1989-04-01,48.397997,247.017,37.458785,1.077481e+07,232868.772368,46.269876,2.050656e+07,2.461222e+07,2.069651e+07
1989-07-01,48.459407,247.698,37.481855,1.085464e+07,233951.055598,46.397044,2.063231e+07,2.472640e+07,2.083702e+07
1989-10-01,48.424753,248.374,37.485228,1.087603e+07,234442.877342,46.390945,2.068776e+07,2.470769e+07,2.087620e+07
1990-01-01,48.347388,248.936,37.479783,1.099488e+07,234563.878749,46.873710,2.094730e+07,2.492131e+07,2.110740e+07
...,...,...,...,...,...,...,...,...,...
2020-07-01,45.112721,330.368,36.842513,2.179280e+07,285528.578325,76.324426,4.449639e+07,3.722059e+07,4.256035e+07
2020-10-01,46.298417,330.815,37.002814,2.203585e+07,294706.304551,74.772255,4.384039e+07,3.758485e+07,4.284858e+07
2021-01-01,46.482143,331.011,37.153992,2.237386e+07,297260.629390,75.266815,4.433691e+07,3.813876e+07,4.332881e+07
2021-04-01,46.805721,331.209,37.226676,2.274096e+07,300094.938123,75.779216,4.475283e+07,3.874135e+07,4.395374e+07
