In [None]:
# Set working directory to project root, identified by presence of the .Rproj file
import os
while not os.path.exists('workshop-pythonr.Rproj'):
    current_dir = os.getcwd()
    parent_dir = os.path.abspath(os.path.join(current_dir, '..'))
    if current_dir == parent_dir:
        raise Exception('Can not find project root directory.')
    os.chdir('..')
print('Working directory set to:', os.getcwd())

Load data preparation functions from the module `data`.

In [None]:
from scripts import data

# Business Dynamics Statistics (BDS)

[BDS](https://www.census.gov/programs-surveys/bds.html) provides annual measures of business dynamics (such as job creation and destruction, establishment births and deaths, and firm startups and shutdowns) for the economy overall and aggregated by establishment and firm characteristics.

In [None]:
data.get_bds_df?

In [None]:
data.get_bds_df('county').head()

In [None]:
df = data.get_bds_df('nation').set_index('year')
df['estabs_growth_rate'] = df['estabs_entry_rate'] - df['estabs_exit_rate']

In [None]:
df[['net_job_creation_rate', 'estabs_growth_rate']].plot(title='Establishments and employment growth rate, %', grid=True);

In [None]:
df[['estabs_entry_rate', 'estabs_exit_rate']].plot(title='Establishments entry and exit rate, %', grid=True);

# County shapes

[Cartographic Boundary Files](https://www.census.gov/geographies/mapping-files/time-series/geo/cartographic-boundary.html) The cartographic boundary files are simplified representations of selected geographic areas from the Census Bureau’s Master Address File/Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) System. These boundary files are specifically designed for small scale thematic mapping.

In [None]:
data.get_county_shape_df?

In [None]:
data.get_county_shape_df().head()

There is significant geographic variation in employment growth rates.

In [None]:
df = data.get_county_shape_df().query('statefp == "55"')
d = data.get_bds_df('county').query('year == 2019')
df = df.merge(d, how='left', on='fips')

df.plot(column='net_job_creation_rate', legend=True).set_title('Employment growth rate in 2019, %');

# Rural classification

[Urban Influence Codes](https://www.ers.usda.gov/data-products/urban-influence-codes.aspx)

The 2013 Urban Influence Codes form a classification scheme that distinguishes metropolitan counties by population size of their metro area, and nonmetropolitan counties by size of the largest city or town and proximity to metro and micropolitan areas. The standard Office of Management and Budget (OMB) metro and nonmetro categories have been subdivided into two metro and 10 nonmetro categories, resulting in a 12-part county classification. This scheme was originally developed in 1993. This scheme allows researchers to break county data into finer residential groups, beyond metro and nonmetro, particularly for the analysis of trends in nonmetro areas that are related to population density and metro influence. 

In [None]:
data.get_ui_df?

In [None]:
data.get_ui_df().head()

In [None]:
d = data.get_ui_df()\
    .groupby(['uic', 'uic_desc'])\
    .aggregate({'fips': 'count', 'population': 'sum'})\
    .rename(columns={'fips': 'counties'})
d['county share'] = d['counties'] / d['counties'].sum()
d['population share'] = d['population'] / d['population'].sum()
d.style.format({'counties': '{:,d}', 'population': '{:,d}', 'county share': '{:2.1%}', 'population share': '{:2.1%}'})

Is the effect of establishment entry and exit on employment growth different between rural and urban areas?

In [None]:
df = data.get_county_shape_df().query('statefp == "55"')
d = data.get_ui_df()
df = df.merge(d, how='left', on='fips')
df['rurality'] = df['uic'].map(lambda x: 'metro' if x in [1, 2] else ('micro' if x in [3, 5, 8] else 'noncore'))
df.plot(column='rurality', legend=True).set_title('Rurality level');