This notebook checks the LEHD and ACS websites mentioned in computing **UI coverage rate** in the UCLA paper. The LEHD website has multiple dataset. The one being used for calculation may be the **RAC** tables in LODES7 dataset. The download links and details about datasets are explained in the data session below.

## Method
\begin{align*}
&\text{UI coverage rate} = \frac{\text{private-sector workers in the UI program}}{\text{number of workers in the private for-profit and nonprofit sectors}}\\
\\
&\text{non-UI coverage rate} = 1-\text{UI coverage rate}
\end{align*}

where the private-sector workers in the UI program comes from Longitudinal Employer–Household Dynamics (LEHD) for 2013‒17 (the five most recent years available), and the estimated labor force comes from the corresponding 2013‒17 ACS. 


## Data
### ACS
- Download link: Estimated labor force: [Table DP03 2013‒17](https://data.census.gov/cedsci/table?q=unemployment&g=0500000US36005.140000,36047.140000,36061.140000,36081.140000,36085.140000&tid=ACSDP5Y2017.DP03&hidePreview=true) 
    - New York County, Kings County, Bronx County, Richmond County, and Queens County
- [Definition document](https://www2.census.gov/programs-surveys/acs/tech_docs/subject_definitions/2017_ACSSubjectDefinitions.pdf)
    - Labor force = civilian labor force + the U.S. Armed Forces
    - **Related field: Private wage and salary workers (DP03_0047E)** – Includes people who worked for wages, salary, commission, tips, pay-in-kind, or piece rates for a private, for-profit employer or a private not-for-profit, tax-exempt or charitable organization. (page 59, section: Class of Worker)
- [Estimation document](https://www2.census.gov/programs-surveys/acs/tech_docs/statistical_testing/2017StatisticalTesting5year.pdf)
    -  Multiyear estimates are **averages** of the characteristics over several years (page 1)


### LEHD
- Website: [LEHD](https://lehd.ces.census.gov/data/)
- LODES provides counts of unemployment insurance covered wage and salary jobs, while ACS estimates the larbor force in the population. It's estimated from samples based on  housing unit (HU) addresses and persons in group quarters (GQ) facilities (page 5, section: Coverage, [paper](https://ideas.repec.org/p/cen/wpaper/14-38.html)).
- Download link: [LODES7](https://lehd.ces.census.gov/data/lodes/LODES7/ny/) (7 means Version 7, meaning the LODES was enumerated by 2010 census blocks)
    - download 2013-2017 data (5 csv in total)
    - (**download**) Census Block Code to Census Tract Code lookup table: ny_xwalk.csv.gz
    - (**download**) WAC: Workplace Area Characteristics (WAC)
        - download the one with name "[NY]\_wac\_[S000]\_[JT03]\_[YEAR]". "S000" means "Total number of jobs"; “JT03” for "Primary Private Jobs" types
        - each row: # jobs who work in w_geocode
    - (**download**) RAC: Residence Area Characteristics (RAC)
        - download the one with name "[NY]\_rac\_[S000]\_[JT03]\_[YEAR]". "S000" means "Total number of jobs"; “JT03” for "Primary Private Jobs" types
        - each row: # jobs who living in h_geocode
    - (**not useful**) OD: Origin-Destination (home-work) table 
        - each row: # jobs who living in h_geocode and work in w_geocode
- [data dictionary](https://lehd.ces.census.gov/data/lodes/LODES7/LODESTechDoc7.4.pdf)
    - **Related field: Total number of jobs (C000)**
    
### Other information    
- Quarterly Workforce Indicators (QWI) 
    - QWI are reported based on detailed firm characteristics. Each row is the # employees in a company. It does not have county/census tract information
- Other websites
    - [NYC Local Employment Dynamics (LED)](https://www.labor.ny.gov/stats/lsled.shtm)

### Logs
- The LODES7 contains all counties in NY. They are right joined with ACS dataset to keep only the 5 counties concered
- To be consistent with ACS data, RAC table should be used instead of WAC. Because ACS data is community-based instead of workplace-based census
- Half of UI coverage rates > 100% (1016 out of 2117). The reason might be wrong choices of related field. Need to discuss with UCLA team. Currently uses Private wage and salary workers (DP03_0047E) for ACS and Total number of jobs (C000) for RAC (All Private Jobs)

## Load packages

In [1]:
import pandas as pd
import numpy as np

## Clean LEHD data

In [3]:
path = '../data/UI_coverage/'
wac = {}
rac = {}
for i in range(2013,2018):  
    wac[i] = pd.read_csv(path+'ny_wac_S000_JT03_' +str(i) + '.csv', low_memory=False)      # Workplace Area Characteristics (WAC)
    rac[i] = pd.read_csv(path+'ny_rac_S000_JT03_' +str(i) + '.csv', low_memory=False)      # RAC: Residence Area Characteristics (RAC)
    
code = pd.read_csv(path+'ny_xwalk.csv',low_memory=False)                                   # Census Block Code to Census Tract Code lookup table

In [49]:
code.head()

Unnamed: 0,tabblk2010,st,stusps,stname,cty,ctyname,trct,trctname,bgrp,bgrpname,...,stanrcname,necta,nectaname,mil,milname,stwib,stwibname,blklatdd,blklondd,createdate
0,360319614002351,36,NY,New York,36031,"Essex County, NY",36031961400,"9614 (Essex, NY)",360319614002,"2 (Tract 9614, Essex, NY)",...,,99999,,,,36361501,Clinton/Essex/Franklin/Hamilton LWIA,43.85237,-74.250919,20190826
1,360419504001095,36,NY,New York,36041,"Hamilton County, NY",36041950400,"9504 (Hamilton, NY)",360419504001,"1 (Tract 9504, Hamilton, NY)",...,,99999,,,,36361501,Clinton/Essex/Franklin/Hamilton LWIA,43.360124,-74.288976,20190826
2,360419501001004,36,NY,New York,36041,"Hamilton County, NY",36041950100,"9501 (Hamilton, NY)",360419501001,"1 (Tract 9501, Hamilton, NY)",...,,99999,,,,36361501,Clinton/Essex/Franklin/Hamilton LWIA,44.066622,-74.388661,20190826
3,360419501001267,36,NY,New York,36041,"Hamilton County, NY",36041950100,"9501 (Hamilton, NY)",360419501001,"1 (Tract 9501, Hamilton, NY)",...,,99999,,,,36361501,Clinton/Essex/Franklin/Hamilton LWIA,44.008256,-74.522291,20190826
4,360319605982039,36,NY,New York,36031,"Essex County, NY",36031960598,"9605.98 (Essex, NY)",360319605982,"2 (Tract 9605.98, Essex, NY)",...,,99999,,,,36361501,Clinton/Essex/Franklin/Hamilton LWIA,44.265729,-73.977385,20190826


In [50]:
# each row: # jobs who living in h_geocode
wac[2013].head()

Unnamed: 0,w_geocode,C000,CA01,CA02,CA03,CE01,CE02,CE03,CNS01,CNS02,...,CFA02,CFA03,CFA04,CFA05,CFS01,CFS02,CFS03,CFS04,CFS05,createdate
0,360010001001004,11,0,10,1,0,0,11,0,0,...,0,0,0,0,0,0,0,0,0,20160219
1,360010001001005,57,12,37,8,1,20,36,0,0,...,0,0,0,0,0,0,0,0,0,20160219
2,360010001001008,373,34,248,91,15,52,306,0,0,...,0,0,0,0,0,0,0,0,0,20160219
3,360010001001009,2,0,2,0,0,2,0,0,0,...,0,0,0,0,0,0,0,0,0,20160219
4,360010001001010,8,3,3,2,3,5,0,0,0,...,0,0,0,0,0,0,0,0,0,20160219


### Get 5-year average numbers of workers in UI program

In [51]:
# select useful columns
# C000: total jobs  
for i in range(2013,2018):
    wac[i] = wac[i][['w_geocode', 'C000']].rename(columns={'C000':'total_ui_jobs'})
    rac[i] = rac[i][['h_geocode', 'C000']].rename(columns={'C000':'total_ui_jobs'})
    
# calculate 5-year average numbers of workers in UI program
wac_avg = pd.concat(wac,axis=0).droplevel(-2).groupby(['w_geocode']).agg({'total_ui_jobs':'mean'}).reset_index()
rac_avg = pd.concat(rac,axis=0).droplevel(-2).groupby(['h_geocode']).agg({'total_ui_jobs':'mean'}).reset_index()

### Group by census tract

In [52]:
code = code[['tabblk2010','trct','ctyname']]
# tabblk2010: 2010 Census Tabulation Block Code 
# trct: Census Tract Code

In [53]:
# map Census Block Code to Census Tract Code
print('wac shape: ', wac_avg.shape)
print('rac shape: ', rac_avg.shape)
wac_avg = wac_avg.merge(code, left_on='w_geocode', right_on='tabblk2010').rename(columns={'trct':'GEO_ID'})
rac_avg = rac_avg.merge(code, left_on='h_geocode', right_on='tabblk2010').rename(columns={'trct':'GEO_ID'})
# check missing after merging
print('wac shape after merge: ', wac_avg.shape)
print('rac shape after merge: ', rac_avg.shape)

wac shape:  (128941, 2)
rac shape:  (241838, 2)
wac shape after merge:  (128941, 5)
rac shape after merge:  (241838, 5)


In [54]:
# recalculate # ui using Census Tract Code
wac_avg = wac_avg.groupby(['GEO_ID','ctyname']).agg({'total_ui_jobs':'sum'}).reset_index()
wac_avg['GEO_ID'] = '1400000US' + wac_avg['GEO_ID'].astype(str)

rac_avg = rac_avg.groupby(['GEO_ID','ctyname']).agg({'total_ui_jobs':'sum'}).reset_index()
rac_avg['GEO_ID'] = '1400000US' + rac_avg['GEO_ID'].astype(str)

wac_avg.head()

Unnamed: 0,GEO_ID,ctyname,total_ui_jobs
0,1400000US36001000100,"Albany County, NY",1626.833333
1,1400000US36001000200,"Albany County, NY",2522.2
2,1400000US36001000300,"Albany County, NY",10137.1
3,1400000US36001000401,"Albany County, NY",8347.533333
4,1400000US36001000403,"Albany County, NY",1155.583333


### Save cleaned WAC and RAC

In [55]:
wac_avg.to_csv(path+'ny_wac_cleaned.csv', index=False)
rac_avg.to_csv(path+'ny_rac_cleaned.csv', index=False)

## Clean ACS data

In [56]:
# load acs data
# DP03_0047E: Private wage and salary workers
# DP03_0003E: Civilian labor force
acs = pd.read_csv(path+'ACSDP5Y2018.DP03_data_with_overlays.csv', low_memory=False)
acs = acs.loc[1:,['GEO_ID','NAME','DP03_0047E']].rename(columns={'DP03_0047E':'total_jobs'})
acs.head()

Unnamed: 0,GEO_ID,NAME,total_jobs
1,1400000US36061000100,"Census Tract 1, New York County, New York",0
2,1400000US36061000201,"Census Tract 2.01, New York County, New York",892
3,1400000US36061000202,"Census Tract 2.02, New York County, New York",2365
4,1400000US36061000500,"Census Tract 5, New York County, New York",0
5,1400000US36061000600,"Census Tract 6, New York County, New York",2481


In [57]:
# # remove prefix in GEO_ID
# import re
# # r = r'(?<=[a-zA-Z])\d+'
# r = r'\d+[a-zA-Z]{2}'
# acs['GEO_ID'] = [re.sub(r, '', x) for x in acs['GEO_ID']]

### Save cleaned ACS

In [58]:
acs.to_csv(path+'ACS_DP03_cleaned.csv', index=False)

## Calculate UI coverage rate

In [62]:
# convert total_labors from string to int
acs['total_jobs'] = acs['total_jobs'].astype('float64')

ui_coverage = acs.merge(rac_avg, on='GEO_ID', how='left')
ui_coverage['ui_rate'] = ui_coverage['total_ui_jobs'] / ui_coverage['total_jobs']
ui_coverage = ui_coverage[['GEO_ID', 'NAME', 'total_ui_jobs', 'total_jobs', 'ui_rate']]
print('# NA:', sum(ui_coverage['ui_rate'].isna()))
print('# ui rate>1:', sum(ui_coverage['ui_rate']>1))
print('# ui rate=infinity:',ui_coverage[np.isinf(ui_coverage.ui_rate)].shape[0])
print(ui_coverage[~np.isinf(ui_coverage.ui_rate)].ui_rate.describe())
ui_coverage

# NA: 0
# ui rate>1: 607
# ui rate=infinity: 50
count    2117.000000
mean        0.977779
std         0.819949
min         0.404375
25%         0.793393
50%         0.893527
75%         1.007706
max        27.950000
Name: ui_rate, dtype: float64


Unnamed: 0,GEO_ID,NAME,total_ui_jobs,total_jobs,ui_rate
0,1400000US36061000100,"Census Tract 1, New York County, New York",158.200000,0.0,inf
1,1400000US36061000201,"Census Tract 2.01, New York County, New York",976.800000,892.0,1.095067
2,1400000US36061000202,"Census Tract 2.02, New York County, New York",2200.650000,2365.0,0.930507
3,1400000US36061000500,"Census Tract 5, New York County, New York",153.300000,0.0,inf
4,1400000US36061000600,"Census Tract 6, New York County, New York",3199.950000,2481.0,1.289782
...,...,...,...,...,...
2162,1400000US36085030302,"Census Tract 303.02, Richmond County, New York",2102.000000,2139.0,0.982702
2163,1400000US36085031901,"Census Tract 319.01, Richmond County, New York",1012.700000,601.0,1.685025
2164,1400000US36085031902,"Census Tract 319.02, Richmond County, New York",1462.666667,1581.0,0.925153
2165,1400000US36085032300,"Census Tract 323, Richmond County, New York",472.950000,421.0,1.123397


In [60]:
ui_coverage[ui_coverage.ui_rate>10]

Unnamed: 0,GEO_ID,NAME,total_ui_jobs,total_jobs,ui_rate
0,1400000US36061000100,"Census Tract 1, New York County, New York",158.2,0.0,inf
3,1400000US36061000500,"Census Tract 5, New York County, New York",153.3,0.0,inf
84,1400000US36061008602,"Census Tract 86.02, New York County, New York",134.2,0.0,inf
101,1400000US36061010200,"Census Tract 102, New York County, New York",442.0,43.0,10.27907
142,1400000US36061014300,"Census Tract 143, New York County, New York",137.2,0.0,inf
223,1400000US36061021703,"Census Tract 217.03, New York County, New York",111.8,4.0,27.95
279,1400000US36061029700,"Census Tract 297, New York County, New York",142.6,0.0,inf
284,1400000US36061031100,"Census Tract 311, New York County, New York",128.0,0.0,inf
287,1400000US36061031900,"Census Tract 319, New York County, New York",162.2,0.0,inf
288,1400000US36005000100,"Census Tract 1, Bronx County, New York",1125.6,0.0,inf
