Standardizing BLS CES NAICS to IPUMS ACS NAICS codes
This repository attempts to join BLS CES data to 2014-18 IPUMS ACS data for the purposes of estimating PUMA level and higher job losses, and other estimates using ACS data to do so.
Data are organized in the data directory and programs in the scripts directory. Extracts from IPUMS should be sure to include the INDNAICS variable, and both the DDI file usa_xxxxx.xml and the data file should be placed in the data/ipums folder.
Standardizing CES and IPUMS ACS NAICS codes
We went through a manual process to standardize the codes. You can find our decisions and notes on the decisions in this Excel document: https://github.com/UI-Research/ipums-acs-naics-standardization/blob/master/data/manual-files/2017-industry-code-list-ces-crosswalk-manual.xlsx. You can find the machine readable CSV used for analysis here: https://github.com/UI-Research/ipums-acs-naics-standardization/blob/master/data/processed-data/2017-ind-ces-crosswalk.csv.
Be careful when using this dataset. It can be thought of as an instruction manual, where each row is a unique ces_code, ind combination. Values are meant to be operated on at the ces_code level, summarized up to the IND level. The variables produced are as follows:
IND: 4-digit ACS Industry classification.naics: NAICS classification per ACS provided crosswalk.ces_code: The 8-digit CES series code from BLS.led_code: The 2-digit NAICS code.formula_type:singledenotes no formula, just oneces_codefor theINDformuladenotes a formula for using multipleces_codeto summarize at theINDlevel, with the formula provided byoperator
operator:+or-values denote addition or subtraction necessary to summarize at theINDlevel. E.g.,32311100+and32311200+forIND = 1070means that to summarize toIND = 1070, you should addces_code32311100to32311200.series_id: Theseries_idcolumn from the CES data corresponding toces_codethat should be pulled for seasonally adjusted total employment.recency: The number of months this particularseries_idlags behind the most recent CES data release.1means that, for example, when April data are released, thatseries_idis just releasing new March data, so it is 1 month behind.parent: The parentces_codefor rows whererecency = 1that provide industry categories that contain theces_codein question, but are more broad, and haverecency = 0. This column is for helping adjustces_codewithrecency = 1to the current month using imputation from the broader category.parent_series_id: Theseries_idcolumn from the CES data corresponding toparentthat should be pulled for seasonally adjusted total employment.