# NAICS to FIPS Data Mapping

### Table Index for Census NAIC data

Field Name        Data Type       Description

    FIPSTATE        C       FIPS State Code

    FIPSCTY         C       FIPS County Code
    
    NAICS           C       Industry Code - 6-digit NAICS code.
    
    EMP_NF          C       Total Mid-March Employees Noise Flag (See all Noise Flag definitions at the end of this record layout)
    
    EMP             N       Total Mid-March Employees with Noise

    QP1_NF          C       Total First Quarter Payroll Noise Flag
    
    QP1             N       Total First Quarter Payroll ($1,000) with Noise                                                        
    
    AP_NF           C       Total Annual Payroll Noise Flag
    
    AP              N       Total Annual Payroll ($1,000) with Noise
    
    EST             N       Total Number of Establishments
    
    N<5             N       Number of Establishments: Less than 5 Employee Size Class
    
    N5_9            N       Number of Establishments: 5-9 Employee Size Class
    
    N10_19          N       Number of Establishments: 10-19 Employee Size Class
    
    N20_49          N       Number of Establishments: 20-49 Employee Size Class
    
    N50_99          N       Number of Establishments: 50-99 Employee Size Class
    
    N100_249        N       Number of Establishments: 100-249 Employee Size Class
    
    N250_499        N       Number of Establishments: 250-499 Employee Size Class
    
    N500_999        N       Number of Establishments: 500-999 Employee Size Class
    
    N1000           N       Number of Establishments: 1,000 or More Employee Size Class
    
    N1000_1         N       Number of Establishments: Employment Size Class: 1,000-1,499 Employees
    
    N1000_2         N       Number of Establishments: Employment Size Class: 1,500-2,499 Employees
    
    N1000_3         N       Number of Establishments: Employment Size Class: 2,500-4,999 Employees
    
    N1000_4         N       Number of Establishments: Employment Size Class: 5,000 or More Employees
    
    CENSTATE        C       Census State Code
    
    CENCTY          C       Census County Code

NOTE: Noise Flag definitions (fields ending in _NF) are:

        G       0 to < 2% noise (low noise)
        H       2 to < 5% noise (medium noise)
        J       	>= 5% noise (high noise)

      Flag definition for Establishment by Employment Size Class fields (N<5, N5_9, etc.):
	
	N	Not available or not comparable

### Data Source

https://www.census.gov/data/datasets/2019/econ/cbp/2019-cbp.html

### Correcting for 7 missing county data

There are 7 very small population counties we made assumptions for this exercise.

    King County, Texas (population 272) and 
    
    Kalawao County, Hawaii (population 86)
             
             Were absent from the census data.  For these we proxied their respective states of Hawaii and Texas to represent their dominant industry representation.

Similarly for,

    Petroleum County, Montana 

    Arthur County, Nebraska 

    Banner County, Nebraska 

    Esmeralda County, Nevada 

    Borden County, Texas 

            Had "Total County" employment numbers but not industry specific.  For these we will also proxy their respective states of Montana, Nebraska, Nevada, and Texas to represent their industry dominant representation.

## Import Libraries

In [1]:
import numpy as np
import pandas as pd

## Read in raw county NAICS data and NAICS code to description mapping

In [2]:
dat = pd.read_csv("cbp20co.txt")
map_dict = pd.read_csv("naics2017.txt", encoding='cp1252')

In [3]:
#view NAIC structure
dat.head()

Unnamed: 0,fipstate,fipscty,naics,emp_nf,emp,qp1_nf,qp1,ap_nf,ap,est,...,n20_49,n50_99,n100_249,n250_499,n500_999,n1000,n1000_1,n1000_2,n1000_3,n1000_4
0,1,1,------,G,11265,G,94865,G,385785,879,...,104,32,9,N,N,N,N,N,N,N
1,1,1,11----,H,92,G,1183,H,5232,10,...,N,N,N,N,N,N,N,N,N,N
2,1,1,113///,H,82,G,1075,G,4741,7,...,N,N,N,N,N,N,N,N,N,N
3,1,1,1133//,H,82,G,1075,G,4741,7,...,N,N,N,N,N,N,N,N,N,N
4,1,1,11331/,H,82,G,1075,G,4741,7,...,N,N,N,N,N,N,N,N,N,N


In [4]:
#view NAIC code structure
map_dict.head()

Unnamed: 0,NAICS,DESCRIPTION
0,------,Total for all sectors
1,11----,"Agriculture, Forestry, Fishing and Hunting"
2,113///,Forestry and Logging
3,1131//,Timber Tract Operations
4,11311/,Timber Tract Operations


## Transform NAIC code table to mapping dictionary

In [5]:
map_replace = dict(zip(map_dict.NAICS, map_dict.DESCRIPTION))

In [6]:
dat['naics'] = dat['naics'].map(map_replace)

In [7]:
dat.head(3)

Unnamed: 0,fipstate,fipscty,naics,emp_nf,emp,qp1_nf,qp1,ap_nf,ap,est,...,n20_49,n50_99,n100_249,n250_499,n500_999,n1000,n1000_1,n1000_2,n1000_3,n1000_4
0,1,1,Total for all sectors,G,11265,G,94865,G,385785,879,...,104,32,9,N,N,N,N,N,N,N
1,1,1,"Agriculture, Forestry, Fishing and Hunting",H,92,G,1183,H,5232,10,...,N,N,N,N,N,N,N,N,N,N
2,1,1,Forestry and Logging,H,82,G,1075,G,4741,7,...,N,N,N,N,N,N,N,N,N,N


In [8]:
# add leading 0's
dat['fipscty']=dat['fipscty'].apply(lambda x: '{0:0>3}'.format(x))
dat['fipstate']=dat['fipstate'].apply(lambda x: '{0:0>2}'.format(x))

In [9]:
# combine state and city FIPS
dat['FIPS'] = dat['fipstate'].astype(str) + dat['fipscty'].astype(str)

In [10]:
# We hold out whole state data, those entries ending in '999' to correct for the 7 counties represented above: Hawaii, Montana, Nebraska, Nevada, Texas (15999, 30999, 31999, 32999, 48999)
hold_st_list = ('15999', '30999', '31999', '32999', '48999')

In [11]:
#df.loc[df['column_name'].isin(some_values)]
hold_dat = dat.loc[dat['FIPS'].isin(hold_st_list)]

In [12]:
dat.head(3)

Unnamed: 0,fipstate,fipscty,naics,emp_nf,emp,qp1_nf,qp1,ap_nf,ap,est,...,n50_99,n100_249,n250_499,n500_999,n1000,n1000_1,n1000_2,n1000_3,n1000_4,FIPS
0,1,1,Total for all sectors,G,11265,G,94865,G,385785,879,...,32,9,N,N,N,N,N,N,N,1001
1,1,1,"Agriculture, Forestry, Fishing and Hunting",H,92,G,1183,H,5232,10,...,N,N,N,N,N,N,N,N,N,1001
2,1,1,Forestry and Logging,H,82,G,1075,G,4741,7,...,N,N,N,N,N,N,N,N,N,1001


In [13]:
# Now we can remove the '999'
dat = dat[~dat['FIPS'].str.endswith('999')]

In [14]:
# Remove rows with NAIC of 'Total for all sectors', also for the hold-dat
dat = dat[~dat['naics'].str.endswith('Total for all sectors')]
hold_dat = hold_dat[~hold_dat['naics'].str.endswith('Total for all sectors')]

In [15]:
# Reduce df Columns
mycols = ['FIPS','naics','emp']
dat_slim = dat[mycols]
hold_dat_slim = hold_dat[mycols]

In [16]:
#dat_slim
dat_slim.head(3)

Unnamed: 0,FIPS,naics,emp
1,1001,"Agriculture, Forestry, Fishing and Hunting",92
2,1001,Forestry and Logging,82
3,1001,Logging,82


In [17]:
hold_dat_slim.head(3)

Unnamed: 0,FIPS,naics,emp
223112,15999,Wholesale Trade,474
223113,15999,"Merchant Wholesalers, Durable Goods",277
223114,15999,Professional and Commercial Equipment and Supp...,247


In [18]:
hold_hi_kalawao = hold_dat_slim.loc[hold_dat_slim['FIPS'] == '15999']
hold_mn_petroleum = hold_dat_slim.loc[hold_dat_slim['FIPS'] == '30999']
hold_nb_arthur = hold_dat_slim.loc[hold_dat_slim['FIPS'] == '31999']
hold_nb_banner = hold_dat_slim.loc[hold_dat_slim['FIPS'] == '31999']
hold_nv_esmeralda = hold_dat_slim.loc[hold_dat_slim['FIPS'] == '32999']
hold_tx_king = hold_dat_slim.loc[hold_dat_slim['FIPS'] == '48999']
hold_tx_borden = hold_dat_slim.loc[hold_dat_slim['FIPS'] == '48999']

In [19]:
hold_hi_kalawao['FIPS'] = hold_hi_kalawao['FIPS'].replace('15999','15005')
hold_mn_petroleum['FIPS'] = hold_mn_petroleum['FIPS'].replace('30999','30069')
hold_nb_arthur['FIPS'] = hold_nb_arthur['FIPS'].replace('31999','31005')
hold_nb_banner['FIPS'] = hold_nb_banner['FIPS'].replace('31999','31007')
hold_nv_esmeralda['FIPS'] = hold_nv_esmeralda['FIPS'].replace('32999','32009')
hold_tx_king['FIPS'] = hold_tx_king['FIPS'].replace('48999','48033')
hold_tx_borden['FIPS'] = hold_tx_borden['FIPS'].replace('48999','48269')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  hold_hi_kalawao['FIPS'] = hold_hi_kalawao['FIPS'].replace('15999','15005')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  hold_mn_petroleum['FIPS'] = hold_mn_petroleum['FIPS'].replace('30999','30069')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  hold_nb_arthur['FIPS'] = hold_nb_arthur['FIPS'].rep

In [20]:
hold_dat_slim_new = pd.concat([hold_hi_kalawao, hold_mn_petroleum, hold_nb_arthur, hold_nb_banner, hold_nv_esmeralda, hold_tx_king, hold_tx_borden], ignore_index=True, axis=0)

In [21]:
dat_slim_new = pd.concat([dat_slim, hold_dat_slim_new], ignore_index=True, axis=0)

In [22]:
#length of unique NAIC codes
len(pd.unique(dat_slim['naics']))

1369

In [23]:
#length of unique NAIC codes NEW
len(pd.unique(dat_slim_new['naics']))

1369

In [24]:
# Number of FIP codes (should be 3135)
len(pd.unique(dat_slim['FIPS']))

3135

In [25]:
# Number of FIP codes NEW(should be 3142)
len(pd.unique(dat_slim_new['FIPS']))

3142

## Return FIPS .csv if needed

In [26]:
### fips_list = pd.unique(dat_slim_new['FIPS'])

In [27]:
### fips_df = pd.DataFrame(fips_list)

In [28]:
### fips_df.to_csv('fipsDF.csv')

## Get maximum number of employee by FIPS code

In [29]:
dat_slim_new.head(3)

Unnamed: 0,FIPS,naics,emp
0,1001,"Agriculture, Forestry, Fishing and Hunting",92
1,1001,Forestry and Logging,82
2,1001,Logging,82


In [30]:
county_industry = dat_slim_new.loc[dat_slim_new.groupby('FIPS')['emp'].idxmax()].set_index('FIPS')

In [31]:
county_industry = county_industry.drop('emp', axis=1).sort_index()

In [32]:
county_industry.to_csv('county_industry.csv')

In [33]:
len(pd.unique(county_industry['naics']))

18

# There are 18 industries making up the majority for each county

In [34]:
county_industry['naics'].value_counts()

Health Care and Social Assistance                                           1207
Manufacturing                                                                966
Retail Trade                                                                 442
Accommodation and Food Services                                              222
Mining, Quarrying, and Oil and Gas Extraction                                 68
Construction                                                                  61
Transportation and Warehousing                                                40
Professional, Scientific, and Technical Services                              39
Wholesale Trade                                                               22
Administrative and Support and Waste Management and Remediation Services      16
Educational Services                                                          14
Utilities                                                                     12
Finance and Insurance       