**Geographical of small retail businesses**

- This analysis uses County Business Pattern data from the U.S. Census Bureau for five years, from 2022 through 2018.
- Merges it with the Rural-Urban Continuum Code, which identifies counties by population on a scale of 1 (urban) to 9 (rural).
- Compare the number of retail businesses with 20 or fewer counties by:
    - County
    - County population
    - State

**Import pandas**

In [17]:
import pandas as pd 

**Read csv files for each year and interpret FIPS codes as strings**

In [18]:
county22_data = pd.read_csv('../data/cbp22co.csv', dtype={'fipstate': str, 'fipscty': str})
county21_data = pd.read_csv('../data/cbp21co.csv', dtype={'fipstate': str, 'fipscty': str})
county20_data = pd.read_csv('../data/cbp20co.csv', dtype={'fipstate': str, 'fipscty': str})
county19_data = pd.read_csv('../data/cbp19co.csv', dtype={'fipstate': str, 'fipscty': str})
county18_data = pd.read_csv('../data/cbp18co.csv', dtype={'fipstate': str, 'fipscty': str})


**Spot check that fipstate and fipscty are objects**

In [19]:
county22_data.dtypes

fipstate    object
fipscty     object
naics       object
emp_nf      object
emp          int64
qp1_nf      object
qp1          int64
ap_nf       object
ap           int64
est          int64
n<5         object
n5_9        object
n10_19      object
n20_49      object
n50_99      object
n100_249    object
n250_499    object
n500_999    object
n1000       object
n1000_1     object
n1000_2     object
n1000_3     object
n1000_4     object
dtype: object

In [20]:
county19_data.dtypes


fipstate    object
fipscty     object
naics       object
emp_nf      object
emp          int64
qp1_nf      object
qp1          int64
ap_nf       object
ap           int64
est          int64
n<5         object
n5_9        object
n10_19      object
n20_49      object
n50_99      object
n100_249    object
n250_499    object
n500_999    object
n1000       object
n1000_1     object
n1000_2     object
n1000_3     object
n1000_4     object
censtate     int64
cencty       int64
dtype: object

**Make new column of NAICS sector code (first two digits)**

In [21]:
county22_data['naics_sector'] = county22_data['naics'].apply(lambda x: x[:2])
county21_data['naics_sector'] = county21_data['naics'].apply(lambda x: x[:2])
county20_data['naics_sector'] = county20_data['naics'].apply(lambda x: x[:2])
county19_data['naics_sector'] = county19_data['naics'].apply(lambda x: x[:2])
county18_data['naics_sector'] = county18_data['naics'].apply(lambda x: x[:2])


**Check that new column "naics_sector" appears**

In [23]:
county22_data.head()

Unnamed: 0,fipstate,fipscty,naics,emp_nf,emp,qp1_nf,qp1,ap_nf,ap,est,...,n50_99,n100_249,n250_499,n500_999,n1000,n1000_1,n1000_2,n1000_3,n1000_4,naics_sector
0,1,1,------,G,12409,G,117103,G,496158,948,...,33,10,3,N,N,N,N,N,N,--
1,1,1,11----,G,52,G,793,G,3477,10,...,N,N,N,N,N,N,N,N,N,11
2,1,1,113///,H,39,G,686,G,2890,7,...,N,N,N,N,N,N,N,N,N,11
3,1,1,1133//,H,39,G,686,G,2890,7,...,N,N,N,N,N,N,N,N,N,11
4,1,1,11331/,H,39,G,686,G,2890,7,...,N,N,N,N,N,N,N,N,N,11


**Make a new column of FIPS that is fipstate + fipscty**

In [24]:
county22_data['FIPS'] = county22_data['fipstate'] + county22_data['fipscty']
county21_data['FIPS'] = county21_data['fipstate'] + county21_data['fipscty']
county20_data['FIPS'] = county20_data['fipstate'] + county20_data['fipscty']
county19_data['FIPS'] = county19_data['fipstate'] + county19_data['fipscty']
county18_data['FIPS'] = county18_data['fipstate'] + county18_data['fipscty']


**Check that FIPS column is there**

In [25]:
county22_data.head()

Unnamed: 0,fipstate,fipscty,naics,emp_nf,emp,qp1_nf,qp1,ap_nf,ap,est,...,n100_249,n250_499,n500_999,n1000,n1000_1,n1000_2,n1000_3,n1000_4,naics_sector,FIPS
0,1,1,------,G,12409,G,117103,G,496158,948,...,10,3,N,N,N,N,N,N,--,1001
1,1,1,11----,G,52,G,793,G,3477,10,...,N,N,N,N,N,N,N,N,11,1001
2,1,1,113///,H,39,G,686,G,2890,7,...,N,N,N,N,N,N,N,N,11,1001
3,1,1,1133//,H,39,G,686,G,2890,7,...,N,N,N,N,N,N,N,N,11,1001
4,1,1,11331/,H,39,G,686,G,2890,7,...,N,N,N,N,N,N,N,N,11,1001


**Make dataframe of only NAICS retail code (44, 45)**

In [26]:
retail22 = county22_data[county22_data['naics_sector'].isin(['44', '45'])]
retail21 = county21_data[county21_data['naics_sector'].isin(['44', '45'])]
retail20 = county20_data[county20_data['naics_sector'].isin(['44', '45'])]
retail19 = county19_data[county19_data['naics_sector'].isin(['44', '45'])]
retail18 = county18_data[county18_data['naics_sector'].isin(['44', '45'])]

**Make dataframe of only companies with 20 or fewer employees**

In [27]:
small_retail22 = retail22[retail22['emp'] <= 20]
small_retail21 = retail21[retail21['emp'] <= 20]
small_retail20 = retail20[retail20['emp'] <= 20]
small_retail19 = retail19[retail19['emp'] <= 20]
small_retail18 = retail18[retail18['emp'] <= 20]

**Import population data**

In [28]:
population_info = pd.read_csv('../data/County_population.csv', dtype={'FIPS': str})

**Merge population info with small retail dataframes for each year**

In [48]:
 small_retail_with_population22 = pd.merge(
    small_retail22,
    population_info,
    on='FIPS',
    how = 'left'

)
small_retail_with_population21 = pd.merge(
    small_retail21,
    population_info,
    on='FIPS',
    how = 'left'

)
small_retail_with_population20 = pd.merge(
    small_retail20,
    population_info,
    on='FIPS',
    how = 'left'

)
small_retail_with_population19 = pd.merge(
    small_retail19,
    population_info,
    on='FIPS',
    how = 'left'

)
small_retail_with_population18 = pd.merge(
    small_retail18,
    population_info,
    on='FIPS',
    how = 'left'

)

**Groupby counties (Number of small retail businesses in each county)**

In [36]:
county_counts22 = small_retail_with_population22.groupby('FIPS').agg({
    "Description": "count"
   
})
county_counts21 = small_retail_with_population21.groupby('FIPS').agg({
    "Description": "count"
   
})
county_counts20 = small_retail_with_population20.groupby('FIPS').agg({
    "Description": "count"
   
})
county_counts19 = small_retail_with_population19.groupby('FIPS').agg({
    "Description": "count"
   
})
county_counts18 = small_retail_with_population18.groupby('FIPS').agg({
    "Description": "count"
   
})

**Export county counts**

In [37]:
county_counts22.to_csv('../output/county_counts22.csv')
county_counts21.to_csv('../output/county_counts21.csv')
county_counts20.to_csv('../output/county_counts20.csv')
county_counts19.to_csv('../output/county_counts19.csv')
county_counts18.to_csv('../output/county_counts18.csv')

**Groupby rural/urban codes (number of small retail businesses in each population type)**

In [38]:
rural_sub_urban22 = small_retail_with_population22.groupby('RUCC_2023').agg({
    "Description": "count"
   
})
rural_sub_urban21 = small_retail_with_population21.groupby('RUCC_2023').agg({
    "Description": "count"
   
})
rural_sub_urban20 = small_retail_with_population20.groupby('RUCC_2023').agg({
    "Description": "count"
   
})
rural_sub_urban18 = small_retail_with_population19.groupby('RUCC_2023').agg({
    "Description": "count"
   
})
rural_sub_urban19 = small_retail_with_population18.groupby('RUCC_2023').agg({
    "Description": "count"
   
})

**See the rural/suburban/urban counts**

In [93]:
print(rural_sub_urban19)

RUCC_2023
1.0    4169
2.0    3743
3.0    3583
4.0    3597
5.0    1332
6.0    4956
7.0    3317
8.0    2860
9.0    2699
dtype: int64


**Export rural/suburban counts CSV files to output folder**

In [94]:
rural_sub_urban22.to_csv('../output/rural_sub_urban22.csv')
rural_sub_urban21.to_csv('../output/rural_sub_urban21.csv')
rural_sub_urban20.to_csv('../output/rural_sub_urban20.csv')
rural_sub_urban19.to_csv('../output/rural_sub_urban19.csv')
rural_sub_urban18.to_csv('../output/rural_sub_urban18.csv')

**Groupby state (number of small retail businesses per state)**

In [46]:
state_counts22 = small_retail_with_population22.groupby('fipstate').agg({
    "Description": "count"
})
state_counts21 = small_retail_with_population21.groupby('fipstate').agg({
    "Description": "count"
})
state_counts20 = small_retail_with_population20.groupby('fipstate').agg({
    "Description": "count"
})
state_counts19 = small_retail_with_population19.groupby('fipstate').agg({
    "Description": "count"
})
state_counts18 = small_retail_with_population18.groupby('fipstate').agg({
    "Description": "count"
})

**See the state counts**

In [47]:
print(state_counts22)

          Description
fipstate             
01                752
02                186
04                147
05                702
06                532
08                565
09                 92
10                 33
11                 13
12                624
13               1552
15                 64
16                343
17                971
18               1002
19                915
20                694
21               1006
22                547
23                247
24                261
25                128
26               1028
27                911
28                854
29               1052
30                326
31                452
32                 81
33                133
34                205
35                316
36                900
37               1356
38                222
39                959
40                794
41                423
42                762
44                 87
45                504
46                291
47                888
48        

**Export state counts csv files to output folder**

In [97]:
state_counts22.to_csv('../output/state_counts22.csv')
state_counts21.to_csv('../output/state_counts21.csv')
state_counts20.to_csv('../output/state_counts20.csv')
state_counts19.to_csv('../output/state_counts19.csv')
state_counts18.to_csv('../output/state_counts18.csv')