<a href="https://colab.research.google.com/github/ReidelVichot/LC_identification/blob/main/LC_identification_Rivera.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#1. PROBLEM DEFINITION

**Background**

“A logistics cluster (LC) is defined as the geographical concentration of firms providing logistics services, such as transportation carriers, warehousing providers, third-party logistics (3PL-s), and forwarders, as well as some other enterprises that are mainly in the logistics business, including logistics enterprises to provide services to various industries” (Rivera et al., 2014, p. 223).  

Several relevant scholars in the field of logistics claim that clustering logistic activity has a positive impact on the efficiency of the economic activity, reduction of costs, and increase of collaboration among the firms that belong to the cluster (Rivera et al., 2014; Rivera, Gligor, et al., 2016; Rivera, Sheffi, et al., 2016; Sheffi, 2013, 2012). Although some of these authors mention that some of these benefits require some trade-offs (Rivera, Gligor, et al., 2016), these trade-offs are not further explored, resulting in an incomplete understanding of the socio-economic effects of the agglomeration of logistics activity. This becomes more problematic given that governments around the world seem to be embracing the idea of logistics clusters being some sort of panacea for economic development based on supply chain management improvements (Baranowski et al., 2015; Baydar et al., 2019; Chung, 2016), even though empirical studies that assess the role of government spending on the formation of logistics clusters are lacking (Liu et al., 2022). In other words, the field still lacks methodological and theoretical development, resulting in an incomplete understanding of the mechanisms of logistical clustering and their socio-economic effects.

**Problem**

There is not a current database of logistics clusters in the US. However, Rivera et al (2014) designed a method to test logistical agglomeration in US counties using NAICS codes and [CBP](https://www.census.gov/programs-surveys/cbp.html) information. Before conducting analyis on the effects of Logistics Clusters on society and the role of governments in their formation it is necessary to have an accurate picture of all logistics clusters in the US. For this purpose, I will extend Reviera's et al (2014) methodology to all the CBP years in which NAICS codes are used and use this database for future analyses.

#2. DATA COLLECTION

In [109]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [110]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import geopandas as gpd

pd.options.mode.copy_on_write = True

In [125]:
dpath = "/content/drive/MyDrive/Disertation/"
fname98 = dpath + "CBP_data/cbp98co/cbp98co.txt"
fname08 = dpath + "CBP_data/cbp08co/cbp08co.txt"

In [126]:
cbp98 = pd.read_csv(fname98)
cbp08 = pd.read_csv(fname08)

In [127]:
cols = ['fipstate', 'fipscty', 'naics', 'emp', 'est']
cbp98 = cbp98[cols]
cbp08 = cbp08[cols]

In [128]:
# Eliminatin fipscty code 999, this code refers to statewide employment and
# establishments
cbp98_total = cbp98[cbp98.naics.str.startswith("-----")]
cbp98_total = cbp98_total[cbp98_total.fipscty != 999]

cbp08_total = cbp08[cbp08.naics.str.startswith("-----")]
cbp08_total = cbp08_total[cbp08_total.fipscty != 999]

In [129]:
cbp98_total = cbp98_total.rename(columns={"emp": "emp_tot", "est": "est_tot"}).drop(columns="naics")
cbp08_total = cbp08_total.rename(columns={"emp": "emp_tot", "est": "est_tot"}).drop(columns="naics")

In [130]:
cbp98_total["GEOID"] = cbp98_total.fipstate.astype(str).str.zfill(2) + cbp98_total.fipscty.astype(str).str.zfill(3)
cbp98_total.drop(columns=["fipscty"], inplace=True)

cbp08_total["GEOID"] = cbp08_total.fipstate.astype(str).str.zfill(2) + cbp08_total.fipscty.astype(str).str.zfill(3)
cbp08_total.drop(columns=["fipscty"], inplace=True)

In [131]:
# List of logistics sector-related NAICS codes
logistics_sectors = [ "481112", "481212", "481219", "483111", "483113",
                      "483211", "484110", "484121", "484122", "484220",
                      "484230", "488119", "488190", "488210", "488310",
                      "488320", "488330", "488390", "488410", "488490",
                      "488510", "488991", "488999", "492110", "492210",
                      "493110", "493190" ]

In [132]:
cbp98_logistics = cbp98[cbp98["naics"].isin(logistics_sectors)]
cbp08_logistics = cbp08[cbp08["naics"].isin(logistics_sectors)]


In [133]:
cbp98_logistics["GEOID"] = cbp98_logistics.fipstate.astype(str).str.zfill(2) + cbp98_logistics.fipscty.astype(str).str.zfill(3)
cbp98_logistics = cbp98_logistics.drop(columns=["fipscty", "fipstate"])

cbp08_logistics["GEOID"] = cbp08_logistics.fipstate.astype(str).str.zfill(2) + cbp08_logistics.fipscty.astype(str).str.zfill(3)
cbp08_logistics = cbp08_logistics.drop(columns=["fipscty", "fipstate"])

In [157]:
cbp98 = pd.merge(cbp98_logistics.groupby("GEOID").sum(numeric_only=True).reset_index(), cbp98_total, on = ['GEOID'], how = 'outer').fillna(0)
cbp98[["fipstate", "emp", "est", "emp_tot", "est_tot"]] = cbp98[["fipstate", "emp", "est", "emp_tot", "est_tot"]].astype(int)

cbp08 = pd.merge(cbp08_logistics.groupby("GEOID").sum(numeric_only=True).reset_index(), cbp08_total, on = ['GEOID'], how = 'outer').fillna(0)
cbp08[["fipstate", "emp", "est", "emp_tot", "est_tot"]] = cbp08[["fipstate", "emp", "est", "emp_tot", "est_tot"]].astype(int)

#3. DATA PREPARATION

In [158]:
cbp98["year"] = 1998
cbp08["year"] = 2008

In [159]:
frames = [cbp98, cbp08]

In [160]:
cbp = pd.concat(frames)
cbp.reset_index().drop(columns="index")

Unnamed: 0,GEOID,emp,est,fipstate,emp_tot,est_tot,year
0,01001,0,16,1,8100,754,1998
1,01003,229,103,1,39662,3760,1998
2,01005,80,47,1,9773,605,1998
3,01007,62,22,1,3636,340,1998
4,01009,129,35,1,7670,716,1998
...,...,...,...,...,...,...,...
6286,48263,0,0,48,41,9,2008
6287,48271,0,0,48,345,40,2008
6288,48301,0,0,48,0,1,2008
6289,48327,0,0,48,239,52,2008


In [168]:
cbp[cbp["year"]== 1998]["est"].sum()

149584

In [171]:
for i in cbp.year.unique():
  cbp["LEP"] = cbp["est"]/cbp[cbp["year"]== i]["est"].sum()
  print(cbp[cbp["year"]== i]["est"].sum())

149584
178821


In [210]:
a = [cbp["year"]==2008] and [cbp['LEP']>0.001]

In [237]:
cbp[(cbp['year'] == 2008) & (cbp["LEP"] > 0.001)]

Unnamed: 0,GEOID,emp,est,fipstate,emp_tot,est_tot,year,LEP
36,01073,8326,326,1,345325,17329,2008,0.001823
48,01097,5035,304,1,161755,9238,2008,0.001700
69,02020,2809,248,2,140981,8486,2008,0.001387
102,04013,40302,1591,4,1602078,89440,2008,0.008897
105,04019,4357,326,4,328976,21265,2008,0.001823
...,...,...,...,...,...,...,...,...
2952,53077,2307,183,53,64898,4799,2008,0.001023
3013,55009,3221,253,55,139148,6597,2008,0.001415
3021,55025,2813,256,55,261219,13423,2008,0.001432
3049,55079,10916,499,55,476144,20715,2008,0.002790


In [208]:
a

0       False
1       False
2       False
3       False
4       False
        ...  
3140     True
3141     True
3142     True
3143     True
3144     True
Name: year, Length: 6291, dtype: bool

In [181]:
cbp[[[cbp["year"]==2008] & cbp['LEP']>0.001]]

ValueError: Arrays were different lengths: 6291 vs 1

In [None]:
#county = gpd.read_file(dpath + "/countyshp/tl_2023_us_county/tl_2023_us_county.shp")
county = gpd.read_file(dpath + "/countyshp/tl_2010_us_county10/tl_2010_us_county10.shp")

In [None]:
county.columns

In [None]:
print(len(county))
# -- remove Alaska
county = county[county["STATEFP10"] != "02"]
print(len(county))
# -- remove Hawaii
county = county[county["STATEFP10"] != "15"]
print(len(county))
# -- remove American Samoa
county = county[county["STATEFP10"] != "60"]
print(len(county))
# -- remove Guam
county = county[county["STATEFP10"] != "66"]
print(len(county))
# -- remove Northern Marianas
county = county[county["STATEFP10"] != "69"]
print(len(county))
# -- remove Puerto Rico
county = county[county["STATEFP10"] != "72"]
print(len(county))
# -- remove Virgin Islands
county = county[county["STATEFP10"] != "78"]
print(len(county))
# -- remove DC
county = county[county["STATEFP10"] != "11"]
print(len(county))

In [None]:
county.plot()

In [None]:
np.sort(county.STATEFP10.unique().astype(int))

In [None]:
county.head()

In [None]:
county = county[['COUNTYFP10', 'NAME10', 'geometry']]

In [None]:
len(cbp)

In [None]:
print(len(cbp))
# -- remove Alaska
cbp = cbp[cbp["fipstate"] != 2]
print(len(cbp))
# -- remove Hawaii
cbp = cbp[cbp["fipstate"] != 15]
print(len(cbp))
# -- remove Puerto Rico
cbp = cbp[cbp["fipstate"] != 72]
print(len(cbp))
# -- remove Virgin Islands
cbp = cbp[cbp["fipstate"] != 78]
print(len(cbp))
# -- remove American Samoa
cbp = cbp[cbp["fipstate"] != 60]
# -- remove Guam
cbp = cbp[cbp["fipstate"] != 66]
len(cbp)

In [None]:
cbp["GEOID"] = cbp.fipstate.astype(str).str.zfill(2) + cbp.fipscty.astype(str).str.zfill(3)

In [None]:
cbp.drop(columns=["fipstate", "fipscty"], inplace=True)

In [None]:
cbp_grouped = cbp.groupby(["year","GEOID"]).sum().reset_index()

In [None]:
county = county.merge(cbp_grouped, left_on="CNTYIDFP", right_on="GEOID")

In [None]:
county

In [None]:
cbp_2008 = county[county["year"]==2008]

In [None]:
len(cbp_2008)

In [None]:
cbp_2008["LEP"] = cbp_2008["est"]/cbp_2008["est"].sum()

In [None]:
cbp_2008[cbp_2008["LEP"] > 0.001717].plot()

In [None]:
len(cbp_2008[cbp_2008["LEP"] > 0.001717])

In [None]:
cbp_2008["LQ"] =

In [None]:
for year in cbp["year"].unique():


#4. MACHINE LEARNING

#5. PROBLEM SOLUTION