<a href="https://colab.research.google.com/github/ReidelVichot/LC_identification/blob/main/LC_identification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#1. PROBLEM DEFINITION

**Background**

“A logistics cluster (LC) is defined as the geographical concentration of firms providing logistics services, such as transportation carriers, warehousing providers, third-party logistics (3PL-s), and forwarders, as well as some other enterprises that are mainly in the logistics business, including logistics enterprises to provide services to various industries” (Rivera et al., 2014, p. 223).  

Several relevant scholars in the field of logistics claim that clustering logistic activity has a positive impact on the efficiency of the economic activity, reduction of costs, and increase of collaboration among the firms that belong to the cluster (Rivera et al., 2014; Rivera, Gligor, et al., 2016; Rivera, Sheffi, et al., 2016; Sheffi, 2013, 2012). Although some of these authors mention that some of these benefits require some trade-offs (Rivera, Gligor, et al., 2016), these trade-offs are not further explored, resulting in an incomplete understanding of the socio-economic effects of the agglomeration of logistics activity. This becomes more problematic given that governments around the world seem to be embracing the idea of logistics clusters being some sort of panacea for economic development based on supply chain management improvements (Baranowski et al., 2015; Baydar et al., 2019; Chung, 2016), even though empirical studies that assess the role of government spending on the formation of logistics clusters are lacking (Liu et al., 2022). In other words, the field still lacks methodological and theoretical development, resulting in an incomplete understanding of the mechanisms of logistical clustering and their socio-economic effects.

**Problem**

There is not a current database of logistics clusters in the US. However, Rivera et al (2014) designed a method to test logistical agglomeration in US counties using NAICS codes and [CBP](https://www.census.gov/programs-surveys/cbp.html) information. Before conducting analyis on the effects of Logistics Clusters on society and the role of governments in their formation it is necessary to have an accurate picture of all logistics clusters in the US. For this purpose, I will extend Reviera's et al (2014) methodology to all the CBP years in which NAICS codes are used and use this database for future analyses.

#2. DATA COLLECTION

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import geopandas as gpd

In [3]:
# -- files downloaded from the census CBP
dpath = "/content/drive/MyDrive/Disertation/CBP_data"
fname98 = dpath + "/cbp98co/cbp98co.txt"
fname99 = dpath + "/cbp99co/cbp99co.txt"
fname00 = dpath + "/cbp00co/cbp00co.txt"
fname01 = dpath + "/cbp01co/cbp01co.txt"
fname02 = dpath + "/cbp02co/cbp02co.txt"
fname03 = dpath + "/cbp03co/cbp03co.txt"
fname04 = dpath + "/cbp04co/cbp04co.txt"
fname05 = dpath + "/cbp05co/cbp05co.txt"
fname06 = dpath + "/cbp06co/cbp06co.txt"
fname07 = dpath + "/cbp07co/cbp07co.txt"
fname08 = dpath + "/cbp08co/cbp08co.txt"
fname09 = dpath + "/cbp09co/cbp09co.txt"
fname10 = dpath + "/cbp10co/cbp10co.txt"
fname11 = dpath + "/cbp11co/cbp11co.txt"
fname12 = dpath + "/cbp12co/cbp12co.txt"
fname13 = dpath + "/cbp13co/cbp13co.txt"
fname14 = dpath + "/cbp14co/cbp14co.txt"
fname15 = dpath + "/cbp15co/cbp15co.txt"
fname16 = dpath + "/cbp16co/cbp16co.txt"
fname17 = dpath + "/cbp17co/cbp17co.txt"
fname18 = dpath + "/cbp18co/cbp18co.txt"
fname19 = dpath + "/cbp19co/cbp19co.txt"
fname20 = dpath + "/cbp20co/cbp20co.txt"
fname21 = dpath + "/cbp21co/cbp21co.txt"

In [57]:
cbp98 = pd.read_csv(fname98)
cbp99 = pd.read_csv(fname99)
cbp00 = pd.read_csv(fname00)
cbp01 = pd.read_csv(fname01)
cbp02 = pd.read_csv(fname02)
cbp03 = pd.read_csv(fname03)
cbp04 = pd.read_csv(fname04)
cbp05 = pd.read_csv(fname05)
cbp06 = pd.read_csv(fname06)
cbp07 = pd.read_csv(fname07)
cbp08 = pd.read_csv(fname08)
cbp09 = pd.read_csv(fname09)
cbp10 = pd.read_csv(fname10)
cbp11 = pd.read_csv(fname11)
cbp12 = pd.read_csv(fname12)
cbp13 = pd.read_csv(fname13)
cbp14 = pd.read_csv(fname14)
cbp15 = pd.read_csv(fname15)
cbp16 = pd.read_csv(fname16)
cbp17 = pd.read_csv(fname17)
cbp18 = pd.read_csv(fname18)
cbp19 = pd.read_csv(fname19)
cbp20 = pd.read_csv(fname20)
cbp21 = pd.read_csv(fname21)

In [58]:
cbp98 = cbp98[cbp98.naics.str.startswith(("48","49"))]
cbp99 = cbp99[cbp99.naics.str.startswith(("48","49"))]
cbp00 = cbp00[cbp00.naics.str.startswith(("48","49"))]
cbp01 = cbp01[cbp01.naics.str.startswith(("48","49"))]
cbp02 = cbp02[cbp02.naics.str.startswith(("48","49"))]
cbp03 = cbp03[cbp03.naics.str.startswith(("48","49"))]
cbp04 = cbp04[cbp04.naics.str.startswith(("48","49"))]
cbp05 = cbp05[cbp05.naics.str.startswith(("48","49"))]
cbp06 = cbp06[cbp06.naics.str.startswith(("48","49"))]
cbp07 = cbp07[cbp07.naics.str.startswith(("48","49"))]
cbp08 = cbp08[cbp08.naics.str.startswith(("48","49"))]
cbp09 = cbp09[cbp09.naics.str.startswith(("48","49"))]
cbp10 = cbp10[cbp10.naics.str.startswith(("48","49"))]
cbp11 = cbp11[cbp11.naics.str.startswith(("48","49"))]
cbp12 = cbp12[cbp12.naics.str.startswith(("48","49"))]
cbp13 = cbp13[cbp13.naics.str.startswith(("48","49"))]
cbp14 = cbp14[cbp14.naics.str.startswith(("48","49"))]
cbp15 = cbp15[cbp15.NAICS.str.startswith(("48","49"))]
cbp16 = cbp16[cbp16.naics.str.startswith(("48","49"))]
cbp17 = cbp17[cbp17.naics.str.startswith(("48","49"))]
cbp18 = cbp18[cbp18.naics.str.startswith(("48","49"))]
cbp19 = cbp19[cbp19.naics.str.startswith(("48","49"))]
cbp20 = cbp20[cbp20.naics.str.startswith(("48","49"))]
cbp21 = cbp21[cbp21.naics.str.startswith(("48","49"))]

#3. DATA PREPARATION

In [121]:
cols = ['fipstate', 'fipscty', 'naics', 'emp', 'est']
cbp98 = cbp98[cols]
cbp99 = cbp99[cols]
cbp00 = cbp00[cols]
cbp01 = cbp01[cols]
cbp02 = cbp02[cols]
cbp03 = cbp03[cols]
cbp04 = cbp04[cols]
cbp05 = cbp05[cols]
cbp06 = cbp06[cols]
cbp07 = cbp07[cols]
cbp08 = cbp08[cols]
cbp09 = cbp09[cols]
cbp10 = cbp10[cols]
cbp11 = cbp11[cols]
cbp12 = cbp12[cols]
cbp13 = cbp13[cols]
cbp14 = cbp14[cols]
cbp15.columns = cbp15.columns.str.lower()
cbp15 = cbp15[cols]
cbp16 = cbp16[cols]
cbp17 = cbp17[cols]
cbp18 = cbp18[cols]
cbp19 = cbp19[cols]
cbp20 = cbp20[cols]
cbp21 = cbp21[cols]

In [122]:
cbp98["year"] = 1998
cbp99["year"] = 1999
cbp00["year"] = 2000
cbp01["year"] = 2001
cbp02["year"] = 2002
cbp03["year"] = 2003
cbp04["year"] = 2004
cbp05["year"] = 2005
cbp06["year"] = 2006
cbp07["year"] = 2007
cbp08["year"] = 2008
cbp09["year"] = 2009
cbp10["year"] = 2010
cbp11["year"] = 2011
cbp12["year"] = 2012
cbp13["year"] = 2013
cbp14["year"] = 2014
cbp15["year"] = 2015
cbp16["year"] = 2016
cbp17["year"] = 2017
cbp18["year"] = 2018
cbp19["year"] = 2019
cbp20["year"] = 2020
cbp21["year"] = 2021

In [125]:
frames = [cbp98, cbp99, cbp00, cbp01, cbp02, cbp03, cbp04, cbp05, cbp06, cbp07,
       cbp08, cbp09, cbp10, cbp11, cbp12, cbp13, cbp14, cbp15, cbp16, cbp17,
       cbp18, cbp19, cbp20, cbp21]

In [126]:
cbp = pd.concat(frames)

In [130]:
cbp.reset_index().drop(columns="index")

Unnamed: 0,fipstate,fipscty,naics,emp,est,year
0,1,1,48----,131,20,1998
1,1,1,481///,0,1,1998
2,1,1,4812//,0,1,1998
3,1,1,48121/,0,1,1998
4,1,1,481219,0,1,1998
...,...,...,...,...,...,...
2354444,56,45,4861//,30,3,2021
2354445,56,45,48611/,30,3,2021
2354446,56,45,486110,30,3,2021
2354447,56,999,48----,183,8,2021


#4. MACHINE LEARNING

#5. PROBLEM SOLUTION