<a href="https://colab.research.google.com/github/ReidelVichot/LC_identification/blob/main/LC_identification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#1. PROBLEM DEFINITION

**Background**

“A logistics cluster (LC) is defined as the geographical concentration of firms providing logistics services, such as transportation carriers, warehousing providers, third-party logistics (3PL-s), and forwarders, as well as some other enterprises that are mainly in the logistics business, including logistics enterprises to provide services to various industries” (Rivera et al., 2014, p. 223).  

Several relevant scholars in the field of logistics claim that clustering logistic activity has a positive impact on the efficiency of the economic activity, reduction of costs, and increase of collaboration among the firms that belong to the cluster (Rivera et al., 2014; Rivera, Gligor, et al., 2016; Rivera, Sheffi, et al., 2016; Sheffi, 2013, 2012). Although some of these authors mention that some of these benefits require some trade-offs (Rivera, Gligor, et al., 2016), these trade-offs are not further explored, resulting in an incomplete understanding of the socio-economic effects of the agglomeration of logistics activity. This becomes more problematic given that governments around the world seem to be embracing the idea of logistics clusters being some sort of panacea for economic development based on supply chain management improvements (Baranowski et al., 2015; Baydar et al., 2019; Chung, 2016), even though empirical studies that assess the role of government spending on the formation of logistics clusters are lacking (Liu et al., 2022). In other words, the field still lacks methodological and theoretical development, resulting in an incomplete understanding of the mechanisms of logistical clustering and their socio-economic effects.

**Problem**

There is not a current database of logistics clusters in the US. However, Rivera et al (2014) designed a method to test logistical agglomeration in US counties using NAICS codes and [CBP](https://www.census.gov/programs-surveys/cbp.html) information. Before conducting analyis on the effects of Logistics Clusters on society and the role of governments in their formation it is necessary to have an accurate picture of all logistics clusters in the US. For this purpose, I will extend Reviera's et al (2014) methodology to all the CBP years in which NAICS codes are used and use this database for future analyses.

#2. DATA COLLECTION

In [3]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [4]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import geopandas as gpd

In [1]:
# -- files downloaded from the census CBP
dpath = "/content/drive/MyDrive/Disertation/CBP_data"
fname98 = dpath + "/cbp98co/cbp98co.txt"
fname99 = dpath + "/cbp99co/cbp99co.txt"
fname00 = dpath + "/cbp00co/cbp00co.txt"
fname01 = dpath + "/cbp01co/cbp01co.txt"
fname02 = dpath + "/cbp02co/cbp02co.txt"
fname03 = dpath + "/cbp03co/cbp03co.txt"
fname04 = dpath + "/cbp04co/cbp04co.txt"
fname05 = dpath + "/cbp05co/cbp05co.txt"
fname06 = dpath + "/cbp06co/cbp06co.txt"
fname07 = dpath + "/cbp07co/cbp07co.txt"
fname08 = dpath + "/cbp08co/cbp08co.txt"
fname09 = dpath + "/cbp09co/cbp09co.txt"
fname10 = dpath + "/cbp10co/cbp10co.txt"
fname11 = dpath + "/cbp11co/cbp11co.txt"
fname12 = dpath + "/cbp12co/cbp12co.txt"
fname13 = dpath + "/cbp13co/cbp13co.txt"
fname14 = dpath + "/cbp14co/cbp14co.txt"
fname15 = dpath + "/cbp15co/cbp15co.txt"
fname16 = dpath + "/cbp16co/cbp16co.txt"
fname17 = dpath + "/cbp17co/cbp17co.txt"
fname18 = dpath + "/cbp18co/cbp18co.txt"
fname19 = dpath + "/cbp19co/cbp19co.txt"
fname20 = dpath + "/cbp20co/cbp20co.txt"
fname21 = dpath + "/cbp21co/cbp21co.txt"


In [2]:
cbp98 = pd.read_csv(fname98)
cbp99 = pd.read_csv(fname99)
cbp00 = pd.read_csv(fname00)
cbp01 = pd.read_csv(fname01)
cbp02 = pd.read_csv(fname02)
cbp03 = pd.read_csv(fname03)
cbp04 = pd.read_csv(fname04)
cbp05 = pd.read_csv(fname05)
cbp06 = pd.read_csv(fname06)
cbp07 = pd.read_csv(fname07)
cbp08 = pd.read_csv(fname08)
cbp09 = pd.read_csv(fname09)
cbp10 = pd.read_csv(fname10)
cbp11 = pd.read_csv(fname11)
cbp12 = pd.read_csv(fname12)
cbp13 = pd.read_csv(fname13)
cbp14 = pd.read_csv(fname14)
cbp15 = pd.read_csv(fname15)
cbp16 = pd.read_csv(fname16)
cbp17 = pd.read_csv(fname17)
cbp18 = pd.read_csv(fname18)
cbp19 = pd.read_csv(fname19)
cbp20 = pd.read_csv(fname20)
cbp21 = pd.read_csv(fname21)

NameError: name 'pd' is not defined

In [9]:
cbp97

Unnamed: 0,fipstate,fipscty,sic,empflag,emp,qp1,ap,est,n1_4,n5_9,...,n100_249,n250_499,n500_999,n1000,n1000_1,n1000_2,n1000_3,n1000_4,censtate,cencty
0,1,1,----,,8288,39265,166744,752,439,147,...,9,2,2,0,0,0,0,0,63,1
1,1,1,07--,,89,205,1049,22,18,1,...,0,0,0,0,0,0,0,0,63,1
2,1,1,0700,,82,175,938,19,15,1,...,0,0,0,0,0,0,0,0,63,1
3,1,1,0720,A,0,0,0,2,2,0,...,0,0,0,0,0,0,0,0,63,1
4,1,1,0740,,17,48,206,3,2,0,...,0,0,0,0,0,0,0,0,63,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1045380,56,999,40--,B,0,0,0,2,0,0,...,0,0,0,0,0,0,0,0,83,999
1045381,56,999,4200,A,0,0,0,1,0,0,...,0,0,0,0,0,0,0,0,83,999
1045382,56,999,4210,A,0,0,0,1,0,0,...,0,0,0,0,0,0,0,0,83,999
1045383,56,999,4600,B,0,0,0,1,0,0,...,0,0,0,0,0,0,0,0,83,999


#3. DATA PREPARATION

#4. MACHINE LEARNING

#5. PROBLEM SOLUTION