# Transborder Freight Data Analysis

## Business Understanding

### 1. Background and Context
Transportation systems are the foundation of modern economies, playing a crucial role in commerce, tourism, and everyday living by facilitating the efficient movement of goods, services, and people. However, as these systems expand and become more intricate, they face growing challenges, such as:

- **Safety concerns** (e.g., accidents and fatalities).
- **Congestion** (leading to delays and economic inefficiencies).
- **Infrastructure stress** (aging systems unable to meet rising demand).
- **Environmental impacts** (e.g., greenhouse gas emissions).
- **Economic disruptions** (e.g., supply chain delays affecting productivity).

The Bureau of Transportation Statistics (BTS) collects and maintains comprehensive data across multiple transportation modes—road, rail, air, and water. This data includes metrics like passenger travel, freight movement, safety incidents, infrastructure capacity, and environmental impacts. These insights are critical for policymakers, transportation agencies, and businesses to design strategies that address inefficiencies, improve safety, and enhance sustainability.

---

### 2. Business Problem
The BTS faces persistent challenges in identifying inefficiencies, mitigating safety issues, and addressing sustainability concerns across transportation networks. Despite its wealth of data, there is a need to:

- Extract actionable insights from this data to inform decision-making.
- Understand underlying patterns and trends in transportation metrics.
- Provide targeted recommendations to optimize the performance of transportation systems.

<!-- ### Key Questions to Address:
1. What are the key inefficiencies in transportation systems, and how can they be mitigated?
2. How can patterns in safety incidents help improve preventive measures?
3. Which transportation modes are under the greatest environmental stress, and how can their sustainability be improved?
4. What are the critical factors contributing to congestion, and how can these be alleviated?
5. How can freight movement and passenger travel be optimized to enhance economic productivity?

--- -->

## 3. Objectives of the Analysis
The primary objective of the project is to analyze the BTS data to:

1. **Identify inefficiencies:** Pinpoint bottlenecks, delays, and underutilized resources across transportation modes.
2. **Improve safety:** Uncover trends and risk factors to develop recommendations for accident prevention.
3. **Optimize capacity:** Determine areas of infrastructure stress and suggest strategies to enhance efficiency.
4. **Enhance sustainability:** Assess the environmental impact of various modes and propose greener alternatives.
5. **Boost economic productivity:** Provide actionable insights to reduce disruptions and improve overall system performance.

By achieving these objectives, the analysis aims to empower BTS to address its challenges effectively and support policymakers, agencies, and businesses in making data-driven decisions.

---

## 4. Key Stakeholders
The project involves several stakeholders:

1. **Bureau of Transportation Statistics (BTS):** The primary client that will use the analysis to improve transportation systems.
2. **Policymakers:** Decision-makers who rely on BTS data to create regulations and allocate resources.
3. **Transportation Agencies:** Organizations managing roads, railways, airways, and waterways that need insights for operational improvements.
4. **Businesses:** Companies dependent on transportation systems for logistics and supply chain management.
5. **Public:** The ultimate beneficiaries of improved safety, efficiency, and sustainability in transportation.

---

## 5. Constraints and Challenges
1. **Data Quality:** Ensuring the BTS data is clean, accurate, and complete for reliable analysis.
2. **Data Volume:** Managing and processing large datasets efficiently.
3. **Complex Metrics:** Understanding and integrating diverse transportation metrics (e.g., safety incidents, emissions, freight movement).
4. **Resource Allocation:** Prioritizing recommendations that are feasible and impactful given budgetary and logistical constraints.
5. **Stakeholder Needs:** Balancing the diverse priorities of stakeholders, from economic productivity to environmental sustainability.

---

## 6. Success Criteria
The success of the project will be determined by:

1. **Insights Generated:** Delivering actionable insights that address the BTS’s challenges and objectives.
2. **Stakeholder Satisfaction:** Meeting or exceeding the expectations of the BTS and other stakeholders.
3. **Impact on Decision-Making:** Enabling data-driven strategies that lead to measurable improvements in transportation systems.
4. **Feasibility of Recommendations:** Providing realistic and implementable recommendations that can be executed within existing constraints.

---

## 7. Scope of Analysis
The analysis will focus on the following dimensions of BTS data:

1. **Passenger Travel:** Patterns and trends in movement across various transportation modes.
2. **Freight Movement:** Identifying inefficiencies and opportunities to optimize logistics.
3. **Safety Incidents:** Analyzing causes and locations of accidents to recommend preventive measures.
4. **Infrastructure Capacity:** Assessing utilization rates and stress points across networks.
5. **Environmental Impacts:** Evaluating greenhouse gas emissions and sustainability metrics.

---

## 8. Deliverables
The final outputs of this phase will include:

1. **Business Understanding Document:** A detailed report summarizing the problem, objectives, stakeholders, and success criteria.
2. **Key Metrics and KPIs:** A list of metrics to be analyzed (e.g., accident rates, freight delays, emissions levels).
3. **Initial Hypotheses:** Proposed areas of inefficiency or risk to be validated during the data understanding and analysis phases.

---



#### Import Necessary Libraries

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import plotly.express as px
import warnings

warnings.filterwarnings("ignore")

###  Data Loading and Merging

### Loading the 2020 Data

##### January 2020

In [2]:
dot1 = pd.read_csv("../data/2020/Jan 2020/dot1_0120.csv")
dot1_ytd = pd.read_csv("../data/2020/Jan 2020/dot1_ytd_0120.csv")
dot2 =  pd.read_csv("../data/2020/Jan 2020/dot2_0120.csv") 
dot2_ytd = pd.read_csv("../data/2020/Jan 2020/dot2_ytd_0120.csv")
dot3 = pd.read_csv("../data/2020/Jan 2020/dot3_0120.csv")
dot3_ytd = pd.read_csv("../data/2020/Jan 2020/dot3_ytd_0120.csv")
jan_2020 = pd.concat([dot1,dot1_ytd,dot2,dot2_ytd,dot3,dot3_ytd],axis=0)
print(f"shape of data: {jan_2020.shape}")
jan_2020.head()


shape of data: (232500, 15)


Unnamed: 0,TRDTYPE,USASTATE,DEPE,DISAGMOT,MEXSTATE,CANPROV,COUNTRY,VALUE,SHIPWT,FREIGHT_CHARGES,DF,CONTCODE,MONTH,YEAR,COMMODITY2
0,1,AK,07XX,3,,XA,1220,3302,378,125,1.0,X,1,2020,
1,1,AK,20XX,3,,XA,1220,133362,137,1563,1.0,X,1,2020,
2,1,AK,20XX,3,,XA,1220,49960,66,2631,2.0,X,1,2020,
3,1,AK,20XX,3,,XC,1220,21184,3418,795,1.0,X,1,2020,
4,1,AK,20XX,3,,XM,1220,4253,2,75,1.0,X,1,2020,


##### February-2020

In [3]:
dot1 = pd.read_csv("../data/2020/Feb 2020/dot1_0220.csv")
dot1_ytd = pd.read_csv("../data/2020/Feb 2020/dot1_ytd_0220.csv")
dot2 =  pd.read_csv("../data/2020/Feb 2020/dot2_0220.csv") 
dot2_ytd = pd.read_csv("../data/2020/Feb 2020/dot2_ytd_0220.csv")
dot3 = pd.read_csv("../data/2020/Feb 2020/dot3_0220.csv")
dot3_ytd = pd.read_csv("../data/2020/Feb 2020/dot3_ytd_0220.csv")
feb_2020 = pd.concat([dot1,dot1_ytd,dot2,dot2_ytd,dot3,dot3_ytd],axis=0)
print(f"shape of data: {feb_2020.shape}")
feb_2020.head()


shape of data: (348540, 15)


Unnamed: 0,TRDTYPE,USASTATE,DEPE,DISAGMOT,MEXSTATE,CANPROV,COUNTRY,VALUE,SHIPWT,FREIGHT_CHARGES,DF,CONTCODE,MONTH,YEAR,COMMODITY2
0,1,AK,0901,5,,XO,1220,2701,0,7,1.0,X,2,2020,
1,1,AK,09XX,3,,XO,1220,11083,187,0,1.0,X,2,2020,
2,1,AK,19XX,1,XX,,2010,698682,993848,0,1.0,0,2,2020,
3,1,AK,20XX,3,,XA,1220,31846,138,1076,1.0,X,2,2020,
4,1,AK,20XX,3,,XA,1220,22927,34,985,2.0,X,2,2020,


##### March 2020

In [4]:
dot1 = pd.read_csv("../data/2020/Mar 2020/dot1_0320.csv")
dot1_ytd = pd.read_csv("../data/2020/Mar 2020/dot1_ytd_0320.csv")
dot2 =  pd.read_csv("../data/2020/Mar 2020/dot2_0320.csv") 
dot2_ytd = pd.read_csv("../data/2020/Mar 2020/dot2_ytd_0320.csv")
dot3 = pd.read_csv("../data/2020/Mar 2020/dot3_0320.csv")
dot3_ytd = pd.read_csv("../data/2020/Mar 2020/dot3_ytd_0320.csv")
mar_2020 = pd.concat([dot1,dot1_ytd,dot2,dot2_ytd,dot3,dot3_ytd],axis=0)
print(f"shape of data: {mar_2020.shape}")
mar_2020.head()

shape of data: (470959, 15)


Unnamed: 0,TRDTYPE,USASTATE,DEPE,DISAGMOT,MEXSTATE,CANPROV,COUNTRY,VALUE,SHIPWT,FREIGHT_CHARGES,DF,CONTCODE,MONTH,YEAR,COMMODITY2
0,1,AK,20XX,3,,XA,1220,26593,460,2133,1.0,X,3,2020,
1,1,AK,20XX,3,,XA,1220,41417,137,713,2.0,X,3,2020,
2,1,AK,20XX,3,,XC,1220,41554,35,526,1.0,X,3,2020,
3,1,AK,20XX,3,,XC,1220,7175,13,341,2.0,X,3,2020,
4,1,AK,20XX,3,,XM,1220,14283,387,513,1.0,X,3,2020,


##### April 2020

In [5]:
dot1 = pd.read_csv("../data/2020/Apr 2020/dot1_0420.csv")
dot1_ytd = pd.read_csv("../data/2020/Apr 2020/dot1_ytd_0420.csv")
dot2 =  pd.read_csv("../data/2020/Apr 2020/dot2_0420.csv") 
dot2_ytd = pd.read_csv("../data/2020/Apr 2020/dot2_ytd_0420.csv")
dot3 = pd.read_csv("../data/2020/Apr 2020/dot3_0420.csv")
dot3_ytd = pd.read_csv("../data/2020/Apr 2020/dot3_ytd_0420.csv")
apr_2020 = pd.concat([dot1,dot1_ytd,dot2,dot2_ytd,dot3,dot3_ytd],axis=0)
print(f"shape of data: {apr_2020.shape}")
apr_2020.head()

shape of data: (561045, 15)


Unnamed: 0,TRDTYPE,USASTATE,DEPE,DISAGMOT,MEXSTATE,CANPROV,COUNTRY,VALUE,SHIPWT,FREIGHT_CHARGES,DF,CONTCODE,MONTH,YEAR,COMMODITY2
0,1,AK,0115,5,,XB,1220,4660,0,67,2.0,X,4,2020,
1,1,AK,0901,5,,XO,1220,14360,0,282,1.0,X,4,2020,
2,1,AK,20XX,1,XX,,2010,4293733,24971000,0,1.0,0,4,2020,
3,1,AK,20XX,3,,XA,1220,28283,443,563,1.0,X,4,2020,
4,1,AK,20XX,3,,XA,1220,29848,69,538,2.0,X,4,2020,


##### May 2020

In [6]:
dot1 = pd.read_csv("../data/2020/May 2020/dot1_0520.csv")
dot1_ytd = pd.read_csv("../data/2020/May 2020/dot1_ytd_0520.csv")
dot2 =  pd.read_csv("../data/2020/May 2020/dot2_0520.csv") 
dot2_ytd = pd.read_csv("../data/2020/May 2020/dot2_ytd_0520.csv")
dot3 = pd.read_csv("../data/2020/May 2020/dot3_0520.csv")
dot3_ytd = pd.read_csv("../data/2020/May 2020/dot3_ytd_0520.csv")
may_2020 = pd.concat([dot1,dot1_ytd,dot2,dot2_ytd,dot3,dot3_ytd],axis=0)
print(f"shape of data: {may_2020.shape}")
may_2020.head()

shape of data: (665265, 15)


Unnamed: 0,TRDTYPE,USASTATE,DEPE,DISAGMOT,MEXSTATE,CANPROV,COUNTRY,VALUE,SHIPWT,FREIGHT_CHARGES,DF,CONTCODE,MONTH,YEAR,COMMODITY2
0,1,AK,1012,3,,XO,1220,32529,172,635,1.0,X,5,2020,
1,1,AK,2006,3,,XM,1220,20151,44,7,1.0,X,5,2020,
2,1,AK,20XX,3,,XA,1220,19925,102,229,1.0,X,5,2020,
3,1,AK,20XX,3,,XC,1220,40526,395,1096,1.0,X,5,2020,
4,1,AK,20XX,3,,XC,1220,165114,676,2379,2.0,X,5,2020,


##### June 2020

In [7]:
dot1 = pd.read_csv("../data/2020/June 2020/dot1_0620.csv")
dot1_ytd = pd.read_csv("../data/2020/June 2020/dot1_ytd_0620.csv")
dot2 =  pd.read_csv("../data/2020/June 2020/dot2_0620.csv") 
dot2_ytd = pd.read_csv("../data/2020/June 2020/dot2_ytd_0620.csv")
dot3 = pd.read_csv("../data/2020/June 2020/dot3_0620.csv")
dot3_ytd = pd.read_csv("../data/2020/June 2020/dot3_ytd_0620.csv")
june_2020 = pd.concat([dot1,dot1_ytd,dot2,dot2_ytd,dot3,dot3_ytd],axis=0)
print(f"shape of data: {june_2020.shape}")
june_2020.head()

shape of data: (782671, 15)


Unnamed: 0,TRDTYPE,USASTATE,DEPE,DISAGMOT,MEXSTATE,CANPROV,COUNTRY,VALUE,SHIPWT,FREIGHT_CHARGES,DF,CONTCODE,MONTH,YEAR,COMMODITY2
0,1,AK,0712,5,,XQ,1220,2864,0,19,1.0,X,6,2020,
1,1,AK,20XX,3,,XA,1220,2938,336,67,1.0,X,6,2020,
2,1,AK,20XX,3,,XA,1220,7957,133,138,2.0,X,6,2020,
3,1,AK,20XX,3,,XC,1220,22874,2253,591,1.0,X,6,2020,
4,1,AK,20XX,3,,XC,1220,7439,1,108,2.0,X,6,2020,


##### July 2020

In [8]:
dot1 = pd.read_csv("../data/2020/July 2020/dot1_0720.csv")
dot1_ytd = pd.read_csv("../data/2020/July 2020/dot1_ytd_0720.csv")
dot2 =  pd.read_csv("../data/2020/July 2020/dot2_0720.csv") 
dot2_ytd = pd.read_csv("../data/2020/July 2020/dot2_ytd_0720.csv")
dot3 = pd.read_csv("../data/2020/July 2020/dot3_0720.csv")
dot3_ytd = pd.read_csv("../data/2020/July 2020/dot3_ytd_0720.csv")
july_2020 = pd.concat([dot1,dot1_ytd,dot2,dot2_ytd,dot3,dot3_ytd],axis=0)
print(f"shape of data: {july_2020.shape}")
july_2020.head()

shape of data: (898774, 15)


Unnamed: 0,TRDTYPE,USASTATE,DEPE,DISAGMOT,MEXSTATE,CANPROV,COUNTRY,VALUE,SHIPWT,FREIGHT_CHARGES,DF,CONTCODE,MONTH,YEAR,COMMODITY2
0,1,AK,0712,5,,XQ,1220,12182,0,461,1.0,X,7,2020,
1,1,AK,20XX,3,,XA,1220,29921,1209,202,1.0,X,7,2020,
2,1,AK,20XX,3,,XA,1220,2590,16,74,2.0,X,7,2020,
3,1,AK,20XX,3,,XC,1220,58967,7843,857,1.0,X,7,2020,
4,1,AK,20XX,3,,XC,1220,7201,1,133,2.0,X,7,2020,


##### August 2020

In [9]:
dot1 = pd.read_csv("../data/2020/August 2020/dot1_0820.csv")
dot1_ytd = pd.read_csv("../data/2020/August 2020/dot1_ytd_0820.csv")
dot2 =  pd.read_csv("../data/2020/August 2020/dot2_0820.csv") 
dot2_ytd = pd.read_csv("../data/2020/August 2020/dot2_ytd_0820.csv")
dot3 = pd.read_csv("../data/2020/August 2020/dot3_0820.csv")
dot3_ytd = pd.read_csv("../data/2020/August 2020/dot3_ytd_0820.csv")
aug_2020 = pd.concat([dot1,dot1_ytd,dot2,dot2_ytd,dot3,dot3_ytd],axis=0)
print(f"shape of data: {aug_2020.shape}")
aug_2020.head()

shape of data: (1013556, 15)


Unnamed: 0,TRDTYPE,USASTATE,DEPE,DISAGMOT,MEXSTATE,CANPROV,COUNTRY,VALUE,SHIPWT,FREIGHT_CHARGES,DF,CONTCODE,MONTH,YEAR,COMMODITY2
0,1,AK,0106,5,,XB,1220,37700,0,113,1.0,X,8,2020,
1,1,AK,0115,5,,XB,1220,16109,0,1512,1.0,X,8,2020,
2,1,AK,0712,5,,XQ,1220,17574,0,666,1.0,X,8,2020,
3,1,AK,20XX,3,,XA,1220,8776,22,137,1.0,X,8,2020,
4,1,AK,20XX,3,,XC,1220,3999,15,188,1.0,X,8,2020,


##### September 2020

In [10]:
dot1 = pd.read_csv("../data/2020/September 2020/dot1_0920.csv")
dot1_ytd = pd.read_csv("../data/2020/September 2020/dot1_ytd_0920.csv")
dot2 =  pd.read_csv("../data/2020/September 2020/dot2_0920.csv") 
dot2_ytd = pd.read_csv("../data/2020/September 2020/dot2_ytd_0920.csv")
dot3 = pd.read_csv("../data/2020/September 2020/dot3_0920.csv")
dot3_ytd = pd.read_csv("../data/2020/September 2020/dot3_ytd_0920.csv")
sept_2020 = pd.concat([dot1,dot1_ytd,dot2,dot2_ytd,dot3,dot3_ytd],axis=0)
print(f"shape of data: {sept_2020.shape}")
sept_2020.head()

shape of data: (1131457, 15)


Unnamed: 0,TRDTYPE,USASTATE,DEPE,DISAGMOT,MEXSTATE,CANPROV,COUNTRY,VALUE,SHIPWT,FREIGHT_CHARGES,DF,CONTCODE,MONTH,YEAR,COMMODITY2
0,1,AK,0115,5,,XB,1220,3665,0,399,1.0,X,9,2020,
1,1,AK,0708,5,,XO,1220,6786,0,96,1.0,X,9,2020,
2,1,AK,0712,5,,XQ,1220,11712,0,443,1.0,X,9,2020,
3,1,AK,20XX,3,,XA,1220,6427,61,197,1.0,X,9,2020,
4,1,AK,20XX,3,,XA,1220,6986,129,7,2.0,X,9,2020,


### Loading the 2021 Data

##### January 2021

In [11]:
dot1 = pd.read_csv("../data/2021/January 2021/dot1_0121.csv")
dot1_ytd = pd.read_csv("../data/2021/January 2021/dot1_ytd_0121.csv")
dot2 =  pd.read_csv("../data/2021/January 2021/dot2_0121.csv") 
dot2_ytd = pd.read_csv("../data/2021/January 2021/dot2_ytd_0121.csv")
dot3 = pd.read_csv("../data/2021/January 2021/dot3_0121.csv")
dot3_ytd = pd.read_csv("../data/2021/January 2021/dot3_ytd_0121.csv")
jan_2021 = pd.concat([dot1,dot1_ytd,dot2,dot2_ytd,dot3,dot3_ytd],axis=0)
print(f"shape of data: {jan_2021.shape}")
jan_2021.head()

shape of data: (229232, 15)


Unnamed: 0,TRDTYPE,USASTATE,DEPE,DISAGMOT,MEXSTATE,CANPROV,COUNTRY,VALUE,SHIPWT,FREIGHT_CHARGES,DF,CONTCODE,MONTH,YEAR,COMMODITY2
0,1,AK,18XX,1,XX,,2010,5940,1136,0,1.0,1,1,2021,
1,1,AK,20XX,3,,XA,1220,7490,26,155,1.0,X,1,2021,
2,1,AK,20XX,3,,XA,1220,24885,13,78,2.0,X,1,2021,
3,1,AK,20XX,3,,XC,1220,16415,139,355,1.0,X,1,2021,
4,1,AK,20XX,3,,XC,1220,9025,5,35,2.0,X,1,2021,


##### February 2021

In [12]:
dot1 = pd.read_csv("../data/2021/February 2021/dot1_0221.csv")
dot1_ytd = pd.read_csv("../data/2021/February 2021/dot1_ytd_0221.csv")
dot2 =  pd.read_csv("../data/2021/February 2021/dot2_0221.csv") 
dot2_ytd = pd.read_csv("../data/2021/February 2021/dot2_ytd_0221.csv")
dot3 = pd.read_csv("../data/2021/February 2021/dot3_0221.csv")
dot3_ytd = pd.read_csv("../data/2021/February 2021/dot3_ytd_0221.csv")
feb_2021 = pd.concat([dot1,dot1_ytd,dot2,dot2_ytd,dot3,dot3_ytd],axis=0)
print(f"shape of data: {feb_2021.shape}")
feb_2021.head()

shape of data: (342454, 15)


Unnamed: 0,TRDTYPE,USASTATE,DEPE,DISAGMOT,MEXSTATE,CANPROV,COUNTRY,VALUE,SHIPWT,FREIGHT_CHARGES,DF,CONTCODE,MONTH,YEAR,COMMODITY2
0,1,AK,0708,5,,XO,1220,24002,0,0,1.0,X,2,2021,
1,1,AK,20XX,3,,XA,1220,16204,55,0,1.0,X,2,2021,
2,1,AK,20XX,3,,XC,1220,30261,832,0,1.0,X,2,2021,
3,1,AK,20XX,3,,XC,1220,46635,99,0,2.0,X,2,2021,
4,1,AK,20XX,3,,XM,1220,5743,389,0,1.0,X,2,2021,


##### March 2021

In [13]:
dot1 = pd.read_csv("../data/2021/March 2021/dot1_0321.csv")
dot1_ytd = pd.read_csv("../data/2021/March 2021/dot1_ytd_0321.csv")
dot2 =  pd.read_csv("../data/2021/March 2021/dot2_0321.csv") 
dot2_ytd = pd.read_csv("../data/2021/March 2021/dot2_ytd_0321.csv")
dot3 = pd.read_csv("../data/2021/March 2021/dot3_0321.csv")
dot3_ytd = pd.read_csv("../data/2021/March 2021/dot3_ytd_0321.csv")
mar_2021 = pd.concat([dot1,dot1_ytd,dot2,dot2_ytd,dot3,dot3_ytd],axis=0)
print(f"shape of data: {mar_2021.shape}")
mar_2021.head()

shape of data: (475385, 15)


Unnamed: 0,TRDTYPE,USASTATE,DEPE,DISAGMOT,MEXSTATE,CANPROV,COUNTRY,VALUE,SHIPWT,FREIGHT_CHARGES,DF,CONTCODE,MONTH,YEAR,COMMODITY2
0,1,AK,0901,5,,XO,1220,16667,0,320,1.0,X,3,2021,
1,1,AK,18XX,1,XX,,2010,37930,34473,0,1.0,1,3,2021,
2,1,AK,20XX,3,,XA,1220,45607,3847,688,1.0,X,3,2021,
3,1,AK,20XX,3,,XC,1220,56248,833,2322,1.0,X,3,2021,
4,1,AK,20XX,3,,XC,1220,36797,21,158,2.0,X,3,2021,


##### April 2021

In [14]:
dot1 = pd.read_csv("../data/2021/April 2021/dot1_0421.csv")
dot1_ytd = pd.read_csv("../data/2021/April 2021/dot1_ytd_0421.csv")
dot2 =  pd.read_csv("../data/2021/April 2021/dot2_0421.csv") 
dot2_ytd = pd.read_csv("../data/2021/April 2021/dot2_ytd_0421.csv")
dot3 = pd.read_csv("../data/2021/April 2021/dot3_0421.csv")
dot3_ytd = pd.read_csv("../data/2021/April 2021/dot3_ytd_0421.csv")
apr_2021 = pd.concat([dot1,dot1_ytd,dot2,dot2_ytd,dot3,dot3_ytd],axis=0)
print(f"shape of data: {apr_2021.shape}")
apr_2021.head()

shape of data: (593968, 15)


Unnamed: 0,TRDTYPE,USASTATE,DEPE,DISAGMOT,MEXSTATE,CANPROV,COUNTRY,VALUE,SHIPWT,FREIGHT_CHARGES,DF,CONTCODE,MONTH,YEAR,COMMODITY2
0,1,AK,07XX,3,,XO,1220,13504,47,401,1.0,X,4,2021,
1,1,AK,18XX,1,XX,,2010,6668,425,0,1.0,1,4,2021,
2,1,AK,20XX,3,,XA,1220,5108,584,80,1.0,X,4,2021,
3,1,AK,20XX,3,,XC,1220,24397,800,1002,1.0,X,4,2021,
4,1,AK,20XX,3,,XC,1220,18429,101,80,2.0,X,4,2021,


##### May 2021

In [15]:
dot1 = pd.read_csv("../data/2021/May 2021/dot1_0521.csv")
dot1_ytd = pd.read_csv("../data/2021/May 2021/dot1_ytd_0521.csv")
dot2 =  pd.read_csv("../data/2021/May 2021/dot2_0521.csv") 
dot2_ytd = pd.read_csv("../data/2021/May 2021/dot2_ytd_0521.csv")
dot3 = pd.read_csv("../data/2021/May 2021/dot3_0521.csv")
dot3_ytd = pd.read_csv("../data/2021/May 2021/dot3_ytd_0521.csv")
may_2021 = pd.concat([dot1,dot1_ytd,dot2,dot2_ytd,dot3,dot3_ytd],axis=0)
print(f"shape of data: {may_2021.shape}")
may_2021.head()

shape of data: (713828, 15)


Unnamed: 0,TRDTYPE,USASTATE,DEPE,DISAGMOT,MEXSTATE,CANPROV,COUNTRY,VALUE,SHIPWT,FREIGHT_CHARGES,DF,CONTCODE,MONTH,YEAR,COMMODITY2
0,1,AK,1704,3,,XC,1220,9711,4,0,1.0,X,5,2021,
1,1,AK,20XX,3,,XA,1220,37590,109,493,1.0,X,5,2021,
2,1,AK,20XX,3,,XA,1220,48833,273,374,2.0,X,5,2021,
3,1,AK,20XX,3,,XC,1220,29724,70,2378,1.0,X,5,2021,
4,1,AK,20XX,3,,XC,1220,18271,2,32,2.0,X,5,2021,


#### June 2021

In [16]:
dot1 = pd.read_csv("../data/2021/June 2021/dot1_0621.csv")
dot1_ytd = pd.read_csv("../data/2021/June 2021/dot1_ytd_0621.csv")
dot2 =  pd.read_csv("../data/2021/June 2021/dot2_0621.csv") 
dot2_ytd = pd.read_csv("../data/2021/June 2021/dot2_ytd_0621.csv")
dot3 = pd.read_csv("../data/2021/June 2021/dot3_0621.csv")
dot3_ytd = pd.read_csv("../data/2021/June 2021/dot3_ytd_0621.csv")
june_2021 = pd.concat([dot1,dot1_ytd,dot2,dot2_ytd,dot3,dot3_ytd],axis=0)
print(f"shape of data: {june_2021.shape}")
june_2021.head()

shape of data: (836618, 15)


Unnamed: 0,TRDTYPE,USASTATE,DEPE,DISAGMOT,MEXSTATE,CANPROV,COUNTRY,VALUE,SHIPWT,FREIGHT_CHARGES,DF,CONTCODE,MONTH,YEAR,COMMODITY2
0,1,AK,0115,5,,XB,1220,5715,0,194,1.0,X,6,2021,
1,1,AK,0704,5,,XO,1220,5993,0,130,1.0,X,6,2021,
2,1,AK,0712,5,,XQ,1220,59925,0,593,2.0,X,6,2021,
3,1,AK,07XX,3,,XO,1220,24660,18,2788,1.0,X,6,2021,
4,1,AK,19XX,1,XX,,2010,184564,206800,0,1.0,0,6,2021,


##### July 2021

In [17]:
dot1 = pd.read_csv("../data/2021/July 2021/dot1_0721.csv")
dot1_ytd = pd.read_csv("../data/2021/July 2021/dot1_ytd_0721.csv")
dot2 =  pd.read_csv("../data/2021/July 2021/dot2_0721.csv") 
dot2_ytd = pd.read_csv("../data/2021/July 2021/dot2_ytd_0721.csv")
dot3 = pd.read_csv("../data/2021/July 2021/dot3_0721.csv")
dot3_ytd = pd.read_csv("../data/2021/July 2021/dot3_ytd_0721.csv")
july_2021 = pd.concat([dot1,dot1_ytd,dot2,dot2_ytd,dot3,dot3_ytd],axis=0)
print(f"shape of data: {july_2021.shape}")
july_2021.head()

shape of data: (955955, 15)


Unnamed: 0,TRDTYPE,USASTATE,DEPE,DISAGMOT,MEXSTATE,CANPROV,COUNTRY,VALUE,SHIPWT,FREIGHT_CHARGES,DF,CONTCODE,MONTH,YEAR,COMMODITY2
0,1,AK,0115,5,,XB,1220,14719,0,501,2.0,X,7,2021,
1,1,AK,0708,5,,XO,1220,78476,0,777,1.0,X,7,2021,
2,1,AK,19XX,1,XX,,2010,20275,20310,0,1.0,0,7,2021,
3,1,AK,20XX,3,,XA,1220,4094,36,51,1.0,X,7,2021,
4,1,AK,20XX,3,,XA,1220,67827,23,1335,2.0,X,7,2021,


##### August 2021

In [18]:
dot1 = pd.read_csv("../data/2021/August 2021/dot1_0821.csv")
dot1_ytd = pd.read_csv("../data/2021/August 2021/dot1_ytd_0821.csv")
dot2 =  pd.read_csv("../data/2021/August 2021/dot2_0821.csv") 
dot2_ytd = pd.read_csv("../data/2021/August 2021/dot2_ytd_0821.csv")
dot3 = pd.read_csv("../data/2021/August 2021/dot3_0821.csv")
dot3_ytd = pd.read_csv("../data/2021/August 2021/dot3_ytd_0821.csv")
aug_2021 = pd.concat([dot1,dot1_ytd,dot2,dot2_ytd,dot3,dot3_ytd],axis=0)
print(f"shape of data: {aug_2021.shape}")
aug_2021.head()

shape of data: (1077011, 15)


Unnamed: 0,TRDTYPE,USASTATE,DEPE,DISAGMOT,MEXSTATE,CANPROV,COUNTRY,VALUE,SHIPWT,FREIGHT_CHARGES,DF,CONTCODE,MONTH,YEAR,COMMODITY2
0,1,AK,0102,5,,XY,1220,15135,0,92,1.0,X,8,2021,
1,1,AK,0106,5,,XB,1220,761533,0,16680,1.0,X,8,2021,
2,1,AK,0712,5,,XQ,1220,145561,0,1441,2.0,X,8,2021,
3,1,AK,19XX,1,,XB,1220,6616561,3337136,144961,1.0,X,8,2021,
4,1,AK,20XX,3,,XA,1220,85906,1440,4129,1.0,X,8,2021,


##### September 2021

In [19]:
dot1 = pd.read_csv("../data/2021/Sept 2021/dot1_0921.csv")
dot1_ytd = pd.read_csv("../data/2021/Sept 2021/dot1_ytd_0921.csv")
dot2 =  pd.read_csv("../data/2021/Sept 2021/dot2_0921.csv") 
dot2_ytd = pd.read_csv("../data/2021/Sept 2021/dot2_ytd_0921.csv")
dot3 = pd.read_csv("../data/2021/Sept 2021/dot3_0921.csv")
dot3_ytd = pd.read_csv("../data/2021/Sept 2021/dot3_ytd_0921.csv")
sept_2021 = pd.concat([dot1,dot1_ytd,dot2,dot2_ytd,dot3,dot3_ytd],axis=0)
print(f"shape of data: {sept_2021.shape}")
sept_2021.head()

shape of data: (1196510, 15)


Unnamed: 0,TRDTYPE,USASTATE,DEPE,DISAGMOT,MEXSTATE,CANPROV,COUNTRY,VALUE,SHIPWT,FREIGHT_CHARGES,DF,CONTCODE,MONTH,YEAR,COMMODITY2
0,1,AK,20XX,1,,XB,1220,20605818,7776129,461333,1.0,X,9,2021,
1,1,AK,20XX,3,,XA,1220,22517,484,756,1.0,X,9,2021,
2,1,AK,20XX,3,,XC,1220,32349,785,1232,1.0,X,9,2021,
3,1,AK,20XX,3,,XM,1220,16824,1645,187,1.0,X,9,2021,
4,1,AK,20XX,3,,XO,1220,40818,946,1137,1.0,X,9,2021,


##### October 2021

In [20]:
dot1 = pd.read_csv("../data/2021/Oct 2021/dot1_1021.csv")
dot1_ytd = pd.read_csv("../data/2021/Oct 2021/dot1_ytd_1021.csv")
dot2 =  pd.read_csv("../data/2021/Oct 2021/dot2_1021.csv") 
dot2_ytd = pd.read_csv("../data/2021/Oct 2021/dot2_ytd_1021.csv")
dot3 = pd.read_csv("../data/2021/Oct 2021/dot3_1021.csv")
dot3_ytd = pd.read_csv("../data/2021/Oct 2021/dot3_ytd_1021.csv")
oct_2021 = pd.concat([dot1,dot1_ytd,dot2,dot2_ytd,dot3,dot3_ytd],axis=0)
print(f"shape of data: {oct_2021.shape}")
oct_2021.head()

shape of data: (1320408, 15)


Unnamed: 0,TRDTYPE,USASTATE,DEPE,DISAGMOT,MEXSTATE,CANPROV,COUNTRY,VALUE,SHIPWT,FREIGHT_CHARGES,DF,CONTCODE,MONTH,YEAR,COMMODITY2
0,1,AK,0712,5,,XQ,1220,100552,0,995,2.0,X,10,2021,
1,1,AK,18XX,1,XX,,2010,25782,6848,0,1.0,1,10,2021,
2,1,AK,20XX,3,,XA,1220,37823,968,581,1.0,X,10,2021,
3,1,AK,20XX,3,,XA,1220,13267,17,80,2.0,X,10,2021,
4,1,AK,20XX,3,,XC,1220,10981,858,495,1.0,X,10,2021,


##### November 2021

In [21]:
dot1 = pd.read_csv("../data/2021/Nov 2021/dot1_1121.csv")
dot1_ytd = pd.read_csv("../data/2021/Nov 2021/dot1_ytd_1121.csv")
dot2 =  pd.read_csv("../data/2021/Nov 2021/dot2_1121.csv") 
dot2_ytd = pd.read_csv("../data/2021/Nov 2021/dot2_ytd_1121.csv")
dot3 = pd.read_csv("../data/2021/Nov 2021/dot3_1121.csv")
dot3_ytd = pd.read_csv("../data/2021/Nov 2021/dot3_ytd_1121.csv")
nov_2021 = pd.concat([dot1,dot1_ytd,dot2,dot2_ytd,dot3,dot3_ytd],axis=0)
print(f"shape of data: {nov_2021.shape}")
nov_2021.head()

shape of data: (1439747, 15)


Unnamed: 0,TRDTYPE,USASTATE,DEPE,DISAGMOT,MEXSTATE,CANPROV,COUNTRY,VALUE,SHIPWT,FREIGHT_CHARGES,DF,CONTCODE,MONTH,YEAR,COMMODITY2
0,1,AK,0106,5,,XY,1220,62562,0,759,1.0,X,11,2021,
1,1,AK,0106,8,,XB,1220,2140647,0,46899,1.0,X,11,2021,
2,1,AK,0115,5,,XB,1220,13027,0,1193,1.0,X,11,2021,
3,1,AK,07XX,3,,XQ,1220,9579,2197,207,1.0,X,11,2021,
4,1,AK,09XX,3,,XO,1220,2572,111,79,2.0,X,11,2021,


##### December 2021

In [28]:
dot1 = pd.read_csv("../data/2021/Dec 2021/dot1_1221.csv")
dot1_ytd = pd.read_csv("../data/2021/Dec 2021/dot1_ytd_1221.csv")
dot1_2021 = pd.read_csv("../data/2021/Dec 2021/dot1_2021.csv")
dot2 =  pd.read_csv("../data/2021/Dec 2021/dot2_1221.csv") 
dot2_2021 = pd.read_csv("../data/2021/Dec 2021/dot2_2021.csv")
dot2_ytd = pd.read_csv("../data/2021/Dec 2021/dot2_ytd_1221.csv")
dot3_1221 = pd.read_csv("../data/2021/Dec 2021/dot3_1221.csv")
dot3_2021 = pd.read_csv("../data/2021/Dec 2021/dot3_2021.csv")
dot3_ytd = pd.read_csv("../data/2021/Dec 2021/dot3_ytd_1221.csv")
dec_2021 = pd.concat([dot1,dot1_ytd,dot2,dot2_ytd,dot3,dot3_ytd],axis=0)
print(f"shape of data: {dec_2021.shape}")
dec_2021.head()

shape of data: (1557156, 15)


Unnamed: 0,TRDTYPE,USASTATE,DEPE,DISAGMOT,MEXSTATE,CANPROV,COUNTRY,VALUE,SHIPWT,FREIGHT_CHARGES,DF,CONTCODE,MONTH,YEAR,COMMODITY2
0,1,AK,01XX,1,,XB,1220,115611,24698,3515,1.0,X,12,2021,
1,1,AK,0901,5,,XO,1220,14381,0,282,1.0,X,12,2021,
2,1,AK,11XX,3,XX,,2010,2851,7,0,2.0,0,12,2021,
3,1,AK,20XX,3,,XA,1220,26159,718,925,1.0,X,12,2021,
4,1,AK,20XX,3,,XA,1220,3086,40,60,2.0,X,12,2021,


### Data Understanding

### Exploratory Data Analysis

#### Univariate Analysis

### Bivariate Analysis

#### Multivariate Analysis

### Data Cleaning

### Data Preparation

### Hypothesis Testing

### Answering Business Questions

### Dashboarding

### Conclusion