# Transborder Freight Data Analysis

## Business Understanding

### 1. Background and Context
Transportation systems are the foundation of modern economies, playing a crucial role in commerce, tourism, and everyday living by facilitating the efficient movement of goods, services, and people. However, as these systems expand and become more intricate, they face growing challenges, such as:

- **Safety concerns** (e.g., accidents and fatalities).
- **Congestion** (leading to delays and economic inefficiencies).
- **Infrastructure stress** (aging systems unable to meet rising demand).
- **Environmental impacts** (e.g., greenhouse gas emissions).
- **Economic disruptions** (e.g., supply chain delays affecting productivity).

The Bureau of Transportation Statistics (BTS) collects and maintains comprehensive data across multiple transportation modes—road, rail, air, and water. This data includes metrics like passenger travel, freight movement, safety incidents, infrastructure capacity, and environmental impacts. These insights are critical for policymakers, transportation agencies, and businesses to design strategies that address inefficiencies, improve safety, and enhance sustainability.

---

### 2. Business Problem
The BTS faces persistent challenges in identifying inefficiencies, mitigating safety issues, and addressing sustainability concerns across transportation networks. Despite its wealth of data, there is a need to:

- Extract actionable insights from this data to inform decision-making.
- Understand underlying patterns and trends in transportation metrics.
- Provide targeted recommendations to optimize the performance of transportation systems.


## 3. Objectives of the Analysis
The primary objective of the project is to analyze the BTS data to:

1. **Identify inefficiencies:** Pinpoint bottlenecks, delays, and underutilized resources across transportation modes.
2. **Improve safety:** Uncover trends and risk factors to develop recommendations for accident prevention.
3. **Optimize capacity:** Determine areas of infrastructure stress and suggest strategies to enhance efficiency.
4. **Enhance sustainability:** Assess the environmental impact of various modes and propose greener alternatives.
5. **Boost economic productivity:** Provide actionable insights to reduce disruptions and improve overall system performance.

By achieving these objectives, the analysis aims to empower BTS to address its challenges effectively and support policymakers, agencies, and businesses in making data-driven decisions.

---

## 4. Key Stakeholders
The project involves several stakeholders:

1. **Bureau of Transportation Statistics (BTS):** The primary client that will use the analysis to improve transportation systems.
2. **Policymakers:** Decision-makers who rely on BTS data to create regulations and allocate resources.
3. **Transportation Agencies:** Organizations managing roads, railways, airways, and waterways that need insights for operational improvements.
4. **Businesses:** Companies dependent on transportation systems for logistics and supply chain management.
5. **Public:** The ultimate beneficiaries of improved safety, efficiency, and sustainability in transportation.

---

## 5. Constraints and Challenges
1. **Data Quality:** Ensuring the BTS data is clean, accurate, and complete for reliable analysis.
2. **Data Volume:** Managing and processing large datasets efficiently.
3. **Complex Metrics:** Understanding and integrating diverse transportation metrics (e.g., safety incidents, emissions, freight movement).
4. **Resource Allocation:** Prioritizing recommendations that are feasible and impactful given budgetary and logistical constraints.
5. **Stakeholder Needs:** Balancing the diverse priorities of stakeholders, from economic productivity to environmental sustainability.

---

## 6. Hypothesis Statement
- **Null Hypothesis(H_o):** There is no significant difference in the freight charges generated by the different modes of transportation.
- **Altenate Hypothesis(H_1):** There is a significant difference in the freight charges generated by the different modes of transportation.

---

### Business Questions to Address:
1. What is the total transborder freight volume by border?
2. What is the total transborder freight volume by border and mode of transportation?
3. What is the rate of change in freight volume for the different modes of transportation over the past years?
4. What are the top 10 traded commodities in the U.S. based on freight volume or value?
5. What are the top 5 truck ports by freight volume or value?
6. Which ports have a relationship with specific modes of transportation?
7. Which trade type generates the highest freight charges?
8. 

---
## 6. Success Criteria
The success of the project will be determined by:

1. **Insights Generated:** Delivering actionable insights that address the BTS’s challenges and objectives.
2. **Stakeholder Satisfaction:** Meeting or exceeding the expectations of the BTS and other stakeholders.
3. **Impact on Decision-Making:** Enabling data-driven strategies that lead to measurable improvements in transportation systems.
4. **Feasibility of Recommendations:** Providing realistic and implementable recommendations that can be executed within existing constraints.

---

## 7. Scope of Analysis
The analysis will focus on the following dimensions of BTS data:

1. **Passenger Travel:** Patterns and trends in movement across various transportation modes.
2. **Freight Movement:** Identifying inefficiencies and opportunities to optimize logistics.
3. **Safety Incidents:** Analyzing causes and locations of accidents to recommend preventive measures.
4. **Infrastructure Capacity:** Assessing utilization rates and stress points across networks.
5. **Environmental Impacts:** Evaluating greenhouse gas emissions and sustainability metrics.

---

## 8. Deliverables
The final outputs of this phase will include:

1. **Business Understanding Document:** A detailed report summarizing the problem, objectives, stakeholders, and success criteria.
2. **Key Metrics and KPIs:** A list of metrics to be analyzed (e.g., accident rates, freight delays, emissions levels).
3. **Initial Hypotheses:** Proposed areas of inefficiency or risk to be validated during the data understanding and analysis phases.

---



#### Import Necessary Libraries

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import plotly.express as px
import warnings

warnings.filterwarnings("ignore")
pd.options.display.float_format = '{:.2f}'.format

###  Data Loading and Merging

### Loading the 2020 Data

##### September 2020 YTD Files

In [2]:
# Read each file using the function
dot1_ytd = pd.read_csv("../data/2020/September 2020/dot1_ytd_0920.csv")
dot2_ytd = pd.read_csv("../data/2020/September 2020/dot2_ytd_0920.csv")
dot3_ytd = pd.read_csv("../data/2020/September 2020/dot3_ytd_0920.csv")

# Concatenating all dataframes
df_2020 = pd.concat([dot1_ytd,dot2_ytd, dot3_ytd], axis=0, ignore_index=True)

print(f"Shape of data: {df_2020.shape}")
df_2020.head()

Shape of data: (1015432, 15)


Unnamed: 0,TRDTYPE,USASTATE,DEPE,DISAGMOT,MEXSTATE,CANPROV,COUNTRY,VALUE,SHIPWT,FREIGHT_CHARGES,DF,CONTCODE,MONTH,YEAR,COMMODITY2
0,1,AK,07XX,3,,XA,1220,3302,378,125,1.0,X,1,2020,
1,1,AK,20XX,3,,XA,1220,133362,137,1563,1.0,X,1,2020,
2,1,AK,20XX,3,,XA,1220,49960,66,2631,2.0,X,1,2020,
3,1,AK,20XX,3,,XC,1220,21184,3418,795,1.0,X,1,2020,
4,1,AK,20XX,3,,XM,1220,4253,2,75,1.0,X,1,2020,


### Loading the 2021 Data

##### December 2021

In [18]:
# Load the all the ytd_datasets for Dec 2021
dot1_ytd = pd.read_csv("../data/2021/Dec 2021/dot1_ytd_1221.csv")
dot2_ytd = pd.read_csv("../data/2021/Dec 2021/dot2_ytd_1221.csv")
dot3_ytd = pd.read_csv("../data/2021/Dec 2021/dot3_ytd_1221.csv")

# Concatenate all data into a single DataFrame
df_2021 = pd.concat([dot1_ytd, dot2_ytd, dot3_ytd], axis=0)

# Display the shape and first few rows of the concatenated data
print(f"Shape of data: {df_2021.shape}")
df_2021.head()

Shape of data: (1437978, 15)


Unnamed: 0,TRDTYPE,USASTATE,DEPE,DISAGMOT,MEXSTATE,CANPROV,COUNTRY,VALUE,SHIPWT,FREIGHT_CHARGES,DF,CONTCODE,MONTH,YEAR,COMMODITY2
0,1,AK,18XX,1,XX,,2010,5940,1136,0,1.0,1,1,2021,
1,1,AK,20XX,3,,XA,1220,7490,26,155,1.0,X,1,2021,
2,1,AK,20XX,3,,XA,1220,24885,13,78,2.0,X,1,2021,
3,1,AK,20XX,3,,XC,1220,16415,139,355,1.0,X,1,2021,
4,1,AK,20XX,3,,XC,1220,9025,5,35,2.0,X,1,2021,


### Loading the 2022 Data

##### December 2022

In [19]:
# Using the function to read the December 2022 data files with a chunk size
dot1_ytd = pd.read_csv("../data/2022/December 2022/dot1_ytd_1222.csv")
dot2_ytd = pd.read_csv("../data/2022/December 2022/dot2_ytd_1222.csv")
dot3_ytd = pd.read_csv("../data/2022/December 2022/dot3_ytd_1222.csv")

# Concatenate all data into a single DataFrame
df_2022 = pd.concat([dot1_ytd, dot2_ytd, dot3_ytd], axis=0)

# Display the shape and first few rows of the concatenated data
print(f"Shape of data: {df_2022.shape}")
df_2022.head()


Shape of data: (1471797, 15)


Unnamed: 0,TRDTYPE,USASTATE,DEPE,DISAGMOT,MEXSTATE,CANPROV,COUNTRY,VALUE,SHIPWT,FREIGHT_CHARGES,DF,CONTCODE,MONTH,YEAR,COMMODITY2
0,1,AK,0901,5,,XO,1220,7042,0,137,1.0,X,1,2022,
1,1,AK,20XX,3,,XA,1220,117977,485,2181,1.0,X,1,2022,
2,1,AK,20XX,3,,XC,1220,105057,22924,8899,1.0,X,1,2022,
3,1,AK,20XX,3,,XO,1220,24751,32,871,1.0,X,1,2022,
4,1,AK,20XX,3,,XQ,1220,2773,1,130,1.0,X,1,2022,


### Loading the 2023 Datasets

##### December 2023

In [20]:
# Using the function to read the December 2023 data files with a chunk size
dot1 = pd.read_csv("../data/2023/December2023/dot1_1223.csv")
dot2 = pd.read_csv("../data/2023/December2023/dot2_1223.csv")
dot3 = pd.read_csv("../data/2023/December2023/dot3_1223.csv")

# Concatenate all data into a single DataFrame
df_2023 = pd.concat([dot1, dot2, dot3], axis=0)

# Display the shape and first few rows of the concatenated data
print(f"Shape of data: {df_2023.shape}")
df_2023.head()

Shape of data: (120152, 15)


Unnamed: 0,TRDTYPE,USASTATE,DEPE,DISAGMOT,MEXSTATE,CANPROV,COUNTRY,VALUE,SHIPWT,FREIGHT_CHARGES,DF,CONTCODE,MONTH,YEAR,COMMODITY2
0,1,AK,0708,5,,XO,1220,25825,0,74,2.0,X,12,2023,
1,1,AK,20XX,3,,XA,1220,57380,128,1223,1.0,X,12,2023,
2,1,AK,20XX,3,,XA,1220,9635,16,188,2.0,X,12,2023,
3,1,AK,20XX,3,,XC,1220,431674,463,8169,1.0,X,12,2023,
4,1,AK,20XX,3,,XC,1220,12598,27,281,2.0,X,12,2023,


### Loading the 2024 Datasets

##### September 2024

In [21]:
# Loading the September 2024 datasets
dot1 = pd.read_csv("../data/2024/september2024/dot1_0924.csv")
dot2 = pd.read_csv("../data/2024/september2024/dot2_0924.csv")
dot3 = pd.read_csv("../data/2024/september2024/dot3_0924.csv")

# Concatenate all data into a single DataFrame
df_2024 = pd.concat([dot1, dot2, dot3], axis=0)

# Display the shape and first few rows of the concatenated data
print(f"Shape of data: {df_2024.shape}")
df_2024.head()


Shape of data: (123568, 15)


Unnamed: 0,TRDTYPE,USASTATE,DEPE,DISAGMOT,MEXSTATE,CANPROV,COUNTRY,VALUE,SHIPWT,FREIGHT_CHARGES,DF,CONTCODE,MONTH,YEAR,COMMODITY2
0,1,AK,0115,5,,XB,1220,680793,0,14915,1.0,X,9,2024,
1,1,AK,01XX,1,,XB,1220,83377,66318,2842,1.0,X,9,2024,
2,1,AK,07XX,3,,XO,1220,93057,74,2772,1.0,X,9,2024,
3,1,AK,0901,6,,XO,1220,70218,0,695,1.0,X,9,2024,
4,1,AK,2006,3,,XM,1220,10397,33,0,1.0,X,9,2024,


### Combine All the Datasets In the Years

In [22]:
# concate all the datasets for the five years
final_df = pd.concat([df_2020,df_2021,df_2022,df_2023,df_2024], axis=0)

# check the shape of the final dataframe
final_df.shape


(4168927, 15)

### Data Understanding

In [23]:
# Load first five rows
final_df.head()

Unnamed: 0,TRDTYPE,USASTATE,DEPE,DISAGMOT,MEXSTATE,CANPROV,COUNTRY,VALUE,SHIPWT,FREIGHT_CHARGES,DF,CONTCODE,MONTH,YEAR,COMMODITY2
0,1,AK,07XX,3,,XA,1220,3302,378,125,1.0,X,1,2020,
1,1,AK,20XX,3,,XA,1220,133362,137,1563,1.0,X,1,2020,
2,1,AK,20XX,3,,XA,1220,49960,66,2631,2.0,X,1,2020,
3,1,AK,20XX,3,,XC,1220,21184,3418,795,1.0,X,1,2020,
4,1,AK,20XX,3,,XM,1220,4253,2,75,1.0,X,1,2020,


In [24]:
# check the last five rows of the data
final_df.tail() 

Unnamed: 0,TRDTYPE,USASTATE,DEPE,DISAGMOT,MEXSTATE,CANPROV,COUNTRY,VALUE,SHIPWT,FREIGHT_CHARGES,DF,CONTCODE,MONTH,YEAR,COMMODITY2
17140,2,,55XX,5,,,1220,4384342,7399,1883,,0,9,2024,98.0
17141,2,,55XX,8,,,1220,50211,6350,3500,,0,9,2024,98.0
17142,2,,60XX,8,,,1220,793390,80,500,,0,9,2024,89.0
17143,2,,70XX,8,,,1220,233990301,0,0,,0,9,2024,99.0
17144,2,,70XX,8,,,2010,224981722,0,0,,0,9,2024,99.0


In [25]:
# check info about data
final_df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 4168927 entries, 0 to 17144
Data columns (total 15 columns):
 #   Column           Dtype  
---  ------           -----  
 0   TRDTYPE          int64  
 1   USASTATE         object 
 2   DEPE             object 
 3   DISAGMOT         int64  
 4   MEXSTATE         object 
 5   CANPROV          object 
 6   COUNTRY          int64  
 7   VALUE            int64  
 8   SHIPWT           int64  
 9   FREIGHT_CHARGES  int64  
 10  DF               float64
 11  CONTCODE         object 
 12  MONTH            int64  
 13  YEAR             int64  
 14  COMMODITY2       float64
dtypes: float64(2), int64(8), object(5)
memory usage: 508.9+ MB


In [26]:
# Perform descriptive statistics
final_df.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
TRDTYPE,4168927.0,1.34,0.47,1.0,1.0,1.0,2.0,2.0
DISAGMOT,4168927.0,4.76,1.26,1.0,5.0,5.0,5.0,9.0
COUNTRY,4168927.0,1530.4,385.83,1220.0,1220.0,1220.0,2010.0,2010.0
VALUE,4168927.0,2823843.3,35652323.41,0.0,14136.0,70433.0,421812.0,5595625173.0
SHIPWT,4168927.0,1216449.01,39497951.28,0.0,0.0,0.0,2613.0,8450848373.0
FREIGHT_CHARGES,4168927.0,37858.14,965473.12,0.0,0.0,269.0,2923.0,225690200.0
DF,2762958.0,1.33,0.47,1.0,1.0,1.0,2.0,2.0
MONTH,4168927.0,6.38,3.39,1.0,3.0,6.0,9.0,12.0
YEAR,4168927.0,2021.26,0.96,2020.0,2021.0,2021.0,2022.0,2024.0
COMMODITY2,3214613.0,56.71,27.83,1.0,33.0,60.0,84.0,99.0


In [27]:
# check for duplicates
final_df.duplicated().sum()

np.int64(0)

- There are no duplicates in the datasets


In [28]:
# check for null values

final_df.isnull().sum()

TRDTYPE                  0
USASTATE            600643
DEPE               2613970
DISAGMOT                 0
MEXSTATE           3011308
CANPROV            1991784
COUNTRY                  0
VALUE                    0
SHIPWT                   0
FREIGHT_CHARGES          0
DF                 1405969
CONTCODE                 0
MONTH                    0
YEAR                     0
COMMODITY2          954314
dtype: int64

### Filling the Null Values

After a thorough data investigation and review from the data dictionary, these are the various techniques I applied to the filling of my Null Values.
- USASTATE: Filling of Null Values will be done based on the district name from the port codes(DEPE column) by creating a map to replace all the codes.

#### USASTATE Column

In [29]:
# Check for Null Values in the USASTATE Column
final_df[final_df['USASTATE'].isna()]

Unnamed: 0,TRDTYPE,USASTATE,DEPE,DISAGMOT,MEXSTATE,CANPROV,COUNTRY,VALUE,SHIPWT,FREIGHT_CHARGES,DF,CONTCODE,MONTH,YEAR,COMMODITY2
866910,1,,0101,5,,,1220,3978,0,116,1.00,X,1,2020,27.00
866911,1,,0101,5,,,1220,3614,0,71,1.00,X,1,2020,28.00
866912,1,,0101,5,,,1220,12436,0,243,1.00,X,1,2020,29.00
866913,1,,0101,5,,,1220,24764,0,485,2.00,X,1,2020,29.00
866914,1,,0101,5,,,1220,33250,0,615,1.00,X,1,2020,34.00
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
17140,2,,55XX,5,,,1220,4384342,7399,1883,,0,9,2024,98.00
17141,2,,55XX,8,,,1220,50211,6350,3500,,0,9,2024,98.00
17142,2,,60XX,8,,,1220,793390,80,500,,0,9,2024,89.00
17143,2,,70XX,8,,,1220,233990301,0,0,,0,9,2024,99.00


In [30]:
# check for all unique values

def check_unique_values(df):
    for column in df.columns:
        print(f'{column}: {df[column].nunique()} unique values')
        print(df[column].unique())
        print('\n')

check_unique_values(final_df)

TRDTYPE: 2 unique values
[1 2]


USASTATE: 52 unique values
['AK' 'AL' 'AR' 'AZ' 'CA' 'CO' 'CT' 'DC' 'DE' 'DU' 'FL' 'GA' 'HI' 'IA'
 'ID' 'IL' 'IN' 'KS' 'KY' 'LA' 'MA' 'MD' 'ME' 'MI' 'MN' 'MO' 'MS' 'MT'
 'NC' 'ND' 'NE' 'NH' 'NJ' 'NM' 'NV' 'NY' 'OH' 'OK' 'OR' 'PA' 'RI' 'SC'
 'SD' 'TN' 'TX' 'UT' 'VA' 'VT' 'WA' 'WI' 'WV' 'WY' nan]


DEPE: 245 unique values
['07XX' '20XX' '2304' '2506' '2604' '3004' '3023' '30XX' '3101' '3103'
 '3104' '3106' '31XX' '3310' '3401' '3403' '3801' '3802' '4101' '41XX'
 '5201' '70XX' '0106' '0109' '0115' '0211' '04XX' '0701' '0708' '0712'
 '0901' '17XX' '18XX' '19XX' '2006' '2301' '2302' '2303' '2305' '2310'
 '2402' '2403' '2404' '2503' '2507' '2601' '2608' '3001' '3009' '3301'
 '3302' '3318' '3322' '34XX' '3501' '35XX' '3604' '3701' '3803' '38XX'
 '4102' '5203' '52XX' '55XX' '0101' '0704' '09XX' '3019' '3422' '01XX'
 '0212' '0417' '11XX' '2408' '2501' '2505' '2602' '2603' '2605' '2606'
 '2609' '26XX' '2720' '27XX' '2801' '28XX' '3002' '3015' '3017' '33XX'
 '37XX

In [17]:
# Fill the Null Values in the USASTATE using Location of ports(DEPE Codes)
depe_mapping = {
    "01XX": "ME",
    "0101": "ME",
    "0102": "ME",
    "0103": "ME",
    "0104": "ME",
    "0105": "ME",
    "0106": "ME",
    "0107": "ME",
    "0108": "ME",
    "0109": "ME",
    "0110": "ME",
    "0111": "ME",
    "0112": "ME",
    "0115": "ME",
    "0118": "ME",
    "0121": "ME",
    "0127": "ME",
    "0131": "NH",
    "0132":"ME",
    "0182": "NH",
    "0152":"ME",
    "0181":"ME",
    "02XX": "VT",
    "0201": "VT",
    "0203": "VT",
    "0206": "VT",
    "0207": "VT",
    "0209": "VT",
    "0211": "VT",
    "0212": "VT",
    "04XX": "MA",
    "0401": "MA",
    "0402": "MA",
    "0403": "MA",
    "0404": "MA",
    "0405": "MA",
    "0406": "MA",
    "0407": "MA",
    "0408": "MA",
    "0409": "MA",
    "0410": "CT",
    "0411": "CT",
    "0412": "CT",
    "0413": "CT",
    "0413": "MA",
    "0417": "MA",
    "05XX": "RI",
    "0501": "RI",
    "0502": "RI",
    "0503":"RI",
    "07XX": "NY",
    "0701": "NY",
    "0704": "NY",
    "0706": "NY",
    "0708": "NY",
    "0712": "NY",
    "0714": "NY",
    "0715": "NY",
    "09XX": "NY",
    "0901": "NY",
    "0903": "NY",
    "0904": "NY",
    "0905": "NY",
    "0906": "NY",
    "0971":"NY",
    "0972":"NY",
    "0981":"NY",
    "10XX": "NY",
    "1001":"NY",
    "1002":"NY",
    "1003": "NJ",
    "1012": "NY",
    "11XX": "PA",
    "1101": "PA",
    "1102": "PA",
    "1103": "DE",
    "1104": "PA",
    "1105": "NJ",
    "1106": "PA",
    "1107": "NJ",
    "1109":"PA",
    "1113":"NJ",
    "1119":"PA",
    "1181":"PA",
    "1182": "NJ",
    "1183":"NJ",
    "1195":"PA",
    "13XX": "MD",
    "1301":"MD",
    "1302":"MD",
    "1303": "MD",
    "1304":"MD",
    "1305":"MD",
    "14XX":"VA",
    "1401": "VA",
    "1402":"VA",
    "1404": "VA",
    "1408":"VA",
    "1409":"VA",
    "1410":"VA",
    "1412":"VA",
    "15XX":"NC",
    "1501":"NC",
    "1502": "NC",
    "1503":"NC",
    "1506":"NC",
    "1511":"NC",
    "1512":"NC",
    "16XX":"SC",
    "1601":"SC",
    "1602":"SC",
    "1603": "SC",
    "1604":"SC",
    "1681":"SC",
    "17XX":"GA",
    "1701": "GA",
    "1703": "GA",
    "1704": "GA",
    "18XX":"GA",
    "1801": "FL",
    "1803":"FL",
    "1805":"FL",
    "1807":"FL",
    "1808":"FL",
    "1809":"FL",
    "1814":"FL",
    "1816":"FL",
    "1818":"FL",
    "1819":"FL",
    "1821":"FL",
    "1822":"FL",
    "1883":"FL",
    "1884":"FL",
    "1885":"FL",
    "1886":"FL",
    "1887":"FL",    
    "19XX":"AL",
    "1901": "AL",
    "1903": "AS",
    "1904": "LA",
    "1910": "LA",

    "2006": "TN",
    "2007": "TN",
    "23XX": "TX",
    "2301": "TX",
    "2302": "TX",
    "2303": "TX",
    "2304": "TX",
    "2305": "TX",
    "2307": "TX",
    "2310": "TX",
    "24XX": "TX",
    "2402": "TX",
    "2403": "TX",
    "2404": "TX",
    "2406": "NM",
    "2407": "NM",
    "2408": "NM",
    "2481": "NM",
    "25XX": "CA",
    "2501": "CA",
    "2502": "CA",
    "2503": "CA",
    "2504": "CA",
    "2505": "CA",
    "2506": "CA",
    "2507": "CA",
    "26XX": "AZ",
    "2601": "AZ",
    "2602": "AZ",
    "2603": "AZ",
    "2604": "AZ",
    "2605": "AZ",
    "2606": "AZ",
    "2608": "AZ",
    "2609": "AZ",
    "27XX": "CA",
    "2704": "CA",
    "2720": "CA",
    "30XX": "WA",
    "31XX": "WA",
    "32XX": "OR",
    "33XX": "MT",
    "34XX": "ND",
    "35XX": "MN",
    "36XX": "WI",
    "37XX": "MI",
    "38XX": "MI",
    "39XX": "OH",
    "41XX": "IL",
    "45XX": "KY",
    "49XX": "LA",
    "51XX": "NM",
    "52XX": "TX",
    "53XX": "WA",
    "54XX": "OR",
    "55XX": "TX",
    "59XX": "CA",
    "60XX": "HI",
    "80XX": "AK",
}


### Data Quality Issues
-
-


### Exploratory Data Analysis

#### Univariate Analysis

### Bivariate Analysis

#### Multivariate Analysis

### Data Cleaning

### Data Preparation

### Hypothesis Testing

### Answering Business Questions

### Dashboarding

### Conclusion