# TRANSPORTATION SYSTEMS ANALYSIS 

## OVERVIEW 
Transportation systems are the backbone of modern economies, facilitating the movement of goods, services, and people across cities, states, and nations. They play a vital role in commerce, tourism, and daily living, directly impacting economic productivity and societal well-being. However, as transportation networks grow in complexity, they face challenges such as safety concerns, inefficiencies, sustainability issues, and environmental impacts.

## About the Project
This project focuses on leveraging data from the Bureau of Transportation Statistics (BTS), a key branch of the U.S. Department of Transportation. BTS provides comprehensive and reliable data on passenger travel, freight movement, safety incidents, infrastructure capacity, and environmental impacts across multiple transportation modes, including road, rail, air, and water.

As a data analyst at getINNOtized, with BTS as the organization’s largest client, you are tasked with analyzing this rich dataset to uncover patterns, identify inefficiencies, and propose actionable recommendations. By doing so, the goal is to help BTS address challenges such as safety, congestion, infrastructure stress, and economic disruptions, ultimately enhancing the effectiveness of transportation systems.


## Objectives




# Understand and Preprocess Data

## Identify the datasets available (e.g., passenger travel, safety incidents, freight movement).

# Load and combine all yearly datasets

In [1]:
import os
import pandas as pd


### Year 2020 - combine all csv files and save the combined DataFrame 

In [2]:
# Specify the folder containing the data files
data_folder = "C:/Users/lenovo/Desktop/sample data files/compiled data 2020"

# List all CSV files in the data folder
csv_files = [os.path.join(data_folder, file) for file in os.listdir(data_folder) if file.endswith('.csv')]

print(csv_files) 
 
# Combine all CSVs into a single DataFrame
dataframes = []
for file in csv_files:
    try:
        # Read each CSV file
        df = pd.read_csv(file, low_memory=False)
        
        # Convert columns with mixed types to strings
        for col in df.columns:
            if df[col].dtype == 'object':  
                df[col] = df[col].astype(str)  
        
        # Add the cleaned DataFrame to the list
        dataframes.append(df)
    except Exception as e:
        print(f"Error reading {file}: {e}")

# Concatenate all DataFrames into a single DataFrame
combined_df_2020 = pd.concat(dataframes, ignore_index=True)

# Print the shape and info of the combined DataFrame
print("Combined DataFrame Shape:", combined_df_2020.shape)
print("Combined DataFrame Info:")
print(combined_df_2020.info())

['C:/Users/lenovo/Desktop/sample data files/compiled data 2020\\dot1_0120.csv', 'C:/Users/lenovo/Desktop/sample data files/compiled data 2020\\dot1_0220.csv', 'C:/Users/lenovo/Desktop/sample data files/compiled data 2020\\dot1_0320.csv', 'C:/Users/lenovo/Desktop/sample data files/compiled data 2020\\dot1_0420.csv', 'C:/Users/lenovo/Desktop/sample data files/compiled data 2020\\dot1_0520.csv', 'C:/Users/lenovo/Desktop/sample data files/compiled data 2020\\dot1_0620.csv', 'C:/Users/lenovo/Desktop/sample data files/compiled data 2020\\dot1_0720.csv', 'C:/Users/lenovo/Desktop/sample data files/compiled data 2020\\dot1_0820.csv', 'C:/Users/lenovo/Desktop/sample data files/compiled data 2020\\dot1_0920.csv', 'C:/Users/lenovo/Desktop/sample data files/compiled data 2020\\dot2_0120.csv', 'C:/Users/lenovo/Desktop/sample data files/compiled data 2020\\dot2_0220.csv', 'C:/Users/lenovo/Desktop/sample data files/compiled data 2020\\dot2_0320.csv', 'C:/Users/lenovo/Desktop/sample data files/compiled

In [3]:
# Define the desired folder path
new_folder = "C:/Users/lenovo/Desktop/Combined_Files"

# Create the folder if it doesn't exist
os.makedirs(new_folder, exist_ok=True)

# Define the file path with the new folder and file name
output_file = os.path.join(new_folder, "combined_df_2020.csv")

# Save the combined DataFrame to the specified file
combined_df_2020.to_csv(output_file, index=False)

### Year 2021 - combine all csv files and save the combined DataFrame

In [4]:
# Specify the folder containing the data files
data_folder = "C:/Users/lenovo/Desktop/sample data files/compiled data 2021"

# List all CSV files in the data folder
csv_files = [os.path.join(data_folder, file) for file in os.listdir(data_folder) if file.endswith('.csv')]

print(csv_files)  # Verify the file paths

# Combine all CSVs into a single DataFrame
dataframes = []
for file in csv_files:
    try:
        # Read each CSV file
        df = pd.read_csv(file, low_memory=False)
        
        # Convert columns with mixed types to strings
        for col in df.columns:
            if df[col].dtype == 'object':  
                df[col] = df[col].astype(str) 
        
        # Add the cleaned DataFrame to the list
        dataframes.append(df)
    except Exception as e:
        print(f"Error reading {file}: {e}")

# Concatenate all DataFrames into a single DataFrame
combined_df_2021 = pd.concat(dataframes, ignore_index=True)

# Print the shape and info of the combined DataFrame
print("Combined DataFrame Shape:", combined_df_2021.shape)
print("Combined DataFrame Info:")
print(combined_df_2021.info())

['C:/Users/lenovo/Desktop/sample data files/compiled data 2021\\dot1_0121.csv', 'C:/Users/lenovo/Desktop/sample data files/compiled data 2021\\dot1_0221.csv', 'C:/Users/lenovo/Desktop/sample data files/compiled data 2021\\dot1_0321.csv', 'C:/Users/lenovo/Desktop/sample data files/compiled data 2021\\dot1_0421.csv', 'C:/Users/lenovo/Desktop/sample data files/compiled data 2021\\dot1_0521.csv', 'C:/Users/lenovo/Desktop/sample data files/compiled data 2021\\dot1_0621.csv', 'C:/Users/lenovo/Desktop/sample data files/compiled data 2021\\dot1_0721.csv', 'C:/Users/lenovo/Desktop/sample data files/compiled data 2021\\dot1_0821.csv', 'C:/Users/lenovo/Desktop/sample data files/compiled data 2021\\dot1_0921.csv', 'C:/Users/lenovo/Desktop/sample data files/compiled data 2021\\dot1_1021.csv', 'C:/Users/lenovo/Desktop/sample data files/compiled data 2021\\dot1_1121.csv', 'C:/Users/lenovo/Desktop/sample data files/compiled data 2021\\dot1_1221.csv', 'C:/Users/lenovo/Desktop/sample data files/compiled

In [5]:
# Define the desired folder path
new_folder = "C:/Users/lenovo/Desktop/Combined_Files"

# Define the file path with the new folder and file name
output_file = os.path.join(new_folder, "combined_df_2021.csv")

# Save the combined DataFrame to the specified file
combined_df_2021.to_csv(output_file, index=False)

### Year 2022 - combine all csv files and save the combined DataFrame

In [6]:
# Specify the folder containing the data files
data_folder = "C:/Users/lenovo/Desktop/sample data files/compiled data 2022"

# List all CSV files in the data folder
csv_files = [os.path.join(data_folder, file) for file in os.listdir(data_folder) if file.endswith('.csv')]

print(csv_files)  # Verify the file paths

# Combine all CSVs into a single DataFrame
dataframes = []
for file in csv_files:
    try:
        # Read each CSV file
        df = pd.read_csv(file, low_memory=False)
        
        # Convert columns with mixed types to strings
        for col in df.columns:
            if df[col].dtype == 'object':  
                df[col] = df[col].astype(str)  
        
        # Add the cleaned DataFrame to the list
        dataframes.append(df)
    except Exception as e:
        print(f"Error reading {file}: {e}")

# Concatenate all DataFrames into a single DataFrame
combined_df_2022 = pd.concat(dataframes, ignore_index=True)

# Print the shape and info of the combined DataFrame
print("Combined DataFrame Shape:", combined_df_2022.shape)
print("Combined DataFrame Info:")
print(combined_df_2022.info())

['C:/Users/lenovo/Desktop/sample data files/compiled data 2022\\dot1_0122.csv', 'C:/Users/lenovo/Desktop/sample data files/compiled data 2022\\dot1_0222.csv', 'C:/Users/lenovo/Desktop/sample data files/compiled data 2022\\dot1_0322.csv', 'C:/Users/lenovo/Desktop/sample data files/compiled data 2022\\dot1_0422.csv', 'C:/Users/lenovo/Desktop/sample data files/compiled data 2022\\dot1_0522.csv', 'C:/Users/lenovo/Desktop/sample data files/compiled data 2022\\dot1_0622.csv', 'C:/Users/lenovo/Desktop/sample data files/compiled data 2022\\dot1_0722.csv', 'C:/Users/lenovo/Desktop/sample data files/compiled data 2022\\dot1_0822.csv', 'C:/Users/lenovo/Desktop/sample data files/compiled data 2022\\dot1_0922.csv', 'C:/Users/lenovo/Desktop/sample data files/compiled data 2022\\dot1_1022.csv', 'C:/Users/lenovo/Desktop/sample data files/compiled data 2022\\dot1_1122.csv', 'C:/Users/lenovo/Desktop/sample data files/compiled data 2022\\dot1_1222.csv', 'C:/Users/lenovo/Desktop/sample data files/compiled

In [7]:
# Define the desired folder path
new_folder = "C:/Users/lenovo/Desktop/Combined_Files"

# Define the file path with the new folder and file name
output_file = os.path.join(new_folder, "combined_df_2022.csv")

# Save the combined DataFrame to the specified file
combined_df_2022.to_csv(output_file, index=False)

### Year 2023 - combine all csv files and save the combined DataFrame

In [9]:
# Specify the folder containing the data files
data_folder = "C:/Users/lenovo/Desktop/sample data files/compiled data 2023"

# List all CSV files in the data folder
csv_files = [os.path.join(data_folder, file) for file in os.listdir(data_folder) if file.endswith('.csv')]

print(csv_files)  # Verify the file paths

# Combine all CSVs into a single DataFrame
dataframes = []
for file in csv_files:
    try:
        # Read each CSV file
        df = pd.read_csv(file, low_memory=False)
        
        # Convert columns with mixed types to strings
        for col in df.columns:
            if df[col].dtype == 'object':  
                df[col] = df[col].astype(str)  
        
        # Add the cleaned DataFrame to the list
        dataframes.append(df)
    except Exception as e:
        print(f"Error reading {file}: {e}")

# Concatenate all DataFrames into a single DataFrame
combined_df_2023 = pd.concat(dataframes, ignore_index=True)

# Print the shape and info of the combined DataFrame
print("Combined DataFrame Shape:", combined_df_2023.shape)
print("Combined DataFrame Info:")
print(combined_df_2023.info())


['C:/Users/lenovo/Desktop/sample data files/compiled data 2023\\dot1_0123.csv', 'C:/Users/lenovo/Desktop/sample data files/compiled data 2023\\dot1_0223.csv', 'C:/Users/lenovo/Desktop/sample data files/compiled data 2023\\dot1_0323.csv', 'C:/Users/lenovo/Desktop/sample data files/compiled data 2023\\dot1_0423.csv', 'C:/Users/lenovo/Desktop/sample data files/compiled data 2023\\dot1_0523.csv', 'C:/Users/lenovo/Desktop/sample data files/compiled data 2023\\dot1_0623.csv', 'C:/Users/lenovo/Desktop/sample data files/compiled data 2023\\dot1_0723.csv', 'C:/Users/lenovo/Desktop/sample data files/compiled data 2023\\dot1_0823.csv', 'C:/Users/lenovo/Desktop/sample data files/compiled data 2023\\dot1_0923.csv', 'C:/Users/lenovo/Desktop/sample data files/compiled data 2023\\dot1_1023.csv', 'C:/Users/lenovo/Desktop/sample data files/compiled data 2023\\dot1_1123.csv', 'C:/Users/lenovo/Desktop/sample data files/compiled data 2023\\dot1_1223.csv', 'C:/Users/lenovo/Desktop/sample data files/compiled

In [10]:
# Define the desired folder path
new_folder = "C:/Users/lenovo/Desktop/Combined_Files"

# Define the file path with the new folder and file name
output_file = os.path.join(new_folder, "combined_df_2023.csv")

# Save the combined DataFrame to the specified file
combined_df_2023.to_csv(output_file, index=False)

### Year 2024 - combine all csv files and save the combined DataFrame

In [11]:
# Specify the folder containing the data files
data_folder = "C:/Users/lenovo/Desktop/sample data files/compiled data 2024"

# List all CSV files in the data folder
csv_files = [os.path.join(data_folder, file) for file in os.listdir(data_folder) if file.endswith('.csv')]

print(csv_files)  # Verify the file paths

# Combine all CSVs into a single DataFrame
dataframes = []
for file in csv_files:
    try:
        # Read each CSV file
        df = pd.read_csv(file, low_memory=False)
        
        # Convert columns with mixed types to strings
        for col in df.columns:
            if df[col].dtype == 'object':  
                df[col] = df[col].astype(str)  
        
        # Add the cleaned DataFrame to the list
        dataframes.append(df)
    except Exception as e:
        print(f"Error reading {file}: {e}")

# Concatenate all DataFrames into a single DataFrame
combined_df_2024 = pd.concat(dataframes, ignore_index=True)

# Print the shape and info of the combined DataFrame
print("Combined DataFrame Shape:", combined_df_2024.shape)
print("Combined DataFrame Info:")
print(combined_df_2024.info())

['C:/Users/lenovo/Desktop/sample data files/compiled data 2024\\dot1_0124.csv', 'C:/Users/lenovo/Desktop/sample data files/compiled data 2024\\dot1_0224.csv', 'C:/Users/lenovo/Desktop/sample data files/compiled data 2024\\dot1_0324.csv', 'C:/Users/lenovo/Desktop/sample data files/compiled data 2024\\dot1_0424.csv', 'C:/Users/lenovo/Desktop/sample data files/compiled data 2024\\dot1_0524.csv', 'C:/Users/lenovo/Desktop/sample data files/compiled data 2024\\dot1_0624.csv', 'C:/Users/lenovo/Desktop/sample data files/compiled data 2024\\dot1_0724.csv', 'C:/Users/lenovo/Desktop/sample data files/compiled data 2024\\dot1_0824.csv', 'C:/Users/lenovo/Desktop/sample data files/compiled data 2024\\dot1_0924.csv', 'C:/Users/lenovo/Desktop/sample data files/compiled data 2024\\dot2_0124.csv', 'C:/Users/lenovo/Desktop/sample data files/compiled data 2024\\dot2_0224.csv', 'C:/Users/lenovo/Desktop/sample data files/compiled data 2024\\dot2_0324.csv', 'C:/Users/lenovo/Desktop/sample data files/compiled

In [12]:
# Define the desired folder path
new_folder = "C:/Users/lenovo/Desktop/Combined_Files"

# Define the file path with the new folder and file name
output_file = os.path.join(new_folder, "combined_df_2024.csv")

# Save the combined DataFrame to the specified file
combined_df_2024.to_csv(output_file, index=False)

### Combine DataFrames 

In [13]:
# Folder containing the CSV files
data_folder = "C:/Users/lenovo/Desktop/Combined_Files"

# List all CSV files in the folder
csv_files = [os.path.join(data_folder, file) for file in os.listdir(data_folder) if file.endswith('.csv')]

print(f"Found {len(csv_files)} CSV files.")

# Combine all CSV files into a single DataFrame
dataframes = [pd.read_csv(file, low_memory=False) for file in csv_files]

# Concatenate all DataFrames into one
combined_df = pd.concat(dataframes, ignore_index=True)

# Display basic information about the combined DataFrame
print(f"Combined DataFrame shape: {combined_df.shape}")
print(combined_df.info())

# Define the output folder and file path
output_folder = "C:/Users/lenovo/Desktop/Combined_Files"

output_file = os.path.join(output_folder, "combined_data.csv")

# Save the combined DataFrame to a CSV file
combined_df.to_csv(output_file, index=False)

print(f"Combined data file saved successfully at: {output_file}")

Found 5 CSV files.
Combined DataFrame shape: (6517225, 15)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6517225 entries, 0 to 6517224
Data columns (total 15 columns):
 #   Column           Dtype  
---  ------           -----  
 0   TRDTYPE          int64  
 1   USASTATE         object 
 2   DEPE             object 
 3   DISAGMOT         int64  
 4   MEXSTATE         object 
 5   CANPROV          object 
 6   COUNTRY          int64  
 7   VALUE            int64  
 8   SHIPWT           int64  
 9   FREIGHT_CHARGES  int64  
 10  DF               float64
 11  CONTCODE         object 
 12  MONTH            int64  
 13  YEAR             int64  
 14  COMMODITY2       float64
dtypes: float64(2), int64(8), object(5)
memory usage: 745.8+ MB
None
Combined data file saved successfully at: C:/Users/lenovo/Desktop/Combined_Files\combined_data.csv


## Explore data characteristics such as missing values, data types, and distributions.
