![Traffic](traffic.png)

Traffic data fluctuates constantly or is affected by time. Predicting it can be challenging, but this task will help sharpen your time-series skills. With deep learning, you can use abstract patterns in data that can help boost predictability.

Your task is to build a system that can be applied to help you predict traffic volume or the number of vehicles passing at a specific point and time. Determining this can help reduce road congestion, support new designs for roads or intersections, improve safety, and more! Or, you can use to help plan your commute to avoid traffic!

The dataset provided contains the hourly traffic volume on an interstate highway in Minnesota, USA. It also includes weather features and holidays, which often impact traffic volume.

Time to predict some traffic!

### The data:

The dataset is collected and maintained by UCI Machine Learning Repository. The target variable is `traffic_volume`. The dataset contains the following and has already been normalized and saved into training and test sets:

`train_scaled.csv`, `test_scaled.csv`
| Column     | Type       | Description              |
|------------|------------|--------------------------|
|`temp`                   |Numeric            |Average temp in kelvin|
|`rain_1h`                |Numeric            |Amount in mm of rain that occurred in the hour|
|`snow_1h`                |Numeric            |Amount in mm of snow that occurred in the hour|
|`clouds_all`             |Numeric            |Percentage of cloud cover|
|`date_time`              |DateTime           |Hour of the data collected in local CST time|
|`holiday_` (11 columns)  |Categorical        |US National holidays plus regional holiday, Minnesota State Fair|
|`weather_main_` (11 columns)|Categorical     |Short textual description of the current weather|
|`weather_description_` (35 columns)|Categorical|Longer textual description of the current weather|
|`traffic_volume`         |Numeric            |Hourly I-94 ATR 301 reported westbound traffic volume|
|`hour_of_day`|Numeric|The hour of the day|
|`day_of_week`|Numeric|The day of the week (0=Monday, Sunday=6)|
|`day_of_month`|Numeric|The day of the month|
|`month`|Numeric|The number of the month|
|`traffic_volume`         |Numeric            |Hourly I-94 ATR 301 reported westbound traffic volume|

In [86]:
# Import the relevant libraries
import numpy as np
import pandas as pd

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import TensorDataset, DataLoader

import torchmetrics

In [87]:
# Read the traffic data from the CSV training and test files
train_scaled_df = pd.read_csv('train_scaled.csv') # 85%
test_scaled_df = pd.read_csv('test_scaled.csv') # 15%

# Convert the DataFrame to NumPy arrays
train_scaled = train_scaled_df.to_numpy()
test_scaled = test_scaled_df.to_numpy()
print('train_scaled shape:', train_scaled.shape, '\ntest_scaled shape:', test_scaled.shape, )

train_scaled shape: (34042, 66) 
test_scaled shape: (6533, 66)


In [88]:
train_scaled_df.describe()

Unnamed: 0,temp,rain_1h,snow_1h,clouds_all,holiday_Christmas Day,holiday_Columbus Day,holiday_Independence Day,holiday_Labor Day,holiday_Martin Luther King Jr Day,holiday_Memorial Day,holiday_New Years Day,holiday_State Fair,holiday_Thanksgiving Day,holiday_Veterans Day,holiday_Washingtons Birthday,weather_main_Clear,weather_main_Clouds,weather_main_Drizzle,weather_main_Fog,weather_main_Haze,weather_main_Mist,weather_main_Rain,weather_main_Smoke,weather_main_Snow,weather_main_Squall,weather_main_Thunderstorm,weather_description_SQUALLS,weather_description_Sky is Clear,weather_description_broken clouds,weather_description_drizzle,weather_description_few clouds,weather_description_fog,weather_description_haze,weather_description_heavy intensity drizzle,weather_description_heavy intensity rain,weather_description_heavy snow,weather_description_light intensity drizzle,weather_description_light intensity shower rain,weather_description_light rain,weather_description_light rain and snow,weather_description_light shower snow,weather_description_light snow,weather_description_mist,weather_description_moderate rain,weather_description_overcast clouds,weather_description_proximity shower rain,weather_description_proximity thunderstorm,weather_description_proximity thunderstorm with drizzle,weather_description_proximity thunderstorm with rain,weather_description_scattered clouds,weather_description_shower drizzle,weather_description_shower snow,weather_description_sky is clear,weather_description_smoke,weather_description_snow,weather_description_thunderstorm,weather_description_thunderstorm with heavy rain,weather_description_thunderstorm with light drizzle,weather_description_thunderstorm with light rain,weather_description_thunderstorm with rain,weather_description_very heavy rain,hour_of_day,day_of_week,day_of_month,month,traffic_volume
count,34042.0,34042.0,34042.0,34042.0,34042.0,34042.0,34042.0,34042.0,34042.0,34042.0,34042.0,34042.0,34042.0,34042.0,34042.0,34042.0,34042.0,34042.0,34042.0,34042.0,34042.0,34042.0,34042.0,34042.0,34042.0,34042.0,34042.0,34042.0,34042.0,34042.0,34042.0,34042.0,34042.0,34042.0,34042.0,34042.0,34042.0,34042.0,34042.0,34042.0,34042.0,34042.0,34042.0,34042.0,34042.0,34042.0,34042.0,34042.0,34042.0,34042.0,34042.0,34042.0,34042.0,34042.0,34042.0,34042.0,34042.0,34042.0,34042.0,34042.0,34042.0,34042.0,34042.0,34042.0,34042.0,34042.0
mean,0.911661,3.7e-05,0.000274,0.450664,0.000147,0.000147,0.000118,0.000118,5.9e-05,0.000118,0.000118,0.000118,0.000147,0.000147,0.000118,0.324335,0.385524,0.03199,0.014188,0.021767,0.087069,0.079314,2.9e-05,0.042213,0.000118,0.013454,0.000118,0.050261,0.117414,0.012308,0.050496,0.014188,0.021767,0.001087,0.003319,0.008989,0.018565,0.000147,0.054638,2.9e-05,0.000176,0.03055,0.087069,0.019623,0.132777,0.001351,0.008636,0.000235,0.000999,0.084836,2.9e-05,2.9e-05,0.274073,2.9e-05,0.002438,0.001674,0.000764,0.000147,0.000588,0.000411,0.000235,0.500579,0.50142,0.489064,0.524617,0.451136
std,0.044584,0.005421,0.01215,0.386631,0.012119,0.012119,0.010839,0.010839,0.007665,0.010839,0.010839,0.010839,0.012119,0.012119,0.010839,0.468133,0.486726,0.175976,0.118269,0.145925,0.28194,0.270232,0.00542,0.201077,0.010839,0.11521,0.010839,0.218487,0.321917,0.11026,0.21897,0.118269,0.145925,0.032951,0.05752,0.094384,0.134986,0.012119,0.227276,0.00542,0.013275,0.172099,0.28194,0.138702,0.339339,0.036735,0.092531,0.015328,0.031588,0.278642,0.00542,0.00542,0.446053,0.00542,0.049318,0.040886,0.027626,0.012119,0.024232,0.020276,0.015328,0.302428,0.333103,0.292053,0.312234,0.272896
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,0.881975,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.217391,0.166667,0.233333,0.272727,0.170467
50%,0.915812,0.0,0.0,0.4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.521739,0.5,0.5,0.545455,0.468201
75%,0.946365,0.0,0.0,0.9,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.782609,0.833333,0.733333,0.818182,0.68022
max,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0


In [89]:
X_train, y_train = train_scaled[:, :-1] , train_scaled[:, -1]
X_test, y_test = test_scaled[:, :-1] , test_scaled[:, -1]
print("Train:", X_train.shape , y_train.shape, "\nTest:",X_test.shape , y_test.shape)

# Converting them into Torch Datasets
dataset_train = TensorDataset(torch.from_numpy(X_train).float(), torch.from_numpy(y_train).float() )
dataset_test = TensorDataset(torch.from_numpy(X_test).float(), torch.from_numpy(y_test).float() )

dataloader_train = DataLoader(dataset_train, shuffle=False, batch_size=32)
dataloader_test = DataLoader(dataset_test, shuffle=False, batch_size=32)
print(type(dataloader_train))

Train: (34042, 65) (34042,) 
Test: (6533, 65) (6533,)
<class 'torch.utils.data.dataloader.DataLoader'>


In [90]:
class Net(nn.Module):
    def __init__(self, input_size):
        super().__init__()
        self.lstm = nn.LSTM(
            input_size=1
            ,hidden_size=32
            ,num_layers=2
            ,batch_first=True
        )
        self.fc = nn.Linear(32,1)

    def forward(self, x):
        h0 = torch.zeros(2, x.size(0), 32)
        c0 = torch.zeros(2, x.size(0), 32)
        out, _ = self.lstm(x, (h0, c0))
        out = self.fc(out[:, -1, :])
        return out

In [None]:
traffic_model = Net(input_size=32)
# Set up MSE loss
criterion = nn.MSELoss()
optimizer = optim.Adam(
  traffic_model.parameters(), lr=0.0001
)

num_epochs = 1

for epoch in range(num_epochs):
    for seqs, labels in dataloader_train:
        # Reshape model inputs
        seqs = seqs.view(seqs.shape[0] , seqs.shape[1], 1)
        # Get model outputs
        outputs = traffic_model(seqs)
        # Compute loss
        loss = criterion(outputs, labels)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
    final_training_loss = loss.item()
    print(f"Epoch {epoch+1}, Loss: {final_training_loss}")

Epoch 1, Loss: 0.048480525612831116


In [92]:
mse = torchmetrics.MeanSquaredError()

traffic_model.eval()

with torch.no_grad():
    for seqs, labels in dataloader_test:
        seqs = seqs.view(seqs.shape[0] , seqs.shape[1], 1)
        outputs = traffic_model(seqs).squeeze()
        mse(outputs, labels)
test_mse = mse.compute()
test_mse = torch.tensor(test_mse, dtype=torch.float32)
print(f"Test MSE: {test_mse}")

Test MSE: 0.1662399023771286
