![Traffic](traffic.png)

Traffic data fluctuates constantly or is affected by time. Predicting it can be challenging, but this task will help sharpen your time-series skills. With deep learning, you can use abstract patterns in data that can help boost predictability.

Your task is to build a system that can be applied to help you predict traffic volume or the number of vehicles passing at a specific point and time. Determining this can help reduce road congestion, support new designs for roads or intersections, improve safety, and more! Or, you can use to help plan your commute to avoid traffic!

The dataset provided contains the hourly traffic volume on an interstate highway in Minnesota, USA. It also includes weather features and holidays, which often impact traffic volume.

Time to predict some traffic!

### The data:

The dataset is collected and maintained by UCI Machine Learning Repository. The target variable is `traffic_volume`. The dataset contains the following and has already been normalized and saved into training and test sets:

`train_scaled.csv`, `test_scaled.csv`
| Column     | Type       | Description              |
|------------|------------|--------------------------|
|`temp`                   |Numeric            |Average temp in kelvin|
|`rain_1h`                |Numeric            |Amount in mm of rain that occurred in the hour|
|`snow_1h`                |Numeric            |Amount in mm of snow that occurred in the hour|
|`clouds_all`             |Numeric            |Percentage of cloud cover|
|`date_time`              |DateTime           |Hour of the data collected in local CST time|
|`holiday_` (11 columns)  |Categorical        |US National holidays plus regional holiday, Minnesota State Fair|
|`weather_main_` (11 columns)|Categorical     |Short textual description of the current weather|
|`weather_description_` (35 columns)|Categorical|Longer textual description of the current weather|
|`traffic_volume`         |Numeric            |Hourly I-94 ATR 301 reported westbound traffic volume|
|`hour_of_day`|Numeric|The hour of the day|
|`day_of_week`|Numeric|The day of the week (0=Monday, Sunday=6)|
|`day_of_month`|Numeric|The day of the month|
|`month`|Numeric|The number of the month|
|`traffic_volume`         |Numeric            |Hourly I-94 ATR 301 reported westbound traffic volume|

In [1]:
# Import the relevant libraries
import numpy as np
import pandas as pd

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import TensorDataset, DataLoader

In [2]:
# Read the traffic data from the CSV training and test files
train_scaled_df = pd.read_csv('train_scaled.csv')
test_scaled_df = pd.read_csv('test_scaled.csv')


# Separate features and target
X_train = train_scaled_df.drop('traffic_volume', axis=1).values
y_train = train_scaled_df['traffic_volume'].values
X_test = test_scaled_df.drop('traffic_volume', axis=1).values
y_test = test_scaled_df['traffic_volume'].values

# Reshape data for LSTM (samples, time steps, features)
def create_sequences(X, y, time_steps=3):
    X_seq, y_seq = [], []
    for i in range(len(X) - time_steps):
        X_seq.append(X[i:i+time_steps])
        y_seq.append(y[i+time_steps])
    return np.array(X_seq), np.array(y_seq)

# Create sequences
X_train_seq, y_train_seq = create_sequences(X_train, y_train)
X_test_seq, y_test_seq = create_sequences(X_test, y_test)

# Convert to PyTorch tensors
X_train_tensor = torch.FloatTensor(X_train_seq)
y_train_tensor = torch.FloatTensor(y_train_seq)
X_test_tensor = torch.FloatTensor(X_test_seq)
y_test_tensor = torch.FloatTensor(y_test_seq)

In [3]:
class TrafficLSTM(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, output_size):
        super(TrafficLSTM, self).__init__()
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        
        # LSTM layer
        self.lstm = nn.LSTM(input_size, hidden_size, num_layers, 
                            batch_first=True, dropout=0.2)
        
        # Fully connected layer
        self.fc = nn.Linear(hidden_size, output_size)
    
    def forward(self, x):
        # Initialize hidden state
        h0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(x.device)
        c0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(x.device)
        
        # LSTM forward pass
        out, _ = self.lstm(x, (h0, c0))
        
        # Take the last time step
        out = self.fc(out[:, -1, :])
        return out

# Model Hyperparameters
input_size = X_train_seq.shape[2]
hidden_size = 50
num_layers = 2
output_size = 1

# Initialize the model
traffic_model = TrafficLSTM(input_size, hidden_size, num_layers, output_size)

# Loss and Optimizer
criterion = nn.MSELoss()
optimizer = optim.Adam(traffic_model.parameters(), lr=0.001)

In [4]:
num_epochs = 50
training_losses = []

for epoch in range(num_epochs):
    # Forward pass
    outputs = traffic_model(X_train_tensor)
    loss = criterion(outputs, y_train_tensor.unsqueeze(1))
    
    # Backward pass and optimize
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    
    training_losses.append(loss.item())
    
    # Print progress
    if (epoch + 1) % 5 == 0:
        print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')

# Save final training loss
final_training_loss = torch.tensor(training_losses[-1])

Epoch [5/50], Loss: 0.2456
Epoch [10/50], Loss: 0.1843
Epoch [15/50], Loss: 0.1276
Epoch [20/50], Loss: 0.0841
Epoch [25/50], Loss: 0.0810
Epoch [30/50], Loss: 0.0854
Epoch [35/50], Loss: 0.0761
Epoch [40/50], Loss: 0.0762
Epoch [45/50], Loss: 0.0769
Epoch [50/50], Loss: 0.0753


In [5]:
traffic_model.eval()
with torch.no_grad():
    # Predict on test set
    test_predictions = traffic_model(X_test_tensor)
    
    # Calculate Mean Squared Error
    test_mse = F.mse_loss(test_predictions, y_test_tensor.unsqueeze(1))

# Print results
print(f'Final Training Loss: {final_training_loss.item():.4f}')
print(f'Test MSE: {test_mse.item():.4f}')


Final Training Loss: 0.0753
Test MSE: 0.0742
