# RandomForest Congestion Prediction - Inference Notebook

This notebook performs real-time congestion prediction for EV charging stations using a pre-trained RandomForest model.

## Workflow:
1. Load pre-trained RandomForest model
2. Create scoring dataset with station IDs
3. Generate temporal features (current date/time)
4. Fetch external data (holidays, events, weather, pedestrian counts)
5. Engineer features to match training schema
6. Run predictions

## Required Features:
The model expects 23 features including temporal, lag, weather, and event indicators.

## 1. Setup and Configuration

In [115]:
# Import libraries
import pandas as pd
import numpy as np
import joblib
import requests
import holidays
from datetime import datetime
import warnings

warnings.filterwarnings('ignore')
print("Libraries loaded successfully")

Libraries loaded successfully


## 2. Load Pre-trained Model

In [116]:
# Load the trained RandomForest model
rf_model = joblib.load('random_forest_model.pkl')
print(f"Model loaded: {type(rf_model).__name__}")
print(f"Number of estimators: {rf_model.n_estimators}")
print(f"Max depth: {rf_model.max_depth}")

Model loaded: RandomForestRegressor
Number of estimators: 300
Max depth: 15


## 3. Initialize Scoring Dataset

Create a dataframe with the station IDs we want to predict for.

In [117]:
# Station IDs to score
stations = [
    '674f97ff3dc8e5d2ac00867a',
    '674f98013dc8e5d2ac00894a',
    '674f97ff3dc8e5d2ac008456'
]

df_score = pd.DataFrame(stations, columns=['stationid'])
print(f"Scoring {len(df_score)} stations")
df_score.head()

Scoring 3 stations


Unnamed: 0,stationid
0,674f97ff3dc8e5d2ac00867a
1,674f98013dc8e5d2ac00894a
2,674f97ff3dc8e5d2ac008456


## 4. Generate Temporal Features

Extract current date/time and create temporal features matching the training schema.

In [118]:
# Get current timestamp
current_time = datetime.now()
df_score['created'] = current_time

# Extract temporal features
df_score['hour'] = current_time.hour
df_score['dayofweek'] = current_time.weekday()  # Monday=0, Sunday=6
df_score['is_weekend'] = int(current_time.weekday() >= 5)

print(f"\nCurrent timestamp: {current_time}")
print(f"Hour: {df_score['hour'].iloc[0]}")
print(f"Day of week: {df_score['dayofweek'].iloc[0]} (0=Monday, 6=Sunday)")
print(f"Is weekend: {df_score['is_weekend'].iloc[0]}")

df_score.head()


Current timestamp: 2026-01-07 22:08:34.636077
Hour: 22
Day of week: 2 (0=Monday, 6=Sunday)
Is weekend: 0


Unnamed: 0,stationid,created,hour,dayofweek,is_weekend
0,674f97ff3dc8e5d2ac00867a,2026-01-07 22:08:34.636077,22,2,0
1,674f98013dc8e5d2ac00894a,2026-01-07 22:08:34.636077,22,2,0
2,674f97ff3dc8e5d2ac008456,2026-01-07 22:08:34.636077,22,2,0


## 5. Initialize Lag and Derived Features

For real-time prediction without historical data, initialize lag features to zero.

In [119]:
# Lag and derived features (set to zero - no historical data available)
lag_and_derived_features = [
    'arrivals_lag1', 'arrivals_lag2', 'arrivals_lag4',
    'arrivals_ma4', 'arrivals_ma8',
    'hod_sin', 'hod_cos',
    'arrivals_pct_change', 'arrivals_diff', 'arrivals_ewma_4'
]

for col in lag_and_derived_features:
    df_score[col] = 0

print(f"Initialized {len(lag_and_derived_features)} lag/derived features to zero")
df_score.head()

Initialized 10 lag/derived features to zero


Unnamed: 0,stationid,created,hour,dayofweek,is_weekend,arrivals_lag1,arrivals_lag2,arrivals_lag4,arrivals_ma4,arrivals_ma8,hod_sin,hod_cos,arrivals_pct_change,arrivals_diff,arrivals_ewma_4
0,674f97ff3dc8e5d2ac00867a,2026-01-07 22:08:34.636077,22,2,0,0,0,0,0,0,0,0,0,0,0
1,674f98013dc8e5d2ac00894a,2026-01-07 22:08:34.636077,22,2,0,0,0,0,0,0,0,0,0,0,0
2,674f97ff3dc8e5d2ac008456,2026-01-07 22:08:34.636077,22,2,0,0,0,0,0,0,0,0,0,0,0


## 6. Fetch Holiday Information

Check if current date is a Victoria public holiday.

In [120]:
# Victoria public holidays
vic_holidays = holidays.Australia(state='VIC', years=[2024, 2025, 2026])
df_score['is_holiday'] = df_score['created'].dt.date.apply(
    lambda date: 1 if date in vic_holidays else 0
)

print(f"Is holiday: {df_score['is_holiday'].iloc[0]}")

Is holiday: 0


## 7. Fetch Major Event Information

Identify if current date coincides with major Melbourne events.

In [121]:
def categorize_event(date):
    """Categorize dates into major Melbourne events"""
    if date.month == 1 and 19 <= date.day <= 31:
        return 'Australian Open'
    elif date.month in [3, 4, 5, 6, 7, 8, 9]:
        if date.month == 4 and date.day == 25:
            return 'ANZAC Day AFL'
        elif date.month == 9 and 26 <= date.day <= 30:
            return 'AFL Grand Final'
        return 'AFL Season'
    elif date.month == 11 and date.day <= 7 and date.weekday() == 1:
        return 'Melbourne Cup'
    elif date.month == 12 and 26 <= date.day <= 30:
        return 'Boxing Day Test'
    elif date.month == 12 and date.day == 31:
        return 'New Year\'s Eve'
    elif date.month == 3 and 13 <= date.day <= 15:
        return 'Australian Grand Prix'
    return 'No Event'

df_score['is_major_event'] = df_score['created'].dt.date.apply(
    lambda date: 0 if categorize_event(date) == 'No Event' else 1
)

event_name = categorize_event(df_score['created'].iloc[0].date())
print(f"Event: {event_name}, Is major event: {df_score['is_major_event'].iloc[0]}")

Event: No Event, Is major event: 0


## 8. Fetch Weather Data

Retrieve current weather conditions from Open-Meteo API (Melbourne coordinates).

In [122]:
# Melbourne coordinates
lat = -37.8136
lon = 144.9631

start_date = df_score['created'].min().strftime('%Y-%m-%d')
end_date = df_score['created'].max().strftime('%Y-%m-%d')

print(f"Fetching weather data for {start_date}...")

# Call Open-Meteo API
url = "https://archive-api.open-meteo.com/v1/archive"
params = {
    "latitude": lat,
    "longitude": lon,
    "start_date": start_date,
    "end_date": end_date,
    "daily": "temperature_2m_max,temperature_2m_min,temperature_2m_mean,precipitation_sum,windspeed_10m_max",
    "timezone": "Australia/Melbourne"
}

response = requests.get(url, params=params)
weather_json = response.json()

weather_df = pd.DataFrame({
    'date': pd.to_datetime(weather_json['daily']['time']),
    'temp_max_c': weather_json['daily']['temperature_2m_max'],
    'temp_min_c': weather_json['daily']['temperature_2m_min'],
    'temp_avg_c': weather_json['daily']['temperature_2m_mean'],
    'precipitation_mm': weather_json['daily']['precipitation_sum'],
    'wind_speed_kmh': weather_json['daily']['windspeed_10m_max']
})

# Merge with scoring data
df_score['date'] = pd.to_datetime(df_score['created'].dt.date)
df_score = df_score.merge(weather_df, on='date', how='left')

print(f"Weather: {df_score['temp_avg_c'].iloc[0]:.1f}°C, "
      f"{df_score['precipitation_mm'].iloc[0]:.1f}mm rain, "
      f"{df_score['wind_speed_kmh'].iloc[0]:.1f}km/h wind")

df_score.head()

Fetching weather data for 2026-01-07...
Weather: 29.7°C, 0.0mm rain, 25.0km/h wind


Unnamed: 0,stationid,created,hour,dayofweek,is_weekend,arrivals_lag1,arrivals_lag2,arrivals_lag4,arrivals_ma4,arrivals_ma8,...,arrivals_diff,arrivals_ewma_4,is_holiday,is_major_event,date,temp_max_c,temp_min_c,temp_avg_c,precipitation_mm,wind_speed_kmh
0,674f97ff3dc8e5d2ac00867a,2026-01-07 22:08:34.636077,22,2,0,0,0,0,0,0,...,0,0,0,0,2026-01-07,42.0,18.9,29.7,0.0,25.0
1,674f98013dc8e5d2ac00894a,2026-01-07 22:08:34.636077,22,2,0,0,0,0,0,0,...,0,0,0,0,2026-01-07,42.0,18.9,29.7,0.0,25.0
2,674f97ff3dc8e5d2ac008456,2026-01-07 22:08:34.636077,22,2,0,0,0,0,0,0,...,0,0,0,0,2026-01-07,42.0,18.9,29.7,0.0,25.0


## 9. Fetch Pedestrian Count Data

Retrieve pedestrian counts from Melbourne's pedestrian counting system.

In [123]:
base_url = "https://melbournetestbed.opendatasoft.com/api/explore/v2.1/catalog/datasets/pedestrian-counting-system-monthly-counts-per-hour/records?"

print("Fetching pedestrian count data...")

all_records = []
offset = 0
limit = 100

while True:
    params = {
        "select": "sensing_date,hourday,direction_1",
        "where": "sensing_date >= now(days=-3) and sensing_date <= now(days=-2)",
        "timezone": "Australia/Melbourne",
        "limit": limit,
        "offset": offset
    }
    
    response = requests.get(base_url, params=params)
    
    if response.status_code == 200:
        data = response.json()
        records = data.get('results', [])
        if not records:
            break
        all_records.extend(records)
        offset += limit
    else:
        print(f"Error: {response.status_code}")
        break

df_pedestrian_api = pd.DataFrame(all_records)
df_pedestrian_api['sensing_date'] = pd.to_datetime(df_pedestrian_api['sensing_date']) + pd.Timedelta(days=1)

print(f"Retrieved {len(df_pedestrian_api)} pedestrian count records")

Fetching pedestrian count data...
Retrieved 2057 pedestrian count records


In [124]:
# Merge pedestrian data by hour
df_pedestrian_api['sensing_date'] = pd.to_datetime(df_pedestrian_api['sensing_date'])
df_score = df_score.merge(df_pedestrian_api, left_on=['hour'], right_on=['hourday'], how='inner')

print(f"Pedestrian count: {df_score['direction_1'].iloc[0]}")
df_score.head()

Pedestrian count: 62


Unnamed: 0,stationid,created,hour,dayofweek,is_weekend,arrivals_lag1,arrivals_lag2,arrivals_lag4,arrivals_ma4,arrivals_ma8,...,is_major_event,date,temp_max_c,temp_min_c,temp_avg_c,precipitation_mm,wind_speed_kmh,sensing_date,hourday,direction_1
0,674f97ff3dc8e5d2ac00867a,2026-01-07 22:08:34.636077,22,2,0,0,0,0,0,0,...,0,2026-01-07,42.0,18.9,29.7,0.0,25.0,2026-01-06,22,62
1,674f97ff3dc8e5d2ac00867a,2026-01-07 22:08:34.636077,22,2,0,0,0,0,0,0,...,0,2026-01-07,42.0,18.9,29.7,0.0,25.0,2026-01-06,22,182
2,674f97ff3dc8e5d2ac00867a,2026-01-07 22:08:34.636077,22,2,0,0,0,0,0,0,...,0,2026-01-07,42.0,18.9,29.7,0.0,25.0,2026-01-06,22,213
3,674f97ff3dc8e5d2ac00867a,2026-01-07 22:08:34.636077,22,2,0,0,0,0,0,0,...,0,2026-01-07,42.0,18.9,29.7,0.0,25.0,2026-01-06,22,3
4,674f97ff3dc8e5d2ac00867a,2026-01-07 22:08:34.636077,22,2,0,0,0,0,0,0,...,0,2026-01-07,42.0,18.9,29.7,0.0,25.0,2026-01-06,22,38


## 10. Create Interaction Features

In [125]:
# Interaction features
df_score["weekend_x_hour"] = df_score["is_weekend"] * df_score["hourday"]
df_score["temp_x_precipitation"] = df_score["temp_avg_c"] * df_score["precipitation_mm"]

print("Interaction features created")

Interaction features created


## 11. Make Predictions

In [126]:
# Required features (must match training schema)
required_features = [
    'hour', 'dayofweek', 'is_weekend',
    'arrivals_lag1', 'arrivals_lag2', 'arrivals_lag4',
    'arrivals_ma4', 'arrivals_ma8',
    'hod_sin', 'hod_cos',
    'is_holiday', 'is_major_event',
    'temp_max_c', 'temp_min_c', 'temp_avg_c',
    'precipitation_mm', 'wind_speed_kmh', 'direction_1',
    'arrivals_pct_change', 'arrivals_diff',
    'weekend_x_hour', 'temp_x_precipitation', 'arrivals_ewma_4'
]

# Validate features
missing = [f for f in required_features if f not in df_score.columns]
if missing:
    print(f"⚠ Missing: {missing}")
else:
    print("✓ All features present")

# Prepare and predict
X_score = df_score[required_features].values
predictions = rf_model.predict(X_score)
df_score['predicted_arrivals'] = predictions

print(f"\nPredicted {len(predictions)} stations successfully")

✓ All features present

Predicted 264 stations successfully


## 12. Results Summary

In [127]:
# Display results
print("\n" + "="*70)
print("PREDICTION RESULTS")
print("="*70)

for idx, row in df_score.iterrows():
    print(f"\nStation: {row['stationid']}")
    print(f"  Predicted arrivals (3h): {row['predicted_arrivals']:.2f}")
    print(f"  Conditions: {row['temp_avg_c']:.1f}°C, {row['precipitation_mm']:.1f}mm, pedestrians: {row['direction_1']:.0f}")

print("="*70)

# Save results
results = df_score[['stationid', 'predicted_arrivals', 'hour', 'temp_avg_c', 'direction_1']].copy()
results.to_csv('prediction_results.csv', index=False)
print("\n✓ Results saved to prediction_results.csv")


PREDICTION RESULTS

Station: 674f97ff3dc8e5d2ac00867a
  Predicted arrivals (3h): 1.09
  Conditions: 29.7°C, 0.0mm, pedestrians: 62

Station: 674f97ff3dc8e5d2ac00867a
  Predicted arrivals (3h): 1.09
  Conditions: 29.7°C, 0.0mm, pedestrians: 182

Station: 674f97ff3dc8e5d2ac00867a
  Predicted arrivals (3h): 1.09
  Conditions: 29.7°C, 0.0mm, pedestrians: 213

Station: 674f97ff3dc8e5d2ac00867a
  Predicted arrivals (3h): 1.09
  Conditions: 29.7°C, 0.0mm, pedestrians: 3

Station: 674f97ff3dc8e5d2ac00867a
  Predicted arrivals (3h): 1.09
  Conditions: 29.7°C, 0.0mm, pedestrians: 38

Station: 674f97ff3dc8e5d2ac00867a
  Predicted arrivals (3h): 1.09
  Conditions: 29.7°C, 0.0mm, pedestrians: 31

Station: 674f97ff3dc8e5d2ac00867a
  Predicted arrivals (3h): 1.09
  Conditions: 29.7°C, 0.0mm, pedestrians: 137

Station: 674f97ff3dc8e5d2ac00867a
  Predicted arrivals (3h): 1.10
  Conditions: 29.7°C, 0.0mm, pedestrians: 253

Station: 674f97ff3dc8e5d2ac00867a
  Predicted arrivals (3h): 1.09
  Conditions: 