Skip to content

bj-mackenzie/maqs

Repository files navigation

MAQS: Model-Augmented Q-Learning Scheduler for Energy-Harvesting IoT Devices

This repository contains the simulation code accompanying the paper:

MAQS: Model-Augmented Q-Learning Scheduler for Energy-Harvesting IoT Devices Brendan J. Mackenzie, Max Sebrechts, Koustabh Dolui, Sam Michiels, Danny Hughes DistriNet, KU Leuven - Accepted at IEEE WoWMoM 2026

Overview

MAQS is a fully on-device Reinforcement Learning (RL) approach for dynamic task scheduling on energy-harvesting Internet of Things (IoT) devices. Unlike existing RL methods that require off-device training in simulated environments, MAQS devices continually learn and adapt during deployment. The energy management problem is formulated as a Markov Decision Process (MDP), where the IoT device acts as an agent that simultaneously performs tasks and learns an optimal scheduling policy through Q-learning.

Key contributions:

  • On-device RL: MAQS operates entirely on-device, requiring only 324 bytes of RAM for its Q-table and negligible computation, making it viable on the most resource-constrained hardware.
  • Phantom Q-updates: A novel mechanism that updates multiple Q-values from a single experience, significantly accelerating convergence.
  • No manual tuning: Unlike AsTAR/AsTAR++, which require careful scenario-specific calibration, MAQS replaces expert-driven parameter tuning with automated learning.

Four scheduling algorithms are compared in this simulation:

  • MAQS: Q-learning based scheduler with model-augmented updates
  • MAQS++: MAQS with diurnal (day/night) optimisation using separate Q-tables
  • AsTAR: State-of-the-art prediction-free baseline
  • AsTAR++: AsTAR with diurnal optimisation

Simulation Environment

The simulation framework uses a discrete-time model populated with real-world environmental data and hardware specifications. The hardware model is based on the Circuit Dojo nRF9160 Feather, powered by monocrystalline solar panels (7 cm × 5 cm, 17% efficiency) and supercapacitors (5F or 60F depending on the application). The model accounts for equivalent series resistance (ESR), leakage current, and DC-DC converter efficiency (Texas Instruments TPS62840). Environmental conditions use the EnHANTs long-term light irradiance dataset from Columbia University.

Repository Structure

  • Qlearning.py: Q-learning agent implementation (Q-table, reward function, epsilon-greedy exploration, phantom Q-update)
  • simulation_func.py: Core simulation functions for MAQS: voltage computation with ESR model during task/sleep cycles
  • AsTAR_sim_func.py: Core simulation functions for AsTAR: continuous-time task+sleep simulation with irradiance lookup
  • MAQS_simulation.py: MAQS simulation runner
  • MAQS++_simulation.py: MAQS++ simulation runner (adds night-mode optimisation)
  • AsTAR_simulation.py: AsTAR simulation runner
  • AsTAR++_simulation.py: AsTAR++ simulation runner (adds night-mode optimisation)
  • dataexpl.py: Dataset preprocessing script (handles missing values, aggregates to 15-min intervals)
  • histogram.py: Generates task interval histograms from saved results
  • voltage_graphs.py: Generates voltage and irradiance comparison plots from saved results

Data & Results

  • data/: Raw irradiance measurement files (Setups A–F from the EnHANTs dataset)
  • datasets/: Processed datasets (15-min averaged irradiance values)
  • res_small/, res_lidar/, res_ltem/: Saved simulation results (voltages, intervals) for each application
  • histograms/: Generated histogram PDFs
  • voltage_graphs/: Generated voltage comparison PDFs

Application Scenarios

Three application scenarios are defined, as described in Table II of the paper:

Specification LiDAR LTE-M Small load
Sensing current 122 mA 218 μA 5 mA
Sensing duration 1.5 s 0.271 s 1.5 s
Transmission current 0 316.99 mA 0
Transmission duration 0 1 s 0
Sleep current 2.2 μA 61 μA 2.2 μA
Capacitance 60 F 60 F 5 F
Dataset Setup C Setup C Setup A

Selecting an Application

To change the application scenario, you must update the boolean flags at the top of two files: simulation_func.py (used by MAQS/MAQS++) and AsTAR_sim_func.py (used by AsTAR/AsTAR++). Set exactly one of the three flags to True and the others to False:

# In simulation_func.py AND AsTAR_sim_func.py:
SMALL_LOAD_APPLICATION = False
LIDAR_APPLICATION = True
LTEM_APPLICATION = False

You must also update the dataset loaded in each simulation runner script (e.g. MAQS_simulation.py, AsTAR_simulation.py, etc.) to match the application's dataset:

  • LiDAR / LTE-M: use datasets/datasetC_processed
  • Small load: use datasets/datasetA_processed

Usage

  1. Preprocess data (only needed once):

    python dataexpl.py
  2. Select the application by editing the flags in simulation_func.py and AsTAR_sim_func.py, and the dataset path in the simulation runner script (see Selecting an Application above).

  3. Run a simulation (e.g., MAQS):

    python MAQS_simulation.py
  4. Generate plots from saved results:

    python histogram.py
    python voltage_graphs.py

    NOTE: these scripts use the results in the res_*/ folders, whilst running new simulations will save results in the top level folder.

MAQS++ Bug fix

A bug was discovered in the MAQS++ implementation affecting the night-time derating of the optimum voltage. After fixing this, the simulation results have shifted slightly as a consequence, but MAQS++ retains its relative position among the four scheduling methods: it still delivers higher throughput than baseline MAQS in every scenario, while still trailing standard MAQS on reliability in the energy-tight LTE-M and small-load profiles. The most visible effect of the fix is that the time MAQS++ spends below the minimum and shutoff voltages drops substantially across all three applications, particularly post-training.

Changes to the results from the paper can be seen in the table below:

LiDAR application — dataset C

Period Measure Old New Δ
Full Time under V_min 8.53% 4.06% −4.47 pp
Full Time under V_shutoff 1.16% 0.68% −0.48 pp
Full Time above V_max 0.40% 0.20% −0.20 pp
Full Time above V_rating 0.20% 0.09% −0.11 pp
Full Avg. num. tasks/hour 46.70 46.21 −0.49
Full Task interval std. dev. (s) 192.22 195.06 +2.84
Excl. first 10 days Time under V_min 6.57% 2.42% −4.15 pp
Excl. first 10 days Time under V_shutoff 1.08% 0.33% −0.75 pp
Excl. first 10 days Time above V_max 0.42% 0.20% −0.22 pp
Excl. first 10 days Time above V_rating 0.21% 0.09% −0.12 pp
Excl. first 10 days Avg. num. tasks/hour 47.65 47.18 −0.47
Excl. first 10 days Task interval std. dev. (s) 192.87 195.77 +2.90
Excl. first 30 days Time under V_min 4.03% 1.38% −2.65 pp
Excl. first 30 days Time under V_shutoff 0.44% 0.04% −0.40 pp
Excl. first 30 days Time above V_max 0.45% 0.22% −0.23 pp
Excl. first 30 days Time above V_rating 0.23% 0.10% −0.13 pp
Excl. first 30 days Avg. num. tasks/hour 50.39 49.94 −0.45
Excl. first 30 days Task interval std. dev. (s) 190.02 190.94 +0.92

LTE-M application — dataset C

Period Measure Old New Δ
Full Time under V_min 14.36% 10.56% −3.80 pp
Full Time under V_shutoff 3.62% 1.73% −1.89 pp
Full Time above V_max 0.14% 0.14% 0.00 pp
Full Time above V_rating 0.01% 0.04% +0.03 pp
Full Avg. num. tasks/hour 26.39 26.31 −0.08
Full Task interval std. dev. (s) 249.15 270.39 +21.24
Excl. first 10 days Time under V_min 12.86% 8.63% −4.23 pp
Excl. first 10 days Time under V_shutoff 3.51% 1.63% −1.88 pp
Excl. first 10 days Time above V_max 0.15% 0.14% −0.01 pp
Excl. first 10 days Time above V_rating 0.01% 0.04% +0.03 pp
Excl. first 10 days Avg. num. tasks/hour 26.96 26.86 −0.10
Excl. first 10 days Task interval std. dev. (s) 249.63 273.11 +23.48
Excl. first 30 days Time under V_min 9.44% 6.11% −3.33 pp
Excl. first 30 days Time under V_shutoff 2.17% 1.15% −1.02 pp
Excl. first 30 days Time above V_max 0.16% 0.16% 0.00 pp
Excl. first 30 days Time above V_rating 0.01% 0.05% +0.04 pp
Excl. first 30 days Avg. num. tasks/hour 28.52 28.45 −0.07
Excl. first 30 days Task interval std. dev. (s) 247.70 270.57 +22.87

Small load application — dataset A

Period Measure Old New Δ
Full Time under V_min 10.68% 7.76% −2.92 pp
Full Time under V_shutoff 1.22% 0.78% −0.44 pp
Full Time above V_max 0.00% 0.00% 0.00 pp
Full Time above V_rating 0.00% 0.00% 0.00 pp
Full Avg. num. tasks/hour 17.91 17.15 −0.76
Full Task interval std. dev. (s) 314.09 310.69 −3.40
Excl. first 10 days Time under V_min 9.41% 5.77% −3.64 pp
Excl. first 10 days Time under V_shutoff 1.01% 0.80% −0.21 pp
Excl. first 10 days Time above V_max 0.00% 0.00% 0.00 pp
Excl. first 10 days Time above V_rating 0.00% 0.00% 0.00 pp
Excl. first 10 days Avg. num. tasks/hour 17.56 16.74 −0.82
Excl. first 10 days Task interval std. dev. (s) 321.13 317.75 −3.38
Excl. first 30 days Time under V_min 5.99% 4.53% −1.46 pp
Excl. first 30 days Time under V_shutoff 0.59% 0.76% +0.17 pp
Excl. first 30 days Time above V_max 0.00% 0.00% 0.00 pp
Excl. first 30 days Time above V_rating 0.00% 0.00% 0.00 pp
Excl. first 30 days Avg. num. tasks/hour 16.75 16.03 −0.72
Excl. first 30 days Task interval std. dev. (s) 337.21 324.77 −12.44

About

MAQS: Model-Augmented Q-Learning Scheduler for Energy-Harvesting IoT Devices

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages