This repository contains the simulation code accompanying the paper:
MAQS: Model-Augmented Q-Learning Scheduler for Energy-Harvesting IoT Devices Brendan J. Mackenzie, Max Sebrechts, Koustabh Dolui, Sam Michiels, Danny Hughes DistriNet, KU Leuven - Accepted at IEEE WoWMoM 2026
MAQS is a fully on-device Reinforcement Learning (RL) approach for dynamic task scheduling on energy-harvesting Internet of Things (IoT) devices. Unlike existing RL methods that require off-device training in simulated environments, MAQS devices continually learn and adapt during deployment. The energy management problem is formulated as a Markov Decision Process (MDP), where the IoT device acts as an agent that simultaneously performs tasks and learns an optimal scheduling policy through Q-learning.
Key contributions:
- On-device RL: MAQS operates entirely on-device, requiring only 324 bytes of RAM for its Q-table and negligible computation, making it viable on the most resource-constrained hardware.
- Phantom Q-updates: A novel mechanism that updates multiple Q-values from a single experience, significantly accelerating convergence.
- No manual tuning: Unlike AsTAR/AsTAR++, which require careful scenario-specific calibration, MAQS replaces expert-driven parameter tuning with automated learning.
Four scheduling algorithms are compared in this simulation:
- MAQS: Q-learning based scheduler with model-augmented updates
- MAQS++: MAQS with diurnal (day/night) optimisation using separate Q-tables
- AsTAR: State-of-the-art prediction-free baseline
- AsTAR++: AsTAR with diurnal optimisation
The simulation framework uses a discrete-time model populated with real-world environmental data and hardware specifications. The hardware model is based on the Circuit Dojo nRF9160 Feather, powered by monocrystalline solar panels (7 cm × 5 cm, 17% efficiency) and supercapacitors (5F or 60F depending on the application). The model accounts for equivalent series resistance (ESR), leakage current, and DC-DC converter efficiency (Texas Instruments TPS62840). Environmental conditions use the EnHANTs long-term light irradiance dataset from Columbia University.
- Qlearning.py: Q-learning agent implementation (Q-table, reward function, epsilon-greedy exploration, phantom Q-update)
- simulation_func.py: Core simulation functions for MAQS: voltage computation with ESR model during task/sleep cycles
- AsTAR_sim_func.py: Core simulation functions for AsTAR: continuous-time task+sleep simulation with irradiance lookup
- MAQS_simulation.py: MAQS simulation runner
- MAQS++_simulation.py: MAQS++ simulation runner (adds night-mode optimisation)
- AsTAR_simulation.py: AsTAR simulation runner
- AsTAR++_simulation.py: AsTAR++ simulation runner (adds night-mode optimisation)
- dataexpl.py: Dataset preprocessing script (handles missing values, aggregates to 15-min intervals)
- histogram.py: Generates task interval histograms from saved results
- voltage_graphs.py: Generates voltage and irradiance comparison plots from saved results
data/: Raw irradiance measurement files (Setups A–F from the EnHANTs dataset)datasets/: Processed datasets (15-min averaged irradiance values)res_small/,res_lidar/,res_ltem/: Saved simulation results (voltages, intervals) for each applicationhistograms/: Generated histogram PDFsvoltage_graphs/: Generated voltage comparison PDFs
Three application scenarios are defined, as described in Table II of the paper:
| Specification | LiDAR | LTE-M | Small load |
|---|---|---|---|
| Sensing current | 122 mA | 218 μA | 5 mA |
| Sensing duration | 1.5 s | 0.271 s | 1.5 s |
| Transmission current | 0 | 316.99 mA | 0 |
| Transmission duration | 0 | 1 s | 0 |
| Sleep current | 2.2 μA | 61 μA | 2.2 μA |
| Capacitance | 60 F | 60 F | 5 F |
| Dataset | Setup C | Setup C | Setup A |
To change the application scenario, you must update the boolean flags at the top of two files: simulation_func.py (used by MAQS/MAQS++) and AsTAR_sim_func.py (used by AsTAR/AsTAR++). Set exactly one of the three flags to True and the others to False:
# In simulation_func.py AND AsTAR_sim_func.py:
SMALL_LOAD_APPLICATION = False
LIDAR_APPLICATION = True
LTEM_APPLICATION = FalseYou must also update the dataset loaded in each simulation runner script (e.g. MAQS_simulation.py, AsTAR_simulation.py, etc.) to match the application's dataset:
- LiDAR / LTE-M: use
datasets/datasetC_processed - Small load: use
datasets/datasetA_processed
-
Preprocess data (only needed once):
python dataexpl.py
-
Select the application by editing the flags in
simulation_func.pyandAsTAR_sim_func.py, and the dataset path in the simulation runner script (see Selecting an Application above). -
Run a simulation (e.g., MAQS):
python MAQS_simulation.py
-
Generate plots from saved results:
python histogram.py python voltage_graphs.py
NOTE: these scripts use the results in the
res_*/folders, whilst running new simulations will save results in the top level folder.
A bug was discovered in the MAQS++ implementation affecting the night-time derating of the optimum voltage. After fixing this, the simulation results have shifted slightly as a consequence, but MAQS++ retains its relative position among the four scheduling methods: it still delivers higher throughput than baseline MAQS in every scenario, while still trailing standard MAQS on reliability in the energy-tight LTE-M and small-load profiles. The most visible effect of the fix is that the time MAQS++ spends below the minimum and shutoff voltages drops substantially across all three applications, particularly post-training.
Changes to the results from the paper can be seen in the table below:
| Period | Measure | Old | New | Δ |
|---|---|---|---|---|
| Full | Time under V_min | 8.53% | 4.06% | −4.47 pp |
| Full | Time under V_shutoff | 1.16% | 0.68% | −0.48 pp |
| Full | Time above V_max | 0.40% | 0.20% | −0.20 pp |
| Full | Time above V_rating | 0.20% | 0.09% | −0.11 pp |
| Full | Avg. num. tasks/hour | 46.70 | 46.21 | −0.49 |
| Full | Task interval std. dev. (s) | 192.22 | 195.06 | +2.84 |
| Excl. first 10 days | Time under V_min | 6.57% | 2.42% | −4.15 pp |
| Excl. first 10 days | Time under V_shutoff | 1.08% | 0.33% | −0.75 pp |
| Excl. first 10 days | Time above V_max | 0.42% | 0.20% | −0.22 pp |
| Excl. first 10 days | Time above V_rating | 0.21% | 0.09% | −0.12 pp |
| Excl. first 10 days | Avg. num. tasks/hour | 47.65 | 47.18 | −0.47 |
| Excl. first 10 days | Task interval std. dev. (s) | 192.87 | 195.77 | +2.90 |
| Excl. first 30 days | Time under V_min | 4.03% | 1.38% | −2.65 pp |
| Excl. first 30 days | Time under V_shutoff | 0.44% | 0.04% | −0.40 pp |
| Excl. first 30 days | Time above V_max | 0.45% | 0.22% | −0.23 pp |
| Excl. first 30 days | Time above V_rating | 0.23% | 0.10% | −0.13 pp |
| Excl. first 30 days | Avg. num. tasks/hour | 50.39 | 49.94 | −0.45 |
| Excl. first 30 days | Task interval std. dev. (s) | 190.02 | 190.94 | +0.92 |
| Period | Measure | Old | New | Δ |
|---|---|---|---|---|
| Full | Time under V_min | 14.36% | 10.56% | −3.80 pp |
| Full | Time under V_shutoff | 3.62% | 1.73% | −1.89 pp |
| Full | Time above V_max | 0.14% | 0.14% | 0.00 pp |
| Full | Time above V_rating | 0.01% | 0.04% | +0.03 pp |
| Full | Avg. num. tasks/hour | 26.39 | 26.31 | −0.08 |
| Full | Task interval std. dev. (s) | 249.15 | 270.39 | +21.24 |
| Excl. first 10 days | Time under V_min | 12.86% | 8.63% | −4.23 pp |
| Excl. first 10 days | Time under V_shutoff | 3.51% | 1.63% | −1.88 pp |
| Excl. first 10 days | Time above V_max | 0.15% | 0.14% | −0.01 pp |
| Excl. first 10 days | Time above V_rating | 0.01% | 0.04% | +0.03 pp |
| Excl. first 10 days | Avg. num. tasks/hour | 26.96 | 26.86 | −0.10 |
| Excl. first 10 days | Task interval std. dev. (s) | 249.63 | 273.11 | +23.48 |
| Excl. first 30 days | Time under V_min | 9.44% | 6.11% | −3.33 pp |
| Excl. first 30 days | Time under V_shutoff | 2.17% | 1.15% | −1.02 pp |
| Excl. first 30 days | Time above V_max | 0.16% | 0.16% | 0.00 pp |
| Excl. first 30 days | Time above V_rating | 0.01% | 0.05% | +0.04 pp |
| Excl. first 30 days | Avg. num. tasks/hour | 28.52 | 28.45 | −0.07 |
| Excl. first 30 days | Task interval std. dev. (s) | 247.70 | 270.57 | +22.87 |
| Period | Measure | Old | New | Δ |
|---|---|---|---|---|
| Full | Time under V_min | 10.68% | 7.76% | −2.92 pp |
| Full | Time under V_shutoff | 1.22% | 0.78% | −0.44 pp |
| Full | Time above V_max | 0.00% | 0.00% | 0.00 pp |
| Full | Time above V_rating | 0.00% | 0.00% | 0.00 pp |
| Full | Avg. num. tasks/hour | 17.91 | 17.15 | −0.76 |
| Full | Task interval std. dev. (s) | 314.09 | 310.69 | −3.40 |
| Excl. first 10 days | Time under V_min | 9.41% | 5.77% | −3.64 pp |
| Excl. first 10 days | Time under V_shutoff | 1.01% | 0.80% | −0.21 pp |
| Excl. first 10 days | Time above V_max | 0.00% | 0.00% | 0.00 pp |
| Excl. first 10 days | Time above V_rating | 0.00% | 0.00% | 0.00 pp |
| Excl. first 10 days | Avg. num. tasks/hour | 17.56 | 16.74 | −0.82 |
| Excl. first 10 days | Task interval std. dev. (s) | 321.13 | 317.75 | −3.38 |
| Excl. first 30 days | Time under V_min | 5.99% | 4.53% | −1.46 pp |
| Excl. first 30 days | Time under V_shutoff | 0.59% | 0.76% | +0.17 pp |
| Excl. first 30 days | Time above V_max | 0.00% | 0.00% | 0.00 pp |
| Excl. first 30 days | Time above V_rating | 0.00% | 0.00% | 0.00 pp |
| Excl. first 30 days | Avg. num. tasks/hour | 16.75 | 16.03 | −0.72 |
| Excl. first 30 days | Task interval std. dev. (s) | 337.21 | 324.77 | −12.44 |