Representation over Routing: Overcoming Surrogate Hacking in Multi-Timescale PPO

This repository contains the official codebase, pre-trained weights, and evaluation environments for the preprint: "Representation over Routing: Overcoming Surrogate Hacking in Multi-Timescale PPO". We provide a minimal, standalone reproducible example (MRE) using standard MLPs on LunarLander-v2 to demonstrate the pathology of surrogate hacking and our proposed solution.

🚀 TL;DR

We identify and formalize two severe optimization pathologies in multi-timescale RL: Surrogate Objective Hacking (exploiting short-term shaping rewards at the expense of the true objective) and the Paradox of Temporal Uncertainty (irreversible myopic degeneration caused by gradient-free variance routing).

To overcome these fundamental vulnerabilities, we introduce Target Decoupling, a novel architectural and algorithmic intervention that disentangles representation learning from temporal routing, allowing the agent to align with the true long-term objective (γ = 0.999) without collapsing into short-term behavioral traps.

🎥 Visual Proof: The Ablation Journey

The core contribution of this work is isolating and systematically solving the pathologies of multi-timescale learning. The comparison between Stage 1 and Stage 4 is particularly striking: while the baseline is paralyzed by the fear of crashing and greedy hoarding of small centering rewards, our decoupled agent acts with true foresight.

Stage 1: Baseline	Stage 2: Surrogate Hacking

Hovering & Wasting Fuel The agent falls into a local optimum. Out of fear of crashing, it hovers endlessly, wasting fuel and failing the main objective just to hoard small, short-term shaping rewards (centering).	Surrogate Objective Hacking Attempting to route values dynamically across different timescales via Actor-driven attention leads to gradient exploitation. The policy collapses as it artificially minimizes the surrogate loss by manipulating attention weights rather than improving physical control.
Stage 3: Temporal Paradox	Stage 4: Target Decoupling

Aimless Wandering The agent suffers from temporal uncertainty. Unable to confidently attribute credit over long horizons, it fails to commit to a landing strategy and wanders aimlessly above the landing pad.	Intelligent Landing The agent uncovers true intelligence by decoupling the target. It understands the ultimate long-term goal (γ = 0.999) and executes a highly fuel-efficient, safe landing, smartly ignoring the strict need to be perfectly centered if it means saving fuel.

📂 Repository Structure

The repository is structured to perfectly mirror our 4-stage ablation study. Each stage is completely standalone, strictly utilizing standard MLPs to ensure clarity and ease of reproducibility.

.
├── 1_baseline.py                      # Stage 1: Standard PPO Baseline
├── 2_surrogate_hacking_attention.py   # Stage 2: Introduction of multi-timescale collapse
├── 3_temporal_paradox_variance.py     # Stage 3: Attempted variance reduction
├── 4_target_decoupling_final.py       # Stage 4: Proposed Target Decoupling architecture
├── 5_evaluate_seeds_plot.py           # Multi-seed evaluation and plotting script
├── record_1_baseline.py               # Evaluation script for Stage 1
├── record_2_surrogate.py              # Evaluation script for Stage 2
├── record_3_paradox.py                # Evaluation script for Stage 3
├── record_4_decoupling.py             # Evaluation script for Stage 4
├── weights_stage_1.pth                # Pre-trained weights for Baseline
├── weights_stage_2.pth                # Pre-trained weights for Surrogate Hacking
├── weights_stage_3.pth                # Pre-trained weights for Temporal Paradox
├── weights_stage_4.pth                # Pre-trained weights for Target Decoupling
└── docs/                              # Assets (GIFs, etc.)
    ├── baseline_hovering.gif
    ├── seed_comparison_plot.png
    ├── surrogate_hacking_crash.gif
    ├── temporal_paradox_wandering.gif
    └── target_decoupling_landing.gif

🛠️ Quick Start

Evaluating the pre-trained models is designed to be frictionless.

Install Dependencies
```
pip install -r requirements.txt
```
Evaluate the Proposed Solution (Stage 4) See the Target Decoupling agent elegantly solve the environment:
```
python record_4_decoupling.py
```
Observe the Baseline Pathology (Stage 1) Contrast it by watching the baseline agent frantically hover and waste fuel:
```
python record_1_baseline.py
```
Multi-Seed Evaluation Run the full comparison across 5 random seeds to reproduce the statistical significance plots:
```
python 5_evaluate_seeds_plot.py
```

Note: You can run any of the standalone X_*.py scripts to train the given stage from scratch.

📊 Statistical Significance

To rigorously validate our claims, we evaluate the Target Decoupling architecture against the Baseline over multiple random seeds (n=5). The Target Decoupling agent consistently solves the environment with minimal variance, easily eliminating the failure modes and escaping hovering local optima.

📖 Citation

If you find this code or our insights useful in your research, please consider citing our work:

@misc{sunRepresentationRoutingOvercoming2026b,
  title = {Representation over {{Routing}}: {{Overcoming Surrogate Hacking}} in {{Multi-Timescale PPO}}},
  shorttitle = {Representation over {{Routing}}},
  author = {Sun, Jing},
  year = 2026,
  publisher = {arXiv},
  doi = {10.48550/ARXIV.2604.13517},
  urldate = {2026-04-16},
  copyright = {Creative Commons Attribution 4.0 International},
  keywords = {Artificial Intelligence (cs.AI),FOS: Computer and information sciences,Machine Learning (cs.LG)}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Representation over Routing: Overcoming Surrogate Hacking in Multi-Timescale PPO

🚀 TL;DR

🎥 Visual Proof: The Ablation Journey

📂 Repository Structure

🛠️ Quick Start

📊 Statistical Significance

📖 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
docs		docs
1_baseline.py		1_baseline.py
2_surrogate_hacking_attention.py		2_surrogate_hacking_attention.py
3_temporal_paradox_variance.py		3_temporal_paradox_variance.py
4_target_decoupling_final.py		4_target_decoupling_final.py
5_evaluate_seeds_plot.py		5_evaluate_seeds_plot.py
README.md		README.md
record_1_baseline.py		record_1_baseline.py
record_2_surrogate.py		record_2_surrogate.py
record_3_paradox.py		record_3_paradox.py
record_4_decoupling.py		record_4_decoupling.py
requirements.txt		requirements.txt
weights_stage_1.pth		weights_stage_1.pth
weights_stage_2.pth		weights_stage_2.pth
weights_stage_3.pth		weights_stage_3.pth
weights_stage_4.pth		weights_stage_4.pth

Folders and files

Latest commit

History

Repository files navigation

Representation over Routing: Overcoming Surrogate Hacking in Multi-Timescale PPO

🚀 TL;DR

🎥 Visual Proof: The Ablation Journey

📂 Repository Structure

🛠️ Quick Start

📊 Statistical Significance

📖 Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages