Validate the PPO route for bounded continuous control in DroneTarget-v0.
Acceptance criteria:
- spike identifies whether PPO should be implemented in TS first, Rust Burn, or hybrid;
- rollout batching, advantage estimation, policy/value update and checkpoint shape are documented;
- minimal experiment produces useful metrics even if learning quality is rough;
- decision notes list blockers before full implementation.
Validate the PPO route for bounded continuous control in DroneTarget-v0.
Acceptance criteria: