ABI-PPO

ARCbound Intelligence — Proximal Policy Optimization

A PyTorch reinforcement-learning pipeline for training real-time multiplayer game AI. Built for and proven in the Arcbound arena shooter.

What it solves

Standard PPO struggles in real-time multiplayer settings where:

Scalar rewards flatten distinct skills (survival vs combat vs strategy) into one noisy signal
Shared optimizer state lets the critic's massive value loss drown out the actor's tiny policy gradient
Credit assignment over ~10,000-tick trajectories gives the agent no chance to learn anything

ABI-PPO fixes all three:

Decoupled optimizers — actor and critic each get an independent Adam optimizer with its own learning rate. No momentum corruption between the two.
Reward decomposition — rewards split into three orthogonal channels (survival / combat / strategy). Each channel's advantages are normalized independently.
Staged curriculum — three training phases (Movement → Combat → Strategy), each mastering one skill before unlocking the next. Collapses credit assignment from ~10,000 ticks to ~200 per phase.

Measured results vs legacy PPO

	Legacy `rl_train.py`	ABI-PPO
Epochs to convergence	800+ (did not converge)	15
Value-function EV	-3.3 (useless)	0.49 (predictive)
Policy gradient strength	baseline	3–4× stronger
One-phase value loss	stuck at ~945	420 → 0.9

Usage

python abi_ppo.py --epochs 150 --phase all
python abi_ppo.py --resume models/checkpoint_v6.pt --epochs 100
python abi_ppo.py --resume models/checkpoint_v6.pt --phase combat --epochs 50
python abi_ppo.py --info

Architecture (v6 — 2026-04-19)

Observation: 270-dim float vector (RLState.ts v6 layout)
  [0-7]     Self: x, y, vx, vy, hp, energy, rotation, alive
  [8-10]    Ammo: missile, bouncy, grenade
  [11-16]   Weapon economy (v6): 4 affordability flags + laserBudget + energyRegenMult
  [17-34]   Flags (18): carrying, nearest-flag, pole/carrier/escort dirs, role one-hot
  [35-76]   Enemies: 7 × (dx, dy, vx, vy, hp, alive)
  [77]      Nearest-enemy angle error
  [78-127]  Projectiles: 10 × (dx, dy, vx, vy, threat)
  [128-151] Teammates: 4 × (dx, dy, vx, vy, hp, carrying)
  [152-155] Game state: scoreDiff, nearestAllyDist, timeAlive, roundProgress
  [156-269] Tile awareness: viewport grid + raycasts (114)

Actor:  270 → 256 → 256 (LayerNorm + ReLU) → { move(9), fire(2), aim(128) }
Critic: 270 → 256 → 256 (LayerNorm + ReLU) → value scalar

Separate backbones, separate optimizers, separate learning rates.

v6 aim semantics: the 128-bin aim head outputs a target-relative offset (±90° around a reference angle, 1.4°/bin). Reference = nearest-visible-enemy direction if one exists, else self.rotation. The policy learns corrections to rule-based lead aim rather than absolute aim direction — rlInfluence (0–0.45) becomes a meaningful precision dial.

Files

abi_ppo.py — the ABI-PPO training system (1511 LOC, v6)
rl_train.py — legacy PPO trainer, kept for comparison

Requirements

Python 3.9+
NumPy
PyTorch (CUDA wheel recommended — training runs on GPU)

See requirements.txt. Pick the PyTorch build that matches your CUDA toolchain from https://pytorch.org/get-started/locally/.

Status

Extracted from the Arcbound game where it trains the in-game AI. v6 policy deployed 2026-04-19; 103 epochs in ~23 min produced a converged three-phase policy on a single GPU.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
abi_ppo.py		abi_ppo.py
requirements.txt		requirements.txt
rl_train.py		rl_train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ABI-PPO

What it solves

Measured results vs legacy PPO

Usage

Architecture (v6 — 2026-04-19)

Files

Requirements

Status

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ABI-PPO

What it solves

Measured results vs legacy PPO

Usage

Architecture (v6 — 2026-04-19)

Files

Requirements

Status

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages