Fix R-GSP-N: LSTM hidden state + vectorized learning + stored state (R2D2) by jdbloom · Pull Request #20 · NESTLab/GSP-RL

jdbloom · 2026-04-08T19:30:43Z

Summary

Three-part fix for the R-GSP-N recurrent pipeline:

1. LSTM Hidden State Fix

EnvironmentEncoder.forward() now returns (output, (h_n, c_n)) and accepts hidden=(h_0, c_0). Previously discarded hidden state every call (lstm_out, _ = ...), meaning R-GSP-N had NO temporal memory beyond the sliding observation window. Supports both single (seq_len, input) and batched (batch, seq_len, input) inputs.

2. SequenceReplayBuffer Stores Hidden State (R2D2-style)

New hidden_size and num_layers params. When enabled, stores (h, c) at sequence boundaries via set_sequence_hidden(). sample_buffer() returns 7 values (s, a, r, s_, d, h, c) instead of 5. Backward compatible — buffers without hidden_size work as before.

3. Vectorized learn_RDDPG with Burn-in

Replaces per-sample loop (1,500ms) with single batched forward/backward pass (<200ms).

Burn-in: first half of sequence rebuilds hidden state with T.no_grad()
Training: second half computes losses on last timestep
Target computation under T.no_grad() for gradient isolation

Reference: R2D2 (Kapturowski et al., ICLR 2019) — stored state + burn-in.

Performance

Before: learn_RDDPG: 1,500ms per call (200x slower than DDPG)
After: <200ms per call (~10x improvement, on par with standard learning)

Tests

7 new LSTM hidden state tests (carry, batch, backward)
7 new buffer hidden state tests (store, sample, backward compat)
4 new vectorized RDDPG tests (correctness, speed, multi-step)
223/223 total tests pass

Breaking Changes

EnvironmentEncoder.forward() returns tuple (output, hidden) instead of single tensor
RDDPGActorNetwork.forward() returns (mu, hidden) instead of mu
RDDPGCriticNetwork.forward() returns (q_value, hidden) instead of q_value
All callers updated. Downstream RL-CT needs separate PR.

🤖 Generated with Claude Code

… RDDPG networks EnvironmentEncoder.forward now accepts optional (h_0, c_0) and returns (output, (h_n, c_n)) instead of discarding hidden state. This enables temporal memory across timesteps for RDDPG. Both RDDPGActorNetwork and RDDPGCriticNetwork forward methods updated to pass through hidden state. All callers in learn_RDDPG, DDPG_choose_action, and test files updated to unpack the new tuple return value. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…yle) Add optional hidden_size/num_layers params to SequenceReplayBuffer so the LSTM (h, c) state at the start of each sequence is stored alongside the SARSD data. sample_buffer returns a 7-tuple when hidden storage is enabled and a 5-tuple otherwise, preserving full backward compatibility. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Replace per-sample loop in learn_RDDPG with batched implementation using R2D2-style burn-in. Splits sequences into burn-in prefix (first half) and training suffix (second half), refreshes LSTM hidden state during burn-in with no_grad, then computes critic/actor loss on the last timestep of the training suffix. Also updates sample_memory to handle 7-value returns from SequenceReplayBuffer (with hidden states) and passes hidden_size/num_layers to SequenceReplayBuffer in build_gsp_network. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…G works on MPS (80ms vs 150ms CPU)

Joshua Bloom and others added 4 commits April 8, 2026 15:19

feat(device): remove MPS CPU fallback for recurrent — vectorized RDDP…

2074805

…G works on MPS (80ms vs 150ms CPU)

jdbloom merged commit 3b63d87 into main Apr 9, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix R-GSP-N: LSTM hidden state + vectorized learning + stored state (R2D2)#20

Fix R-GSP-N: LSTM hidden state + vectorized learning + stored state (R2D2)#20
jdbloom merged 4 commits intomainfrom
feat/rddpg-lstm-fix

jdbloom commented Apr 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jdbloom commented Apr 8, 2026

Summary

1. LSTM Hidden State Fix

2. SequenceReplayBuffer Stores Hidden State (R2D2-style)

3. Vectorized learn_RDDPG with Burn-in

Performance

Tests

Breaking Changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant