The artifact of the Janus buffer management system for distributed reinforcement learning.
Janus introduces Dual-End Staging to decouple buffer management from the actor/learner control flow:
Actor Buffer → Collection → Experience Buffer → Selection → Learner Buffer
Key features:
- Placement optimization: CPU pageable, CPU pinned, or GPU placement
- Overlap scheduling: Collection overlaps with rollout, selection overlaps with update
- Granularity control: Batch multiple operations to amortize overhead
- On-policy/Off-policy support: Unified interface for both training paradigms
module load triton/2024.1-gcc gcc/12.3.0 cuda/12.2.1
module load scicomp-python-env/2025.2sbatch run_janus_quick.shOutput: All results logged to slurm-<jobid>.out
# Compare: Compare 4 predefined configurations (baseline, overlap, granularity, full Janus)
python src/main.py compare
# Train: Single run with custom parameters
python src/main.py train --config configs/default.yaml
python src/main.py train --placement cuda --overlap_collect 1 --overlap_select 1 --g_collect 4
# Eval: Parameter sweep to analyze specific parameter effects
python src/main.py eval --sweep placement # Test different placement strategies
python src/main.py eval --sweep granularity # Test different g values (1,2,4,8,16)
python src/main.py eval --sweep overlap # Test overlap effect| Parameter | Values | Description |
|---|---|---|
actor_device |
cpu, cuda |
Device for actor rollout |
placement |
cpu_pageable, cpu_pinned, cuda |
Experience buffer placement |
overlap_collect |
0, 1 |
Overlap collection with rollout |
overlap_select |
0, 1 |
Overlap selection with update |
g_collect |
≥1 (off-policy), k/M (on-policy) |
Collection granularity |
g_select |
≥1 (off-policy), k/M (on-policy) |
Selection granularity |
is_on_policy |
true, false |
Training algorithm type |
janus/
├── run_janus.sh # SLURM script: Full experiments
├── run_janus_quick.sh # SLURM script: Quick test
├── configs/
│ └── default.yaml # Default configuration
├── src/
│ ├── main.py # Entry point
│ ├── train.py # Training loop
│ ├── eval.py # Evaluation benchmark
│ ├── janus/
│ │ ├── buffer.py # JanusBuffer (orchestrator)
│ │ ├── staging.py # ActorBuffer + LearnerBuffer
│ │ ├── experience_buffer.py # ExperienceBuffer
│ │ ├── operators.py # CollectionOp + SelectionOp
│ │ ├── transfer.py # Device transfer utilities
│ │ └── config.py # Configuration dataclasses
│ └── workload/
│ ├── actor.py # Actor simulation
│ └── learner.py # Learner simulation
Producer-side staging that decouples actor rollout from collection.
- Bounded queue with backpressure
g_collectcontrols how many rollouts are pulled per collection invocation
Central storage for RL experiences.
- Supports CPU pageable, CPU pinned, or GPU placement
- Circular buffer with efficient insert/sample operations
- Thread-safe for concurrent access
Consumer-side staging that decouples selection from learner update.
- Prefetches prepared batches
g_selectcontrols how many batches are prepared per selection invocation
Pulls from Actor Buffer, processes data, writes to Experience Buffer.
- Concatenates multiple rollouts (granularity)
- Handles device transfers based on placement
- Actual tensor operations (cat, reshape, preprocess)
Samples from Experience Buffer, prepares batches, writes to Learner Buffer.
- Random sampling (off-policy) or sequential iteration (on-policy)
- Shuffle and batch assembly
- Transfers to learner device (CUDA)
workload:
is_on_policy: false
batch_size: 256
buffer_capacity: 100000
janus:
placement: cuda
overlap_collect: 1
overlap_select: 1
g_collect: 4
g_select: 4workload:
is_on_policy: true
num_minibatches: 4
batch_size: 256
janus:
placement: cpu_pinned
overlap_collect: 1
overlap_select: 1
g_collect: 0.25 # 1/M
g_select: 0.25 # 1/M