Janus: Dual-End Staging for Efficient Buffer Management in Distributed RL

The artifact of the Janus buffer management system for distributed reinforcement learning.

Overview

Janus introduces Dual-End Staging to decouple buffer management from the actor/learner control flow:

Actor Buffer → Collection → Experience Buffer → Selection → Learner Buffer

Key features:

Placement optimization: CPU pageable, CPU pinned, or GPU placement
Overlap scheduling: Collection overlaps with rollout, selection overlaps with update
Granularity control: Batch multiple operations to amortize overhead
On-policy/Off-policy support: Unified interface for both training paradigms

Requirements

On HPC (SLURM)

module load triton/2024.1-gcc gcc/12.3.0 cuda/12.2.1
module load scicomp-python-env/2025.2

Running Scripts

Running on HPC with SLURM

sbatch run_janus_quick.sh

Output: All results logged to slurm-<jobid>.out

Examples

# Compare: Compare 4 predefined configurations (baseline, overlap, granularity, full Janus)
python src/main.py compare

# Train: Single run with custom parameters
python src/main.py train --config configs/default.yaml
python src/main.py train --placement cuda --overlap_collect 1 --overlap_select 1 --g_collect 4

# Eval: Parameter sweep to analyze specific parameter effects
python src/main.py eval --sweep placement      # Test different placement strategies
python src/main.py eval --sweep granularity    # Test different g values (1,2,4,8,16)
python src/main.py eval --sweep overlap        # Test overlap effect

Configuration Options

Parameter	Values	Description
`actor_device`	`cpu`, `cuda`	Device for actor rollout
`placement`	`cpu_pageable`, `cpu_pinned`, `cuda`	Experience buffer placement
`overlap_collect`	`0`, `1`	Overlap collection with rollout
`overlap_select`	`0`, `1`	Overlap selection with update
`g_collect`	`≥1` (off-policy), `k/M` (on-policy)	Collection granularity
`g_select`	`≥1` (off-policy), `k/M` (on-policy)	Selection granularity
`is_on_policy`	`true`, `false`	Training algorithm type

Project Structure

janus/
├── run_janus.sh                     # SLURM script: Full experiments
├── run_janus_quick.sh               # SLURM script: Quick test
├── configs/
│   └── default.yaml                 # Default configuration
├── src/
│   ├── main.py                      # Entry point
│   ├── train.py                     # Training loop
│   ├── eval.py                      # Evaluation benchmark
│   ├── janus/
│   │   ├── buffer.py                # JanusBuffer (orchestrator)
│   │   ├── staging.py               # ActorBuffer + LearnerBuffer
│   │   ├── experience_buffer.py     # ExperienceBuffer
│   │   ├── operators.py             # CollectionOp + SelectionOp
│   │   ├── transfer.py              # Device transfer utilities
│   │   └── config.py                # Configuration dataclasses
│   └── workload/
│       ├── actor.py                 # Actor simulation
│       └── learner.py               # Learner simulation

Key Components

Actor Buffer

Producer-side staging that decouples actor rollout from collection.

Bounded queue with backpressure
g_collect controls how many rollouts are pulled per collection invocation

Experience Buffer

Central storage for RL experiences.

Supports CPU pageable, CPU pinned, or GPU placement
Circular buffer with efficient insert/sample operations
Thread-safe for concurrent access

Learner Buffer

Consumer-side staging that decouples selection from learner update.

Prefetches prepared batches
g_select controls how many batches are prepared per selection invocation

Collection Operator

Pulls from Actor Buffer, processes data, writes to Experience Buffer.

Concatenates multiple rollouts (granularity)
Handles device transfers based on placement
Actual tensor operations (cat, reshape, preprocess)

Selection Operator

Samples from Experience Buffer, prepares batches, writes to Learner Buffer.

Random sampling (off-policy) or sequential iteration (on-policy)
Shuffle and batch assembly
Transfers to learner device (CUDA)

Configuration Examples

Off-policy

workload:
  is_on_policy: false
  batch_size: 256
  buffer_capacity: 100000

janus:
  placement: cuda
  overlap_collect: 1
  overlap_select: 1
  g_collect: 4
  g_select: 4

On-policy

workload:
  is_on_policy: true
  num_minibatches: 4
  batch_size: 256

janus:
  placement: cpu_pinned
  overlap_collect: 1
  overlap_select: 1
  g_collect: 0.25    # 1/M
  g_select: 0.25     # 1/M

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
configs		configs
src		src
.gitignore		.gitignore
README.md		README.md
run_janus.sh		run_janus.sh
run_janus_quick.sh		run_janus_quick.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Janus: Dual-End Staging for Efficient Buffer Management in Distributed RL

Overview

Requirements

On HPC (SLURM)

Running Scripts

Running on HPC with SLURM

Examples

Configuration Options

Project Structure

Key Components

Actor Buffer

Experience Buffer

Learner Buffer

Collection Operator

Selection Operator

Configuration Examples

Off-policy

On-policy

About

Uh oh!

Releases

Packages

Languages

RealZST/Janus

Folders and files

Latest commit

History

Repository files navigation

Janus: Dual-End Staging for Efficient Buffer Management in Distributed RL

Overview

Requirements

On HPC (SLURM)

Running Scripts

Running on HPC with SLURM

Examples

Configuration Options

Project Structure

Key Components

Actor Buffer

Experience Buffer

Learner Buffer

Collection Operator

Selection Operator

Configuration Examples

Off-policy

On-policy

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages