Skip to content

RealZST/Janus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Janus: Dual-End Staging for Efficient Buffer Management in Distributed RL

The artifact of the Janus buffer management system for distributed reinforcement learning.

Overview

Janus introduces Dual-End Staging to decouple buffer management from the actor/learner control flow:

Actor Buffer → Collection → Experience Buffer → Selection → Learner Buffer

Key features:

  • Placement optimization: CPU pageable, CPU pinned, or GPU placement
  • Overlap scheduling: Collection overlaps with rollout, selection overlaps with update
  • Granularity control: Batch multiple operations to amortize overhead
  • On-policy/Off-policy support: Unified interface for both training paradigms

Requirements

On HPC (SLURM)

module load triton/2024.1-gcc gcc/12.3.0 cuda/12.2.1
module load scicomp-python-env/2025.2

Running Scripts

Running on HPC with SLURM

sbatch run_janus_quick.sh

Output: All results logged to slurm-<jobid>.out

Examples

# Compare: Compare 4 predefined configurations (baseline, overlap, granularity, full Janus)
python src/main.py compare

# Train: Single run with custom parameters
python src/main.py train --config configs/default.yaml
python src/main.py train --placement cuda --overlap_collect 1 --overlap_select 1 --g_collect 4

# Eval: Parameter sweep to analyze specific parameter effects
python src/main.py eval --sweep placement      # Test different placement strategies
python src/main.py eval --sweep granularity    # Test different g values (1,2,4,8,16)
python src/main.py eval --sweep overlap        # Test overlap effect

Configuration Options

Parameter Values Description
actor_device cpu, cuda Device for actor rollout
placement cpu_pageable, cpu_pinned, cuda Experience buffer placement
overlap_collect 0, 1 Overlap collection with rollout
overlap_select 0, 1 Overlap selection with update
g_collect ≥1 (off-policy), k/M (on-policy) Collection granularity
g_select ≥1 (off-policy), k/M (on-policy) Selection granularity
is_on_policy true, false Training algorithm type

Project Structure

janus/
├── run_janus.sh                     # SLURM script: Full experiments
├── run_janus_quick.sh               # SLURM script: Quick test
├── configs/
│   └── default.yaml                 # Default configuration
├── src/
│   ├── main.py                      # Entry point
│   ├── train.py                     # Training loop
│   ├── eval.py                      # Evaluation benchmark
│   ├── janus/
│   │   ├── buffer.py                # JanusBuffer (orchestrator)
│   │   ├── staging.py               # ActorBuffer + LearnerBuffer
│   │   ├── experience_buffer.py     # ExperienceBuffer
│   │   ├── operators.py             # CollectionOp + SelectionOp
│   │   ├── transfer.py              # Device transfer utilities
│   │   └── config.py                # Configuration dataclasses
│   └── workload/
│       ├── actor.py                 # Actor simulation
│       └── learner.py               # Learner simulation

Key Components

Actor Buffer

Producer-side staging that decouples actor rollout from collection.

  • Bounded queue with backpressure
  • g_collect controls how many rollouts are pulled per collection invocation

Experience Buffer

Central storage for RL experiences.

  • Supports CPU pageable, CPU pinned, or GPU placement
  • Circular buffer with efficient insert/sample operations
  • Thread-safe for concurrent access

Learner Buffer

Consumer-side staging that decouples selection from learner update.

  • Prefetches prepared batches
  • g_select controls how many batches are prepared per selection invocation

Collection Operator

Pulls from Actor Buffer, processes data, writes to Experience Buffer.

  • Concatenates multiple rollouts (granularity)
  • Handles device transfers based on placement
  • Actual tensor operations (cat, reshape, preprocess)

Selection Operator

Samples from Experience Buffer, prepares batches, writes to Learner Buffer.

  • Random sampling (off-policy) or sequential iteration (on-policy)
  • Shuffle and batch assembly
  • Transfers to learner device (CUDA)

Configuration Examples

Off-policy

workload:
  is_on_policy: false
  batch_size: 256
  buffer_capacity: 100000

janus:
  placement: cuda
  overlap_collect: 1
  overlap_select: 1
  g_collect: 4
  g_select: 4

On-policy

workload:
  is_on_policy: true
  num_minibatches: 4
  batch_size: 256

janus:
  placement: cpu_pinned
  overlap_collect: 1
  overlap_select: 1
  g_collect: 0.25    # 1/M
  g_select: 0.25     # 1/M

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published