vLLM V1 migration by rafapi · Pull Request #137 · ServiceNow/PipelineRL

rafapi · 2026-04-22T12:29:09Z

Upgrade to vLLMv1 and restore vLLM V0 behaviour on the vLLM V1 rollout path

reward	logprobs

The logprobs shows the discrepancy with the initial vanilla vllm upgrade (green line)

rafapi · 2026-04-22T13:13:36Z

@bigximik FYI

rafapi · 2026-04-22T13:20:04Z

+                chunk_probs = torch.exp(chunk_logprobs)
+                entropy -= (chunk_probs * chunk_logprobs).sum(dim=-1)
+
+    del logits, selected_logits, log_norm


all the chamges abpve are an old bugfix to ensure we don't materialise the vocab tensors by default and taking a lot of VRAM for no reason

rafapi · 2026-04-22T13:21:32Z

+                "NCCL_PROTO", "NCCL_ALGO", "NCCL_NTHREADS", "NCCL_SOCKET_NTHREADS",
+            ):
+                os.environ.pop(_k, None)
+


This stuff above is not required but since i tested batch invariance i decided to leave it for completion

The merge added a pause_generation/resume_generation wrap to both HTTP and fast-llm weight-update paths symmetrically. On the fast-llm path this deadlocks the initial (step=0) weight broadcast: engine.pause_generation blocks waiting for in-flight requests to drain from a generator that hasn't started yet, so the NCCL broadcast send from fast-llm never gets a receiver. Origin/fast-llm calls the worker RPC directly with no wrap, and the baseline run on counting.yaml completed 10/10 iterations cleanly. This commit restores that behavior: EngineManager no longer has a receive_weight_update_fast_llm method, and start_fast_llm_monitoring calls collective_rpc_async directly again. HTTP path keeps the pause/resume wrap (PR #137's intended fix).

ehsk · 2026-04-23T16:31:22Z

-            response.raise_for_status()
-        data = await response.json()
+    response_data = None
+    for attempt in range(2):


why 3 times?

twice :), just a basic retry

my bad ;) can we make it configurable? and add a small delay between the retries?

No, I’d keep this as a single immediate retry. This path should stay as fast as possible, and if these aborts happen at all, we don’t want to add extra delay here.

This makes me wonder what specifically causes these aborts in this system. If it's weight-update pauses, an immediate retry might just hit another abort if the update hasn't finished yet. In that case, the retry only helps if the pause is shorter than the round-trip time of the HTTP request, which seems fragile to assume.

ehsk · 2026-04-23T16:39:36Z

+            # vLLM batch_invariant mode sets restrictive NCCL env vars (single channel,
+            # tree algo, simple proto, P2P disabled) that the trainer does not share.
+            # Clear them so the weight-update NCCL comm matches trainer defaults.
+            # Safe at tp=1 because no intra-engine NCCL comm has been created yet.


Shouldn't we add an assert to ensure this should occur only when tensor-parallel-size is 1?

see my comment in the files, this mode should never be used, i left it there because it's a useful default for NCCL compatibility with the trainer

Do we know if this works when batch invariant is enabled and tp > 1?

it makes things slightly worse, it adds about 2x delay that compounds through the run

Great, can we also clarify it in the comments for tp > 1?

rafapi added 5 commits April 21, 2026 17:38

upgrade vllm to 0.18.1 for v1 support

3ab9ee2

chunked entropy to reduce memory

bcfe127

clear nccl env vars before actor weight-update group init

f2d9ad9

pause during weight updates

a120bfe

remove prefix-caching and async-sched

1d5a8d3

rafapi requested a review from ehsk April 22, 2026 13:11

rafapi commented Apr 22, 2026

View reviewed changes

rafapi added 3 commits April 22, 2026 17:16

pass chat template kwargs

415fb9e

thread chat template kwargs

774e820

log mean abs ratio

a14f9ad

ehsk and others added 9 commits April 23, 2026 14:37

v1 used by default

8456802

torch bumped up to 2.10 to be compatible with vllm v0.18

e1b0944

fix flash-attn wheel

ab168ed

fix logging

8f677ac

retry aborted completions

5ce6f86

retry aborted rollouts

6bae016

log weight update timing

018fe48

Merge branch 'vllm_v1' of github.com:ServiceNow/PipelineRL into vllm_v1

f286ac9

Merge branch 'main' into vllm_v1

b6bb3e0

ehsk reviewed Apr 23, 2026

View reviewed changes

Comment thread pipelinerl/async_llm.py Outdated

ehsk reviewed Apr 23, 2026

View reviewed changes

Comment thread pipelinerl/vllm1.py Outdated

rafapi added 4 commits April 23, 2026 16:57

remove dead flag

166de90

fix logic

d1a9c7e

remove vllm0

24ae2c8

always use vllm1

d2b6bd9

rafapi added 4 commits April 23, 2026 17:13

drop base use_v1

16a85fc

drop chartqa use_v1

8a0e883

add mising launch params to config

ce4d8b8

fp32_lm_head=true

3bbc9cc

Conversation

rafapi commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rafapi commented Apr 22, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rafapi Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rafapi commented Apr 22, 2026 •

edited

Loading

rafapi Apr 23, 2026 •

edited

Loading