[tx] Lazy inference engine initialization by tyler-griggs · Pull Request #1069 · NovaSky-AI/SkyRL

tyler-griggs · 2026-02-11T06:30:04Z

Summary

Defer vLLM inference engine creation from create_model() to first sampling-related call (save_sampler_checkpoint or sample)
SFT scripts that never sample no longer spin up inference engines
RL scripts lazily initialize engines on first save_weights_and_get_sampling_client() call

devin-ai-integration

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 5 additional findings.

gemini-code-assist

Code Review

This pull request refactors the initialization of inference engines to be lazy, deferring their creation until they are first needed for sampling. This is a good optimization that avoids allocating resources for inference engines in training-only workflows and introduces a mechanism to sleep inference engines during training passes to free up GPU memory. A medium-severity Denial of Service vulnerability was identified, as the sample method does not validate user-provided sampling parameters, which could be abused to cause excessive resource consumption. Additionally, a critical issue exists where the new lazy initialization logic can lead to a crash if num_inference_engines is configured to 0, requiring graceful handling in sample() and save_sampler_checkpoint().

gemini-code-assist · 2026-02-11T19:00:04Z

skyrl-tx/tx/tinker/backends/skyrl_train.py

-            return {req_id: error for req_id, _, _, _, _ in prepared_batch.request_batch_slices}
+        # 1. Ensure inference engines are initialized
+        self._ensure_inference_engines()



While lazy initialization is a great improvement, this change introduces a regression where sample() will crash if num_inference_engines is 0. Previously, there was a check (though likely buggy) to handle this. With lazy init, _ensure_inference_engines() will create an InferenceEngineClient with 0 engines, which then causes a crash inside _inference_engine_client.sample().

Please add a check to ensure there are engines available before proceeding with sampling.

if not self._inference_engine_client or not self._inference_engine_client.engines: error = types.ErrorResponse( error="Sampling not enabled. Inference engines were not initialized (num_inference_engines=0 in SkyRL config).", status="error", ) return {req_id: error for req_id, _, _, _, _ in prepared_batch.request_batch_slices}

gemini-code-assist · 2026-02-11T19:00:04Z

skyrl-tx/tx/tinker/backends/skyrl_train.py

+        asyncio.run(self._dispatch.save_weights_for_sampler())
+        logger.info(f"Synced weights for {model_id} to inference engines via NCCL")


Similar to the sample method, this will crash if num_inference_engines is 0 because _dispatch.save_weights_for_sampler() will fail when there are no engines in the InferenceEngineClient. The weight sync should be skipped if no inference engines are configured.

if self._inference_engine_client and self._inference_engine_client.engines: asyncio.run(self._dispatch.save_weights_for_sampler()) logger.info(f"Synced weights for {model_id} to inference engines via NCCL") else: logger.info("Skipping sampler weight sync: no inference engines configured.")

skyrl-tx/tx/tinker/backends/skyrl_train.py

Defer vLLM inference engine creation from create_model() to first sampling-related call (save_sampler_checkpoint or sample). SFT scripts that never sample no longer pay the inference engine memory cost. Sleep inference engines before forward/forward_backward when colocate_all=True so the training model can load without OOM. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Map lora_config.rank and alpha to the SkyRL-Train LoRA config so that LoRA requests actually create LoRA adapters instead of silently doing full fine-tuning. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

vercel bot deployed to Preview February 11, 2026 06:30 View deployment

tyler-griggs force-pushed the tyler/lazy-inference-engines branch from 2260838 to f0a7666 Compare February 11, 2026 06:35

vercel bot deployed to Preview February 11, 2026 06:36 View deployment

tyler-griggs force-pushed the tyler/lazy-inference-engines branch from f0a7666 to fb60f53 Compare February 11, 2026 06:38

vercel bot deployed to Preview February 11, 2026 06:38 View deployment

tyler-griggs force-pushed the tyler/lazy-inference-engines branch from fb60f53 to e2260e3 Compare February 11, 2026 06:40

vercel bot deployed to Preview February 11, 2026 06:41 View deployment

vercel bot deployed to Preview February 11, 2026 07:03 View deployment

vercel bot deployed to Preview February 11, 2026 07:04 View deployment

vercel bot deployed to Preview February 11, 2026 07:31 View deployment

tyler-griggs marked this pull request as ready for review February 11, 2026 18:54

devin-ai-integration bot reviewed Feb 11, 2026

View reviewed changes

gemini-code-assist bot reviewed Feb 11, 2026

View reviewed changes

tyler-griggs force-pushed the tyler/lazy-inference-engines branch from c410869 to e8137bb Compare February 11, 2026 21:12

vercel bot temporarily deployed to Preview February 11, 2026 21:13 Inactive

tyler-griggs and others added 2 commits February 12, 2026 01:42

[skyrl] Wire LoRA config from Tinker API into SkyRL config

b2b74fc

Map lora_config.rank and alpha to the SkyRL-Train LoRA config so that LoRA requests actually create LoRA adapters instead of silently doing full fine-tuning. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

tyler-griggs force-pushed the tyler/lazy-inference-engines branch from e8137bb to b2b74fc Compare February 12, 2026 01:43

vercel bot deployed to Preview February 12, 2026 01:43 View deployment

tyler-griggs merged commit ecebb00 into main Feb 12, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[tx] Lazy inference engine initialization#1069

[tx] Lazy inference engine initialization#1069
tyler-griggs merged 2 commits intomainfrom
tyler/lazy-inference-engines

tyler-griggs commented Feb 11, 2026 •

edited by devin-ai-integration bot

Loading

Uh oh!

devin-ai-integration bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Feb 11, 2026

Uh oh!

gemini-code-assist bot Feb 11, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		asyncio.run(self._dispatch.save_weights_for_sampler())
		logger.info(f"Synced weights for {model_id} to inference engines via NCCL")

Conversation

tyler-griggs commented Feb 11, 2026 • edited by devin-ai-integration bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

✅ Devin Review: No Issues Found

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tyler-griggs commented Feb 11, 2026 •

edited by devin-ai-integration bot

Loading