Skip to content

[tx] Lazy inference engine initialization#1069

Merged
tyler-griggs merged 2 commits intomainfrom
tyler/lazy-inference-engines
Feb 12, 2026
Merged

[tx] Lazy inference engine initialization#1069
tyler-griggs merged 2 commits intomainfrom
tyler/lazy-inference-engines

Conversation

@tyler-griggs
Copy link
Member

@tyler-griggs tyler-griggs commented Feb 11, 2026

Summary

  • Defer vLLM inference engine creation from create_model() to first sampling-related call (save_sampler_checkpoint or sample)
  • SFT scripts that never sample no longer spin up inference engines
  • RL scripts lazily initialize engines on first save_weights_and_get_sampling_client() call

Open with Devin

Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 5 additional findings.

Open in Devin Review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the initialization of inference engines to be lazy, deferring their creation until they are first needed for sampling. This is a good optimization that avoids allocating resources for inference engines in training-only workflows and introduces a mechanism to sleep inference engines during training passes to free up GPU memory. A medium-severity Denial of Service vulnerability was identified, as the sample method does not validate user-provided sampling parameters, which could be abused to cause excessive resource consumption. Additionally, a critical issue exists where the new lazy initialization logic can lead to a crash if num_inference_engines is configured to 0, requiring graceful handling in sample() and save_sampler_checkpoint().

return {req_id: error for req_id, _, _, _, _ in prepared_batch.request_batch_slices}
# 1. Ensure inference engines are initialized
self._ensure_inference_engines()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

While lazy initialization is a great improvement, this change introduces a regression where sample() will crash if num_inference_engines is 0. Previously, there was a check (though likely buggy) to handle this. With lazy init, _ensure_inference_engines() will create an InferenceEngineClient with 0 engines, which then causes a crash inside _inference_engine_client.sample().

Please add a check to ensure there are engines available before proceeding with sampling.

        if not self._inference_engine_client or not self._inference_engine_client.engines:
            error = types.ErrorResponse(
                error="Sampling not enabled. Inference engines were not initialized (num_inference_engines=0 in SkyRL config).",
                status="error",
            )
            return {req_id: error for req_id, _, _, _, _ in prepared_batch.request_batch_slices}

Comment on lines +605 to +606
asyncio.run(self._dispatch.save_weights_for_sampler())
logger.info(f"Synced weights for {model_id} to inference engines via NCCL")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Similar to the sample method, this will crash if num_inference_engines is 0 because _dispatch.save_weights_for_sampler() will fail when there are no engines in the InferenceEngineClient. The weight sync should be skipped if no inference engines are configured.

        if self._inference_engine_client and self._inference_engine_client.engines:
            asyncio.run(self._dispatch.save_weights_for_sampler())
            logger.info(f"Synced weights for {model_id} to inference engines via NCCL")
        else:
            logger.info("Skipping sampler weight sync: no inference engines configured.")

@tyler-griggs tyler-griggs force-pushed the tyler/lazy-inference-engines branch from c410869 to e8137bb Compare February 11, 2026 21:12
tyler-griggs and others added 2 commits February 12, 2026 01:42
Defer vLLM inference engine creation from create_model() to first
sampling-related call (save_sampler_checkpoint or sample). SFT scripts
that never sample no longer pay the inference engine memory cost.

Sleep inference engines before forward/forward_backward when
colocate_all=True so the training model can load without OOM.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Map lora_config.rank and alpha to the SkyRL-Train LoRA config so that
LoRA requests actually create LoRA adapters instead of silently doing
full fine-tuning.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@tyler-griggs tyler-griggs force-pushed the tyler/lazy-inference-engines branch from e8137bb to b2b74fc Compare February 12, 2026 01:43
@tyler-griggs tyler-griggs merged commit ecebb00 into main Feb 12, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant