[CI] Fix `test_inference_engines_generation` after vllm 0.16.0 upgrade; Use the correct GSM8k path for `test_generator_multi_turn_gsm8k_router_replay` by SumanthRH · Pull Request #1339 · NovaSky-AI/SkyRL

SumanthRH · 2026-03-18T01:20:46Z

What does this PR do?

Fixes CI failures on main right now: https://github.com/NovaSky-AI/SkyRL/actions/runs/23218100038/job/67484003027

tests/backends/skyrl_train/gpu/gpu_ci/test_skyrl_gym_generator.py::test_generator_multi_turn_gsm8k_router_replay - FileNotFoundError: Unable to find '/mnt/cluster_storage/data/gsm8k/validation.parquet' -> R3 PR: Rollout Routing Replay #1273 added a router replay test but used an incorrect path for the GSM8K dataset on CI
tests/backends/skyrl_train/gpu/gpu_ci/inference_servers/test_weight_sync.py::TestColocatedIpcWeightUpdateFlow::test_update_weights_ipc -> Similar fix as in [CI] Skip FlashRL integration test in CI and fix failing generation test for new inference codepath #1301

=========================== short test summary info ============================
FAILED tests/backends/skyrl_train/gpu/gpu_ci/test_engine_generation.py::test_inference_engines_generation[tp2_pp1_dp2] - ray.exceptions.ActorDiedError: The actor died because of an error raised in its creation task, ray::AsyncVLLMInferenceEngine.__init__() (pid=25165, ip=10.0.25.170, actor_id=7617504c4a769d500d1bc9ef13000000, repr=<skyrl.backends.skyrl_train.inference_engines.vllm.vllm_engine.AsyncVLLMInferenceEngine object at 0x7ea2b748c800>)
  File "/home/ray/anaconda3/lib/python3.12/concurrent/futures/_base.py", line 456, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/home/ray/anaconda3/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
           ^^^^^^^^^^^^^^^^^^^^^
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ray/session_2026-03-17_21-54-20_786615_3475/runtime_resources/working_dir_files/s3_anyscale-production-data-cld-hxkifz7xa22mwicp21nzkds1lw_org_xc6lv84h3d7m9dljcc17esfw2i_cld_hxkifz7xa22mwicp21nzkds1lw_runtime_env_packages_pkg_95e2ed2914e100bfad4cccac453e4b5b/skyrl/backends/skyrl_train/inference_engines/vllm/vllm_engine.py", line 343, in __init__
    super().__init__(*args, **kwargs)
  File "/tmp/ray/session_2026-03-17_21-54-20_786615_3475/runtime_resources/working_dir_files/s3_anyscale-production-data-cld-hxkifz7xa22mwicp21nzkds1lw_org_xc6lv84h3d7m9dljcc17esfw2i_cld_hxkifz7xa22mwicp21nzkds1lw_runtime_env_packages_pkg_95e2ed2914e100bfad4cccac453e4b5b/skyrl/backends/skyrl_train/inference_engines/vllm/vllm_engine.py", line 112, in __init__
    self.llm = self._create_engine(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ray/session_2026-03-17_21-54-20_786615_3475/runtime_resources/working_dir_files/s3_anyscale-production-data-cld-hxkifz7xa22mwicp21nzkds1lw_org_xc6lv84h3d7m9dljcc17esfw2i_cld_hxkifz7xa22mwicp21nzkds1lw_runtime_env_packages_pkg_95e2ed2914e100bfad4cccac453e4b5b/skyrl/backends/skyrl_train/inference_engines/vllm/vllm_engine.py", line 364, in _create_engine
    engine = vllm.AsyncLLMEngine.from_engine_args(engine_args, stat_loggers=stat_loggers)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ray/.cache/uv/builds-v0/.tmpuPz58u/lib/python3.12/site-packages/vllm/v1/engine/async_llm.py", line 251, in from_engine_args
    return cls(
           ^^^^
  File "/home/ray/.cache/uv/builds-v0/.tmpuPz58u/lib/python3.12/site-packages/vllm/v1/engine/async_llm.py", line 148, in __init__
    self.engine_core = EngineCoreClient.make_async_mp_client(
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ray/.cache/uv/builds-v0/.tmpuPz58u/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 121, in make_async_mp_client
    return DPAsyncMPClient(*client_args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ray/.cache/uv/builds-v0/.tmpuPz58u/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 1082, in __init__
    self._ensure_stats_update_task()
  File "/home/ray/.cache/uv/builds-v0/.tmpuPz58u/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 1091, in _ensure_stats_update_task
    assert self.stats_update_address is not None
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError
FAILED tests/backends/skyrl_train/gpu/gpu_ci/test_skyrl_gym_generator.py::test_generator_multi_turn_gsm8k_router_replay - FileNotFoundError: Unable to find '/mnt/cluster_storage/data/gsm8k/validation.parquet'
ERROR tests/backends/skyrl_train/gpu/gpu_ci/inference_servers/test_weight_sync.py::TestColocatedIpcWeightUpdateFlow::test_update_weights_ipc - AssertionError

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

gemini-code-assist

Code Review

This pull request effectively addresses two CI failures. The first is a FileNotFoundError which is resolved by correcting a dataset path. The second failure, an AssertionError following a vllm upgrade, is fixed by parameterizing a test to include a MoE model, which likely covers the failing code path. The changes are well-targeted and correct. I have one minor suggestion to improve code clarity by removing a redundant parameter.

gemini-code-assist · 2026-03-18T01:25:06Z

        max_input_length=max_input_length,
        max_generate_length=1000,
-        data_path=os.path.expanduser("/mnt/cluster_storage/data/gsm8k/validation.parquet"),
+        data_path=os.path.expanduser("~/data/gsm8k/validation.parquet"),


This line explicitly sets data_path to the same value as its default in the run_generator_end_to_end function signature. To reduce redundancy and improve maintainability, you can remove this line and rely on the default value.

devin-ai-integration

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 2 additional findings.

…e; Use the correct GSM8k path for `test_generator_multi_turn_gsm8k_router_replay` (#1339)

SumanthRH added 2 commits March 18, 2026 00:11

fix ci faiilures

4a6afda

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

x

79da8fb

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

SumanthRH marked this pull request as ready for review March 18, 2026 01:23

gemini-code-assist bot reviewed Mar 18, 2026

View reviewed changes

devin-ai-integration bot reviewed Mar 18, 2026

View reviewed changes

SumanthRH merged commit be602d8 into main Mar 18, 2026
5 of 6 checks passed

devpatelio pushed a commit that referenced this pull request Mar 20, 2026

[CI] Fix test_inference_engines_generation after vllm 0.16.0 upgrad…

7066bf4

…e; Use the correct GSM8k path for `test_generator_multi_turn_gsm8k_router_replay` (#1339)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI] Fix `test_inference_engines_generation` after vllm 0.16.0 upgrade; Use the correct GSM8k path for `test_generator_multi_turn_gsm8k_router_replay`#1339

[CI] Fix `test_inference_engines_generation` after vllm 0.16.0 upgrade; Use the correct GSM8k path for `test_generator_multi_turn_gsm8k_router_replay`#1339
SumanthRH merged 2 commits intomainfrom
fix-ci-failures-0317

SumanthRH commented Mar 18, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Mar 18, 2026

Uh oh!

devin-ai-integration bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

SumanthRH commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

✅ Devin Review: No Issues Found

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

SumanthRH commented Mar 18, 2026 •

edited

Loading