[DPO] Drop subprocess inference, add more framework support, doc update by wheresmyhair · Pull Request #971 · OptimalScale/LMFlow

wheresmyhair · 2026-05-22T02:53:08Z

Overview

Support more inference frameworks when doing DPO, trl version update, and readme update

Detailed Description

iterative_dpo_aligner
- Choose inference engine in-process - VLLMInferencer or SGLangInferencer based on inference_engine arg;
- fix DataProto -> text_to_textlist conversion left misaligned by [Data] Apply DataProto to vLLM Inference & Align API with SGLang #967 (n>1 rollouts are repeat-interleaved by prepare_inputs_for_inference and need to be ungrouped via meta_info["actual_n_rollouts"]).
vllm_inferencer
- mark MemorySafeVLLMInferencer deprecated with DeprecationWarning;
- scheduled for removal in lmflow 1.1.0.
auto_pipeline
- relax iterative_dpo_aligner gate from vllm AND trl AND ray to trl AND (vllm OR sglang);
- ray is only needed for the opt-in distributed reward inference path, and will be removed in the future to achieve a ray-less pipeline.
setup.py
- bump trl 0.8.0 -> trl>=0.11,<0.12;
- add pybase64 to [sglang] and rich to [trl] to work around upstream packaging gaps (sglang.utils eagerly imports pybase64; trl 0.11.x lazy-imports rich).
README + 5 localized READMEs
- document optional dependency extras and the vllm/sglang environment incompatibility.

- iterative_dpo_aligner: dispatch in-process VLLMInferencer or SGLangInferencer based on inference_engine; fix DataProto -> text_to_textlist conversion left misaligned by #967 (n>1 rollouts are repeat-interleaved by prepare_inputs_for_inference and need to be ungrouped via meta_info["actual_n_rollouts"]). - vllm_inferencer: mark MemorySafeVLLMInferencer deprecated with DeprecationWarning; scheduled for removal in lmflow 1.1.0. - auto_pipeline: relax iterative_dpo_aligner gate from vllm AND trl AND ray to trl AND (vllm OR sglang); ray is only needed for the opt-in distributed reward inference path. - setup.py: bump trl 0.8.0 -> trl>=0.11,<0.12; add pybase64 to [sglang] and rich to [trl] to work around upstream packaging gaps (sglang.utils eagerly imports pybase64; trl 0.11.x lazy-imports rich). - README + 5 localized READMEs: document optional dependency extras and the vllm/sglang environment incompatibility.

research4pan

LGTM

research4pan approved these changes May 22, 2026

View reviewed changes

research4pan merged commit 767e04c into main May 22, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DPO] Drop subprocess inference, add more framework support, doc update#971

[DPO] Drop subprocess inference, add more framework support, doc update#971
research4pan merged 1 commit into
mainfrom
lmflow-dpo

wheresmyhair commented May 22, 2026

Uh oh!

research4pan left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

wheresmyhair commented May 22, 2026

Overview

Detailed Description

Uh oh!

research4pan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants