Skip to content

[DPO] Drop subprocess inference, add more framework support, doc update#971

Merged
research4pan merged 1 commit into
mainfrom
lmflow-dpo
May 22, 2026
Merged

[DPO] Drop subprocess inference, add more framework support, doc update#971
research4pan merged 1 commit into
mainfrom
lmflow-dpo

Conversation

@wheresmyhair
Copy link
Copy Markdown
Collaborator

Overview

Support more inference frameworks when doing DPO, trl version update, and readme update

Detailed Description

  • iterative_dpo_aligner
    • Choose inference engine in-process - VLLMInferencer or SGLangInferencer based on inference_engine arg;
    • fix DataProto -> text_to_textlist conversion left misaligned by [Data] Apply DataProto to vLLM Inference & Align API with SGLang #967 (n>1 rollouts are repeat-interleaved by prepare_inputs_for_inference and need to be ungrouped via meta_info["actual_n_rollouts"]).
  • vllm_inferencer
    • mark MemorySafeVLLMInferencer deprecated with DeprecationWarning;
    • scheduled for removal in lmflow 1.1.0.
  • auto_pipeline
    • relax iterative_dpo_aligner gate from vllm AND trl AND ray to trl AND (vllm OR sglang);
    • ray is only needed for the opt-in distributed reward inference path, and will be removed in the future to achieve a ray-less pipeline.
  • setup.py
    • bump trl 0.8.0 -> trl>=0.11,<0.12;
    • add pybase64 to [sglang] and rich to [trl] to work around upstream packaging gaps (sglang.utils eagerly imports pybase64; trl 0.11.x lazy-imports rich).
  • README + 5 localized READMEs
    • document optional dependency extras and the vllm/sglang environment incompatibility.

- iterative_dpo_aligner: dispatch in-process VLLMInferencer or
  SGLangInferencer based on inference_engine; fix DataProto ->
  text_to_textlist conversion left misaligned by #967 (n>1 rollouts are
  repeat-interleaved by prepare_inputs_for_inference and need to be
  ungrouped via meta_info["actual_n_rollouts"]).
- vllm_inferencer: mark MemorySafeVLLMInferencer deprecated with
  DeprecationWarning; scheduled for removal in lmflow 1.1.0.
- auto_pipeline: relax iterative_dpo_aligner gate from
  vllm AND trl AND ray to trl AND (vllm OR sglang); ray is only needed
  for the opt-in distributed reward inference path.
- setup.py: bump trl 0.8.0 -> trl>=0.11,<0.12; add pybase64 to [sglang]
  and rich to [trl] to work around upstream packaging gaps
  (sglang.utils eagerly imports pybase64; trl 0.11.x lazy-imports rich).
- README + 5 localized READMEs: document optional dependency extras
  and the vllm/sglang environment incompatibility.
Copy link
Copy Markdown
Contributor

@research4pan research4pan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@research4pan research4pan merged commit 767e04c into main May 22, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants