Add DPO and ORPO preference data preprocessing pipeline utils by igorts-git · Pull Request #3895 · AI-Hypercomputer/maxtext

igorts-git · 2026-05-13T17:49:21Z

Description

To simplify code review I am splitting the Tunix-based DPO implementation into smaller PRs.
This one adds the data reading processing required by DPO.

The classic DPO inputs consist of three data columns: ["prompt", "chosen_response", "rejected_response"].
However, some DPO datasets use a two-column format where the prompt is the prefix to the choosen and rejected strings.
When a 2-column dataset is used our implementation extracts the common prefix into the "prompt" field that is then fed into the model separately.
The column names in the dataset can wary, for example ["input", chosen", "rejected"]. Our implementation allows the user to supply the dataset column names via the train_data_columns and eval_data_columns parameters.

Tunix requires left-padded prompt and right-padded responses. Our code implements this padding (and truncation if needed) it also provides Tunix with the corresponding masks.

NOTE: once this PR is merged the legacy DPO will stop working correctly. The follow up PRs will enable Tunix-based DPO.

Caveat: This PR only adds support for HuggingFace datasets, while the legacy DPO implementation supported HuggingFace, TFDS and Grain. This is on-par with our SFT implementation. We need to discuss the priority of supporting TFDS and Grain in post-training.

Tests

Added unit tests. Ran DPO/ORPO and performed logits comparison against the legacy implementation.

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

codecov · 2026-05-13T17:53:26Z

Codecov Report

❌ Patch coverage is 95.16129% with 3 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/maxtext/input_pipeline/hf_data_processing.py	71.42%	1 Missing and 1 partial ⚠️
src/maxtext/input_pipeline/dpo_utils.py	98.18%	0 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

github-actions · 2026-05-14T23:12:36Z

🤖 Hi @igorts-git, I've received your request, and I'm working on it now! You can track my progress in the logs for more details.

github-actions

## 📋 Review Summary

The Pull Request introduces important utilities for DPO and ORPO preference data preprocessing, which is a key component for the upcoming Tunix-based alignment implementation. The core logic for handling 2-column and 3-column datasets is well-structured, but I identified a high-severity bug in the common prefix extraction and some opportunities for more flexible truncation strategies.

🔍 General Feedback

Logic Bug: The common prefix extraction logic using enumerate(zip(...)) is flawed for edge cases like identical strings or prefix strings. I have provided a more robust implementation in the inline comments.
Truncation Strategy: The current 50/50 split for prompt/response lengths and the prefix-based truncation for prompts might lead to information loss in long-context scenarios.
Test Coverage: The new unit tests are quite thorough, but adding the suggested edge cases for prefix extraction would make them even better.

github-actions · 2026-05-15T00:06:16Z

🤖 Hi @igorts-git, I've received your request, and I'm working on it now! You can track my progress in the logs for more details.

github-actions

## 📋 Review Summary

This PR introduces necessary data preprocessing utilities for DPO and ORPO, including a new Grain transform DPOTunixPrep that handles column remapping, prefix extraction, and DPO-aware padding. The implementation is well-tested and integrated into the existing Hugging Face data pipeline.

🔍 General Feedback

Robustness: The prefix extraction logic for 2-column datasets is a great addition for supporting popular preference datasets like Anthropic/hh-rlhf.
Breaking Change: As noted in the description, moving DPO parameters into a nested config block is a breaking change for existing DPO configurations.
Logic Correction: A fix is suggested for the slicing logic in _pad to correctly handle cases where the requested length is 0.
Validation: Added a suggestion for non-negativity validation on max_prompt_length to align with project standards.

SurbhiJainUSC · 2026-05-19T17:32:03Z

 use_dpo: False
-dpo_label_smoothing: 0.0
-dpo_beta: 0.1
+dpo:


Do we need to add these DPO configs to base.yml too?

Sorry, I don't fully understand the comment. Are you suggesting to remove these configs from base.yml?

most likely these dpo related parameters ended up being in base.yml due to historical reasons. if we have them in the dpo.yaml and we have separate yml files in the configs/post_train directory then it makes sense to remove them from base.yml since the ones in base.yml are meant to be shareable across multiple use-cases.

OK, thanks for the details. I removed the "dpo:" section from base.yml

gagika · 2026-05-19T18:09:10Z

-dpo_label_smoothing: 0.0
-dpo_beta: 0.1


will this break older DPO code, if so should we just leave as it is for now, perhpas comment that used for older DPO?

Unfortunately, this PR breaks the older DPO code, not just due to the configs, but also in how the dataset is loaded. Once this and the next PR in the series is merged, I plan to follow up with a PR that completely deletes the legacy DPO implementation.

aireenmei

LGTM. But please note that both tfds and hf pipelines are planned for deprecation. There's already support for SFT and DPO in Grain pipeline. It would be great if we can have follow up changes to enable the same support in Grain pipeline

aireenmei · 2026-05-19T21:21:03Z

+    if self.use_dpo:
+      if self.packing:
+        raise ValueError("For DPO/ORPO, `packing` is not supported.")
+      if self.dpo.max_prompt_length is not None and self.dpo.max_prompt_length > self.max_target_length:


In the case of max_prompt_length == max_target_length, it will cause max_response_length=0 error in DPODataFormatting, should we guard it here?

Updated the guard here. FYI, There is a slightly more comprehensive assertion in dpo_utils.py that can trigger in a few more edge cases.

A9isha

thanks Igor

Approved barring the one comment

A9isha · 2026-05-19T21:48:17Z

 use_dpo: False
-dpo_label_smoothing: 0.0
-dpo_beta: 0.1
+dpo:


most likely these dpo related parameters ended up being in base.yml due to historical reasons. if we have them in the dpo.yaml and we have separate yml files in the configs/post_train directory then it makes sense to remove them from base.yml since the ones in base.yml are meant to be shareable across multiple use-cases.

…ities Includes robust common prefix extraction for 2-column datasets, prompt suffix truncation, customizable max_prompt_length with validation against max_target_length, and complete integration unit test coverage.

igorts-git force-pushed the igorts/dpo-input-processing branch 2 times, most recently from 30d3c25 to b8ae239 Compare May 14, 2026 21:56

igorts-git added the gemini-review label May 14, 2026

github-actions Bot reviewed May 14, 2026

View reviewed changes

igorts-git force-pushed the igorts/dpo-input-processing branch from b8ae239 to 2d7b6e0 Compare May 15, 2026 00:02

igorts-git added gemini-review and removed gemini-review labels May 15, 2026

github-actions Bot reviewed May 15, 2026

View reviewed changes

Comment thread src/maxtext/input_pipeline/dpo_utils.py

Comment thread src/maxtext/configs/types.py

igorts-git force-pushed the igorts/dpo-input-processing branch 3 times, most recently from fd544d6 to eb1d524 Compare May 15, 2026 16:57

igorts-git marked this pull request as ready for review May 15, 2026 17:31

igorts-git requested review from A9isha, NicoGrande, NuojCheng, RissyRan, SurbhiJainUSC, abhinavclemson, aireenmei, bvandermoon, gagika, gobbleturk, hengtaoguo, khatwanimohit, richjames0, shralex and vipannalla as code owners May 15, 2026 17:31

igorts-git requested review from dipannita08, jesselu-google, jiangjy1982 and suexu1025 as code owners May 15, 2026 17:31

igorts-git force-pushed the igorts/dpo-input-processing branch from eb1d524 to 6cb3fd9 Compare May 19, 2026 14:18

igorts-git requested a review from darisoy as a code owner May 19, 2026 14:18

SurbhiJainUSC reviewed May 19, 2026

View reviewed changes

gagika approved these changes May 19, 2026

View reviewed changes

igorts-git force-pushed the igorts/dpo-input-processing branch from 6cb3fd9 to 13c2e3e Compare May 19, 2026 21:03

aireenmei approved these changes May 19, 2026

View reviewed changes

github-actions Bot added pull ready labels May 19, 2026

A9isha approved these changes May 19, 2026

View reviewed changes

igorts-git force-pushed the igorts/dpo-input-processing branch 2 times, most recently from bd2c0bf to 03a2b54 Compare May 20, 2026 03:00

igorts-git force-pushed the igorts/dpo-input-processing branch from 03a2b54 to d59a15e Compare May 20, 2026 05:46

copybara-service Bot merged commit 60bc7f9 into main May 20, 2026
28 of 29 checks passed

copybara-service Bot deleted the igorts/dpo-input-processing branch May 20, 2026 18:10

		dpo_label_smoothing: 0.0
		dpo_beta: 0.1

Conversation

igorts-git commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Tests

Checklist

Uh oh!

codecov Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

🔍 General Feedback

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented May 15, 2026

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

🔍 General Feedback

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aireenmei left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

igorts-git May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

A9isha left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

igorts-git commented May 13, 2026 •

edited

Loading

codecov Bot commented May 13, 2026 •

edited

Loading

aireenmei left a comment •

edited

Loading

igorts-git May 20, 2026 •

edited

Loading