Skip to content

Conversation

@YixuanWang-99
Copy link
Collaborator

@YixuanWang-99 YixuanWang-99 commented Jun 16, 2025

Description

This update to forward_pass_logit_checker.py enables direct comparisons between MaxText and Hugging Face model checkpoints.

Previously, the script could only compare a single checkpoint (either Hugging Face or MaxText) against a set of "golden logits." This was problematic for fine-tuned models, as their outputs often diverge from the original golden logits. Additionally, when converting models between MaxText and Hugging Face formats, it was difficult to verify the conversion's accuracy.

  • Added a new flag: --run_hf_model (default to False).
  • When --run_hf_model flag is set to True, forward_pass_logit_checker .py will run both MaxText and HuggingFace models on-the-fly and compare their output logits, including evaluating output logits for the last token prediction, top-k predicted tokens and their corresponding scores, and KL-divergence between the full logit distributions, ensuring similarity.
  • When --run_hf_model flag is not used (this is default behavior), it preserves the existing functionality. All existing shell scripts that rely on the original behavior will remain unaffected and run without changes.

This enhancement is crucial for verifying that model conversions accurately preserve predictive behavior.

Tests

Tested on Gemma-2b Model, with an example to comparing MaxText/Hugging Face models runs:

python3 -m MaxText.tests.forward_pass_logit_checker MaxText/configs/base.yml tokenizer_path=assets/tokenizer.gemma load_parameters_path=gs://maxtext-model-checkpoints/gemma-2b/2025-01-23-19-20/unscanned/checkpoints/0/items run_name=forward_pass_test_gemma2b per_device_batch_size=1 model_name=gemma-2b max_prefill_predict_length=4 max_target_length=4 dataset_type=synthetic scan_layers=false attention=dot_product --max_kl_div=0.015 --run_hf_model=True --hf_model_path=google/gemma-2b

A successful check between huggingface and MaxText checkpoints like this. And the similarity and KL div check should be with no errors

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • I have performed a self-review of my code.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have run end-to-end tests tests and provided workload links above if applicable.
  • I have made or will make corresponding changes to the doc if needed.

@YixuanWang-99 YixuanWang-99 changed the title Enable conversion from Huggingface to Maxtext Enable Checkpoint Conversion from Huggingface to Maxtext Jun 16, 2025
Copy link
Collaborator

@hengtaoguo hengtaoguo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent work!

@hengtaoguo
Copy link
Collaborator

Hi @gagika ! I've heard this might be interesting to you for loading/saving HF checkpoints. Would you like to take a look when you got a chance? Thanks a lot for your time!

Copy link
Collaborator

@shralex shralex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Yixuan! Added a few comments

Copy link
Collaborator

@shralex shralex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing the comments! I have 1 small comment and also a question -- did you test both directions -- to and from HF ? if so can you add both to the PR description testing section, currently it includes 1 example. Thanks!

@YixuanWang-99
Copy link
Collaborator Author

YixuanWang-99 commented Jun 24, 2025

Thanks for addressing the comments! I have 1 small comment and also a question -- did you test both directions -- to and from HF ? if so can you add both to the PR description testing section, currently it includes 1 example. Thanks!

from HF conversion with examples is pushed in previous PR: #1785 and #1821. And I have revised the run name.

@YixuanWang-99 YixuanWang-99 changed the title Enable Checkpoint Conversion from Huggingface to Maxtext Improve forward_pass_logit_checker.py to perform mutual conversion check Jun 27, 2025
@shralex
Copy link
Collaborator

shralex commented Jun 28, 2025

@YixuanWang-99 thank you for consolidating these files. Before merging this, lets make sure that end-to-end tests using forward logits checker still work - can you please run a couple of these tests.
@khatwanimohit I believe you're familiar with forward logits checker, could please also review

@YixuanWang-99 YixuanWang-99 force-pushed the yixuannwang-test2 branch 2 times, most recently from b2faae0 to d8de947 Compare June 30, 2025 18:16
@hengtaoguo
Copy link
Collaborator

hengtaoguo commented Jun 30, 2025

@YixuanWang-99 thank you for consolidating these files. Before merging this, lets make sure that end-to-end tests using forward logits checker still work - can you please run a couple of these tests. @khatwanimohit I believe you're familiar with forward logits checker, could please also review

Thank you for the constructive feedback! The new flag run_hf_model aims to add functionality without impacting existing nightly tests. We did a local end_to_end test run for gemma-2b model and the results passed with kl_div < max_kl_div (0.015).

python3 -m MaxText.tests.forward_pass_logit_checker  MaxText/configs/base.yml tokenizer_path=assets/tokenizer.gemma load_parameters_path=gs://runner-maxtext-logs/unscanned_chkpt_2025-06-30-04-17/checkpoints/0/items run_name=forward_pass_test_gemma2b per_device_batch_size=1 model_name=gemma-2b max_prefill_predict_length=4 max_target_length=4 dataset_type=synthetic scan_layers=false attention=dot_product --max_kl_div=0.015

Workload that runs a full test_gemma.sh: link

@copybara-service copybara-service bot merged commit a832a34 into main Jun 30, 2025
18 checks passed
@copybara-service copybara-service bot deleted the yixuannwang-test2 branch June 30, 2025 20:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants