Skip to content

Support gemma3 full model bidirectional ckpt conversion.#1983

Merged
copybara-service[bot] merged 1 commit intomainfrom
yixuannwang-test2
Jul 24, 2025
Merged

Support gemma3 full model bidirectional ckpt conversion.#1983
copybara-service[bot] merged 1 commit intomainfrom
yixuannwang-test2

Conversation

@YixuanWang-99
Copy link
Copy Markdown
Collaborator

@YixuanWang-99 YixuanWang-99 commented Jul 17, 2025

Description

The PR supports the Gemma3 multimodal (vision + text) checkpoint bidirectional conversion between Maxtext and huggingface.

  • Added parameter mappings, layerwise hook functions, and HF shapes/configs for gemma3 multimodal conversion
  • Removed the text-only gemma3 support, this became redundant.
  • Fixed the OOM error by adding Jax config
  • Improved the to_huggingface.py to skip local store of *.safetensors, config file and index file when publishing ckpt on huggingface hub/GCS, greatly save the disk usage

Tests

Bidirectional tests are done by running decode of the converted checkpoint. By eye-balling the generated outputs, they are matched perfectly.

python -m MaxText.decode MaxText/configs/base.yml model_name=gemma3-4b tokenizer_path=assets/tokenizer.gemma3 load_parameters_path=gs://yixuannwang-maxtext-logs/gemma3-4b/0/items per_device_batch_size=1 run_name=ht_test max_prefill_predict_length=272 max_target_length=300 steps=1 async_checkpointing=false scan_layers=false use_multimodal=true prompt=\'Describe\ image\ \<start_of_image\>\' image_path=\'MaxText/test_assets/test_image.jpg\' attention=\'dot_product\'

Original decode output
Converted Maxtext ckpt output
Converted Huggingface ckpt output

Converted Huggingface ckpt are published on: https://huggingface.co/yixuan-99/gemma3-4b-it/tree/main

Qwen3-14b model ckpt conversion is also tested to verify the OOM error is fixed. Test outputs: 1) tokens 2) KL divergence

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • I have performed a self-review of my code.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have run end-to-end tests tests and provided workload links above if applicable.
  • I have made or will make corresponding changes to the doc if needed.

@hengtaoguo hengtaoguo self-assigned this Jul 17, 2025
Copy link
Copy Markdown
Collaborator

@hengtaoguo hengtaoguo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some instructions running the Gemma3 full model conversion? Thanks!

Also add a bug for improving the forward_pass_logits_checker, to include multimodal correctness check.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, this code support two ways to save the HF ckpt: (1) upload to GCS bucket and (2) directly publish on HF.

For GCS option, does it still require saving all the files to local directory first? Is it possible to also do the memory-to-GCS in the future?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point, for both the GCS and and HF uploading, the .safetensor files are the most significant portion of disk usage, and it is optimized to be uploaded from memory to remote directory in this PR

Copy link
Copy Markdown
Collaborator

@shralex shralex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Yixuan! General comment -- is it possible to make this implementation more modular ? Right now, for every type of file there is a separate implementation of uploading to HF, uploading to GCS and to local disk. Can we create an upload method and then use them ? Also for GCS for checkpoint files the implementation is optimized but for other files its storing things locally, while for HF its always uploading directly.

Comment thread MaxText/utils/ckpt_conversion/utils/utils.py Outdated
@copybara-service copybara-service Bot merged commit c974396 into main Jul 24, 2025
19 checks passed
@copybara-service copybara-service Bot deleted the yixuannwang-test2 branch July 24, 2025 04:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants