Support gemma3 full model bidirectional ckpt conversion.#1983
Support gemma3 full model bidirectional ckpt conversion.#1983copybara-service[bot] merged 1 commit intomainfrom
Conversation
hengtaoguo
left a comment
There was a problem hiding this comment.
Can you add some instructions running the Gemma3 full model conversion? Thanks!
Also add a bug for improving the forward_pass_logits_checker, to include multimodal correctness check.
There was a problem hiding this comment.
IIUC, this code support two ways to save the HF ckpt: (1) upload to GCS bucket and (2) directly publish on HF.
For GCS option, does it still require saving all the files to local directory first? Is it possible to also do the memory-to-GCS in the future?
There was a problem hiding this comment.
At this point, for both the GCS and and HF uploading, the .safetensor files are the most significant portion of disk usage, and it is optimized to be uploaded from memory to remote directory in this PR
shralex
left a comment
There was a problem hiding this comment.
Thanks Yixuan! General comment -- is it possible to make this implementation more modular ? Right now, for every type of file there is a separate implementation of uploading to HF, uploading to GCS and to local disk. Can we create an upload method and then use them ? Also for GCS for checkpoint files the implementation is optimized but for other files its storing things locally, while for HF its always uploading directly.
ca17359 to
eaba236
Compare
54d8b65 to
62efeff
Compare
Description
The PR supports the Gemma3 multimodal (vision + text) checkpoint bidirectional conversion between Maxtext and huggingface.
Tests
Bidirectional tests are done by running decode of the converted checkpoint. By eye-balling the generated outputs, they are matched perfectly.
Original decode output
Converted Maxtext ckpt output
Converted Huggingface ckpt output
Converted Huggingface ckpt are published on: https://huggingface.co/yixuan-99/gemma3-4b-it/tree/main
Qwen3-14b model ckpt conversion is also tested to verify the OOM error is fixed. Test outputs: 1) tokens 2) KL divergence
Checklist
Before submitting this PR, please make sure (put X in square brackets):