Conversion scripts fail on standard TPU VMs for large models

### Bug report

The [official instruction]([https://maxtext.readthedocs.io/en/latest/guides/checkpointing_solutions/convert_checkpoint.html#hugging-face-to-maxtext) for model conversion failed on a standard TPU-v5p VM for large models such as QWen3-235B, with CPU OOM errors when sharding MoE layers. Since on GCP the CPU memory is fixed (400GB), I wonder if we can improve the script to bypass this issue, or is the doc outdated?

Also for the model conversion the sharding process with default simulated_cpu_devices_count=16 is very very slow (even for 30B model). 

### Logs/Output

_No response_

### Environment Information

_No response_

### Additional Context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Conversion scripts fail on standard TPU VMs for large models #3418

Bug report

Logs/Output

Environment Information

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Conversion scripts fail on standard TPU VMs for large models #3418

Description

Bug report

Logs/Output

Environment Information

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions