Bug report
The official instruction for model conversion failed on a standard TPU-v5p VM for large models such as QWen3-235B, with CPU OOM errors when sharding MoE layers. Since on GCP the CPU memory is fixed (400GB), I wonder if we can improve the script to bypass this issue, or is the doc outdated?
Also for the model conversion the sharding process with default simulated_cpu_devices_count=16 is very very slow (even for 30B model).
Logs/Output
No response
Environment Information
No response
Additional Context
No response