Skip to content

fix: add --simulated_cpu_devices_count to to_huggingface.py to prevent OOM#3192

Merged
copybara-service[bot] merged 1 commit intoAI-Hypercomputer:mainfrom
kryvokhyzha:fix/to-hf-oom-error
Feb 19, 2026
Merged

fix: add --simulated_cpu_devices_count to to_huggingface.py to prevent OOM#3192
copybara-service[bot] merged 1 commit intoAI-Hypercomputer:mainfrom
kryvokhyzha:fix/to-hf-oom-error

Conversation

@kryvokhyzha
Copy link
Copy Markdown
Contributor

@kryvokhyzha kryvokhyzha commented Feb 19, 2026

Description

Fix OOM in to_huggingface.py caused by 16× weight replication during Orbax checkpoint restore. Add --simulated_cpu_devices_count flag (default 16).

Problem

to_huggingface.py hardcodes xla_force_host_platform_device_count=16, creating 16 simulated CPU devices. load_orbax_checkpoint() then builds a mesh with all devices and restores every parameter with PartitionSpec().

For gemma3-4b in float32 (~14.5 GiB), this results in 16 × 14.5 = ~231 GiB of memory usage just for the checkpoint load - far exceeding typical CPU node RAM.

Fix

  • Move JAX platform and XLA flag configuration from main() to __main__ block (before JAX initialization), following the same pattern as to_maxtext.py.
  • Add --simulated_cpu_devices_count argparse flag (default 16) that is pre-parsed before absl.app.run(), matching the existing flag in to_maxtext.py.

This preserves backward compatibility: users who explicitly need one device can pass --simulated_cpu_devices_count=1.

Tests

  • Tested gemma3-4b Orbax -> HuggingFace conversion on an n2-standard-64 node (256 GiB RAM) in GKE with 64 GiB memory limit:
    • Before fix: OOM killed at ~231 GiB during checkpoint restore
    • After fix: completed successfully with ~40 GiB peak memory
  • Verified the converted HF checkpoint loads correctly

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have run end-to-end tests tests and provided workload links above if applicable.
  • I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

@kryvokhyzha kryvokhyzha changed the title Fix/to hf oom error fix: add --simulated_cpu_devices_count to to_huggingface.py to prevent OOM Feb 19, 2026
@kryvokhyzha
Copy link
Copy Markdown
Contributor Author

kryvokhyzha commented Feb 19, 2026

  1. Command:
python3 -u -m maxtext.checkpoint_conversion.to_huggingface \
        maxtext/configs/base.yml \
        model_name=gemma3-4b \
        hf_access_token=${HF_AUTH_TOKEN} \
        load_parameters_path=${LOAD_PATH} \
        base_output_directory=gs://mymodel-training/checkpoints/gemma3-4b-pt-hf \
        per_device_batch_size=1 \
        run_name=export \
        scan_layers=true \
        hardware=cpu \
        skip_jax_distributed_system=True \
        checkpoint_storage_concurrent_gb=16 \
        --simulated_cpu_devices_count=1
  1. Output:
Auto-detected latest step: 0
Loading checkpoint from: gs://mymodel-training/checkpoints/gemma-3-4b-pt-orbax/0/items
2026-02-19 19:33:57.458525: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1771529637.473906     150 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1771529637.478493     150 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1771529637.491083     150 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1771529637.491102     150 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1771529637.491104     150 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1771529637.491106     150 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
Patched XLA device count → 1 in /deps/src/maxtext/checkpoint_conversion/to_huggingface.py
2026-02-19 19:34:17.660784: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1771529657.675751     283 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1771529657.680202     283 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1771529657.691829     283 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1771529657.691847     283 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1771529657.691849     283 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1771529657.691851     283 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
2026-02-19 19:34:21.971369: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)
I0219 19:34:22.182761 136489015797568 max_utils.py:197] Skipping jax distributed system due to skip_jax_distributed_system=True flag.
I0219 19:34:22.205413 136489015797568 max_utils.py:328]  Setting num_slices=1 for CPU hardware type
I0219 19:34:22.206426 136489015797568 pyconfig.py:333] Config param act_quantization_calibration_method: absmax
I0219 19:34:22.206502 136489015797568 pyconfig.py:333] Config param activation_dropout_for_audio: 0.0
I0219 19:34:22.206539 136489015797568 pyconfig.py:333] Config param activation_function_for_audio: gelu
I0219 19:34:22.206569 136489015797568 pyconfig.py:333] Config param activations_in_float32: False
I0219 19:34:22.206598 136489015797568 pyconfig.py:333] Config param adam_b1: 0.9
I0219 19:34:22.206625 136489015797568 pyconfig.py:333] Config param adam_b2: 0.95
I0219 19:34:22.206650 136489015797568 pyconfig.py:333] Config param adam_eps: 1e-08
I0219 19:34:22.206678 136489015797568 pyconfig.py:333] Config param adam_eps_root: 0.0
I0219 19:34:22.206699 136489015797568 pyconfig.py:333] Config param adam_weight_decay: 0.1
I0219 19:34:22.206723 136489015797568 pyconfig.py:333] Config param add_bos: True
I0219 19:34:22.206748 136489015797568 pyconfig.py:333] Config param add_eos: True
I0219 19:34:22.206770 136489015797568 pyconfig.py:333] Config param allow_split_physical_axes: False
I0219 19:34:22.206793 136489015797568 pyconfig.py:333] Config param ar_cache_axis_order: 1,2,0,3
I0219 19:34:22.206815 136489015797568 pyconfig.py:333] Config param async_checkpointing: True
I0219 19:34:22.206837 136489015797568 pyconfig.py:333] Config param attention: autoselected
I0219 19:34:22.206865 136489015797568 pyconfig.py:333] Config param attention_bias: False
I0219 19:34:22.206885 136489015797568 pyconfig.py:333] Config param attention_dropout_for_audio: 0.0
I0219 19:34:22.206907 136489015797568 pyconfig.py:333] Config param attention_out: RematLocation.REMAT
I0219 19:34:22.206944 136489015797568 pyconfig.py:333] Config param attention_sink: False
I0219 19:34:22.206980 136489015797568 pyconfig.py:333] Config param attention_type: global
I0219 19:34:22.207005 136489015797568 pyconfig.py:333] Config param attn_logits_soft_cap: None
I0219 19:34:22.207026 136489015797568 pyconfig.py:333] Config param audio_path: 
I0219 19:34:22.207048 136489015797568 pyconfig.py:333] Config param autoregressive_decode_assert: 
I0219 19:34:22.207070 136489015797568 pyconfig.py:333] Config param base_config: None
I0219 19:34:22.207092 136489015797568 pyconfig.py:333] Config param base_emb_dim: 2560
I0219 19:34:22.207113 136489015797568 pyconfig.py:333] Config param base_mlp_dim: 10240
I0219 19:34:22.207137 136489015797568 pyconfig.py:333] Config param base_moe_mlp_dim: 7168
I0219 19:34:22.207159 136489015797568 pyconfig.py:333] Config param base_num_decoder_layers: 34
I0219 19:34:22.207179 136489015797568 pyconfig.py:333] Config param base_num_kv_heads: 4
I0219 19:34:22.207198 136489015797568 pyconfig.py:333] Config param base_num_query_heads: 8
I0219 19:34:22.207217 136489015797568 pyconfig.py:333] Config param base_output_directory: gs://mymodel-training/checkpoints/gemma3-4b-pt-hf
I0219 19:34:22.207239 136489015797568 pyconfig.py:333] Config param batch_size: 1
I0219 19:34:22.207261 136489015797568 pyconfig.py:333] Config param batch_split_factor: 1
I0219 19:34:22.207281 136489015797568 pyconfig.py:333] Config param beta_fast: 32
I0219 19:34:22.207301 136489015797568 pyconfig.py:333] Config param beta_slow: 1
I0219 19:34:22.207322 136489015797568 pyconfig.py:333] Config param bwd_quantization_calibration_method: absmax
I0219 19:34:22.207344 136489015797568 pyconfig.py:333] Config param capacity_factor: -1.0
I0219 19:34:22.207366 136489015797568 pyconfig.py:333] Config param cast_logits_to_fp32: True
I0219 19:34:22.207387 136489015797568 pyconfig.py:333] Config param chat_template_path: 
I0219 19:34:22.207406 136489015797568 pyconfig.py:333] Config param checkpoint_conversion_fn: None
I0219 19:34:22.207426 136489015797568 pyconfig.py:333] Config param checkpoint_dir: gs://mymodel-training/checkpoints/gemma3-4b-pt-hf/export/checkpoints/
I0219 19:34:22.207448 136489015797568 pyconfig.py:333] Config param checkpoint_is_quantized: False
I0219 19:34:22.207469 136489015797568 pyconfig.py:333] Config param checkpoint_period: 10000
I0219 19:34:22.207489 136489015797568 pyconfig.py:333] Config param checkpoint_storage_concurrent_gb: 16
I0219 19:34:22.207510 136489015797568 pyconfig.py:333] Config param checkpoint_storage_target_data_file_size_bytes: 2147483648
I0219 19:34:22.207530 136489015797568 pyconfig.py:333] Config param checkpoint_storage_use_ocdbt: True
I0219 19:34:22.207551 136489015797568 pyconfig.py:333] Config param checkpoint_storage_use_zarr3: True
I0219 19:34:22.207571 136489015797568 pyconfig.py:333] Config param chips_per_vm: 4
I0219 19:34:22.207593 136489015797568 pyconfig.py:333] Config param chunk_attn_window_size: 0
I0219 19:34:22.207616 136489015797568 pyconfig.py:333] Config param collect_stack_trace: False
I0219 19:34:22.207635 136489015797568 pyconfig.py:333] Config param colocated_python_data_input: False
I0219 19:34:22.207656 136489015797568 pyconfig.py:333] Config param compile_topology: 
I0219 19:34:22.207680 136489015797568 pyconfig.py:333] Config param compile_topology_num_slices: -1
I0219 19:34:22.207701 136489015797568 pyconfig.py:333] Config param compiled_trainstep_file: 
I0219 19:34:22.207723 136489015797568 pyconfig.py:333] Config param compute_axis_order: 0,1,2,3
I0219 19:34:22.207745 136489015797568 pyconfig.py:333] Config param constant_bound_config: []
I0219 19:34:22.207769 136489015797568 pyconfig.py:333] Config param context: RematLocation.REMAT
I0219 19:34:22.207792 136489015797568 pyconfig.py:333] Config param context_parallel_load_balance: True
I0219 19:34:22.207814 136489015797568 pyconfig.py:333] Config param context_parallel_size: 1
I0219 19:34:22.207837 136489015797568 pyconfig.py:333] Config param context_parallel_strategy: all_gather
I0219 19:34:22.207864 136489015797568 pyconfig.py:333] Config param conv_chunksize_for_audio: 500
I0219 19:34:22.207883 136489015797568 pyconfig.py:333] Config param conv_stride_for_vit: 14
I0219 19:34:22.207905 136489015797568 pyconfig.py:333] Config param cost_estimate_flops_bwd: -1
I0219 19:34:22.207924 136489015797568 pyconfig.py:333] Config param cost_estimate_flops_fwd: -1
I0219 19:34:22.207964 136489015797568 pyconfig.py:333] Config param custom_mesh: 
I0219 19:34:22.207985 136489015797568 pyconfig.py:333] Config param d_model_for_audio: 256
I0219 19:34:22.208007 136489015797568 pyconfig.py:333] Config param data_sharding: (('data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'context_autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence', 'expert', 'autoregressive'),)
I0219 19:34:22.208034 136489015797568 pyconfig.py:333] Config param data_shuffle_seed: 0
I0219 19:34:22.208056 136489015797568 pyconfig.py:333] Config param dataset_name: c4/en:3.0.1
I0219 19:34:22.208076 136489015797568 pyconfig.py:333] Config param dataset_path: 
I0219 19:34:22.208095 136489015797568 pyconfig.py:333] Config param dataset_type: DatasetType.TFDS
I0219 19:34:22.208119 136489015797568 pyconfig.py:333] Config param dcn_autoregressive_parallelism: 1
I0219 19:34:22.208141 136489015797568 pyconfig.py:333] Config param dcn_context_autoregressive_parallelism: 1
I0219 19:34:22.208160 136489015797568 pyconfig.py:333] Config param dcn_context_parallelism: 1
I0219 19:34:22.208181 136489015797568 pyconfig.py:333] Config param dcn_data_parallelism: -1
I0219 19:34:22.208200 136489015797568 pyconfig.py:333] Config param dcn_diloco_parallelism: 1
I0219 19:34:22.208221 136489015797568 pyconfig.py:333] Config param dcn_expert_parallelism: 1
I0219 19:34:22.208240 136489015797568 pyconfig.py:333] Config param dcn_fsdp_parallelism: 1
I0219 19:34:22.208260 136489015797568 pyconfig.py:333] Config param dcn_fsdp_transpose_parallelism: 1
I0219 19:34:22.208279 136489015797568 pyconfig.py:333] Config param dcn_parallelism: [1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
I0219 19:34:22.208303 136489015797568 pyconfig.py:333] Config param dcn_pipeline_parallelism: 1
I0219 19:34:22.208325 136489015797568 pyconfig.py:333] Config param dcn_sequence_parallelism: 1
I0219 19:34:22.208346 136489015797568 pyconfig.py:333] Config param dcn_tensor_parallelism: 1
I0219 19:34:22.208366 136489015797568 pyconfig.py:333] Config param dcn_tensor_sequence_parallelism: 1
I0219 19:34:22.208387 136489015797568 pyconfig.py:333] Config param dcn_tensor_transpose_parallelism: 1
I0219 19:34:22.208409 136489015797568 pyconfig.py:333] Config param debug: {'rl': False}
I0219 19:34:22.208432 136489015797568 pyconfig.py:333] Config param debug_sharding: False
I0219 19:34:22.208458 136489015797568 pyconfig.py:333] Config param decode_sampling_nucleus_p: -1
I0219 19:34:22.208480 136489015797568 pyconfig.py:333] Config param decode_sampling_strategy: SamplingStrategy.GREEDY
I0219 19:34:22.208503 136489015797568 pyconfig.py:333] Config param decode_sampling_temperature: 1.0
I0219 19:34:22.208526 136489015797568 pyconfig.py:333] Config param decode_sampling_top_k: 0
I0219 19:34:22.208545 136489015797568 pyconfig.py:333] Config param decoder_block: DecoderBlockType.GEMMA3
I0219 19:34:22.208568 136489015797568 pyconfig.py:333] Config param decoder_layer_input: RematLocation.DEVICE
I0219 19:34:22.208593 136489015797568 pyconfig.py:333] Config param deepstack_visual_indexes_for_vit: []
I0219 19:34:22.208615 136489015797568 pyconfig.py:333] Config param diloco_outer_lr: 0.3
I0219 19:34:22.208636 136489015797568 pyconfig.py:333] Config param diloco_outer_momentum: 0.9
I0219 19:34:22.208657 136489015797568 pyconfig.py:333] Config param diloco_sync_period: 36
I0219 19:34:22.208677 136489015797568 pyconfig.py:333] Config param distill_alpha: 0.5
I0219 19:34:22.208699 136489015797568 pyconfig.py:333] Config param distill_temperature: 1.0
I0219 19:34:22.208718 136489015797568 pyconfig.py:333] Config param downsample_hidden_size_for_audio: 256
I0219 19:34:22.208737 136489015797568 pyconfig.py:333] Config param dpo_beta: 0.1
I0219 19:34:22.208759 136489015797568 pyconfig.py:333] Config param dpo_label_smoothing: 0.0
I0219 19:34:22.208780 136489015797568 pyconfig.py:333] Config param dq_reduction_steps: 0
I0219 19:34:22.208801 136489015797568 pyconfig.py:333] Config param dropout_rate: 0.0
I0219 19:34:22.208822 136489015797568 pyconfig.py:333] Config param dtype: bfloat16
I0219 19:34:22.208865 136489015797568 pyconfig.py:333] Config param dtype_mm: float32
I0219 19:34:22.208895 136489015797568 pyconfig.py:333] Config param dump_hlo: False
I0219 19:34:22.208919 136489015797568 pyconfig.py:333] Config param dump_hlo_delete_local_after: True
I0219 19:34:22.208957 136489015797568 pyconfig.py:333] Config param dump_hlo_gcs_dir: gs://mymodel-training/checkpoints/gemma3-4b-pt-hf/export/xla_dump
I0219 19:34:22.208982 136489015797568 pyconfig.py:333] Config param dump_hlo_local_dir: /tmp/xla_dump/
I0219 19:34:22.209005 136489015797568 pyconfig.py:333] Config param dump_hlo_local_module_name: jit_train_step
I0219 19:34:22.209027 136489015797568 pyconfig.py:333] Config param dump_hlo_module_name: jit_train_step
I0219 19:34:22.209053 136489015797568 pyconfig.py:333] Config param dump_hlo_upload_all: False
I0219 19:34:22.209074 136489015797568 pyconfig.py:333] Config param dump_hlo_xla_flags: 
I0219 19:34:22.209094 136489015797568 pyconfig.py:333] Config param dump_jaxpr: False
I0219 19:34:22.209115 136489015797568 pyconfig.py:333] Config param dump_jaxpr_delete_local_after: True
I0219 19:34:22.209138 136489015797568 pyconfig.py:333] Config param dump_jaxpr_gcs_dir: gs://mymodel-training/checkpoints/gemma3-4b-pt-hf/export/jaxpr_dump
I0219 19:34:22.209161 136489015797568 pyconfig.py:333] Config param dump_jaxpr_local_dir: /tmp/jaxpr_dump/
I0219 19:34:22.209183 136489015797568 pyconfig.py:333] Config param dump_step: -1
I0219 19:34:22.209207 136489015797568 pyconfig.py:333] Config param emb_dim: 2560
I0219 19:34:22.209234 136489015797568 pyconfig.py:333] Config param enable_checkpoint_cloud_logger: False
I0219 19:34:22.209256 136489015797568 pyconfig.py:333] Config param enable_checkpointing: True
I0219 19:34:22.209275 136489015797568 pyconfig.py:333] Config param enable_continuous_checkpointing: False
I0219 19:34:22.209296 136489015797568 pyconfig.py:333] Config param enable_data_shuffling: True
I0219 19:34:22.209315 136489015797568 pyconfig.py:333] Config param enable_diloco: False
I0219 19:34:22.209337 136489015797568 pyconfig.py:333] Config param enable_dp_attention: False
I0219 19:34:22.209363 136489015797568 pyconfig.py:333] Config param enable_dropout: True
I0219 19:34:22.209387 136489015797568 pyconfig.py:333] Config param enable_emergency_checkpoint: False
I0219 19:34:22.209407 136489015797568 pyconfig.py:333] Config param enable_gcp_goodput_metrics: True
I0219 19:34:22.209425 136489015797568 pyconfig.py:333] Config param enable_gcp_step_deviation_metrics: True
I0219 19:34:22.209446 136489015797568 pyconfig.py:333] Config param enable_goodput_recording: False
I0219 19:34:22.209465 136489015797568 pyconfig.py:333] Config param enable_jax_profiler: False
I0219 19:34:22.209484 136489015797568 pyconfig.py:333] Config param enable_llm_inference_pool: False
I0219 19:34:22.209505 136489015797568 pyconfig.py:333] Config param enable_model_warmup: False
I0219 19:34:22.209526 136489015797568 pyconfig.py:333] Config param enable_multi_tier_checkpointing: False
I0219 19:34:22.209547 136489015797568 pyconfig.py:333] Config param enable_nnx: False
I0219 19:34:22.209567 136489015797568 pyconfig.py:333] Config param enable_orbax_v1: False
I0219 19:34:22.209586 136489015797568 pyconfig.py:333] Config param enable_padding_causal_mask: True
I0219 19:34:22.209609 136489015797568 pyconfig.py:333] Config param enable_pathways_goodput: False
I0219 19:34:22.209631 136489015797568 pyconfig.py:333] Config param enable_prefix_caching: False
I0219 19:34:22.209652 136489015797568 pyconfig.py:333] Config param enable_rampup_batch_size: False
I0219 19:34:22.209671 136489015797568 pyconfig.py:333] Config param enable_single_controller: False
I0219 19:34:22.209692 136489015797568 pyconfig.py:333] Config param enable_single_replica_ckpt_restoring: False
I0219 19:34:22.209711 136489015797568 pyconfig.py:333] Config param enable_tensorboard: True
I0219 19:34:22.209730 136489015797568 pyconfig.py:333] Config param enable_tunix_perf_metrics: False
I0219 19:34:22.209751 136489015797568 pyconfig.py:333] Config param encoder_attention_heads_for_audio: 4
I0219 19:34:22.209773 136489015797568 pyconfig.py:333] Config param encoder_ffn_dim_for_audio: 512
I0219 19:34:22.209794 136489015797568 pyconfig.py:333] Config param encoder_layers_for_audio: 2
I0219 19:34:22.209814 136489015797568 pyconfig.py:333] Config param eval_corr_lst: False
I0219 19:34:22.209835 136489015797568 pyconfig.py:333] Config param eval_data_columns: ['text']
I0219 19:34:22.209862 136489015797568 pyconfig.py:333] Config param eval_dataset_name: c4/en:3.0.1
I0219 19:34:22.209882 136489015797568 pyconfig.py:333] Config param eval_image_column: image
I0219 19:34:22.209902 136489015797568 pyconfig.py:333] Config param eval_interval: -1
I0219 19:34:22.209921 136489015797568 pyconfig.py:333] Config param eval_make_lst: False
I0219 19:34:22.209959 136489015797568 pyconfig.py:333] Config param eval_per_device_batch_size: 1
I0219 19:34:22.209984 136489015797568 pyconfig.py:333] Config param eval_sampling_strategy: greedy
I0219 19:34:22.210006 136489015797568 pyconfig.py:333] Config param eval_split: validation
I0219 19:34:22.210029 136489015797568 pyconfig.py:333] Config param eval_steps: -1
I0219 19:34:22.210051 136489015797568 pyconfig.py:333] Config param expansion_factor_real_data: -1.0
I0219 19:34:22.210071 136489015797568 pyconfig.py:333] Config param expert_shard_attention_option: fsdp
I0219 19:34:22.210093 136489015797568 pyconfig.py:333] Config param final_logits_soft_cap: None
I0219 19:34:22.210117 136489015797568 pyconfig.py:333] Config param first_num_dense_layers: 0
I0219 19:34:22.210139 136489015797568 pyconfig.py:333] Config param float32_logits: False
I0219 19:34:22.210158 136489015797568 pyconfig.py:333] Config param float32_qk_product: False
I0219 19:34:22.210179 136489015797568 pyconfig.py:333] Config param float32_weight_sum: True
I0219 19:34:22.210199 136489015797568 pyconfig.py:333] Config param force_q_layout: False
I0219 19:34:22.210218 136489015797568 pyconfig.py:333] Config param force_unroll: False
I0219 19:34:22.210239 136489015797568 pyconfig.py:333] Config param freeze_audio_encoder_params: True
I0219 19:34:22.210258 136489015797568 pyconfig.py:333] Config param freeze_vision_encoder_params: True
I0219 19:34:22.210278 136489015797568 pyconfig.py:333] Config param fused_mlp: False
I0219 19:34:22.210297 136489015797568 pyconfig.py:333] Config param fused_qkv: False
I0219 19:34:22.210315 136489015797568 pyconfig.py:333] Config param gcs_metrics: False
I0219 19:34:22.210336 136489015797568 pyconfig.py:333] Config param gdn_chunk_size: 64
I0219 19:34:22.210354 136489015797568 pyconfig.py:333] Config param gdn_conv_kernel_dim: 4
I0219 19:34:22.210375 136489015797568 pyconfig.py:333] Config param gdn_key_head_dim: 128
I0219 19:34:22.210394 136489015797568 pyconfig.py:333] Config param gdn_num_key_heads: 16
I0219 19:34:22.210415 136489015797568 pyconfig.py:333] Config param gdn_num_value_heads: 32
I0219 19:34:22.210436 136489015797568 pyconfig.py:333] Config param gdn_value_head_dim: 128
I0219 19:34:22.210454 136489015797568 pyconfig.py:333] Config param generate_padding_batch_eval: False
I0219 19:34:22.210474 136489015797568 pyconfig.py:333] Config param generate_padding_batch_train: False
I0219 19:34:22.210493 136489015797568 pyconfig.py:333] Config param generate_slice: v5e-16
I0219 19:34:22.210514 136489015797568 pyconfig.py:333] Config param generation_configs: {}
I0219 19:34:22.210535 136489015797568 pyconfig.py:333] Config param global_batch_size_to_eval_on: 1
I0219 19:34:22.210554 136489015797568 pyconfig.py:333] Config param global_batch_size_to_load: 1
I0219 19:34:22.210575 136489015797568 pyconfig.py:333] Config param global_batch_size_to_load_eval: 1
I0219 19:34:22.210596 136489015797568 pyconfig.py:333] Config param global_batch_size_to_load_increment: None
I0219 19:34:22.210618 136489015797568 pyconfig.py:333] Config param global_batch_size_to_load_start: None
I0219 19:34:22.210638 136489015797568 pyconfig.py:333] Config param global_batch_size_to_train_on: 1
I0219 19:34:22.210657 136489015797568 pyconfig.py:333] Config param global_parameter_scale: 1
I0219 19:34:22.210677 136489015797568 pyconfig.py:333] Config param global_rampup_samples: 500
I0219 19:34:22.210695 136489015797568 pyconfig.py:333] Config param goodput_upload_interval_seconds: 30
I0219 19:34:22.210715 136489015797568 pyconfig.py:333] Config param grad_dtype: float32
I0219 19:34:22.210761 136489015797568 pyconfig.py:333] Config param gradient_accumulation_steps: 1
I0219 19:34:22.210786 136489015797568 pyconfig.py:333] Config param gradient_clipping_threshold: 1.0
I0219 19:34:22.210810 136489015797568 pyconfig.py:333] Config param grain_data_source_max_workers: 16
I0219 19:34:22.210829 136489015797568 pyconfig.py:333] Config param grain_eval_files: 
I0219 19:34:22.210855 136489015797568 pyconfig.py:333] Config param grain_file_type: arrayrecord
I0219 19:34:22.210874 136489015797568 pyconfig.py:333] Config param grain_num_threads: 16
I0219 19:34:22.210895 136489015797568 pyconfig.py:333] Config param grain_num_threads_eval: 16
I0219 19:34:22.210914 136489015797568 pyconfig.py:333] Config param grain_packing_type: first_fit
I0219 19:34:22.210950 136489015797568 pyconfig.py:333] Config param grain_per_worker_buffer_size: 1
I0219 19:34:22.210979 136489015797568 pyconfig.py:333] Config param grain_per_worker_buffer_size_eval: 1
I0219 19:34:22.211001 136489015797568 pyconfig.py:333] Config param grain_prefetch_buffer_size: 500
I0219 19:34:22.211023 136489015797568 pyconfig.py:333] Config param grain_prefetch_buffer_size_eval: 500
I0219 19:34:22.211045 136489015797568 pyconfig.py:333] Config param grain_ram_budget_mb: 1024
I0219 19:34:22.211067 136489015797568 pyconfig.py:333] Config param grain_train_files: 
I0219 19:34:22.211088 136489015797568 pyconfig.py:333] Config param grain_train_mixture_config_path: 
I0219 19:34:22.211110 136489015797568 pyconfig.py:333] Config param grain_worker_count: 1
I0219 19:34:22.211132 136489015797568 pyconfig.py:333] Config param grain_worker_count_eval: 1
I0219 19:34:22.211154 136489015797568 pyconfig.py:333] Config param grpo_beta: 0.08
I0219 19:34:22.211177 136489015797568 pyconfig.py:333] Config param grpo_epsilon: 0.2
I0219 19:34:22.211199 136489015797568 pyconfig.py:333] Config param hardware: cpu
I0219 19:34:22.211221 136489015797568 pyconfig.py:333] Config param hbm_utilization_vllm: 0.72
I0219 19:34:22.211243 136489015797568 pyconfig.py:333] Config param head_dim: 256
I0219 19:34:22.211264 136489015797568 pyconfig.py:333] Config param heartbeat_reporting_interval_in_seconds: 5
I0219 19:34:22.211286 136489015797568 pyconfig.py:333] Config param hf_data_dir: 
I0219 19:34:22.211307 136489015797568 pyconfig.py:333] Config param hf_eval_files: None
I0219 19:34:22.211328 136489015797568 pyconfig.py:333] Config param hf_eval_split: 
I0219 19:34:22.211350 136489015797568 pyconfig.py:333] Config param hf_name: 
I0219 19:34:22.211371 136489015797568 pyconfig.py:333] Config param hf_path: 
I0219 19:34:22.211392 136489015797568 pyconfig.py:333] Config param hf_train_files: None
I0219 19:34:22.211414 136489015797568 pyconfig.py:333] Config param hidden_size_for_vit: 1152
I0219 19:34:22.211434 136489015797568 pyconfig.py:333] Config param hide_profiler_step_metric: False
I0219 19:34:22.211456 136489015797568 pyconfig.py:333] Config param ici_autoregressive_parallelism: 1
I0219 19:34:22.211477 136489015797568 pyconfig.py:333] Config param ici_context_autoregressive_parallelism: 1
I0219 19:34:22.211498 136489015797568 pyconfig.py:333] Config param ici_context_parallelism: 1
I0219 19:34:22.211519 136489015797568 pyconfig.py:333] Config param ici_data_parallelism: 1
I0219 19:34:22.211539 136489015797568 pyconfig.py:333] Config param ici_diloco_parallelism: 1
I0219 19:34:22.211560 136489015797568 pyconfig.py:333] Config param ici_expert_parallelism: 1
I0219 19:34:22.211581 136489015797568 pyconfig.py:333] Config param ici_fsdp_parallelism: -1
I0219 19:34:22.211604 136489015797568 pyconfig.py:333] Config param ici_fsdp_transpose_parallelism: 1
I0219 19:34:22.211626 136489015797568 pyconfig.py:333] Config param ici_parallelism: [1, 1, 1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
I0219 19:34:22.211650 136489015797568 pyconfig.py:333] Config param ici_pipeline_parallelism: 1
I0219 19:34:22.211671 136489015797568 pyconfig.py:333] Config param ici_sequence_parallelism: 1
I0219 19:34:22.211693 136489015797568 pyconfig.py:333] Config param ici_tensor_parallelism: 1
I0219 19:34:22.211714 136489015797568 pyconfig.py:333] Config param ici_tensor_sequence_parallelism: 1
I0219 19:34:22.211735 136489015797568 pyconfig.py:333] Config param ici_tensor_transpose_parallelism: 1
I0219 19:34:22.211757 136489015797568 pyconfig.py:333] Config param image_path: 
I0219 19:34:22.211779 136489015797568 pyconfig.py:333] Config param image_placeholder: <start_of_image>
I0219 19:34:22.211800 136489015797568 pyconfig.py:333] Config param image_size_for_vit: 896
I0219 19:34:22.211821 136489015797568 pyconfig.py:333] Config param index_head_dim: 128
I0219 19:34:22.211847 136489015797568 pyconfig.py:333] Config param index_n_heads: 64
I0219 19:34:22.211869 136489015797568 pyconfig.py:333] Config param index_topk: 2048
I0219 19:34:22.211890 136489015797568 pyconfig.py:333] Config param inference_benchmark_test: False
I0219 19:34:22.211912 136489015797568 pyconfig.py:333] Config param inference_metadata_file: 
I0219 19:34:22.211969 136489015797568 pyconfig.py:333] Config param inference_microbenchmark_log_file_path: 
I0219 19:34:22.211998 136489015797568 pyconfig.py:333] Config param inference_microbenchmark_loop_iters: 10
I0219 19:34:22.212022 136489015797568 pyconfig.py:333] Config param inference_microbenchmark_num_samples: [1, 2, 3, 4, 5]
I0219 19:34:22.212046 136489015797568 pyconfig.py:333] Config param inference_microbenchmark_prefill_lengths: 64,128,256,512,1024
I0219 19:34:22.212069 136489015797568 pyconfig.py:333] Config param inference_microbenchmark_stages: prefill,generate
I0219 19:34:22.212091 136489015797568 pyconfig.py:333] Config param inference_server: MaxtextInterleavedServer
I0219 19:34:22.212113 136489015797568 pyconfig.py:333] Config param inhomogeneous_layer_cycle_interval: 1
I0219 19:34:22.212135 136489015797568 pyconfig.py:333] Config param init_weights_seed: 0
I0219 19:34:22.212157 136489015797568 pyconfig.py:333] Config param input_data_sharding_logical_axes: ['activation_embed_and_logits_batch', 'activation_norm_length']
I0219 19:34:22.212179 136489015797568 pyconfig.py:333] Config param interleave_moe_layer_step: 1
I0219 19:34:22.212201 136489015797568 pyconfig.py:333] Config param intermediate_size_for_vit: 4304
I0219 19:34:22.212223 136489015797568 pyconfig.py:333] Config param jax_cache_dir: ~/jax_cache
I0219 19:34:22.212244 136489015797568 pyconfig.py:333] Config param jax_debug_log_modules: 
I0219 19:34:22.212266 136489015797568 pyconfig.py:333] Config param jax_distributed_initialization_timeout: 300
I0219 19:34:22.212288 136489015797568 pyconfig.py:333] Config param jax_profiler_port: 9999
I0219 19:34:22.212310 136489015797568 pyconfig.py:333] Config param key_proj: RematLocation.REMAT
I0219 19:34:22.212332 136489015797568 pyconfig.py:333] Config param kv_cache_buffer: 256
I0219 19:34:22.212356 136489015797568 pyconfig.py:333] Config param kv_lora_rank: 512
I0219 19:34:22.212378 136489015797568 pyconfig.py:333] Config param kv_quant_axis: KvQuantAxis.HEADS_AND_DKV
I0219 19:34:22.212403 136489015797568 pyconfig.py:333] Config param kv_quant_dtype: int8
I0219 19:34:22.212424 136489015797568 pyconfig.py:333] Config param learning_rate: 3e-05
I0219 19:34:22.212450 136489015797568 pyconfig.py:333] Config param learning_rate_final_fraction: 0.1
I0219 19:34:22.212472 136489015797568 pyconfig.py:333] Config param learning_rate_schedule_steps: 150001
I0219 19:34:22.212494 136489015797568 pyconfig.py:333] Config param load_balance_loss_weight: 0.0
I0219 19:34:22.212516 136489015797568 pyconfig.py:333] Config param load_checkpoint_only_once: False
I0219 19:34:22.212537 136489015797568 pyconfig.py:333] Config param load_from_prefill_dir: False
I0219 19:34:22.212559 136489015797568 pyconfig.py:333] Config param load_full_state_path: 
I0219 19:34:22.212581 136489015797568 pyconfig.py:333] Config param load_parameters_path: gs://mymodel-training/checkpoints/gemma-3-4b-pt-orbax/0/items
I0219 19:34:22.212605 136489015797568 pyconfig.py:333] Config param local_checkpoint_directory: 
I0219 19:34:22.212627 136489015797568 pyconfig.py:333] Config param local_checkpoint_period: 0
I0219 19:34:22.212649 136489015797568 pyconfig.py:333] Config param local_rope_max_timescale: 10000
I0219 19:34:22.212671 136489015797568 pyconfig.py:333] Config param log_config: True
I0219 19:34:22.212693 136489015797568 pyconfig.py:333] Config param log_period: 100
I0219 19:34:22.212714 136489015797568 pyconfig.py:333] Config param logical_axis_rules: (('activation_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_batch_no_exp', ('data', 'fsdp', 'fsdp_transpose')), ('activation_embed_and_logits_batch', ('data', 'stage', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_embed_and_logits_batch_sequence', ('data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('activation_heads', ('tensor', 'tensor_transpose', 'sequence', 'tensor_sequence', 'autoregressive')), ('activation_kv_heads', ('tensor', 'tensor_transpose', 'sequence', 'tensor_sequence')), ('activation_length', ('sequence', 'context', 'expert')), ('activation_length', ('context', 'expert')), ('activation_attn_length', ('sequence', 'context', 'expert')), ('activation_attn_length', ('context', 'expert')), ('activation_attn_length_no_exp', ('sequence', 'context')), ('activation_attn_length_no_exp', ('context',)), ('activation_length_no_exp', ('sequence', 'context')), ('activation_length_no_exp', ('context',)), ('activation_norm_length', ('tensor_sequence', 'context', 'sequence')), ('activation_q_length', ('context', 'expert')), ('activation_q_length_no_exp', ('context',)), ('prefill_activation_length', ('sequence', 'context')), ('prefill_activation_norm_length', ('tensor_sequence', 'context', 'sequence')), ('activation_kv_length', ()), ('activation_attn_embed', ('tensor', 'tensor_transpose')), ('activation_embed', ('tensor', 'tensor_transpose')), ('activation_mlp', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_kv', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_prefill_kv_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_kv_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_kv_batch_no_exp', ('data', 'fsdp', 'fsdp_transpose')), ('activation_kv_head_dim', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_vocab', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_vocab', ('tensor', 'tensor_transpose')), ('activation_vocab', 'tensor_sequence'), ('activation_vocab', ('sequence', 'context')), ('activation_stage', 'stage'), ('activation_exp', ('expert',)), ('decode_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('decode_length', ('sequence',)), ('mlp', ('fsdp_transpose', 'tensor', 'tensor_sequence', 'autoregressive')), ('mlp_no_fsdp', ('tensor', 'tensor_sequence', 'autoregressive')), ('vocab', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('q_heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('kv_heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('embed', ('fsdp', 'fsdp_transpose', 'sequence', 'tensor_transpose', 'context', 'expert')), ('embed', ('fsdp', 'sequence', 'tensor_transpose', 'context', 'expert')), ('embed', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('embed', ('fsdp', 'sequence', 'context', 'expert')), ('embed_no_exp', ('fsdp', 'fsdp_transpose', 'sequence', 'tensor_transpose', 'context')), ('embed_no_exp', ('fsdp', 'sequence', 'tensor_transpose', 'context')), ('embed_no_exp', ('fsdp', 'fsdp_transpose', 'sequence', 'context')), ('embed_no_exp', ('fsdp', 'sequence', 'context')), ('embed_tensor_transpose', ('tensor_transpose',)), ('q_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'tensor_transpose', 'expert')), ('q_lora', ('fsdp', 'sequence', 'context', 'tensor_transpose', 'expert')), ('q_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('q_lora', ('fsdp', 'sequence', 'context', 'expert')), ('q_lora_up_proj', ()), ('kv_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'tensor_transpose', 'expert')), ('kv_lora', ('fsdp', 'sequence', 'context', 'tensor_transpose', 'expert')), ('kv_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('kv_lora', ('fsdp', 'sequence', 'context', 'expert')), ('kv_lora_up_proj', ()), ('norm', ('tensor', 'tensor_transpose')), ('layers', 'stage'), ('qkv', ()), ('kv', ()), ('kv_head_dim', ()), ('cache_batch_prefill', ()), ('cache_batch', ()), ('cache_heads_none', ()), ('cache_heads', ('autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence')), ('cache_heads', ('autoregressive', 'tensor', 'tensor_sequence')), ('cache_kv', ()), ('cache_sequence', ()), ('exp', 'expert'), ('exp_with_fsdp', 'fsdp'), ('paged_kv_heads', ('tensor',)), ('num_pages', ()), ('tokens_per_page', ()), ('paged_kv_head_dim_size', ()), ('dense_layers', ()), ('moe_layers', ()), ('engram_dim', ('tensor',)), ('mhc', ()), ('diloco', 'diloco'))
I0219 19:34:22.212846 136489015797568 pyconfig.py:333] Config param logits_dot_in_fp32: False
I0219 19:34:22.212868 136489015797568 pyconfig.py:333] Config param logits_via_embedding: True
I0219 19:34:22.212890 136489015797568 pyconfig.py:333] Config param lora_input_adapters_path: 
I0219 19:34:22.212912 136489015797568 pyconfig.py:333] Config param loss_algo: grpo
I0219 19:34:22.212948 136489015797568 pyconfig.py:333] Config param lr_schedule_type: LearningRateScheduleType.COSINE
I0219 19:34:22.212983 136489015797568 pyconfig.py:333] Config param managed_mldiagnostics: False
I0219 19:34:22.213006 136489015797568 pyconfig.py:333] Config param managed_mldiagnostics_dir: gs://mymodel-training/checkpoints/gemma3-4b-pt-hf/export/managed-mldiagnostics
I0219 19:34:22.213028 136489015797568 pyconfig.py:333] Config param managed_mldiagnostics_run_group: 
I0219 19:34:22.213050 136489015797568 pyconfig.py:333] Config param matmul_precision: MatmulPrecision.DEFAULT
I0219 19:34:22.213074 136489015797568 pyconfig.py:333] Config param max_checkify: False
I0219 19:34:22.213096 136489015797568 pyconfig.py:333] Config param max_corpus_chars: 10000000
I0219 19:34:22.213119 136489015797568 pyconfig.py:333] Config param max_num_batched_tokens: None
I0219 19:34:22.213140 136489015797568 pyconfig.py:333] Config param max_num_checkpoints_to_keep: None
I0219 19:34:22.213161 136489015797568 pyconfig.py:333] Config param max_num_images_per_example: -1
I0219 19:34:22.213183 136489015797568 pyconfig.py:333] Config param max_num_seqs: None
I0219 19:34:22.213205 136489015797568 pyconfig.py:333] Config param max_position_embeddings: 163840
I0219 19:34:22.213227 136489015797568 pyconfig.py:333] Config param max_prefill_predict_length: 64
I0219 19:34:22.213248 136489015797568 pyconfig.py:333] Config param max_sample_len_for_audio: 10000
I0219 19:34:22.213269 136489015797568 pyconfig.py:333] Config param max_segments_per_seq: -1
I0219 19:34:22.213290 136489015797568 pyconfig.py:333] Config param max_source_positions_for_audio: 1500
I0219 19:34:22.213311 136489015797568 pyconfig.py:333] Config param max_target_length: 2048
I0219 19:34:22.213332 136489015797568 pyconfig.py:333] Config param max_timescale_for_audio: 10000.0
I0219 19:34:22.213354 136489015797568 pyconfig.py:333] Config param megablox: True
I0219 19:34:22.213376 136489015797568 pyconfig.py:333] Config param mesh_axes: ['diloco', 'data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'context_autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence', 'expert', 'autoregressive']
I0219 19:34:22.213401 136489015797568 pyconfig.py:333] Config param metrics_dir: gs://mymodel-training/checkpoints/gemma3-4b-pt-hf/export/metrics/
I0219 19:34:22.213422 136489015797568 pyconfig.py:333] Config param metrics_file: 
I0219 19:34:22.213443 136489015797568 pyconfig.py:333] Config param mhc_expansion_rate: 1
I0219 19:34:22.213464 136489015797568 pyconfig.py:333] Config param micro_batch_size: -1
I0219 19:34:22.213486 136489015797568 pyconfig.py:333] Config param micro_batch_size_to_eval_on: 1
I0219 19:34:22.213506 136489015797568 pyconfig.py:333] Config param micro_batch_size_to_train_on: 1
I0219 19:34:22.213527 136489015797568 pyconfig.py:333] Config param mla_kv: RematLocation.REMAT
I0219 19:34:22.213549 136489015797568 pyconfig.py:333] Config param mla_naive_kvcache: True
I0219 19:34:22.213571 136489015797568 pyconfig.py:333] Config param mla_q: RematLocation.REMAT
I0219 19:34:22.213593 136489015797568 pyconfig.py:333] Config param mlp_activations: ['gelu', 'linear']
I0219 19:34:22.213615 136489015797568 pyconfig.py:333] Config param mlp_activations_limit: -1.0
I0219 19:34:22.213638 136489015797568 pyconfig.py:333] Config param mlp_bias: False
I0219 19:34:22.213660 136489015797568 pyconfig.py:333] Config param mlp_dim: 10240
I0219 19:34:22.213682 136489015797568 pyconfig.py:333] Config param mlpwi: RematLocation.REMAT
I0219 19:34:22.213704 136489015797568 pyconfig.py:333] Config param mlpwi_0: RematLocation.REMAT
I0219 19:34:22.213726 136489015797568 pyconfig.py:333] Config param mlpwi_1: RematLocation.REMAT
I0219 19:34:22.213747 136489015797568 pyconfig.py:333] Config param mlpwo: RematLocation.REMAT
I0219 19:34:22.213768 136489015797568 pyconfig.py:333] Config param moba: False
I0219 19:34:22.213790 136489015797568 pyconfig.py:333] Config param moba_chunk_size: 1024
I0219 19:34:22.213811 136489015797568 pyconfig.py:333] Config param moba_topk: 8
I0219 19:34:22.213832 136489015797568 pyconfig.py:333] Config param model_call_mode: 
I0219 19:34:22.213858 136489015797568 pyconfig.py:333] Config param model_name: gemma3-4b
I0219 19:34:22.213879 136489015797568 pyconfig.py:333] Config param moe_fsdp_use_two_stage_all_gather: False
I0219 19:34:22.213901 136489015797568 pyconfig.py:333] Config param moe_mlp_dim: 7168
I0219 19:34:22.213922 136489015797568 pyconfig.py:333] Config param monitor_goodput: False
I0219 19:34:22.213952 136489015797568 pyconfig.py:333] Config param monitor_step_time_deviation: True
I0219 19:34:22.213973 136489015797568 pyconfig.py:333] Config param mrope_section: [24, 20, 20]
I0219 19:34:22.213996 136489015797568 pyconfig.py:333] Config param mscale: 1.0
I0219 19:34:22.214018 136489015797568 pyconfig.py:333] Config param mtc_data_parallelism: 0
I0219 19:34:22.214038 136489015797568 pyconfig.py:333] Config param mtp_eval_target_module: 0
I0219 19:34:22.214059 136489015797568 pyconfig.py:333] Config param mtp_loss_scaling_factor: 0.1
I0219 19:34:22.214081 136489015797568 pyconfig.py:333] Config param mtp_num_layers: 0
I0219 19:34:22.214102 136489015797568 pyconfig.py:333] Config param mu_dtype: float32
I0219 19:34:22.214143 136489015797568 pyconfig.py:333] Config param multi_sampling: False
I0219 19:34:22.214167 136489015797568 pyconfig.py:333] Config param multi_tier_checkpointing_backup_interval_minutes: 0
I0219 19:34:22.214190 136489015797568 pyconfig.py:333] Config param muon_beta: 0.95
I0219 19:34:22.214214 136489015797568 pyconfig.py:333] Config param muon_consistent_rms: None
I0219 19:34:22.214236 136489015797568 pyconfig.py:333] Config param muon_weight_decay: 0.0
I0219 19:34:22.214257 136489015797568 pyconfig.py:333] Config param n_routing_groups: -1
I0219 19:34:22.214279 136489015797568 pyconfig.py:333] Config param n_window_for_audio: 50
I0219 19:34:22.214300 136489015797568 pyconfig.py:333] Config param n_window_infer_for_audio: 800
I0219 19:34:22.214322 136489015797568 pyconfig.py:333] Config param nope_layer_interval: -1
I0219 19:34:22.214343 136489015797568 pyconfig.py:333] Config param norm_topk_prob: False
I0219 19:34:22.214364 136489015797568 pyconfig.py:333] Config param normalization_layer_epsilon: 1e-06
I0219 19:34:22.214388 136489015797568 pyconfig.py:333] Config param normalize_embedding_logits: True
I0219 19:34:22.214410 136489015797568 pyconfig.py:333] Config param num_attention_heads_for_vit: 16
I0219 19:34:22.214432 136489015797568 pyconfig.py:333] Config param num_batches: 4
I0219 19:34:22.214453 136489015797568 pyconfig.py:333] Config param num_channels_for_vit: 3
I0219 19:34:22.214475 136489015797568 pyconfig.py:333] Config param num_conv_layers_for_audio: 3
I0219 19:34:22.214496 136489015797568 pyconfig.py:333] Config param num_decoder_layers: 34
I0219 19:34:22.214517 136489015797568 pyconfig.py:333] Config param num_diloco_replicas: 1
I0219 19:34:22.214538 136489015797568 pyconfig.py:333] Config param num_epoch: 1
I0219 19:34:22.214560 136489015797568 pyconfig.py:333] Config param num_eval_passes: 1
I0219 19:34:22.214581 136489015797568 pyconfig.py:333] Config param num_experts: 1
I0219 19:34:22.214602 136489015797568 pyconfig.py:333] Config param num_experts_per_tok: 1
I0219 19:34:22.214623 136489015797568 pyconfig.py:333] Config param num_generations: 2
I0219 19:34:22.214644 136489015797568 pyconfig.py:333] Config param num_hidden_layers_for_vit: 27
I0219 19:34:22.214665 136489015797568 pyconfig.py:333] Config param num_iterations: 1
I0219 19:34:22.214685 136489015797568 pyconfig.py:333] Config param num_kv_heads: 4
I0219 19:34:22.214706 136489015797568 pyconfig.py:333] Config param num_layers_per_pipeline_stage: 1
I0219 19:34:22.214727 136489015797568 pyconfig.py:333] Config param num_mel_bins_for_audio: 128
I0219 19:34:22.214748 136489015797568 pyconfig.py:333] Config param num_pipeline_microbatches: -1
I0219 19:34:22.214769 136489015797568 pyconfig.py:333] Config param num_pipeline_repeats: -1
I0219 19:34:22.214790 136489015797568 pyconfig.py:333] Config param num_position_embeddings_for_vit: 1024
I0219 19:34:22.214811 136489015797568 pyconfig.py:333] Config param num_query_heads: 8
I0219 19:34:22.214833 136489015797568 pyconfig.py:333] Config param num_samplers_slices: -1
I0219 19:34:22.214858 136489015797568 pyconfig.py:333] Config param num_slices: 1
I0219 19:34:22.214879 136489015797568 pyconfig.py:333] Config param num_target_devices: 1
I0219 19:34:22.214901 136489015797568 pyconfig.py:333] Config param num_test_batches: 5
I0219 19:34:22.214921 136489015797568 pyconfig.py:333] Config param num_trainer_slices: -1
I0219 19:34:22.214950 136489015797568 pyconfig.py:333] Config param num_vocab_tiling: 1
I0219 19:34:22.214972 136489015797568 pyconfig.py:333] Config param opt_type: OptimizerType.ADAMW
I0219 19:34:22.214996 136489015797568 pyconfig.py:333] Config param optimize_mesh_for_tpu_v6e: False
I0219 19:34:22.215018 136489015797568 pyconfig.py:333] Config param optimizer_memory_host_offload: False
I0219 19:34:22.215039 136489015797568 pyconfig.py:333] Config param original_max_position_embeddings: 4096
I0219 19:34:22.215059 136489015797568 pyconfig.py:333] Config param out_hidden_size_for_vit: 512
I0219 19:34:22.215080 136489015797568 pyconfig.py:333] Config param out_proj: RematLocation.REMAT
I0219 19:34:22.215102 136489015797568 pyconfig.py:333] Config param output_dim_for_audio: 512
I0219 19:34:22.215124 136489015797568 pyconfig.py:333] Config param override_logical_axis_rules: False
I0219 19:34:22.215145 136489015797568 pyconfig.py:333] Config param override_model_config: False
I0219 19:34:22.215166 136489015797568 pyconfig.py:333] Config param packing: True
I0219 19:34:22.215186 136489015797568 pyconfig.py:333] Config param pagedattn_head_dim_alignment: 128
I0219 19:34:22.215207 136489015797568 pyconfig.py:333] Config param pagedattn_max_pages_per_group: -1
I0219 19:34:22.215228 136489015797568 pyconfig.py:333] Config param pagedattn_num_pages: 64
I0219 19:34:22.215249 136489015797568 pyconfig.py:333] Config param pagedattn_pages_per_compute_block: 4
I0219 19:34:22.215270 136489015797568 pyconfig.py:333] Config param pagedattn_tokens_per_page: 32
I0219 19:34:22.215291 136489015797568 pyconfig.py:333] Config param param_scan_axis: 1
I0219 19:34:22.215312 136489015797568 pyconfig.py:333] Config param parameter_memory_host_offload: False
I0219 19:34:22.215332 136489015797568 pyconfig.py:333] Config param partial_rotary_factor: 1.0
I0219 19:34:22.215354 136489015797568 pyconfig.py:333] Config param patch_size_for_vit: 14
I0219 19:34:22.215374 136489015797568 pyconfig.py:333] Config param penalty_incorrect_answer: -1.0
I0219 19:34:22.215396 136489015797568 pyconfig.py:333] Config param penalty_incorrect_format: -0.5
I0219 19:34:22.215418 136489015797568 pyconfig.py:333] Config param per_device_batch_size: 1
I0219 19:34:22.215439 136489015797568 pyconfig.py:333] Config param per_device_batch_size_increment: 2.0
I0219 19:34:22.215461 136489015797568 pyconfig.py:333] Config param per_device_batch_size_start: 4.0
I0219 19:34:22.215482 136489015797568 pyconfig.py:333] Config param pipeline_delay_activation_forwarding: False
I0219 19:34:22.215503 136489015797568 pyconfig.py:333] Config param pipeline_fsdp_ag_once: False
I0219 19:34:22.215524 136489015797568 pyconfig.py:333] Config param pipeline_parallel_layers: 34
I0219 19:34:22.215546 136489015797568 pyconfig.py:333] Config param pixel_shuffle_ratio_for_vit: 0.5
I0219 19:34:22.215568 136489015797568 pyconfig.py:333] Config param posemb_type_for_vit: learn
I0219 19:34:22.215589 136489015797568 pyconfig.py:333] Config param position_id_per_seconds: 25
I0219 19:34:22.215610 136489015797568 pyconfig.py:333] Config param prefill_cache_axis_order: 1,2,0,3
I0219 19:34:22.215631 136489015797568 pyconfig.py:333] Config param prefill_cache_dir: 
I0219 19:34:22.215652 136489015797568 pyconfig.py:333] Config param prefill_chunk_size: 256
I0219 19:34:22.215673 136489015797568 pyconfig.py:333] Config param prefill_slice: v5e-16
I0219 19:34:22.215694 136489015797568 pyconfig.py:333] Config param prefix_caching_dram_byte: 100000000000
I0219 19:34:22.215715 136489015797568 pyconfig.py:333] Config param prefix_caching_hbm_byte: 10000000000
I0219 19:34:22.215736 136489015797568 pyconfig.py:333] Config param profile_cleanly: True
I0219 19:34:22.215756 136489015797568 pyconfig.py:333] Config param profile_periodically_period: -1
I0219 19:34:22.215777 136489015797568 pyconfig.py:333] Config param profiler: ProfilerType.NONE
I0219 19:34:22.215800 136489015797568 pyconfig.py:333] Config param profiler_steps: 5
I0219 19:34:22.215821 136489015797568 pyconfig.py:333] Config param projector_dropout_for_vit: 0.0
I0219 19:34:22.215846 136489015797568 pyconfig.py:333] Config param projector_input_dim_for_vit: 4096
I0219 19:34:22.215868 136489015797568 pyconfig.py:333] Config param projector_output_dim_for_vit: 4096
I0219 19:34:22.215890 136489015797568 pyconfig.py:333] Config param prometheus_port: 0
I0219 19:34:22.215912 136489015797568 pyconfig.py:333] Config param prompt: I love to
I0219 19:34:22.215968 136489015797568 pyconfig.py:333] Config param q_lora_rank: 0
I0219 19:34:22.215991 136489015797568 pyconfig.py:333] Config param qk_nope_head_dim: 128
I0219 19:34:22.216014 136489015797568 pyconfig.py:333] Config param qk_rope_head_dim: 64
I0219 19:34:22.216036 136489015797568 pyconfig.py:333] Config param qkv_proj: RematLocation.REMAT
I0219 19:34:22.216058 136489015797568 pyconfig.py:333] Config param quant_cfg_path: 
I0219 19:34:22.216080 136489015797568 pyconfig.py:333] Config param quantization: QuantizationType.NONE
I0219 19:34:22.216103 136489015797568 pyconfig.py:333] Config param quantization_local_shard_count: 1
I0219 19:34:22.216125 136489015797568 pyconfig.py:333] Config param quantize_kvcache: False
I0219 19:34:22.216147 136489015797568 pyconfig.py:333] Config param query_proj: RematLocation.REMAT
I0219 19:34:22.216168 136489015797568 pyconfig.py:333] Config param ragged_block_size: 256
I0219 19:34:22.216190 136489015797568 pyconfig.py:333] Config param rampup_end_step: 0
I0219 19:34:22.216212 136489015797568 pyconfig.py:333] Config param rampup_samples_per_increment_to_load: None
I0219 19:34:22.216233 136489015797568 pyconfig.py:333] Config param reasoning_end_token: </reasoning>
I0219 19:34:22.216254 136489015797568 pyconfig.py:333] Config param reasoning_start_token: <reasoning>
I0219 19:34:22.216274 136489015797568 pyconfig.py:333] Config param record_internal_nn_metrics: 0
I0219 19:34:22.216295 136489015797568 pyconfig.py:333] Config param remat_policy: full
I0219 19:34:22.216316 136489015797568 pyconfig.py:333] Config param remat_policy_for_vit: minimal
I0219 19:34:22.216337 136489015797568 pyconfig.py:333] Config param replicate_quant_scale: False
I0219 19:34:22.216358 136489015797568 pyconfig.py:333] Config param replicator_backup_interval_minutes: 0
I0219 19:34:22.216379 136489015797568 pyconfig.py:333] Config param report_heartbeat_metric_for_gcp_monitoring: False
I0219 19:34:22.216399 136489015797568 pyconfig.py:333] Config param report_performance_metric_for_gcp_monitoring: False
I0219 19:34:22.216420 136489015797568 pyconfig.py:333] Config param reshape_q: False
I0219 19:34:22.216441 136489015797568 pyconfig.py:333] Config param return_log_prob: False
I0219 19:34:22.216462 136489015797568 pyconfig.py:333] Config param reuse_example_batch: 0
I0219 19:34:22.216483 136489015797568 pyconfig.py:333] Config param reward_exact_format_match: 3.0
I0219 19:34:22.216504 136489015797568 pyconfig.py:333] Config param reward_partial_format_match: 0.5
I0219 19:34:22.216525 136489015797568 pyconfig.py:333] Config param reward_ratio_guess_to_answer_high: 0.5
I0219 19:34:22.216547 136489015797568 pyconfig.py:333] Config param reward_ratio_guess_to_answer_low: 0.25
I0219 19:34:22.216568 136489015797568 pyconfig.py:333] Config param reward_white_space_format_match: 1.5
I0219 19:34:22.216590 136489015797568 pyconfig.py:333] Config param rl: {'num_generations': 2, 'num_iterations': 1, 'grpo_beta': 0.08, 'grpo_epsilon': 0.2, 'loss_algo': 'grpo'}
I0219 19:34:22.216615 136489015797568 pyconfig.py:333] Config param rollout_data_parallelism: -1
I0219 19:34:22.216636 136489015797568 pyconfig.py:333] Config param rollout_tensor_parallelism: -1
I0219 19:34:22.216657 136489015797568 pyconfig.py:333] Config param rope_attention_scaling: False
I0219 19:34:22.216678 136489015797568 pyconfig.py:333] Config param rope_factor: 40
I0219 19:34:22.216698 136489015797568 pyconfig.py:333] Config param rope_interleave: True
I0219 19:34:22.216719 136489015797568 pyconfig.py:333] Config param rope_linear_scaling_factor: 8.0
I0219 19:34:22.216740 136489015797568 pyconfig.py:333] Config param rope_max_timescale: 1000000
I0219 19:34:22.216761 136489015797568 pyconfig.py:333] Config param rope_min_timescale: 1
I0219 19:34:22.216782 136489015797568 pyconfig.py:333] Config param rope_theta_for_vit: 10000
I0219 19:34:22.216803 136489015797568 pyconfig.py:333] Config param rope_truncate: True
I0219 19:34:22.216824 136489015797568 pyconfig.py:333] Config param rope_type: RopeType.DEFAULT
I0219 19:34:22.216850 136489015797568 pyconfig.py:333] Config param rope_use_scale: True
I0219 19:34:22.216872 136489015797568 pyconfig.py:333] Config param routed_bias: False
I0219 19:34:22.216893 136489015797568 pyconfig.py:333] Config param routed_bias_update_rate: 0.0
I0219 19:34:22.216915 136489015797568 pyconfig.py:333] Config param routed_scaling_factor: 1.0
I0219 19:34:22.216947 136489015797568 pyconfig.py:333] Config param routed_score_func: 
I0219 19:34:22.216969 136489015797568 pyconfig.py:333] Config param run_name: export
I0219 19:34:22.216991 136489015797568 pyconfig.py:333] Config param sa_block_kv: 512
I0219 19:34:22.217012 136489015797568 pyconfig.py:333] Config param sa_block_kv_compute: 512
I0219 19:34:22.217034 136489015797568 pyconfig.py:333] Config param sa_block_kv_dkv: 512
I0219 19:34:22.217055 136489015797568 pyconfig.py:333] Config param sa_block_kv_dkv_compute: 512
I0219 19:34:22.217077 136489015797568 pyconfig.py:333] Config param sa_block_kv_dq: 512
I0219 19:34:22.217097 136489015797568 pyconfig.py:333] Config param sa_block_q: 512
I0219 19:34:22.217119 136489015797568 pyconfig.py:333] Config param sa_block_q_dkv: 512
I0219 19:34:22.217140 136489015797568 pyconfig.py:333] Config param sa_block_q_dq: 512
I0219 19:34:22.217162 136489015797568 pyconfig.py:333] Config param sa_k_layout: HEAD_DIM_MINOR
I0219 19:34:22.217183 136489015797568 pyconfig.py:333] Config param sa_q_layout: HEAD_DIM_MINOR
I0219 19:34:22.217204 136489015797568 pyconfig.py:333] Config param sa_use_fused_bwd_kernel: False
I0219 19:34:22.217226 136489015797568 pyconfig.py:333] Config param sa_v_layout: HEAD_DIM_MINOR
I0219 19:34:22.217247 136489015797568 pyconfig.py:333] Config param sampler_devices_fraction: 0.5
I0219 19:34:22.217268 136489015797568 pyconfig.py:333] Config param save_checkpoint_on_completion: True
I0219 19:34:22.217291 136489015797568 pyconfig.py:333] Config param save_config_to_gcs: False
I0219 19:34:22.217312 136489015797568 pyconfig.py:333] Config param save_quantized_params_path: 
I0219 19:34:22.217333 136489015797568 pyconfig.py:333] Config param scale_embedding_for_audio: True
I0219 19:34:22.217354 136489015797568 pyconfig.py:333] Config param scan_layers: True
I0219 19:34:22.217375 136489015797568 pyconfig.py:333] Config param scan_layers_per_stage: False
I0219 19:34:22.217396 136489015797568 pyconfig.py:333] Config param scan_pipeline_iterations: True
I0219 19:34:22.217416 136489015797568 pyconfig.py:333] Config param set_remat_policy_on_layers_per_stage: False
I0219 19:34:22.217437 136489015797568 pyconfig.py:333] Config param set_remat_policy_on_pipeline_iterations: True
I0219 19:34:22.217458 136489015797568 pyconfig.py:333] Config param sft_train_on_completion_only: False
I0219 19:34:22.217478 136489015797568 pyconfig.py:333] Config param shard_exp_on_fsdp: False
I0219 19:34:22.217499 136489015797568 pyconfig.py:333] Config param shard_mode: ShardMode.AUTO
I0219 19:34:22.217523 136489015797568 pyconfig.py:333] Config param shard_optimizer_over_data: False
I0219 19:34:22.217545 136489015797568 pyconfig.py:333] Config param sharding_strategy: None
I0219 19:34:22.217566 136489015797568 pyconfig.py:333] Config param sharding_tolerance: 0.02
I0219 19:34:22.217587 136489015797568 pyconfig.py:333] Config param shardy: True
I0219 19:34:22.217608 136489015797568 pyconfig.py:333] Config param shared_experts: 1
I0219 19:34:22.217629 136489015797568 pyconfig.py:333] Config param sinkhorn_iterations: 20
I0219 19:34:22.217650 136489015797568 pyconfig.py:333] Config param skip_first_n_steps_for_profiler: 1
I0219 19:34:22.217671 136489015797568 pyconfig.py:333] Config param skip_jax_distributed_system: True
I0219 19:34:22.217693 136489015797568 pyconfig.py:333] Config param sliding_window_size: 1024
I0219 19:34:22.217714 136489015797568 pyconfig.py:333] Config param solution_end_token: </answer>
I0219 19:34:22.217737 136489015797568 pyconfig.py:333] Config param solution_start_token: <answer>
I0219 19:34:22.217759 136489015797568 pyconfig.py:333] Config param source_checkpoint_layout: orbax
I0219 19:34:22.217781 136489015797568 pyconfig.py:333] Config param sparse_matmul: True
I0219 19:34:22.217802 136489015797568 pyconfig.py:333] Config param spatial_merge_size_for_vit: 2
I0219 19:34:22.217822 136489015797568 pyconfig.py:333] Config param stack_prefill_result_cache: False
I0219 19:34:22.217846 136489015797568 pyconfig.py:333] Config param stack_trace_interval_seconds: 600
I0219 19:34:22.217868 136489015797568 pyconfig.py:333] Config param stack_trace_to_cloud: False
I0219 19:34:22.217889 136489015797568 pyconfig.py:333] Config param step_deviation_interval_seconds: 30
I0219 19:34:22.217910 136489015797568 pyconfig.py:333] Config param steps: 150001
I0219 19:34:22.217938 136489015797568 pyconfig.py:333] Config param student_overrides: {}
I0219 19:34:22.217961 136489015797568 pyconfig.py:333] Config param subslice_shape: 
I0219 19:34:22.217983 136489015797568 pyconfig.py:333] Config param swap_space_vllm_gb: 2
I0219 19:34:22.218004 136489015797568 pyconfig.py:333] Config param target_eval_loss: 0.0
I0219 19:34:22.218025 136489015797568 pyconfig.py:333] Config param teacher_overrides: {}
I0219 19:34:22.218046 136489015797568 pyconfig.py:333] Config param temperature_tuning: False
I0219 19:34:22.218067 136489015797568 pyconfig.py:333] Config param temporal_patch_size_for_vit: 2
I0219 19:34:22.218088 136489015797568 pyconfig.py:333] Config param tensorboard_dir: gs://mymodel-training/checkpoints/gemma3-4b-pt-hf/export/tensorboard/
I0219 19:34:22.218110 136489015797568 pyconfig.py:333] Config param tensors_on_device: None
I0219 19:34:22.218131 136489015797568 pyconfig.py:333] Config param tensors_to_offload: None
I0219 19:34:22.218153 136489015797568 pyconfig.py:333] Config param tile_size_for_vit: 336
I0219 19:34:22.218174 136489015797568 pyconfig.py:333] Config param tokenize_eval_data: True
I0219 19:34:22.218195 136489015797568 pyconfig.py:333] Config param tokenize_train_data: True
I0219 19:34:22.218216 136489015797568 pyconfig.py:333] Config param tokenizer_path: src/maxtext/assets/tokenizers/tokenizer.llama2
I0219 19:34:22.218238 136489015797568 pyconfig.py:333] Config param tokenizer_type: TokenizerType.SENTENCEPIECE
I0219 19:34:22.218261 136489015797568 pyconfig.py:333] Config param topk_routing_group: -1
I0219 19:34:22.218284 136489015797568 pyconfig.py:333] Config param train_data_columns: ['text']
I0219 19:34:22.218306 136489015797568 pyconfig.py:333] Config param train_fraction: 1.0
I0219 19:34:22.218328 136489015797568 pyconfig.py:333] Config param train_image_column: image
I0219 19:34:22.218349 136489015797568 pyconfig.py:333] Config param train_split: train
I0219 19:34:22.218370 136489015797568 pyconfig.py:333] Config param trainable_position_size: -1
I0219 19:34:22.218391 136489015797568 pyconfig.py:333] Config param trainer_devices_fraction: 0.5
I0219 19:34:22.218413 136489015797568 pyconfig.py:333] Config param upload_all_profiler_results: False
I0219 19:34:22.218434 136489015797568 pyconfig.py:333] Config param use_2d_fsdp_sharding: False
I0219 19:34:22.218455 136489015797568 pyconfig.py:333] Config param use_audio: False
I0219 19:34:22.218476 136489015797568 pyconfig.py:333] Config param use_audio_in_video: False
I0219 19:34:22.218497 136489015797568 pyconfig.py:333] Config param use_batch_split_schedule: False
I0219 19:34:22.218517 136489015797568 pyconfig.py:333] Config param use_chat_template: False
I0219 19:34:22.218538 136489015797568 pyconfig.py:333] Config param use_chunked_prefill: False
I0219 19:34:22.218559 136489015797568 pyconfig.py:333] Config param use_custom_sort_vjp: True
I0219 19:34:22.218580 136489015797568 pyconfig.py:333] Config param use_dpo: False
I0219 19:34:22.218600 136489015797568 pyconfig.py:333] Config param use_grpo: True
I0219 19:34:22.218621 136489015797568 pyconfig.py:333] Config param use_iota_embed: False
I0219 19:34:22.218641 136489015797568 pyconfig.py:333] Config param use_jax_splash: False
I0219 19:34:22.218662 136489015797568 pyconfig.py:333] Config param use_max_logit_estimate: -1
I0219 19:34:22.218680 136489015797568 pyconfig.py:333] Config param use_mrope: False
I0219 19:34:22.218701 136489015797568 pyconfig.py:333] Config param use_multimodal: False
I0219 19:34:22.218722 136489015797568 pyconfig.py:333] Config param use_pathways: True
I0219 19:34:22.218742 136489015797568 pyconfig.py:333] Config param use_post_attn_norm: True
I0219 19:34:22.218763 136489015797568 pyconfig.py:333] Config param use_post_ffw_norm: True
I0219 19:34:22.218784 136489015797568 pyconfig.py:333] Config param use_qk_norm: False
I0219 19:34:22.218804 136489015797568 pyconfig.py:333] Config param use_qk_norm_in_gdn: True
I0219 19:34:22.218825 136489015797568 pyconfig.py:333] Config param use_qwix_quantization: False
I0219 19:34:22.218848 136489015797568 pyconfig.py:333] Config param use_ragged_attention: False
I0219 19:34:22.218869 136489015797568 pyconfig.py:333] Config param use_random_routing: False
I0219 19:34:22.218890 136489015797568 pyconfig.py:333] Config param use_replicator_service: False
I0219 19:34:22.218911 136489015797568 pyconfig.py:333] Config param use_ring_of_experts: False
I0219 19:34:22.218939 136489015797568 pyconfig.py:333] Config param use_sft: False
I0219 19:34:22.218960 136489015797568 pyconfig.py:333] Config param use_sparse_indexer: False
I0219 19:34:22.218981 136489015797568 pyconfig.py:333] Config param use_splash_scheduler: False
I0219 19:34:22.219002 136489015797568 pyconfig.py:333] Config param use_tokamax_gmm: False
I0219 19:34:22.219023 136489015797568 pyconfig.py:333] Config param use_tokamax_splash: False
I0219 19:34:22.219044 136489015797568 pyconfig.py:333] Config param use_truncation: True
I0219 19:34:22.219064 136489015797568 pyconfig.py:333] Config param use_tunix_gradient_accumulation: False
I0219 19:34:22.219085 136489015797568 pyconfig.py:333] Config param use_untrainable_positional_embedding: False
I0219 19:34:22.219105 136489015797568 pyconfig.py:333] Config param use_vertex_tensorboard: False
I0219 19:34:22.219126 136489015797568 pyconfig.py:333] Config param using_pipeline_parallelism: False
I0219 19:34:22.219147 136489015797568 pyconfig.py:333] Config param v_head_dim: 128
I0219 19:34:22.219167 136489015797568 pyconfig.py:333] Config param value_proj: RematLocation.REMAT
I0219 19:34:22.219189 136489015797568 pyconfig.py:333] Config param vertex_tensorboard_project: 
I0219 19:34:22.219210 136489015797568 pyconfig.py:333] Config param vertex_tensorboard_region: 
I0219 19:34:22.219230 136489015797568 pyconfig.py:333] Config param video_path: 
I0219 19:34:22.219251 136489015797568 pyconfig.py:333] Config param vision_output_dim_for_vit: 4096
I0219 19:34:22.219272 136489015797568 pyconfig.py:333] Config param vllm_additional_config: {}
I0219 19:34:22.219294 136489015797568 pyconfig.py:333] Config param vllm_hf_config_path: 
I0219 19:34:22.219315 136489015797568 pyconfig.py:333] Config param vocab_size: 262144
I0219 19:34:22.219336 136489015797568 pyconfig.py:333] Config param warmup_steps_fraction: 0.1
I0219 19:34:22.219358 136489015797568 pyconfig.py:333] Config param weight_dtype: float32
I0219 19:34:22.219392 136489015797568 pyconfig.py:333] Config param weight_quantization_calibration_method: absmax
I0219 19:34:22.219416 136489015797568 pyconfig.py:333] Config param wi_combine_scopes: False
I0219 19:34:22.219438 136489015797568 pyconfig.py:333] Config param wi_tile_dlhs_batch_seq: 512
I0219 19:34:22.219460 136489015797568 pyconfig.py:333] Config param wi_tile_dlhs_buffer_count: 2
I0219 19:34:22.219482 136489015797568 pyconfig.py:333] Config param wi_tile_dlhs_embed_dim: 1024
I0219 19:34:22.219504 136489015797568 pyconfig.py:333] Config param wi_tile_dlhs_mlp_dim: 1024
I0219 19:34:22.219525 136489015797568 pyconfig.py:333] Config param wi_tile_drhs_batch_seq: 512
I0219 19:34:22.219547 136489015797568 pyconfig.py:333] Config param wi_tile_drhs_buffer_count: 2
I0219 19:34:22.219569 136489015797568 pyconfig.py:333] Config param wi_tile_drhs_embed_dim: 1024
I0219 19:34:22.219590 136489015797568 pyconfig.py:333] Config param wi_tile_drhs_mlp_dim: 1024
I0219 19:34:22.219612 136489015797568 pyconfig.py:333] Config param wi_tile_fwd_batch_seq: 512
I0219 19:34:22.219633 136489015797568 pyconfig.py:333] Config param wi_tile_fwd_buffer_count: 2
I0219 19:34:22.219654 136489015797568 pyconfig.py:333] Config param wi_tile_fwd_embed_dim: 1024
I0219 19:34:22.219676 136489015797568 pyconfig.py:333] Config param wi_tile_fwd_mlp_dim: 1024
I0219 19:34:22.219698 136489015797568 pyconfig.py:333] Config param wo_combine_scopes: False
I0219 19:34:22.219719 136489015797568 pyconfig.py:333] Config param wo_tile_dlhs_batch_seq: 512
I0219 19:34:22.219740 136489015797568 pyconfig.py:333] Config param wo_tile_dlhs_buffer_count: 2
I0219 19:34:22.219761 136489015797568 pyconfig.py:333] Config param wo_tile_dlhs_embed_dim: 1024
I0219 19:34:22.219782 136489015797568 pyconfig.py:333] Config param wo_tile_dlhs_mlp_dim: 1024
I0219 19:34:22.219803 136489015797568 pyconfig.py:333] Config param wo_tile_drhs_batch_seq: 512
I0219 19:34:22.219824 136489015797568 pyconfig.py:333] Config param wo_tile_drhs_buffer_count: 2
I0219 19:34:22.219849 136489015797568 pyconfig.py:333] Config param wo_tile_drhs_embed_dim: 1024
I0219 19:34:22.219871 136489015797568 pyconfig.py:333] Config param wo_tile_drhs_mlp_dim: 1024
I0219 19:34:22.219892 136489015797568 pyconfig.py:333] Config param wo_tile_fwd_batch_seq: 512
I0219 19:34:22.219913 136489015797568 pyconfig.py:333] Config param wo_tile_fwd_buffer_count: 2
I0219 19:34:22.219945 136489015797568 pyconfig.py:333] Config param wo_tile_fwd_embed_dim: 1024
I0219 19:34:22.219967 136489015797568 pyconfig.py:333] Config param wo_tile_fwd_mlp_dim: 1024
I0219 19:34:22.219989 136489015797568 pyconfig.py:333] Config param wsd_decay_steps_fraction: 0.1
I0219 19:34:22.220011 136489015797568 pyconfig.py:333] Config param wsd_decay_style: WsdDecayStyle.LINEAR
I0219 19:34:22.220036 136489015797568 pyconfig.py:333] Config param xprof_e2e_enable_fw_power_level_event: False
I0219 19:34:22.220058 136489015797568 pyconfig.py:333] Config param xprof_e2e_enable_fw_thermal_event: False
I0219 19:34:22.220080 136489015797568 pyconfig.py:333] Config param xprof_e2e_enable_fw_throttle_event: False
I0219 19:34:22.220101 136489015797568 pyconfig.py:333] Config param xprof_tpu_power_trace_level: 0
I0219 19:34:22.220524 136489015797568 max_utils.py:741] System Information: Jax Version: 0.8.1
I0219 19:34:22.220575 136489015797568 max_utils.py:742] System Information: Jaxlib Version: 0.8.1
I0219 19:34:22.220621 136489015797568 max_utils.py:743] System Information: Jax Backend: cpu
I0219 19:34:22.220654 136489015797568 to_huggingface.py:218] 
Loading Orbax checkpoint from: gs://mymodel-training/checkpoints/gemma-3-4b-pt-orbax/0/items
I0219 19:34:22.220706 136489015797568 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None
I0219 19:34:22.221087 136489015797568 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=True, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7c210400dd60>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 16000000000 (14.9 GiB)
I0219 19:34:22.221148 136489015797568 abstract_checkpointer.py:35] orbax-checkpoint version: 0.11.28
W0219 19:34:22.434855 136489015797568 checkpoint.py:202] Metadata file does not exist: gs://mymodel-training/checkpoints/gemma-3-4b-pt-orbax/0/items/_CHECKPOINT_METADATA
I0219 19:34:22.660713     549 google_auth_provider.cc:181] Running on GCE, using service account mymodel-477310.svc.id.goog
I0219 19:34:23.055272 136489015797568 checkpointer.py:304] Restoring checkpoint from gs://mymodel-training/checkpoints/gemma-3-4b-pt-orbax/0/items.
I0219 19:34:32.624639 136489015797568 base_pytree_checkpoint_handler.py:128] [process=0] /jax/checkpoint/read/gbytes_per_sec: 1.530 GiB/s (total gbytes: 14.5 GiB) (time elapsed: 9 seconds) (per-host)
I0219 19:34:32.625083 136489015797568 checkpointer.py:318] Finished restoring checkpoint in 9.69 seconds from gs://mymodel-training/checkpoints/gemma-3-4b-pt-orbax/0/items.
I0219 19:34:32.625557 136489015797568 to_huggingface.py:221] Elapse for checkpoint load: 0.17 min
I0219 19:34:36.252872 136489015797568 utils.py:934] Detected Linen checkpoint structure
I0219 19:34:36.254182 136489015797568 utils.py:147] Warning: extra keys in param_map are skipped: {'params-vision_encoder-Gemma3VisionEncoderLayer_0-Transformer-encoderblock-MultiHeadDotProductAttention_0-out-kernel', 'params-vision_encoder-Gemma3VisionEncoderLayer_0-Transformer-encoderblock-MultiHeadDotProductAttention_0-value-kernel', 'params-vision_encoder-Gemma3VisionEncoderLayer_0-Transformer-encoder_norm-scale', 'params-vision_encoder-Gemma3VisionEncoderLayer_0-Transformer-encoderblock-LayerNorm_0-scale', 'params-vision_encoder-Gemma3VisionEncoderLayer_0-pos_embedding', 'params-vision_encoder-Gemma3VisionEncoderLayer_0-Transformer-encoderblock-MlpBlockViT_0-Dense_0-kernel', 'params-vision_encoder-Gemma3VisionEncoderLayer_0-Transformer-encoderblock-MlpBlockViT_0-Dense_1-bias', 'params-vision_encoder-Gemma3VisionEncoderLayer_0-Transformer-encoderblock-LayerNorm_1-bias', 'params-vision_encoder-Gemma3VisionEncoderLayer_0-Transformer-encoderblock-MlpBlockViT_0-Dense_1-kernel', 'params-vision_encoder-Gemma3VisionEncoderLayer_0-Transformer-encoderblock-MultiHeadDotProductAttention_0-query-bias', 'params-vision_encoder-Gemma3VisionEncoderLayer_0-Transformer-encoderblock-LayerNorm_0-bias', 'params-vision_encoder-VisionEmbedder_0-mm_soft_embedding_norm-scale', 'params-vision_encoder-Gemma3VisionEncoderLayer_0-Transformer-encoderblock-MultiHeadDotProductAttention_0-key-bias', 'params-vision_encoder-Gemma3VisionEncoderLayer_0-Transformer-encoderblock-MultiHeadDotProductAttention_0-query-kernel', 'params-vision_encoder-Gemma3VisionEncoderLayer_0-Transformer-encoderblock-MultiHeadDotProductAttention_0-value-bias', 'params-vision_encoder-Gemma3VisionEncoderLayer_0-Transformer-encoderblock-MultiHeadDotProductAttention_0-out-bias', 'params-vision_encoder-Gemma3VisionEncoderLayer_0-embedding-kernel', 'params-vision_encoder-Gemma3VisionEncoderLayer_0-Transformer-encoderblock-MlpBlockViT_0-Dense_0-bias', 'params-vision_encoder-Gemma3VisionEncoderLayer_0-Transformer-encoder_norm-bias', 'params-vision_encoder-VisionEmbedder_0-mm_input_projection-w', 'params-vision_encoder-Gemma3VisionEncoderLayer_0-embedding-bias', 'params-vision_encoder-Gemma3VisionEncoderLayer_0-Transformer-encoderblock-MultiHeadDotProductAttention_0-key-kernel', 'params-vision_encoder-Gemma3VisionEncoderLayer_0-Transformer-encoderblock-LayerNorm_1-scale'}
I0219 19:34:36.254275 136489015797568 to_huggingface.py:262] 
Proccessing weight...

  0%|          | 0/132 [00:00<?, ?it/s, RAM: 26.9/251.9GB (10.7%)]I0219 19:34:36.255150 136489015797568 utils.py:255] maxtext param: params-token_embedder-embedding
I0219 19:34:36.255224 136489015797568 utils.py:271] 	unscan

  1%|          | 1/132 [00:01<02:49,  1.29s/it, RAM: 29.5/251.9GB (11.7%)]I0219 19:34:37.547821 136489015797568 utils.py:255] maxtext param: params-decoder-decoder_norm-scale
I0219 19:34:37.547920 136489015797568 utils.py:271] 	unscan
I0219 19:34:37.577153 136489015797568 utils.py:255] maxtext param: params-decoder-layers-layers_0-pre_self_attention_norm-scale
I0219 19:34:37.577260 136489015797568 utils.py:287] 	scan

  2%|▏         | 3/132 [00:01<00:51,  2.48it/s, RAM: 29.5/251.9GB (11.7%)]I0219 19:34:37.730221 136489015797568 utils.py:255] maxtext param: params-decoder-layers-layers_0-post_self_attention_norm-scale
I0219 19:34:37.730345 136489015797568 utils.py:287] 	scan
I0219 19:34:37.731618 136489015797568 utils.py:255] maxtext param: params-decoder-layers-layers_0-self_attention-query_norm-scale
I0219 19:34:37.731712 136489015797568 utils.py:287] 	scan

  4%|▍         | 5/132 [00:01<00:31,  4.01it/s, RAM: 29.5/251.9GB (11.7%)]I0219 19:34:37.939030 136489015797568 utils.py:255] maxtext param: params-decoder-layers-layers_0-self_attention-key_norm-scale
I0219 19:34:37.939126 136489015797568 utils.py:287] 	scan
I0219 19:34:37.940351 136489015797568 utils.py:255] maxtext param: params-decoder-layers-layers_0-pre_ffw_norm-scale
I0219 19:34:37.940443 136489015797568 utils.py:287] 	scan
I0219 19:34:37.941660 136489015797568 utils.py:255] maxtext param: params-decoder-layers-layers_0-post_ffw_norm-scale
I0219 19:34:37.941724 136489015797568 utils.py:287] 	scan
I0219 19:34:37.942892 136489015797568 utils.py:255] maxtext param: params-decoder-layers-layers_0-self_attention-query-kernel
I0219 19:34:37.942982 136489015797568 utils.py:287] 	scan

  7%|▋         | 9/132 [00:01<00:17,  7.10it/s, RAM: 29.5/251.9GB (11.7%)]I0219 19:34:38.206676 136489015797568 utils.py:255] maxtext param: params-decoder-layers-layers_0-self_attention-key-kernel
I0219 19:34:38.206774 136489015797568 utils.py:287] 	scan
I0219 19:34:38.437133 136489015797568 utils.py:255] maxtext param: params-decoder-layers-layers_0-self_attention-value-kernel
I0219 19:34:38.437249 136489015797568 utils.py:287] 	scan

  8%|▊         | 11/132 [00:02<00:16,  7.31it/s, RAM: 29.5/251.9GB (11.7%)]I0219 19:34:38.461414 136489015797568 utils.py:255] maxtext param: params-decoder-layers-layers_0-self_attention-out-kernel
I0219 19:34:38.461508 136489015797568 utils.py:287] 	scan

  9%|▉         | 12/132 [00:02<00:18,  6.53it/s, RAM: 29.7/251.9GB (11.8%)]I0219 19:34:38.691223 136489015797568 utils.py:255] maxtext param: params-decoder-layers-layers_0-mlp-wi_0-kernel
I0219 19:34:38.691322 136489015797568 utils.py:287] 	scan

 10%|▉         | 13/132 [00:03<00:29,  4.04it/s, RAM: 30.5/251.9GB (12.1%)]I0219 19:34:39.316456 136489015797568 utils.py:255] maxtext param: params-decoder-layers-layers_0-mlp-wi_1-kernel
I0219 19:34:39.316563 136489015797568 utils.py:287] 	scan

 11%|█         | 14/132 [00:03<00:34,  3.42it/s, RAM: 31.0/251.9GB (12.3%)]I0219 19:34:39.766273 136489015797568 utils.py:255] maxtext param: params-decoder-layers-layers_0-mlp-wo-kernel
I0219 19:34:39.766367 136489015797568 utils.py:287] 	scan

 11%|█▏        | 15/132 [00:04<00:44,  2.62it/s, RAM: 31.4/251.9GB (12.5%)]I0219 19:34:40.426199 136489015797568 utils.py:255] maxtext param: params-decoder-layers-layers_1-pre_self_attention_norm-scale
I0219 19:34:40.426294 136489015797568 utils.py:287] 	scan
I0219 19:34:40.427619 136489015797568 utils.py:255] maxtext param: params-decoder-layers-layers_1-post_self_attention_norm-scale
I0219 19:34:40.427703 136489015797568 utils.py:287] 	scan
I0219 19:34:40.428637 136489015797568 utils.py:255] maxtext param: params-decoder-layers-layers_1-self_attention-query_norm-scale
I0219 19:34:40.428701 136489015797568 utils.py:287] 	scan
I0219 19:34:40.429536 136489015797568 utils.py:255] maxtext param: params-decoder-layers-layers_1-self_attention-key_norm-scale
I0219 19:34:40.429608 136489015797568 utils.py:287] 	scan
I0219 19:34:40.430420 136489015797568 utils.py:255] maxtext param: params-decoder-layers-layers_1-pre_ffw_norm-scale
I0219 19:34:40.430478 136489015797568 utils.py:287] 	scan
I0219 19:34:40.431437 136489015797568 utils.py:255] maxtext param: params-decoder-layers-layers_1-post_ffw_norm-scale
I0219 19:34:40.431491 136489015797568 utils.py:287] 	scan
I0219 19:34:40.432412 136489015797568 utils.py:255] maxtext param: params-decoder-layers-layers_1-self_attention-query-kernel
I0219 19:34:40.432466 136489015797568 utils.py:287] 	scan
I0219 19:34:40.483455 136489015797568 utils.py:255] maxtext param: params-decoder-layers-layers_1-self_attention-key-kernel
I0219 19:34:40.483574 136489015797568 utils.py:287] 	scan
I0219 19:34:40.506306 136489015797568 utils.py:255] maxtext param: params-decoder-layers-layers_1-self_attention-value-kernel
I0219 19:34:40.506418 136489015797568 utils.py:287] 	scan

 18%|█▊        | 24/132 [00:04<00:11,  9.72it/s, RAM: 31.5/251.9GB (12.5%)]I0219 19:34:40.544690 136489015797568 utils.py:255] maxtext param: params-decoder-layers-layers_1-self_attention-out-kernel
I0219 19:34:40.544782 136489015797568 utils.py:287] 	scan
I0219 19:34:40.624586 136489015797568 utils.py:255] maxtext param: params-decoder-layers-layers_1-mlp-wi_0-kernel
I0219 19:34:40.624700 136489015797568 utils.py:287] 	scan
I0219 19:34:41.126336 136489015797568 utils.py:255] maxtext param: params-decoder-layers-layers_1-mlp-wi_1-kernel
I0219 19:34:41.126456 136489015797568 utils.py:287] 	scan

 20%|██        | 27/132 [00:05<00:18,  5.72it/s, RAM: 32.6/251.9GB (12.9%)]I0219 19:34:41.666722 136489015797568 utils.py:255] maxtext param: params-decoder-layers-layers_1-mlp-wo-kernel
I0219 19:34:41.666816 136489015797568 utils.py:287] 	scan
I0219 19:34:42.171260 136489015797568 utils.py:255] maxtext param: params-decoder-layers-layers_2-pre_self_attention_norm-scale
I0219 19:34:42.171383 136489015797568 utils.py:287] 	scan

 22%|██▏       | 29/132 [00:05<00:19,  5.25it/s, RAM: 32.9/251.9GB (13.1%)]I0219 19:34:42.172954 136489015797568 utils.py:255] maxtext param: params-decoder-layers-layers_2-post_self_attention_norm-scale
I0219 19:34:42.173028 136489015797568 utils.py:287] 	scan
I0219 19:34:42.174042 136489015797568 utils.py:255] maxtext param: params-decoder-layers-layers_2-self_attention-query_norm-scale
I0219 19:34:42.174105 136489015797568 utils.py:287] 	scan
I0219 19:34:42.174966 136489015797568 utils.py:255] maxtext param: params-decoder-layers-layers_2-self_attention-key_norm-scale
I0219 19:34:42.175023 136489015797568 utils.py:287] 	scan
I0219 19:34:42.175815 136489015797568 utils.py:255] maxtext param: params-decoder-layers-layers_2-pre_ffw_norm-scale
I0219 19:34:42.175883 136489015797568 utils.py:287] 	scan
I0219 19:34:42.176809 136489015797568 utils.py:255] maxtext param: params-decoder-layers-layers_2-post_ffw_norm-scale
I0219 19:34:42.176866 136489015797568 utils.py:287] 	scan
I0219 19:34:42.177762 136489015797568 utils.py:255] maxtext param: params-decoder-layers-layers_2-self_attention-query-kernel
I0219 19:34:42.177820 136489015797568 utils.py:287] 	scan
I0219 19:34:42.228230 136489015797568 utils.py:255] maxtext param: params-decoder-layers-layers_2-self_attention-key-kernel
I0219 19:34:42.228351 136489015797568 utils.py:287] 	scan
I0219 19:34:42.264835 136489015797568 utils.py:255] maxtext param: params-decoder-layers-layers_2-self_attention-value-kernel
I0219 19:34:42.264958 136489015797568 utils.py:287] 	scan

 28%|██▊       | 37/132 [00:06<00:09, 10.28it/s, RAM: 33.0/251.9GB (13.1%)]I0219 19:34:42.303804 136489015797568 utils.py:255] maxtext param: params-decoder-layers-layers_2-self_attention-out-kernel
I0219 19:34:42.303900 136489015797568 utils.py:287] 	scan
I0219 19:34:42.362308 136489015797568 utils.py:255] maxtext param: params-decoder-layers-layers_2-mlp-wi_0-kernel
I0219 19:34:42.362428 136489015797568 utils.py:287] 	scan
I0219 19:34:42.845671 136489015797568 utils.py:255] maxtext param: params-decoder-layers-layers_2-mlp-wi_1-kernel
I0219 19:34:42.845797 136489015797568 utils.py:287] 	scan

 30%|███       | 40/132 [00:07<00:13,  6.59it/s, RAM: 34.1/251.9GB (13.6%)]I0219 19:34:43.328059 136489015797568 utils.py:255] maxtext param: params-decoder-layers-layers_2-mlp-wo-kernel
I0219 19:34:43.328153 136489015797568 utils.py:287] 	scan
I0219 19:34:43.819135 136489015797568 utils.py:255] maxtext param: params-decoder-layers-layers_3-pre_self_attention_norm-scale
I0219 19:34:43.819257 136489015797568 utils.py:287] 	scan
I0219 19:34:43.820519 136489015797568 utils.py:255] maxtext param: params-decoder-layers-layers_3-post_self_attention_norm-scale
I0219 19:34:43.820605 136489015797568 utils.py:287] 	scan

 33%|███▎      | 43/132 [00:07<00:13,  6.46it/s, RAM: 34.6/251.9GB (13.8%)]I0219 19:34:43.821824 136489015797568 utils.py:255] maxtext param: params-decoder-layers-layers_3-self_attention-query_norm-scale
I0219 19:34:43.821881 136489015797568 utils.py:287] 	scan
I0219 19:34:43.822737 136489015797568 utils.py:255] maxtext param: params-decoder-layers-layers_3-self_attention-key_norm-scale
I0219 19:34:43.822795 136489015797568 utils.py:287] 	scan
I0219 19:34:43.823606 136489015797568 utils.py:255] maxtext param: params-decoder-layers-layers_3-pre_ffw_norm-scale
I0219 19:34:43.823660 136489015797568 utils.py:287] 	scan
I0219 19:34:43.824709 136489015797568 utils.py:255] maxtext param: params-decoder-layers-layers_3-post_ffw_norm-scale
I0219 19:34:43.824766 136489015797568 utils.py:287] 	scan
I0219 19:34:43.825650 136489015797568 utils.py:255] maxtext param: params-decoder-layers-layers_3-self_attention-query-kernel
I0219 19:34:43.825705 136489015797568 utils.py:287] 	scan
I0219 19:34:43.874802 136489015797568 utils.py:255] maxtext param: params-decoder-layers-layers_3-self_attention-key-kernel
I0219 19:34:43.874910 136489015797568 utils.py:287] 	scan
I0219 19:34:43.895801 136489015797568 utils.py:255] maxtext param: params-decoder-layers-layers_3-self_attention-value-kernel
I0219 19:34:43.895906 136489015797568 utils.py:287] 	scan
I0219 19:34:43.920958 136489015797568 utils.py:255] maxtext param: params-decoder-layers-layers_3-self_attention-out-kernel
I0219 19:34:43.921065 136489015797568 utils.py:287] 	scan

 39%|███▊      | 51/132 [00:07<00:07, 11.21it/s, RAM: 34.9/251.9GB (13.8%)]I0219 19:34:43.964407 136489015797568 utils.py:255] maxtext param: params-decoder-layers-layers_3-mlp-wi_0-kernel
I0219 19:34:43.964498 136489015797568 utils.py:287] 	scan
I0219 19:34:44.440402 136489015797568 utils.py:255] maxtext param: params-decoder-layers-layers_3-mlp-wi_1-kernel
I0219 19:34:44.440523 136489015797568 utils.py:287] 	scan
I0219 19:34:44.931714 136489015797568 utils.py:255] maxtext param: params-decoder-layers-layers_3-mlp-wo-kernel
I0219 19:34:44.931834 136489015797568 utils.py:287] 	scan

 41%|████      | 54/132 [00:09<00:13,  5.89it/s, RAM: 36.4/251.9GB (14.4%)]I0219 19:34:45.417678 136489015797568 utils.py:255] maxtext param: params-decoder-layers-layers_4-pre_self_attention_norm-scale
I0219 19:34:45.417773 136489015797568 utils.py:287] 	scan
I0219 19:34:45.419103 136489015797568 utils.py:255] maxtext param: params-decoder-layers-layers_4-post_self_attention_norm-scale
I0219 19:34:45.419188 136489015797568 utils.py:287] 	scan
I0219 19:34:45.420132 136489015797568 utils.py:255] maxtext param: params-decoder-layers-layers_4-self_attention-query_norm-scale
I0219 19:34:45.420199 136489015797568 utils.py:287] 	scan
I0219 19:34:45.421061 136489015797568 utils.py:255] maxtext param: params-decoder-layers-layers_4-self_attention-key_norm-scale
I0219 19:34:45.421124 136489015797568 utils.py:287] 	scan
I0219 19:34:45.421917 136489015797568 utils.py:255] maxtext param: params-decoder-layers-layers_4-pre_ffw_norm-scale
I0219 19:34:45.421994 136489015797568 utils.py:287] 	scan
I0219 19:34:45.422986 136489015797568 utils.py:255] maxtext param: params-decoder-layers-layers_4-post_ffw_norm-scale
I0219 19:34:45.423049 136489015797568 utils.py:287] 	scan
I0219 19:34:45.423983 136489015797568 utils.py:255] maxtext param: params-decoder-layers-layers_4-self_attention-query-kernel
I0219 19:34:45.424042 136489015797568 utils.py:287] 	scan
I0219 19:34:45.473612 136489015797568 utils.py:255] maxtext param: params-decoder-layers-layers_4-self_attention-key-kernel
I0219 19:34:45.473726 136489015797568 utils.py:287] 	scan
I0219 19:34:45.498576 136489015797568 utils.py:255] maxtext param: params-decoder-layers-layers_4-self_attention-value-kernel
I0219 19:34:45.498687 136489015797568 utils.py:287] 	scan

 48%|████▊     | 63/132 [00:09<00:06, 10.41it/s, RAM: 36.4/251.9GB (14.5%)]I0219 19:34:45.523682 136489015797568 utils.py:255] maxtext param: params-decoder-layers-layers_4-self_attention-out-kernel
I0219 19:34:45.523773 136489015797568 utils.py:287] 	scan
I0219 19:34:45.567882 136489015797568 utils.py:255] maxtext param: params-decoder-layers-layers_4-mlp-wi_0-kernel
I0219 19:34:45.568004 136489015797568 utils.py:287] 	scan
I0219 19:34:46.049347 136489015797568 utils.py:255] maxtext param: params-decoder-layers-layers_4-mlp-wi_1-kernel
I0219 19:34:46.049469 136489015797568 utils.py:287] 	scan
I0219 19:34:46.592797 136489015797568 utils.py:255] maxtext param: params-decoder-layers-layers_4-mlp-wo-kernel
I0219 19:34:46.592921 136489015797568 utils.py:287] 	scan

 51%|█████     | 67/132 [00:10<00:10,  6.12it/s, RAM: 38.1/251.9GB (15.1%)]I0219 19:34:47.086299 136489015797568 utils.py:255] maxtext param: params-decoder-layers-layers_5-pre_self_attention_norm-scale
I0219 19:34:47.086393 136489015797568 utils.py:287] 	scan
I0219 19:34:47.087688 136489015797568 utils.py:255] maxtext param: params-decoder-layers-layers_5-post_self_attention_norm-scale
I0219 19:34:47.087772 136489015797568 utils.py:287] 	scan
I0219 19:34:47.088747 136489015797568 utils.py:255] maxtext param: params-decoder-layers-layers_5-self_attention-query_norm-scale
I0219 19:34:47.088809 136489015797568 utils.py:287] 	scan
I0219 19:34:47.089673 136489015797568 utils.py:255] maxtext param: params-decoder-layers-layers_5-self_attention-key_norm-scale
I0219 19:34:47.089732 136489015797568 utils.py:287] 	scan
I0219 19:34:47.090549 136489015797568 utils.py:255] maxtext param: params-decoder-layers-layers_5-pre_ffw_norm-scale
I0219 19:34:47.090605 136489015797568 utils.py:287] 	scan
I0219 19:34:47.091517 136489015797568 utils.py:255] maxtext param: params-decoder-layers-layers_5-post_ffw_norm-scale
I0219 19:34:47.091577 136489015797568 utils.py:287] 	scan
I0219 19:34:47.092476 136489015797568 utils.py:255] maxtext param: params-decoder-layers-layers_5-self_attention-query-kernel
I0219 19:34:47.092536 136489015797568 utils.py:287] 	scan
I0219 19:34:47.143404 136489015797568 utils.py:255] maxtext param: params-decoder-layers-layers_5-self_attention-key-kernel
I0219 19:34:47.143518 136489015797568 utils.py:287] 	scan
I0219 19:34:47.167069 136489015797568 utils.py:255] maxtext param: params-decoder-layers-layers_5-self_attention-value-kernel
I0219 19:34:47.167177 136489015797568 utils.py:287] 	scan

 58%|█████▊    | 76/132 [00:10<00:05, 10.08it/s, RAM: 38.2/251.9GB (15.2%)]I0219 19:34:47.192813 136489015797568 utils.py:255] maxtext param: params-decoder-layers-layers_5-self_attention-out-kernel
I0219 19:34:47.192902 136489015797568 utils.py:287] 	scan
I0219 19:34:47.233757 136489015797568 utils.py:255] maxtext param: params-decoder-layers-layers_5-mlp-wi_0-kernel
I0219 19:34:47.233867 136489015797568 utils.py:287] 	scan
I0219 19:34:47.729310 136489015797568 utils.py:255] maxtext param: params-decoder-layers-layers_5-mlp-wi_1-kernel
I0219 19:34:47.729436 136489015797568 utils.py:287] 	scan
I0219 19:34:48.215446 136489015797568 utils.py:255] maxtext param: params-decoder-layers-layers_5-mlp-wo-kernel
I0219 19:34:48.215578 136489015797568 utils.py:287] 	scan
I0219 19:34:48.714073 136489015797568 utils.py:255] maxtext param: params-decoder-layers_remainder-layers_0-pre_self_attention_norm-scale
I0219 19:34:48.714190 136489015797568 utils.py:271] 	unscan

 61%|██████▏   | 81/132 [00:12<00:07,  6.63it/s, RAM: 39.8/251.9GB (15.8%)]I0219 19:34:48.714821 136489015797568 utils.py:255] maxtext param: params-decoder-layers_remainder-layers_0-post_self_attention_norm-scale
I0219 19:34:48.714884 136489015797568 utils.py:271] 	unscan
I0219 19:34:48.715129 136489015797568 utils.py:255] maxtext param: params-decoder-layers_remainder-layers_0-self_attention-query_norm-scale
I0219 19:34:48.715180 136489015797568 utils.py:271] 	unscan
I0219 19:34:48.715346 136489015797568 utils.py:255] maxtext param: params-decoder-layers_remainder-layers_0-self_attention-key_norm-scale
I0219 19:34:48.715388 136489015797568 utils.py:271] 	unscan
I0219 19:34:48.715523 136489015797568 utils.py:255] maxtext param: params-decoder-layers_remainder-layers_0-pre_ffw_norm-scale
I0219 19:34:48.715567 136489015797568 utils.py:271] 	unscan
I0219 19:34:48.715712 136489015797568 utils.py:255] maxtext param: params-decoder-layers_remainder-layers_0-post_ffw_norm-scale
I0219 19:34:48.715750 136489015797568 utils.py:271] 	unscan
I0219 19:34:48.715887 136489015797568 utils.py:255] maxtext param: params-decoder-layers_remainder-layers_0-self_attention-query-kernel
I0219 19:34:48.715925 136489015797568 utils.py:271] 	unscan
I0219 19:34:48.725765 136489015797568 utils.py:255] maxtext param: params-decoder-layers_remainder-layers_0-self_attention-key-kernel
I0219 19:34:48.725847 136489015797568 utils.py:271] 	unscan
I0219 19:34:48.730981 136489015797568 utils.py:255] maxtext param: params-decoder-layers_remainder-layers_0-self_attention-value-kernel
I0219 19:34:48.731046 136489015797568 utils.py:271] 	unscan
I0219 19:34:48.736119 136489015797568 utils.py:255] maxtext param: params-decoder-layers_remainder-layers_0-self_attention-out-kernel
I0219 19:34:48.736180 136489015797568 utils.py:271] 	unscan
I0219 19:34:48.746247 136489015797568 utils.py:255] maxtext param: params-decoder-layers_remainder-layers_0-mlp-wi_0-kernel
I0219 19:34:48.746314 136489015797568 utils.py:271] 	unscan

 69%|██████▉   | 91/132 [00:12<00:03, 10.84it/s, RAM: 39.9/251.9GB (15.8%)]I0219 19:34:48.821905 136489015797568 utils.py:255] maxtext param: params-decoder-layers_remainder-layers_0-mlp-wi_1-kernel
I0219 19:34:48.822002 136489015797568 utils.py:271] 	unscan
I0219 19:34:48.897889 136489015797568 utils.py:255] maxtext param: params-decoder-layers_remainder-layers_0-mlp-wo-kernel
I0219 19:34:48.898005 136489015797568 utils.py:271] 	unscan
I0219 19:34:48.975688 136489015797568 utils.py:255] maxtext param: params-decoder-layers_remainder-layers_1-pre_self_attention_norm-scale
I0219 19:34:48.975795 136489015797568 utils.py:271] 	unscan
I0219 19:34:48.976156 136489015797568 utils.py:255] maxtext param: params-decoder-layers_remainder-layers_1-post_self_attention_norm-scale
I0219 19:34:48.976208 136489015797568 utils.py:271] 	unscan
I0219 19:34:48.976376 136489015797568 utils.py:255] maxtext param: params-decoder-layers_remainder-layers_1-self_attention-query_norm-scale
I0219 19:34:48.976416 136489015797568 utils.py:271] 	unscan
I0219 19:34:48.976564 136489015797568 utils.py:255] maxtext param: params-decoder-layers_remainder-layers_1-self_attention-key_norm-scale
I0219 19:34:48.976602 136489015797568 utils.py:271] 	unscan

 73%|███████▎  | 97/132 [00:12<00:02, 13.38it/s, RAM: 40.1/251.9GB (15.9%)]I0219 19:34:48.977004 136489015797568 utils.py:255] maxtext param: params-decoder-layers_remainder-layers_1-pre_ffw_norm-scale
I0219 19:34:48.977058 136489015797568 utils.py:271] 	unscan
I0219 19:34:48.977238 136489015797568 utils.py:255] maxtext param: params-decoder-layers_remainder-layers_1-post_ffw_norm-scale
I0219 19:34:48.977277 136489015797568 utils.py:271] 	unscan
I0219 19:34:48.977427 136489015797568 utils.py:255] maxtext param: params-decoder-layers_remainder-layers_1-self_attention-query-kernel
I0219 19:34:48.977464 136489015797568 utils.py:271] 	unscan
I0219 19:34:48.989092 136489015797568 utils.py:255] maxtext param: params-decoder-layers_remainder-layers_1-self_attention-key-kernel
I0219 19:34:48.989171 136489015797568 utils.py:271] 	unscan
I0219 19:34:48.994445 136489015797568 utils.py:255] maxtext param: params-decoder-layers_remainder-layers_1-self_attention-value-kernel
I0219 19:34:48.994522 136489015797568 utils.py:271] 	unscan
I0219 19:34:48.999253 136489015797568 utils.py:255] maxtext param: params-decoder-layers_remainder-layers_1-self_attention-out-kernel
I0219 19:34:48.999317 136489015797568 utils.py:271] 	unscan
I0219 19:34:49.008338 136489015797568 utils.py:255] maxtext param: params-decoder-layers_remainder-layers_1-mlp-wi_0-kernel
I0219 19:34:49.008410 136489015797568 utils.py:271] 	unscan

 79%|███████▉  | 104/132 [00:12<00:01, 17.68it/s, RAM: 40.2/251.9GB (16.0%)]I0219 19:34:49.084203 136489015797568 utils.py:255] maxtext param: params-decoder-layers_remainder-layers_1-mlp-wi_1-kernel
I0219 19:34:49.084287 136489015797568 utils.py:271] 	unscan
I0219 19:34:49.168358 136489015797568 utils.py:255] maxtext param: params-decoder-layers_remainder-layers_1-mlp-wo-kernel
I0219 19:34:49.168469 136489015797568 utils.py:271] 	unscan
I0219 19:34:49.262246 136489015797568 utils.py:255] maxtext param: params-decoder-layers_remainder-layers_2-pre_self_attention_norm-scale
I0219 19:34:49.262355 136489015797568 utils.py:271] 	unscan
I0219 19:34:49.262707 136489015797568 utils.py:255] maxtext param: params-decoder-layers_remainder-layers_2-post_self_attention_norm-scale
I0219 19:34:49.262762 136489015797568 utils.py:271] 	unscan
I0219 19:34:49.262953 136489015797568 utils.py:255] maxtext param: params-decoder-layers_remainder-layers_2-self_attention-query_norm-scale
I0219 19:34:49.262999 136489015797568 utils.py:271] 	unscan
I0219 19:34:49.263156 136489015797568 utils.py:255] maxtext param: params-decoder-layers_remainder-layers_2-self_attention-key_norm-scale
I0219 19:34:49.263196 136489015797568 utils.py:271] 	unscan

 83%|████████▎ | 110/132 [00:13<00:01, 20.28it/s, RAM: 40.4/251.9GB (16.0%)]I0219 19:34:49.263612 136489015797568 utils.py:255] maxtext param: params-decoder-layers_remainder-layers_2-pre_ffw_norm-scale
I0219 19:34:49.263674 136489015797568 utils.py:271] 	unscan
I0219 19:34:49.263851 136489015797568 utils.py:255] maxtext param: params-decoder-layers_remainder-layers_2-post_ffw_norm-scale
I0219 19:34:49.263893 136489015797568 utils.py:271] 	unscan
I0219 19:34:49.264057 136489015797568 utils.py:255] maxtext param: params-decoder-layers_remainder-layers_2-self_attention-query-kernel
I0219 19:34:49.264097 136489015797568 utils.py:271] 	unscan
I0219 19:34:49.273169 136489015797568 utils.py:255] maxtext param: params-decoder-layers_remainder-layers_2-self_attention-key-kernel
I0219 19:34:49.273242 136489015797568 utils.py:271] 	unscan
I0219 19:34:49.278513 136489015797568 utils.py:255] maxtext param: params-decoder-layers_remainder-layers_2-self_attention-value-kernel
I0219 19:34:49.278573 136489015797568 utils.py:271] 	unscan
I0219 19:34:49.284191 136489015797568 utils.py:255] maxtext param: params-decoder-layers_remainder-layers_2-self_attention-out-kernel
I0219 19:34:49.284247 136489015797568 utils.py:271] 	unscan
I0219 19:34:49.293044 136489015797568 utils.py:255] maxtext param: params-decoder-layers_remainder-layers_2-mlp-wi_0-kernel
I0219 19:34:49.293113 136489015797568 utils.py:271] 	unscan

 89%|████████▊ | 117/132 [00:13<00:00, 25.76it/s, RAM: 40.6/251.9GB (16.1%)]I0219 19:34:49.373425 136489015797568 utils.py:255] maxtext param: params-decoder-layers_remainder-layers_2-mlp-wi_1-kernel
I0219 19:34:49.373508 136489015797568 utils.py:271] 	unscan
I0219 19:34:49.454304 136489015797568 utils.py:255] maxtext param: params-decoder-layers_remainder-layers_2-mlp-wo-kernel
I0219 19:34:49.454411 136489015797568 utils.py:271] 	unscan
I0219 19:34:49.548264 136489015797568 utils.py:255] maxtext param: params-decoder-layers_remainder-layers_3-pre_self_attention_norm-scale
I0219 19:34:49.548374 136489015797568 utils.py:271] 	unscan
I0219 19:34:49.548719 136489015797568 utils.py:255] maxtext param: params-decoder-layers_remainder-layers_3-post_self_attention_norm-scale
I0219 19:34:49.548774 136489015797568 utils.py:271] 	unscan
I0219 19:34:49.548960 136489015797568 utils.py:255] maxtext param: params-decoder-layers_remainder-layers_3-self_attention-query_norm-scale
I0219 19:34:49.549004 136489015797568 utils.py:271] 	unscan
I0219 19:34:49.549158 136489015797568 utils.py:255] maxtext param: params-decoder-layers_remainder-layers_3-self_attention-key_norm-scale
I0219 19:34:49.549201 136489015797568 utils.py:271] 	unscan

 93%|█████████▎| 123/132 [00:13<00:00, 27.62it/s, RAM: 40.7/251.9GB (16.2%)]I0219 19:34:49.549588 136489015797568 utils.py:255] maxtext param: params-decoder-layers_remainder-layers_3-pre_ffw_norm-scale
I0219 19:34:49.549647 136489015797568 utils.py:271] 	unscan
I0219 19:34:49.549842 136489015797568 utils.py:255] maxtext param: params-decoder-layers_remainder-layers_3-post_ffw_norm-scale
I0219 19:34:49.549885 136489015797568 utils.py:271] 	unscan
I0219 19:34:49.550048 136489015797568 utils.py:255] maxtext param: params-decoder-layers_remainder-layers_3-self_attention-query-kernel
I0219 19:34:49.550090 136489015797568 utils.py:271] 	unscan
I0219 19:34:49.559005 136489015797568 utils.py:255] maxtext param: params-decoder-layers_remainder-layers_3-self_attention-key-kernel
I0219 19:34:49.559075 136489015797568 utils.py:271] 	unscan
I0219 19:34:49.563655 136489015797568 utils.py:255] maxtext param: params-decoder-layers_remainder-layers_3-self_attention-value-kernel
I0219 19:34:49.563712 136489015797568 utils.py:271] 	unscan
I0219 19:34:49.569391 136489015797568 utils.py:255] maxtext param: params-decoder-layers_remainder-layers_3-self_attention-out-kernel
I0219 19:34:49.569448 136489015797568 utils.py:271] 	unscan
I0219 19:34:49.578373 136489015797568 utils.py:255] maxtext param: params-decoder-layers_remainder-layers_3-mlp-wi_0-kernel
I0219 19:34:49.578442 136489015797568 utils.py:271] 	unscan

 98%|█████████▊| 130/132 [00:13<00:00, 33.87it/s, RAM: 40.9/251.9GB (16.2%)]I0219 19:34:49.654752 136489015797568 utils.py:255] maxtext param: params-decoder-layers_remainder-layers_3-mlp-wi_1-kernel
I0219 19:34:49.654829 136489015797568 utils.py:271] 	unscan
I0219 19:34:49.730471 136489015797568 utils.py:255] maxtext param: params-decoder-layers_remainder-layers_3-mlp-wo-kernel
I0219 19:34:49.730577 136489015797568 utils.py:271] 	unscan

100%|██████████| 132/132 [00:13<00:00,  9.74it/s, RAM: 41.1/251.9GB (16.3%)]
I0219 19:34:49.807833 136489015797568 to_huggingface.py:278] Elapse for transform: 0.23 min
I0219 19:34:49.807917 136489015797568 to_huggingface.py:285] 
Saving HuggingFace model...
I0219 19:34:49.807967 136489015797568 utils.py:594] 
-> Saving model and tokenizer (if provided) to gs://mymodel-training/checkpoints/gemma3-4b-pt-hf...
I0219 19:34:49.808207 136489015797568 utils.py:566]    Using temporary local staging directory: /tmp/maxtext_hf_save_zd4vzljg
I0219 19:34:49.808262 136489015797568 utils.py:606]     Saving tokenizer files to /tmp/maxtext_hf_save_zd4vzljg...
I0219 19:34:50.128398 136489015797568 utils.py:608]     Tokenizer files saved locally: ('/tmp/maxtext_hf_save_zd4vzljg/tokenizer_config.json', '/tmp/maxtext_hf_save_zd4vzljg/special_tokens_map.json', '/tmp/maxtext_hf_save_zd4vzljg/chat_template.jinja', '/tmp/maxtext_hf_save_zd4vzljg/tokenizer.model', '/tmp/maxtext_hf_save_zd4vzljg/added_tokens.json', '/tmp/maxtext_hf_save_zd4vzljg/tokenizer.json')
I0219 19:34:50.128705 136489015797568 utils.py:692] -> Uploading /tmp/maxtext_hf_save_zd4vzljg/tokenizer_config.json to mymodel-training/checkpoints/gemma3-4b-pt-hf/tokenizer_config.json...
I0219 19:34:50.377410 136489015797568 utils.py:700] ✅ Uploaded /tmp/maxtext_hf_save_zd4vzljg/tokenizer_config.json to mymodel-training/checkpoints/gemma3-4b-pt-hf/tokenizer_config.json
I0219 19:34:50.377701 136489015797568 utils.py:704] ✅ Deleted /tmp/maxtext_hf_save_zd4vzljg/tokenizer_config.json
I0219 19:34:50.377763 136489015797568 utils.py:692] -> Uploading /tmp/maxtext_hf_save_zd4vzljg/tokenizer.json to mymodel-training/checkpoints/gemma3-4b-pt-hf/tokenizer.json...
I0219 19:34:50.925624 136489015797568 utils.py:700] ✅ Uploaded /tmp/maxtext_hf_save_zd4vzljg/tokenizer.json to mymodel-training/checkpoints/gemma3-4b-pt-hf/tokenizer.json
I0219 19:34:50.929486 136489015797568 utils.py:704] ✅ Deleted /tmp/maxtext_hf_save_zd4vzljg/tokenizer.json
I0219 19:34:50.929586 136489015797568 utils.py:692] -> Uploading /tmp/maxtext_hf_save_zd4vzljg/special_tokens_map.json to mymodel-training/checkpoints/gemma3-4b-pt-hf/special_tokens_map.json...
I0219 19:34:51.075369 136489015797568 utils.py:700] ✅ Uploaded /tmp/maxtext_hf_save_zd4vzljg/special_tokens_map.json to mymodel-training/checkpoints/gemma3-4b-pt-hf/special_tokens_map.json
I0219 19:34:51.075503 136489015797568 utils.py:704] ✅ Deleted /tmp/maxtext_hf_save_zd4vzljg/special_tokens_map.json
I0219 19:34:51.075559 136489015797568 utils.py:692] -> Uploading /tmp/maxtext_hf_save_zd4vzljg/chat_template.jinja to mymodel-training/checkpoints/gemma3-4b-pt-hf/chat_template.jinja...
I0219 19:34:51.223978 136489015797568 utils.py:700] ✅ Uploaded /tmp/maxtext_hf_save_zd4vzljg/chat_template.jinja to mymodel-training/checkpoints/gemma3-4b-pt-hf/chat_template.jinja
I0219 19:34:51.224103 136489015797568 utils.py:704] ✅ Deleted /tmp/maxtext_hf_save_zd4vzljg/chat_template.jinja
I0219 19:34:51.224155 136489015797568 utils.py:692] -> Uploading /tmp/maxtext_hf_save_zd4vzljg/tokenizer.model to mymodel-training/checkpoints/gemma3-4b-pt-hf/tokenizer.model...
I0219 19:34:51.473654 136489015797568 utils.py:700] ✅ Uploaded /tmp/maxtext_hf_save_zd4vzljg/tokenizer.model to mymodel-training/checkpoints/gemma3-4b-pt-hf/tokenizer.model
I0219 19:34:51.474383 136489015797568 utils.py:704] ✅ Deleted /tmp/maxtext_hf_save_zd4vzljg/tokenizer.model
I0219 19:34:51.474463 136489015797568 utils.py:692] -> Uploading /tmp/maxtext_hf_save_zd4vzljg/added_tokens.json to mymodel-training/checkpoints/gemma3-4b-pt-hf/added_tokens.json...
I0219 19:34:51.624598 136489015797568 utils.py:700] ✅ Uploaded /tmp/maxtext_hf_save_zd4vzljg/added_tokens.json to mymodel-training/checkpoints/gemma3-4b-pt-hf/added_tokens.json
I0219 19:34:51.624740 136489015797568 utils.py:704] ✅ Deleted /tmp/maxtext_hf_save_zd4vzljg/added_tokens.json
I0219 19:34:51.626836 136489015797568 utils.py:374]    Saved config.json to /tmp/maxtext_hf_save_zd4vzljg/config.json
I0219 19:34:51.626905 136489015797568 utils.py:692] -> Uploading /tmp/maxtext_hf_save_zd4vzljg/config.json to mymodel-training/checkpoints/gemma3-4b-pt-hf/config.json...
I0219 19:34:51.779690 136489015797568 utils.py:700] ✅ Uploaded /tmp/maxtext_hf_save_zd4vzljg/config.json to mymodel-training/checkpoints/gemma3-4b-pt-hf/config.json
I0219 19:34:51.779815 136489015797568 utils.py:704] ✅ Deleted /tmp/maxtext_hf_save_zd4vzljg/config.json
-> Uploading in-memory state_dict to mymodel-training/checkpoints/gemma3-4b-pt-hf/model-00001-of-00005.safetensors...
-> Uploading in-memory state_dict to mymodel-training/checkpoints/gemma3-4b-pt-hf/model-00004-of-00005.safetensors...
-> Uploading in-memory state_dict to mymodel-training/checkpoints/gemma3-4b-pt-hf/model-00002-of-00005.safetensors...
-> Uploading in-memory state_dict to mymodel-training/checkpoints/gemma3-4b-pt-hf/model-00005-of-00005.safetensors...
-> Uploading in-memory state_dict to mymodel-training/checkpoints/gemma3-4b-pt-hf/model-00003-of-00005.safetensors...
✅ Uploaded to mymodel-training/checkpoints/gemma3-4b-pt-hf/model-00001-of-00005.safetensors
✅ Uploaded to mymodel-training/checkpoints/gemma3-4b-pt-hf/model-00003-of-00005.safetensors
✅ Uploaded to mymodel-training/checkpoints/gemma3-4b-pt-hf/model-00005-of-00005.safetensors
✅ Uploaded to mymodel-training/checkpoints/gemma3-4b-pt-hf/model-00002-of-00005.safetensors
✅ Uploaded to mymodel-training/checkpoints/gemma3-4b-pt-hf/model-00004-of-00005.safetensors
I0219 19:35:59.956094 136489015797568 utils.py:506]    Saved model.safetensors.index.json to /tmp/maxtext_hf_save_zd4vzljg/model.safetensors.index.json
I0219 19:35:59.956249 136489015797568 utils.py:692] -> Uploading /tmp/maxtext_hf_save_zd4vzljg/model.safetensors.index.json to mymodel-training/checkpoints/gemma3-4b-pt-hf/model.safetensors.index.json...
I0219 19:36:00.169449 136489015797568 utils.py:700] ✅ Uploaded /tmp/maxtext_hf_save_zd4vzljg/model.safetensors.index.json to mymodel-training/checkpoints/gemma3-4b-pt-hf/model.safetensors.index.json
I0219 19:36:00.169610 136489015797568 utils.py:704] ✅ Deleted /tmp/maxtext_hf_save_zd4vzljg/model.safetensors.index.json
I0219 19:36:00.169655 136489015797568 utils.py:513]    Successfully uploaded model.safetensors.index.json to GCS: gs://mymodel-training/checkpoints/gemma3-4b-pt-hf
I0219 19:36:00.169904 136489015797568 utils.py:649] ✅ Model and tokenizer (if provided) successfully processed for gs://mymodel-training/checkpoints/gemma3-4b-pt-hf
I0219 19:36:00.170009 136489015797568 to_huggingface.py:294] ✅ MaxText model successfully saved in HuggingFace format at gs://mymodel-training/checkpoints/gemma3-4b-pt-hf
I0219 19:36:00.170049 136489015797568 to_huggingface.py:295] Elapse for save: 1.17 min
I0219 19:36:00.170075 136489015797568 to_huggingface.py:296] Overall Elapse: 1.63 min
I0219 19:36:00.170172 136489015797568 utils.py:764] Peak Memory: 54.10 GB
==============================================
Export complete!
HF model at: gs://mymodel-training/checkpoints/gemma3-4b-pt-hf
==============================================

Copy link
Copy Markdown
Collaborator

@shuningjin shuningjin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for identifying the issue, proposing the fix, and performing tests! I don't see the benefit of simulated_cpu_devices_count > 1, so perhaps we can remove this entirely.

@kryvokhyzha
Copy link
Copy Markdown
Contributor Author

Thank you for identifying the issue, proposing the fix, and performing tests! I don't see the benefit of simulated_cpu_devices_count > 1, so perhaps we can remove this entirely.

Done ✅

Copy link
Copy Markdown
Collaborator

@shuningjin shuningjin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Could you squash into one commit before merge?

@codecov
Copy link
Copy Markdown

codecov Bot commented Feb 19, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@copybara-service copybara-service Bot merged commit c6f3bc2 into AI-Hypercomputer:main Feb 19, 2026
33 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants