Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finetune loss and acc is pool #11

Closed
k15201363625 opened this issue Aug 2, 2021 · 10 comments
Closed

Finetune loss and acc is pool #11

k15201363625 opened this issue Aug 2, 2021 · 10 comments

Comments

@k15201363625
Copy link

  1. acc results on c3 and ccpm are lower than 30.0%
  2. prompt tuning losses don't decline with 0% acc.
  3. When will the sogou-log evaluation code be released?
    thanks very much!
@t1101675
Copy link
Contributor

t1101675 commented Aug 2, 2021

Could you please share the training logs? Thanks!

@k15201363625
Copy link
Author

Sorry for the late reply
use 10B chinese vocab model

c3 finetune args:
{ "model_config": "./configs/model/cpm2_config_xxlarge.json", "cpu_optimizer": false, "cpu_torch_adam": false, "fp16": true, "fp32_embedding": false, "fp32_layernorm": false, "fp32_tokentypes": false, "fp32_allreduce": false, "hysteresis": 2, "loss_scale": null, "loss_scale_window": 1000, "min_scale": 1, "prompt_tune": false, "prompt_config": null, "do_train": true, "do_valid": true, "do_eval": false, "do_infer": false, "train_ratio": 1.0, "train_num": -1, "dev_ratio": 1.0, "dev_num": -1, "test_ratio": 1.0, "test_num": -1, "epochs": 10, "batch_size": 4, "gradient_accumulation_steps": 8, "weight_decay": 0.01, "checkpoint_activations": true, "checkpoint_num_layers": 1, "num_checkpoints": 24, "deepspeed_activation_checkpointing": true, "clip_grad": 1.0, "train_iters": -1, "log_interval": 10, "max_save": -1, "seed": 1234, "reset_position_ids": false, "reset_attention_mask": false, "lr_decay_iters": null, "lr_decay_style": "constant", "lr": 5e-06, "warmup": 0.0, "save": "./results_xxlarge/c3/cpm2_finetune_lr0.000005const_G8/", "save_interval": 100000, "no_save_optim": false, "no_save_rng": false, "load": "./checkpoints/cpm-2-xxlarge", "load_oprimizer_states": false, "load_lr_scheduler_states": false, "no_load_optim": true, "no_load_rng": false, "finetune": false, "resume_dataloader": false, "log_file": "./results_xxlarge/c3/cpm2_finetune_lr0.000005const_G8//log.txt", "distributed_backend": "nccl", "local_rank": 0, "eval_batch_size": null, "eval_iters": 10, "eval_interval": 50, "temperature": 1.0, "top_p": null, "top_k": null, "out_seq_length": 256, "model_parallel_size": 4, "data_path": "data/c3", "data_ext": ".json", "data_name": "c3", "data_prefix": null, "num_workers": 2, "tokenizer_path": "./bpe_cn", "seq_length": 512, "enc_seq_length": 512, "dec_seq_length": 256, "deepspeed": true, "deepspeed_config": "./configs/deepspeed/ds_full_model_xxlarge.json", "deepscale": false, "deepscale_config": null, "deepspeed_mpi": false, "cuda": true, "rank": 0, "world_size": 16, "dynamic_loss_scale": true }
c3 finetune log:
Path: data/c3/train.json | Ratio:1.0 | Max enc len: 768 | Max dec len: 2 | Data num: 11869 Path: data/c3/dev.json | Ratio:1.0 | Max enc len: 768 | Max dec len: 2 | Data num: 3816 Total train epochs 10 | Total train iters 927 | Path: data/c3/train.json | Ratio:1.0 | Max enc len: 768 | Max dec len: 2 | Data num: 11869 Path: data/c3/dev.json | Ratio:1.0 | Max enc len: 768 | Max dec len: 2 | Data num: 3816 Total train epochs 10 | Total train iters 927 | epoch 0/ 10 | global iteration 10/ 927 | learning rate 5e-06 | lm loss 10.748 | loss scale 4096.0 | epoch 0/ 10 | global iteration 20/ 927 | learning rate 5e-06 | lm loss 8.12545 | loss scale 4096.0 | epoch 0/ 10 | global iteration 30/ 927 | learning rate 5e-06 | lm loss 6.86793 | loss scale 4096.0 | epoch 0/ 10 | global iteration 40/ 927 | learning rate 5e-06 | lm loss 6.53714 | loss scale 4096.0 | epoch 0/ 10 | global iteration 50/ 927 | learning rate 5e-06 | lm loss 6.19935 | loss scale 4096.0 | iteration 50 | eval_loss: 5.996873801495848 | eval acc(mrr): 0.2759978991596639 epoch 0/ 10 | global iteration 60/ 927 | learning rate 5e-06 | lm loss 5.86172 | loss scale 4096.0 | epoch 0/ 10 | global iteration 70/ 927 | learning rate 5e-06 | lm loss 5.5148 | loss scale 4096.0 | epoch 0/ 10 | global iteration 80/ 927 | learning rate 5e-06 | lm loss 5.1446 | loss scale 4096.0 | epoch 0/ 10 | global iteration 90/ 927 | learning rate 5e-06 | lm loss 4.77114 | loss scale 4096.0 | epoch 1/ 10 | global iteration 100/ 927 | learning rate 5e-06 | lm loss 4.41403 | loss scale 4096.0 | iteration 100 | eval_loss: 4.186605309237953 | eval acc(mrr): 0.2759978991596639 epoch 1/ 10 | global iteration 110/ 927 | learning rate 5e-06 | lm loss 4.03451 | loss scale 4096.0 | epoch 1/ 10 | global iteration 120/ 927 | learning rate 5e-06 | lm loss 3.66324 | loss scale 4096.0 | epoch 1/ 10 | global iteration 130/ 927 | learning rate 5e-06 | lm loss 3.29167 | loss scale 4096.0 | epoch 1/ 10 | global iteration 140/ 927 | learning rate 5e-06 | lm loss 2.923 | loss scale 4096.0 | epoch 1/ 10 | global iteration 150/ 927 | learning rate 5e-06 | lm loss 2.56698 | loss scale 4096.0 | iteration 150 | eval_loss: 2.374825203118204 | eval acc(mrr): 0.2759978991596639 epoch 1/ 10 | global iteration 160/ 927 | learning rate 5e-06 | lm loss 2.24863 | loss scale 4096.0 | epoch 1/ 10 | global iteration 170/ 927 | learning rate 5e-06 | lm loss 1.93756 | loss scale 4096.0 | epoch 1/ 10 | global iteration 180/ 927 | learning rate 5e-06 | lm loss 1.66897 | loss scale 4096.0 | epoch 2/ 10 | global iteration 190/ 927 | learning rate 5e-06 | lm loss 1.44367 | loss scale 4096.0 | epoch 2/ 10 | global iteration 200/ 927 | learning rate 5e-06 | lm loss 1.26313 | loss scale 4096.0 | iteration 200 | eval_loss: 1.1684456813235242 | eval acc(mrr): 0.2759978991596639 epoch 2/ 10 | global iteration 210/ 927 | learning rate 5e-06 | lm loss 1.12203 | loss scale 4096.0 | epoch 2/ 10 | global iteration 220/ 927 | learning rate 5e-06 | lm loss 1.0063 | loss scale 4096.0 | epoch 2/ 10 | global iteration 230/ 927 | learning rate 5e-06 | lm loss 0.929716 | loss scale 4096.0 | epoch 2/ 10 | global iteration 240/ 927 | learning rate 5e-06 | lm loss 0.872882 | loss scale 4096.0 | epoch 2/ 10 | global iteration 250/ 927 | learning rate 5e-06 | lm loss 0.837873 | loss scale 4096.0 | iteration 250 | eval_loss: 0.8134413176224011 | eval acc(mrr): 0.2694327731092437 epoch 2/ 10 | global iteration 260/ 927 | learning rate 5e-06 | lm loss 0.805735 | loss scale 4096.0 | epoch 2/ 10 | global iteration 270/ 927 | learning rate 5e-06 | lm loss 0.783518 | loss scale 4096.0 | epoch 3/ 10 | global iteration 280/ 927 | learning rate 5e-06 | lm loss 0.771855 | loss scale 4096.0 | epoch 3/ 10 | global iteration 290/ 927 | learning rate 5e-06 | lm loss 0.769435 | loss scale 4096.0 | epoch 3/ 10 | global iteration 300/ 927 | learning rate 5e-06 | lm loss 0.765096 | loss scale 4096.0 | iteration 300 | eval_loss: 0.7490392965929848 | eval acc(mrr): 0.2759978991596639 epoch 3/ 10 | global iteration 310/ 927 | learning rate 5e-06 | lm loss 0.74909 | loss scale 4096.0 | epoch 3/ 10 | global iteration 320/ 927 | learning rate 5e-06 | lm loss 0.74265 | loss scale 4096.0 | epoch 3/ 10 | global iteration 330/ 927 | learning rate 5e-06 | lm loss 0.736001 | loss scale 4096.0 | epoch 3/ 10 | global iteration 340/ 927 | learning rate 5e-06 | lm loss 0.735703 | loss scale 4096.0 | epoch 3/ 10 | global iteration 350/ 927 | learning rate 5e-06 | lm loss 0.730404 | loss scale 4096.0 | iteration 350 | eval_loss: 0.7269730758266288 | eval acc(mrr): 0.2694327731092437 epoch 3/ 10 | global iteration 360/ 927 | learning rate 5e-06 | lm loss 0.727411 | loss scale 4096.0 | epoch 3/ 10 | global iteration 370/ 927 | learning rate 5e-06 | lm loss 0.717546 | loss scale 4096.0 | epoch 4/ 10 | global iteration 380/ 927 | learning rate 5e-06 | lm loss 0.726132 | loss scale 4096.0 | epoch 4/ 10 | global iteration 390/ 927 | learning rate 5e-06 | lm loss 0.730546 | loss scale 4096.0 | epoch 4/ 10 | global iteration 400/ 927 | learning rate 5e-06 | lm loss 0.725443 | loss scale 4096.0 | iteration 400 | eval_loss: 0.7172215768770009 | eval acc(mrr): 0.28125 epoch 4/ 10 | global iteration 410/ 927 | learning rate 5e-06 | lm loss 0.717477 | loss scale 4096.0 | epoch 4/ 10 | global iteration 420/ 927 | learning rate 5e-06 | lm loss 0.715359 | loss scale 4096.0 | epoch 4/ 10 | global iteration 430/ 927 | learning rate 5e-06 | lm loss 0.711248 | loss scale 4096.0 | epoch 4/ 10 | global iteration 440/ 927 | learning rate 5e-06 | lm loss 0.720953 | loss scale 4096.0 | epoch 4/ 10 | global iteration 450/ 927 | learning rate 5e-06 | lm loss 0.714497 | loss scale 4096.0 | iteration 450 | eval_loss: 0.7102466142978989 | eval acc(mrr): 0.2849264705882353 epoch 4/ 10 | global iteration 460/ 927 | learning rate 5e-06 | lm loss 0.703637 | loss scale 4096.0 | epoch 5/ 10 | global iteration 470/ 927 | learning rate 5e-06 | lm loss 0.715789 | loss scale 4096.0 | epoch 5/ 10 | global iteration 480/ 927 | learning rate 5e-06 | lm loss 0.719384 | loss scale 4096.0 | epoch 5/ 10 | global iteration 490/ 927 | learning rate 5e-06 | lm loss 0.715522 | loss scale 4096.0 | epoch 5/ 10 | global iteration 500/ 927 | learning rate 5e-06 | lm loss 0.716133 | loss scale 4096.0 | iteration 500 | eval_loss: 0.7121572051228595 | eval acc(mrr): 0.2694327731092437 epoch 5/ 10 | global iteration 510/ 927 | learning rate 5e-06 | lm loss 0.704994 | loss scale 4096.0 | epoch 5/ 10 | global iteration 520/ 927 | learning rate 5e-06 | lm loss 0.704081 | loss scale 4096.0 | epoch 5/ 10 | global iteration 530/ 927 | learning rate 5e-06 | lm loss 0.716542 | loss scale 4096.0 | epoch 5/ 10 | global iteration 540/ 927 | learning rate 5e-06 | lm loss 0.712576 | loss scale 4096.0 | epoch 5/ 10 | global iteration 550/ 927 | learning rate 5e-06 | lm loss 0.702559 | loss scale 4096.0 | iteration 550 | eval_loss: 0.7039489377947414 | eval acc(mrr): 0.2694327731092437 epoch 6/ 10 | global iteration 560/ 927 | learning rate 5e-06 | lm loss 0.704957 | loss scale 4096.0 | epoch 6/ 10 | global iteration 570/ 927 | learning rate 5e-06 | lm loss 0.713218 | loss scale 4096.0 | epoch 6/ 10 | global iteration 580/ 927 | learning rate 5e-06 | lm loss 0.709553 | loss scale 4096.0 | epoch 6/ 10 | global iteration 590/ 927 | learning rate 5e-06 | lm loss 0.710381 | loss scale 4096.0 | epoch 6/ 10 | global iteration 600/ 927 | learning rate 5e-06 | lm loss 0.704144 | loss scale 4096.0 | iteration 600 | eval_loss: 0.7039651274681091 | eval acc(mrr): 0.27941176470588236 epoch 6/ 10 | global iteration 610/ 927 | learning rate 5e-06 | lm loss 0.703877 | loss scale 4096.0 | epoch 6/ 10 | global iteration 620/ 927 | learning rate 5e-06 | lm loss 0.704596 | loss scale 4096.0 | epoch 6/ 10 | global iteration 630/ 927 | learning rate 5e-06 | lm loss 0.704305 | loss scale 4096.0 | epoch 6/ 10 | global iteration 640/ 927 | learning rate 5e-06 | lm loss 0.698795 | loss scale 4096.0 | epoch 7/ 10 | global iteration 650/ 927 | learning rate 5e-06 | lm loss 0.690701 | loss scale 4096.0 | iteration 650 | eval_loss: 0.6931329740195715 | eval acc(mrr): 0.28939075630252103 epoch 7/ 10 | global iteration 660/ 927 | learning rate 5e-06 | lm loss 0.704786 | loss scale 4096.0 | epoch 7/ 10 | global iteration 670/ 927 | learning rate 5e-06 | lm loss 0.698949 | loss scale 4096.0 | epoch 7/ 10 | global iteration 680/ 927 | learning rate 5e-06 | lm loss 0.695341 | loss scale 4096.0 | epoch 7/ 10 | global iteration 690/ 927 | learning rate 5e-06 | lm loss 0.684619 | loss scale 4096.0 | epoch 7/ 10 | global iteration 700/ 927 | learning rate 5e-06 | lm loss 0.693175 | loss scale 4096.0 | iteration 700 | eval_loss: 0.6982303076431531 | eval acc(mrr): 0.2865021008403361 epoch 7/ 10 | global iteration 710/ 927 | learning rate 5e-06 | lm loss 0.691223 | loss scale 4096.0 | epoch 7/ 10 | global iteration 720/ 927 | learning rate 5e-06 | lm loss 0.702129 | loss scale 4096.0 | epoch 7/ 10 | global iteration 730/ 927 | learning rate 5e-06 | lm loss 0.687939 | loss scale 4096.0 | epoch 7/ 10 | global iteration 740/ 927 | learning rate 5e-06 | lm loss 0.681826 | loss scale 4096.0 | epoch 8/ 10 | global iteration 750/ 927 | learning rate 5e-06 | lm loss 0.699984 | loss scale 4096.0 | iteration 750 | eval_loss: 0.6886968789230875 | eval acc(mrr): 0.2694327731092437 epoch 8/ 10 | global iteration 760/ 927 | learning rate 5e-06 | lm loss 0.693742 | loss scale 4096.0 | epoch 8/ 10 | global iteration 770/ 927 | learning rate 5e-06 | lm loss 0.690863 | loss scale 4096.0 | epoch 8/ 10 | global iteration 780/ 927 | learning rate 5e-06 | lm loss 0.69418 | loss scale 4096.0 | epoch 8/ 10 | global iteration 790/ 927 | learning rate 5e-06 | lm loss 0.685127 | loss scale 4096.0 | epoch 8/ 10 | global iteration 800/ 927 | learning rate 5e-06 | lm loss 0.680224 | loss scale 4096.0 | iteration 800 | eval_loss: 0.6923549502086239 | eval acc(mrr): 0.28728991596638653 epoch 8/ 10 | global iteration 810/ 927 | learning rate 5e-06 | lm loss 0.710464 | loss scale 4096.0 | epoch 8/ 10 | global iteration 820/ 927 | learning rate 5e-06 | lm loss 0.678743 | loss scale 4096.0 | epoch 8/ 10 | global iteration 830/ 927 | learning rate 5e-06 | lm loss 0.682874 | loss scale 4096.0 | epoch 9/ 10 | global iteration 840/ 927 | learning rate 5e-06 | lm loss 0.682015 | loss scale 4096.0 | epoch 9/ 10 | global iteration 850/ 927 | learning rate 5e-06 | lm loss 0.699769 | loss scale 4096.0 | iteration 850 | eval_loss: 0.6890141349129316 | eval acc(mrr): 0.2820378151260504 epoch 9/ 10 | global iteration 860/ 927 | learning rate 5e-06 | lm loss 0.674382 | loss scale 4096.0 | epoch 9/ 10 | global iteration 870/ 927 | learning rate 5e-06 | lm loss 0.666706 | loss scale 4096.0 | epoch 9/ 10 | global iteration 880/ 927 | learning rate 5e-06 | lm loss 0.677731 | loss scale 4096.0 | epoch 9/ 10 | global iteration 890/ 927 | learning rate 5e-06 | lm loss 0.664579 | loss scale 4096.0 | epoch 9/ 10 | global iteration 900/ 927 | learning rate 5e-06 | lm loss 0.69042 | loss scale 4096.0 | iteration 900 | eval_loss: 0.6896009432668445 | eval acc(mrr): 0.29070378151260506 epoch 9/ 10 | global iteration 910/ 927 | learning rate 5e-06 | lm loss 0.67181 | loss scale 4096.0 | epoch 9/ 10 | global iteration 920/ 927 | learning rate 5e-06 | lm loss 0.667761 | loss scale 4096.0 |
ccpm finetune args:
{ "model_config": "./configs/model/cpm2_config_xxlarge.json", "cpu_optimizer": false, "cpu_torch_adam": false, "fp16": true, "fp32_embedding": false, "fp32_layernorm": false, "fp32_tokentypes": false, "fp32_allreduce": false, "hysteresis": 2, "loss_scale": null, "loss_scale_window": 1000, "min_scale": 1, "prompt_tune": false, "prompt_config": null, "do_train": true, "do_valid": true, "do_eval": false, "do_infer": false, "train_ratio": 1.0, "train_num": -1, "dev_ratio": 1.0, "dev_num": -1, "test_ratio": 1.0, "test_num": -1, "epochs": 10, "batch_size": 16, "gradient_accumulation_steps": 1, "weight_decay": 0.01, "checkpoint_activations": true, "checkpoint_num_layers": 1, "num_checkpoints": 24, "deepspeed_activation_checkpointing": true, "clip_grad": 1.0, "train_iters": -1, "log_interval": 10, "max_save": -1, "seed": 1234, "reset_position_ids": false, "reset_attention_mask": false, "lr_decay_iters": null, "lr_decay_style": "constant", "lr": 3e-06, "warmup": 0.0, "save": "./results_xxlarge/ccpm/cpm2_finetune_lr0.000003const_G1_seed1234/", "save_interval": 100000, "no_save_optim": false, "no_save_rng": false, "load": "./checkpoints/cpm-2-xxlarge", "load_oprimizer_states": false, "load_lr_scheduler_states": false, "no_load_optim": true, "no_load_rng": false, "finetune": false, "resume_dataloader": false, "log_file": "./results_xxlarge/ccpm/cpm2_finetune_lr0.000003const_G1_seed1234//log.txt", "distributed_backend": "nccl", "local_rank": 0, "eval_batch_size": null, "eval_iters": 10, "eval_interval": 50, "temperature": 1.0, "top_p": null, "top_k": null, "out_seq_length": 256, "model_parallel_size": 4, "data_path": "./data/ccpm", "data_ext": ".jsonl", "data_name": "ccpm", "data_prefix": null, "num_workers": 2, "tokenizer_path": "./bpe_cn", "seq_length": 512, "enc_seq_length": 512, "dec_seq_length": 512, "deepspeed": true, "deepspeed_config": "./configs/deepspeed/ds_full_model_xxlarge.json", "deepscale": false, "deepscale_config": null, "deepspeed_mpi": false, "cuda": true, "rank": 0, "world_size": 16, "dynamic_loss_scale": true }
ccpm finetune log:
Path: ./data/ccpm/train.jsonl | Ratio:1.0 | Max enc len: 178 | Max dec len: 2 | Data num: 21778 Path: ./data/ccpm/dev.jsonl | Ratio:1.0 | Max enc len: 135 | Max dec len: 2 | Data num: 2720 Total train epochs 10 | Total train iters 3402 | Path: ./data/ccpm/train.jsonl | Ratio:1.0 | Max enc len: 178 | Max dec len: 2 | Data num: 21778 Path: ./data/ccpm/dev.jsonl | Ratio:1.0 | Max enc len: 135 | Max dec len: 2 | Data num: 2720 Total train epochs 10 | Total train iters 3402 | epoch 0/ 10 | global iteration 10/ 3402 | learning rate 3e-06 | lm loss 11.0489 | loss scale 512.0 | epoch 0/ 10 | global iteration 20/ 3402 | learning rate 3e-06 | lm loss 7.72363 | loss scale 512.0 | epoch 0/ 10 | global iteration 30/ 3402 | learning rate 3e-06 | lm loss 6.98914 | loss scale 512.0 | epoch 0/ 10 | global iteration 40/ 3402 | learning rate 3e-06 | lm loss 6.78841 | loss scale 512.0 | epoch 0/ 10 | global iteration 50/ 3402 | learning rate 3e-06 | lm loss 6.60709 | loss scale 512.0 | iteration 50 | eval_loss: 6.483274198713756 | eval acc(mrr): 0.24293154761904762 epoch 0/ 10 | global iteration 60/ 3402 | learning rate 3e-06 | lm loss 6.40855 | loss scale 512.0 | epoch 0/ 10 | global iteration 70/ 3402 | learning rate 3e-06 | lm loss 6.21417 | loss scale 512.0 | epoch 0/ 10 | global iteration 80/ 3402 | learning rate 3e-06 | lm loss 6.02228 | loss scale 512.0 | epoch 0/ 10 | global iteration 90/ 3402 | learning rate 3e-06 | lm loss 5.82484 | loss scale 512.0 | epoch 0/ 10 | global iteration 100/ 3402 | learning rate 3e-06 | lm loss 5.60874 | loss scale 512.0 | iteration 100 | eval_loss: 5.4880617913745695 | eval acc(mrr): 0.25967261904761907 epoch 0/ 10 | global iteration 110/ 3402 | learning rate 3e-06 | lm loss 5.40055 | loss scale 512.0 | epoch 0/ 10 | global iteration 120/ 3402 | learning rate 3e-06 | lm loss 5.18977 | loss scale 512.0 | epoch 0/ 10 | global iteration 130/ 3402 | learning rate 3e-06 | lm loss 4.96869 | loss scale 512.0 | epoch 0/ 10 | global iteration 140/ 3402 | learning rate 3e-06 | lm loss 4.75542 | loss scale 512.0 | epoch 0/ 10 | global iteration 150/ 3402 | learning rate 3e-06 | lm loss 4.53394 | loss scale 512.0 | iteration 150 | eval_loss: 4.396578981762841 | eval acc(mrr): 0.24702380952380953 epoch 0/ 10 | global iteration 160/ 3402 | learning rate 3e-06 | lm loss 4.31014 | loss scale 512.0 | epoch 0/ 10 | global iteration 170/ 3402 | learning rate 3e-06 | lm loss 4.08557 | loss scale 512.0 | epoch 0/ 10 | global iteration 180/ 3402 | learning rate 3e-06 | lm loss 3.8539 | loss scale 512.0 | epoch 0/ 10 | global iteration 190/ 3402 | learning rate 3e-06 | lm loss 3.64374 | loss scale 512.0 | epoch 0/ 10 | global iteration 200/ 3402 | learning rate 3e-06 | lm loss 3.41605 | loss scale 512.0 | iteration 200 | eval_loss: 3.290597881589617 | eval acc(mrr): 0.24702380952380953 epoch 0/ 10 | global iteration 210/ 3402 | learning rate 3e-06 | lm loss 3.19916 | loss scale 512.0 | epoch 0/ 10 | global iteration 220/ 3402 | learning rate 3e-06 | lm loss 2.99433 | loss scale 512.0 | epoch 0/ 10 | global iteration 230/ 3402 | learning rate 3e-06 | lm loss 2.77989 | loss scale 512.0 | epoch 0/ 10 | global iteration 240/ 3402 | learning rate 3e-06 | lm loss 2.57478 | loss scale 512.0 | epoch 0/ 10 | global iteration 250/ 3402 | learning rate 3e-06 | lm loss 2.38497 | loss scale 512.0 | iteration 250 | eval_loss: 2.264177850314549 | eval acc(mrr): 0.25037202380952384 epoch 0/ 10 | global iteration 260/ 3402 | learning rate 3e-06 | lm loss 2.19315 | loss scale 512.0 | epoch 0/ 10 | global iteration 270/ 3402 | learning rate 3e-06 | lm loss 2.01338 | loss scale 512.0 | epoch 0/ 10 | global iteration 280/ 3402 | learning rate 3e-06 | lm loss 1.84703 | loss scale 512.0 | epoch 0/ 10 | global iteration 290/ 3402 | learning rate 3e-06 | lm loss 1.69573 | loss scale 512.0 | epoch 0/ 10 | global iteration 300/ 3402 | learning rate 3e-06 | lm loss 1.55845 | loss scale 512.0 | iteration 300 | eval_loss: 1.4787132285890126 | eval acc(mrr): 0.24702380952380953 epoch 0/ 10 | global iteration 310/ 3402 | learning rate 3e-06 | lm loss 1.42777 | loss scale 512.0 | epoch 0/ 10 | global iteration 320/ 3402 | learning rate 3e-06 | lm loss 1.32583 | loss scale 512.0 | epoch 0/ 10 | global iteration 330/ 3402 | learning rate 3e-06 | lm loss 1.22507 | loss scale 512.0 | epoch 0/ 10 | global iteration 340/ 3402 | learning rate 3e-06 | lm loss 1.1542 | loss scale 512.0 | epoch 1/ 10 | global iteration 350/ 3402 | learning rate 3e-06 | lm loss 1.07626 | loss scale 512.0 | iteration 350 | eval_loss: 1.0477620249702817 | eval acc(mrr): 0.25037202380952384 epoch 1/ 10 | global iteration 360/ 3402 | learning rate 3e-06 | lm loss 1.03146 | loss scale 512.0 | epoch 1/ 10 | global iteration 370/ 3402 | learning rate 3e-06 | lm loss 0.979402 | loss scale 512.0 | epoch 1/ 10 | global iteration 380/ 3402 | learning rate 3e-06 | lm loss 0.940498 | loss scale 512.0 | epoch 1/ 10 | global iteration 390/ 3402 | learning rate 3e-06 | lm loss 0.915173 | loss scale 512.0 | epoch 1/ 10 | global iteration 400/ 3402 | learning rate 3e-06 | lm loss 0.883658 | loss scale 512.0 | iteration 400 | eval_loss: 0.8736535012722015 | eval acc(mrr): 0.24702380952380953 epoch 1/ 10 | global iteration 410/ 3402 | learning rate 3e-06 | lm loss 0.862299 | loss scale 512.0 | epoch 1/ 10 | global iteration 420/ 3402 | learning rate 3e-06 | lm loss 0.845544 | loss scale 512.0 | epoch 1/ 10 | global iteration 430/ 3402 | learning rate 3e-06 | lm loss 0.823381 | loss scale 512.0 | epoch 1/ 10 | global iteration 440/ 3402 | learning rate 3e-06 | lm loss 0.81731 | loss scale 512.0 | epoch 1/ 10 | global iteration 450/ 3402 | learning rate 3e-06 | lm loss 0.808349 | loss scale 512.0 | iteration 450 | eval_loss: 0.7996905474435716 | eval acc(mrr): 0.24702380952380953 epoch 1/ 10 | global iteration 460/ 3402 | learning rate 3e-06 | lm loss 0.797024 | loss scale 512.0 | epoch 1/ 10 | global iteration 470/ 3402 | learning rate 3e-06 | lm loss 0.793413 | loss scale 512.0 | epoch 1/ 10 | global iteration 480/ 3402 | learning rate 3e-06 | lm loss 0.787075 | loss scale 512.0 | epoch 1/ 10 | global iteration 490/ 3402 | learning rate 3e-06 | lm loss 0.77995 | loss scale 512.0 | epoch 1/ 10 | global iteration 500/ 3402 | learning rate 3e-06 | lm loss 0.771585 | loss scale 512.0 | iteration 500 | eval_loss: 0.7702109430517469 | eval acc(mrr): 0.25967261904761907 epoch 1/ 10 | global iteration 510/ 3402 | learning rate 3e-06 | lm loss 0.771184 | loss scale 512.0 | epoch 1/ 10 | global iteration 520/ 3402 | learning rate 3e-06 | lm loss 0.761128 | loss scale 512.0 | epoch 1/ 10 | global iteration 530/ 3402 | learning rate 3e-06 | lm loss 0.761819 | loss scale 512.0 | epoch 1/ 10 | global iteration 540/ 3402 | learning rate 3e-06 | lm loss 0.759291 | loss scale 512.0 | epoch 1/ 10 | global iteration 550/ 3402 | learning rate 3e-06 | lm loss 0.750679 | loss scale 512.0 | iteration 550 | eval_loss: 0.7518070340156555 | eval acc(mrr): 0.24702380952380953 epoch 1/ 10 | global iteration 560/ 3402 | learning rate 3e-06 | lm loss 0.760313 | loss scale 512.0 | epoch 1/ 10 | global iteration 570/ 3402 | learning rate 3e-06 | lm loss 0.753756 | loss scale 512.0 | epoch 1/ 10 | global iteration 580/ 3402 | learning rate 3e-06 | lm loss 0.743793 | loss scale 512.0 | epoch 1/ 10 | global iteration 590/ 3402 | learning rate 3e-06 | lm loss 0.74604 | loss scale 512.0 | epoch 1/ 10 | global iteration 600/ 3402 | learning rate 3e-06 | lm loss 0.742513 | loss scale 512.0 | iteration 600 | eval_loss: 0.7369276469662076 | eval acc(mrr): 0.25037202380952384 epoch 1/ 10 | global iteration 610/ 3402 | learning rate 3e-06 | lm loss 0.738733 | loss scale 512.0 | epoch 1/ 10 | global iteration 620/ 3402 | learning rate 3e-06 | lm loss 0.742789 | loss scale 512.0 | epoch 1/ 10 | global iteration 630/ 3402 | learning rate 3e-06 | lm loss 0.739731 | loss scale 512.0 | epoch 1/ 10 | global iteration 640/ 3402 | learning rate 3e-06 | lm loss 0.734842 | loss scale 512.0 | epoch 1/ 10 | global iteration 650/ 3402 | learning rate 3e-06 | lm loss 0.739687 | loss scale 512.0 | iteration 650 | eval_loss: 0.7311658490271795 | eval acc(mrr): 0.24702380952380953 epoch 1/ 10 | global iteration 660/ 3402 | learning rate 3e-06 | lm loss 0.735862 | loss scale 512.0 | epoch 1/ 10 | global iteration 670/ 3402 | learning rate 3e-06 | lm loss 0.727382 | loss scale 512.0 | epoch 1/ 10 | global iteration 680/ 3402 | learning rate 3e-06 | lm loss 0.72402 | loss scale 512.0 | epoch 2/ 10 | global iteration 690/ 3402 | learning rate 3e-06 | lm loss 0.723197 | loss scale 512.0 | epoch 2/ 10 | global iteration 700/ 3402 | learning rate 3e-06 | lm loss 0.72251 | loss scale 512.0 | iteration 700 | eval_loss: 0.7268535920551845 | eval acc(mrr): 0.25037202380952384 epoch 2/ 10 | global iteration 710/ 3402 | learning rate 3e-06 | lm loss 0.72168 | loss scale 512.0 | epoch 2/ 10 | global iteration 720/ 3402 | learning rate 3e-06 | lm loss 0.721957 | loss scale 512.0 | epoch 2/ 10 | global iteration 730/ 3402 | learning rate 3e-06 | lm loss 0.722069 | loss scale 512.0 | epoch 2/ 10 | global iteration 740/ 3402 | learning rate 3e-06 | lm loss 0.722624 | loss scale 512.0 | epoch 2/ 10 | global iteration 750/ 3402 | learning rate 3e-06 | lm loss 0.719828 | loss scale 512.0 | iteration 750 | eval_loss: 0.7224317235606057 | eval acc(mrr): 0.24293154761904762 epoch 2/ 10 | global iteration 760/ 3402 | learning rate 3e-06 | lm loss 0.7227 | loss scale 512.0 | epoch 2/ 10 | global iteration 770/ 3402 | learning rate 3e-06 | lm loss 0.727933 | loss scale 512.0 | epoch 2/ 10 | global iteration 780/ 3402 | learning rate 3e-06 | lm loss 0.715458 | loss scale 512.0 | epoch 2/ 10 | global iteration 790/ 3402 | learning rate 3e-06 | lm loss 0.718815 | loss scale 512.0 | epoch 2/ 10 | global iteration 800/ 3402 | learning rate 3e-06 | lm loss 0.715942 | loss scale 512.0 | iteration 800 | eval_loss: 0.7170386584032149 | eval acc(mrr): 0.25967261904761907 epoch 2/ 10 | global iteration 810/ 3402 | learning rate 3e-06 | lm loss 0.718093 | loss scale 512.0 | epoch 2/ 10 | global iteration 820/ 3402 | learning rate 3e-06 | lm loss 0.717979 | loss scale 512.0 | epoch 2/ 10 | global iteration 830/ 3402 | learning rate 3e-06 | lm loss 0.721751 | loss scale 512.0 | epoch 2/ 10 | global iteration 840/ 3402 | learning rate 3e-06 | lm loss 0.7173 | loss scale 512.0 | epoch 2/ 10 | global iteration 850/ 3402 | learning rate 3e-06 | lm loss 0.717908 | loss scale 512.0 | iteration 850 | eval_loss: 0.7137335340181986 | eval acc(mrr): 0.25037202380952384 epoch 2/ 10 | global iteration 860/ 3402 | learning rate 3e-06 | lm loss 0.714888 | loss scale 512.0 | epoch 2/ 10 | global iteration 870/ 3402 | learning rate 3e-06 | lm loss 0.716904 | loss scale 512.0 | epoch 2/ 10 | global iteration 880/ 3402 | learning rate 3e-06 | lm loss 0.718471 | loss scale 512.0 | epoch 2/ 10 | global iteration 890/ 3402 | learning rate 3e-06 | lm loss 0.714047 | loss scale 512.0 | epoch 2/ 10 | global iteration 900/ 3402 | learning rate 3e-06 | lm loss 0.724561 | loss scale 512.0 | iteration 900 | eval_loss: 0.7156579253219423 | eval acc(mrr): 0.24293154761904762 epoch 2/ 10 | global iteration 910/ 3402 | learning rate 3e-06 | lm loss 0.717794 | loss scale 512.0 | epoch 2/ 10 | global iteration 920/ 3402 | learning rate 3e-06 | lm loss 0.712875 | loss scale 512.0 | epoch 2/ 10 | global iteration 930/ 3402 | learning rate 3e-06 | lm loss 0.717839 | loss scale 512.0 | epoch 2/ 10 | global iteration 940/ 3402 | learning rate 3e-06 | lm loss 0.717386 | loss scale 512.0 | epoch 2/ 10 | global iteration 950/ 3402 | learning rate 3e-06 | lm loss 0.713294 | loss scale 512.0 | iteration 950 | eval_loss: 0.7126000693866185 | eval acc(mrr): 0.24293154761904762 epoch 2/ 10 | global iteration 960/ 3402 | learning rate 3e-06 | lm loss 0.717683 | loss scale 512.0 | epoch 2/ 10 | global iteration 970/ 3402 | learning rate 3e-06 | lm loss 0.71518 | loss scale 512.0 | epoch 2/ 10 | global iteration 980/ 3402 | learning rate 3e-06 | lm loss 0.714588 | loss scale 512.0 | epoch 2/ 10 | global iteration 990/ 3402 | learning rate 3e-06 | lm loss 0.713793 | loss scale 512.0 | epoch 2/ 10 | global iteration 1000/ 3402 | learning rate 3e-06 | lm loss 0.713041 | loss scale 512.0 | iteration 1000 | eval_loss: 0.7087225729510898 | eval acc(mrr): 0.24293154761904762 epoch 2/ 10 | global iteration 1010/ 3402 | learning rate 3e-06 | lm loss 0.707068 | loss scale 1024.0 | epoch 2/ 10 | global iteration 1020/ 3402 | learning rate 3e-06 | lm loss 0.703401 | loss scale 1024.0 | epoch 3/ 10 | global iteration 1030/ 3402 | learning rate 3e-06 | lm loss 0.705485 | loss scale 1024.0 | epoch 3/ 10 | global iteration 1040/ 3402 | learning rate 3e-06 | lm loss 0.704622 | loss scale 1024.0 | epoch 3/ 10 | global iteration 1050/ 3402 | learning rate 3e-06 | lm loss 0.704849 | loss scale 1024.0 | iteration 1050 | eval_loss: 0.7088191906611124 | eval acc(mrr): 0.25967261904761907 epoch 3/ 10 | global iteration 1060/ 3402 | learning rate 3e-06 | lm loss 0.705438 | loss scale 1024.0 | epoch 3/ 10 | global iteration 1070/ 3402 | learning rate 3e-06 | lm loss 0.705283 | loss scale 1024.0 | epoch 3/ 10 | global iteration 1080/ 3402 | learning rate 3e-06 | lm loss 0.706047 | loss scale 1024.0 | epoch 3/ 10 | global iteration 1090/ 3402 | learning rate 3e-06 | lm loss 0.703389 | loss scale 1024.0 | epoch 3/ 10 | global iteration 1100/ 3402 | learning rate 3e-06 | lm loss 0.707454 | loss scale 1024.0 | iteration 1100 | eval_loss: 0.7080635854176113 | eval acc(mrr): 0.24293154761904762 epoch 3/ 10 | global iteration 1110/ 3402 | learning rate 3e-06 | lm loss 0.793805 | loss scale 1024.0 | epoch 3/ 10 | global iteration 1120/ 3402 | learning rate 3e-06 | lm loss 0.702349 | loss scale 1024.0 | epoch 3/ 10 | global iteration 1130/ 3402 | learning rate 3e-06 | lm loss 0.706519 | loss scale 1024.0 | epoch 3/ 10 | global iteration 1140/ 3402 | learning rate 3e-06 | lm loss 0.706155 | loss scale 1024.0 | epoch 3/ 10 | global iteration 1150/ 3402 | learning rate 3e-06 | lm loss 0.706156 | loss scale 1024.0 | iteration 1150 | eval_loss: 0.705021531808944 | eval acc(mrr): 0.25967261904761907 epoch 3/ 10 | global iteration 1160/ 3402 | learning rate 3e-06 | lm loss 0.706198 | loss scale 1024.0 | epoch 3/ 10 | global iteration 1170/ 3402 | learning rate 3e-06 | lm loss 0.709492 | loss scale 1024.0 | epoch 3/ 10 | global iteration 1180/ 3402 | learning rate 3e-06 | lm loss 0.705964 | loss scale 1024.0 | epoch 3/ 10 | global iteration 1190/ 3402 | learning rate 3e-06 | lm loss 0.706014 | loss scale 1024.0 | epoch 3/ 10 | global iteration 1200/ 3402 | learning rate 3e-06 | lm loss 0.708112 | loss scale 1024.0 | iteration 1200 | eval_loss: 0.7043665619123549 | eval acc(mrr): 0.24702380952380953 epoch 3/ 10 | global iteration 1210/ 3402 | learning rate 3e-06 | lm loss 0.707071 | loss scale 1024.0 | epoch 3/ 10 | global iteration 1220/ 3402 | learning rate 3e-06 | lm loss 0.709152 | loss scale 1024.0 | epoch 3/ 10 | global iteration 1230/ 3402 | learning rate 3e-06 | lm loss 0.703659 | loss scale 1024.0 | epoch 3/ 10 | global iteration 1240/ 3402 | learning rate 3e-06 | lm loss 0.713421 | loss scale 1024.0 | epoch 3/ 10 | global iteration 1250/ 3402 | learning rate 3e-06 | lm loss 0.707735 | loss scale 1024.0 | iteration 1250 | eval_loss: 0.7044862934521267 | eval acc(mrr): 0.24293154761904762 epoch 3/ 10 | global iteration 1260/ 3402 | learning rate 3e-06 | lm loss 0.706701 | loss scale 1024.0 | epoch 3/ 10 | global iteration 1270/ 3402 | learning rate 3e-06 | lm loss 0.708858 | loss scale 1024.0 | epoch 3/ 10 | global iteration 1280/ 3402 | learning rate 3e-06 | lm loss 0.710285 | loss scale 1024.0 | epoch 3/ 10 | global iteration 1290/ 3402 | learning rate 3e-06 | lm loss 0.705229 | loss scale 1024.0 | epoch 3/ 10 | global iteration 1300/ 3402 | learning rate 3e-06 | lm loss 0.706952 | loss scale 1024.0 | iteration 1300 | eval_loss: 0.7046898404757181 | eval acc(mrr): 0.24293154761904762 epoch 3/ 10 | global iteration 1310/ 3402 | learning rate 3e-06 | lm loss 0.706128 | loss scale 1024.0 | epoch 3/ 10 | global iteration 1320/ 3402 | learning rate 3e-06 | lm loss 0.70551 | loss scale 1024.0 | epoch 3/ 10 | global iteration 1330/ 3402 | learning rate 3e-06 | lm loss 0.705266 | loss scale 1024.0 | epoch 3/ 10 | global iteration 1340/ 3402 | learning rate 3e-06 | lm loss 0.70635 | loss scale 1024.0 | epoch 3/ 10 | global iteration 1350/ 3402 | learning rate 3e-06 | lm loss 0.701291 | loss scale 1024.0 | iteration 1350 | eval_loss: 0.7025331656138102 | eval acc(mrr): 0.25037202380952384 epoch 3/ 10 | global iteration 1360/ 3402 | learning rate 3e-06 | lm loss 0.699219 | loss scale 1024.0 | epoch 4/ 10 | global iteration 1370/ 3402 | learning rate 3e-06 | lm loss 0.697672 | loss scale 1024.0 | epoch 4/ 10 | global iteration 1380/ 3402 | learning rate 3e-06 | lm loss 0.698514 | loss scale 1024.0 | epoch 4/ 10 | global iteration 1390/ 3402 | learning rate 3e-06 | lm loss 0.698999 | loss scale 1024.0 | epoch 4/ 10 | global iteration 1400/ 3402 | learning rate 3e-06 | lm loss 0.701253 | loss scale 1024.0 | iteration 1400 | eval_loss: 0.7021288914339883 | eval acc(mrr): 0.24293154761904762 epoch 4/ 10 | global iteration 1410/ 3402 | learning rate 3e-06 | lm loss 0.702823 | loss scale 1024.0 | epoch 4/ 10 | global iteration 1420/ 3402 | learning rate 3e-06 | lm loss 1.08578 | loss scale 1024.0 | epoch 4/ 10 | global iteration 1430/ 3402 | learning rate 3e-06 | lm loss 0.697458 | loss scale 1024.0 | epoch 4/ 10 | global iteration 1440/ 3402 | learning rate 3e-06 | lm loss 0.68737 | loss scale 1024.0 | epoch 4/ 10 | global iteration 1450/ 3402 | learning rate 3e-06 | lm loss 0.694854 | loss scale 1024.0 | iteration 1450 | eval_loss: 0.714258862393243 | eval acc(mrr): 0.2418154761904762 epoch 4/ 10 | global iteration 1460/ 3402 | learning rate 3e-06 | lm loss 0.663357 | loss scale 1024.0 | epoch 4/ 10 | global iteration 1470/ 3402 | learning rate 3e-06 | lm loss 0.730879 | loss scale 1024.0 | epoch 4/ 10 | global iteration 1480/ 3402 | learning rate 3e-06 | lm loss 0.706203 | loss scale 1024.0 | epoch 4/ 10 | global iteration 1490/ 3402 | learning rate 3e-06 | lm loss 0.697477 | loss scale 1024.0 | epoch 4/ 10 | global iteration 1500/ 3402 | learning rate 3e-06 | lm loss 0.699788 | loss scale 1024.0 | iteration 1500 | eval_loss: 0.7071089418161483 | eval acc(mrr): 0.24144345238095238 epoch 4/ 10 | global iteration 1510/ 3402 | learning rate 3e-06 | lm loss 0.711388 | loss scale 1024.0 | epoch 4/ 10 | global iteration 1520/ 3402 | learning rate 3e-06 | lm loss 0.707614 | loss scale 1024.0 | epoch 4/ 10 | global iteration 1530/ 3402 | learning rate 3e-06 | lm loss 0.70121 | loss scale 1024.0 | epoch 4/ 10 | global iteration 1540/ 3402 | learning rate 3e-06 | lm loss 0.701764 | loss scale 1024.0 | epoch 4/ 10 | global iteration 1550/ 3402 | learning rate 3e-06 | lm loss 0.700915 | loss scale 1024.0 | iteration 1550 | eval_loss: 0.7038169957342602 | eval acc(mrr): 0.26339285714285715 epoch 4/ 10 | global iteration 1560/ 3402 | learning rate 3e-06 | lm loss 0.703684 | loss scale 1024.0 | epoch 4/ 10 | global iteration 1570/ 3402 | learning rate 3e-06 | lm loss 0.705844 | loss scale 1024.0 | epoch 4/ 10 | global iteration 1580/ 3402 | learning rate 3e-06 | lm loss 0.704058 | loss scale 1024.0 | epoch 4/ 10 | global iteration 1590/ 3402 | learning rate 3e-06 | lm loss 0.706756 | loss scale 1024.0 | epoch 4/ 10 | global iteration 1600/ 3402 | learning rate 3e-06 | lm loss 0.703857 | loss scale 1024.0 | iteration 1600 | eval_loss: 0.7020452334767296 | eval acc(mrr): 0.25892857142857145 epoch 4/ 10 | global iteration 1610/ 3402 | learning rate 3e-06 | lm loss 0.702213 | loss scale 1024.0 | epoch 4/ 10 | global iteration 1620/ 3402 | learning rate 3e-06 | lm loss 0.709278 | loss scale 1024.0 | epoch 4/ 10 | global iteration 1630/ 3402 | learning rate 3e-06 | lm loss 0.700675 | loss scale 1024.0 | epoch 4/ 10 | global iteration 1640/ 3402 | learning rate 3e-06 | lm loss 0.697512 | loss scale 1024.0 | epoch 4/ 10 | global iteration 1650/ 3402 | learning rate 3e-06 | lm loss 0.699758 | loss scale 1024.0 | iteration 1650 | eval_loss: 0.7053966252576738 | eval acc(mrr): 0.2421875 epoch 4/ 10 | global iteration 1660/ 3402 | learning rate 3e-06 | lm loss 0.703553 | loss scale 1024.0 | epoch 4/ 10 | global iteration 1670/ 3402 | learning rate 3e-06 | lm loss 0.702834 | loss scale 1024.0 | epoch 4/ 10 | global iteration 1680/ 3402 | learning rate 3e-06 | lm loss 0.704672 | loss scale 1024.0 | epoch 4/ 10 | global iteration 1690/ 3402 | learning rate 3e-06 | lm loss 0.698479 | loss scale 1024.0 | epoch 4/ 10 | global iteration 1700/ 3402 | learning rate 3e-06 | lm loss 0.697557 | loss scale 1024.0 | iteration 1700 | eval_loss: 0.7015601311411176 | eval acc(mrr): 0.24293154761904762 epoch 5/ 10 | global iteration 1710/ 3402 | learning rate 3e-06 | lm loss 0.698609 | loss scale 1024.0 | epoch 5/ 10 | global iteration 1720/ 3402 | learning rate 3e-06 | lm loss 0.695714 | loss scale 1024.0 | epoch 5/ 10 | global iteration 1730/ 3402 | learning rate 3e-06 | lm loss 0.697703 | loss scale 1024.0 | epoch 5/ 10 | global iteration 1740/ 3402 | learning rate 3e-06 | lm loss 0.698369 | loss scale 1024.0 | epoch 5/ 10 | global iteration 1750/ 3402 | learning rate 3e-06 | lm loss 0.697768 | loss scale 1024.0 | iteration 1750 | eval_loss: 0.6996511377039409 | eval acc(mrr): 0.2611607142857143 epoch 5/ 10 | global iteration 1760/ 3402 | learning rate 3e-06 | lm loss 0.695816 | loss scale 1024.0 | epoch 5/ 10 | global iteration 1770/ 3402 | learning rate 3e-06 | lm loss 0.68727 | loss scale 1024.0 | epoch 5/ 10 | global iteration 1780/ 3402 | learning rate 3e-06 | lm loss 0.669676 | loss scale 1024.0 | epoch 5/ 10 | global iteration 1790/ 3402 | learning rate 3e-06 | lm loss 0.696357 | loss scale 1024.0 | epoch 5/ 10 | global iteration 1800/ 3402 | learning rate 3e-06 | lm loss 0.653745 | loss scale 1024.0 | iteration 1800 | eval_loss: 0.7257630739893232 | eval acc(mrr): 0.24925595238095238 epoch 5/ 10 | global iteration 1810/ 3402 | learning rate 3e-06 | lm loss 0.713665 | loss scale 1024.0 | epoch 5/ 10 | global iteration 1820/ 3402 | learning rate 3e-06 | lm loss 0.70439 | loss scale 1024.0 | epoch 5/ 10 | global iteration 1830/ 3402 | learning rate 3e-06 | lm loss 0.696879 | loss scale 1024.0 | epoch 5/ 10 | global iteration 1840/ 3402 | learning rate 3e-06 | lm loss 0.705514 | loss scale 1024.0 | epoch 5/ 10 | global iteration 1850/ 3402 | learning rate 3e-06 | lm loss 0.715712 | loss scale 1024.0 | iteration 1850 | eval_loss: 0.7024027918066297 | eval acc(mrr): 0.23958333333333334 epoch 5/ 10 | global iteration 1860/ 3402 | learning rate 3e-06 | lm loss 0.701897 | loss scale 1024.0 | epoch 5/ 10 | global iteration 1870/ 3402 | learning rate 3e-06 | lm loss 0.708783 | loss scale 1024.0 | epoch 5/ 10 | global iteration 1880/ 3402 | learning rate 3e-06 | lm loss 0.706934 | loss scale 1024.0 | epoch 5/ 10 | global iteration 1890/ 3402 | learning rate 3e-06 | lm loss 0.700901 | loss scale 1024.0 | epoch 5/ 10 | global iteration 1900/ 3402 | learning rate 3e-06 | lm loss 0.700948 | loss scale 1024.0 | iteration 1900 | eval_loss: 0.700256769146238 | eval acc(mrr): 0.2585565476190476 epoch 5/ 10 | global iteration 1910/ 3402 | learning rate 3e-06 | lm loss 0.702548 | loss scale 1024.0 | epoch 5/ 10 | global iteration 1920/ 3402 | learning rate 3e-06 | lm loss 0.710741 | loss scale 1024.0 | epoch 5/ 10 | global iteration 1930/ 3402 | learning rate 3e-06 | lm loss 0.708174 | loss scale 1024.0 | epoch 5/ 10 | global iteration 1940/ 3402 | learning rate 3e-06 | lm loss 0.704977 | loss scale 1024.0 | epoch 5/ 10 | global iteration 1950/ 3402 | learning rate 3e-06 | lm loss 0.701676 | loss scale 1024.0 | iteration 1950 | eval_loss: 0.7000243976002648 | eval acc(mrr): 0.25558035714285715 epoch 5/ 10 | global iteration 1960/ 3402 | learning rate 3e-06 | lm loss 0.708936 | loss scale 1024.0 | epoch 5/ 10 | global iteration 1970/ 3402 | learning rate 3e-06 | lm loss 0.699574 | loss scale 1024.0 | epoch 5/ 10 | global iteration 1980/ 3402 | learning rate 3e-06 | lm loss 0.692564 | loss scale 1024.0 | epoch 5/ 10 | global iteration 1990/ 3402 | learning rate 3e-06 | lm loss 0.695664 | loss scale 1024.0 | epoch 5/ 10 | global iteration 2000/ 3402 | learning rate 3e-06 | lm loss 0.696595 | loss scale 1024.0 | iteration 2000 | eval_loss: 0.6991467234634218 | eval acc(mrr): 0.24293154761904762 epoch 5/ 10 | global iteration 2010/ 3402 | learning rate 3e-06 | lm loss 0.703299 | loss scale 1024.0 | epoch 5/ 10 | global iteration 2020/ 3402 | learning rate 3e-06 | lm loss 0.704659 | loss scale 1024.0 | epoch 5/ 10 | global iteration 2030/ 3402 | learning rate 3e-06 | lm loss 0.693708 | loss scale 1024.0 | epoch 5/ 10 | global iteration 2040/ 3402 | learning rate 3e-06 | lm loss 0.691117 | loss scale 1024.0 | epoch 6/ 10 | global iteration 2050/ 3402 | learning rate 3e-06 | lm loss 0.691764 | loss scale 1024.0 | iteration 2050 | eval_loss: 0.6977309244019645 | eval acc(mrr): 0.2604166666666667 epoch 6/ 10 | global iteration 2060/ 3402 | learning rate 3e-06 | lm loss 0.685356 | loss scale 1024.0 | epoch 6/ 10 | global iteration 2070/ 3402 | learning rate 3e-06 | lm loss 0.689423 | loss scale 1024.0 | epoch 6/ 10 | global iteration 2080/ 3402 | learning rate 3e-06 | lm loss 0.686156 | loss scale 1024.0 | epoch 6/ 10 | global iteration 2090/ 3402 | learning rate 3e-06 | lm loss 0.682342 | loss scale 1024.0 | epoch 6/ 10 | global iteration 2100/ 3402 | learning rate 3e-06 | lm loss 0.672294 | loss scale 1024.0 | iteration 2100 | eval_loss: 0.7472535868485769 | eval acc(mrr): 0.2570684523809524 epoch 6/ 10 | global iteration 2110/ 3402 | learning rate 3e-06 | lm loss 0.669551 | loss scale 1024.0 | epoch 6/ 10 | global iteration 2120/ 3402 | learning rate 3e-06 | lm loss 0.663732 | loss scale 1024.0 | epoch 6/ 10 | global iteration 2130/ 3402 | learning rate 3e-06 | lm loss 0.656427 | loss scale 1024.0 | epoch 6/ 10 | global iteration 2140/ 3402 | learning rate 3e-06 | lm loss 0.63141 | loss scale 1024.0 | epoch 6/ 10 | global iteration 2150/ 3402 | learning rate 3e-06 | lm loss 0.711107 | loss scale 1024.0 | iteration 2150 | eval_loss: 0.7155438917023795 | eval acc(mrr): 0.23697916666666666 epoch 6/ 10 | global iteration 2160/ 3402 | learning rate 3e-06 | lm loss 0.692446 | loss scale 1024.0 | epoch 6/ 10 | global iteration 2170/ 3402 | learning rate 3e-06 | lm loss 0.691434 | loss scale 1024.0 | epoch 6/ 10 | global iteration 2180/ 3402 | learning rate 3e-06 | lm loss 0.69206 | loss scale 1024.0 | epoch 6/ 10 | global iteration 2190/ 3402 | learning rate 3e-06 | lm loss 0.699029 | loss scale 1024.0 | epoch 6/ 10 | global iteration 2200/ 3402 | learning rate 3e-06 | lm loss 0.698421 | loss scale 1024.0 | iteration 2200 | eval_loss: 0.698927933261508 | eval acc(mrr): 0.23623511904761904 epoch 6/ 10 | global iteration 2210/ 3402 | learning rate 3e-06 | lm loss 0.694253 | loss scale 1024.0 | epoch 6/ 10 | global iteration 2220/ 3402 | learning rate 3e-06 | lm loss 0.699248 | loss scale 1024.0 | epoch 6/ 10 | global iteration 2230/ 3402 | learning rate 3e-06 | lm loss 0.697865 | loss scale 1024.0 | epoch 6/ 10 | global iteration 2240/ 3402 | learning rate 3e-06 | lm loss 0.703001 | loss scale 1024.0 | epoch 6/ 10 | global iteration 2250/ 3402 | learning rate 3e-06 | lm loss 0.709318 | loss scale 1024.0 | iteration 2250 | eval_loss: 0.7021887373356592 | eval acc(mrr): 0.25595238095238093 epoch 6/ 10 | global iteration 2260/ 3402 | learning rate 3e-06 | lm loss 0.710019 | loss scale 1024.0 | epoch 6/ 10 | global iteration 2270/ 3402 | learning rate 3e-06 | lm loss 0.698222 | loss scale 1024.0 | epoch 6/ 10 | global iteration 2280/ 3402 | learning rate 3e-06 | lm loss 0.706742 | loss scale 1024.0 | epoch 6/ 10 | global iteration 2290/ 3402 | learning rate 3e-06 | lm loss 0.694774 | loss scale 1024.0 | epoch 6/ 10 | global iteration 2300/ 3402 | learning rate 3e-06 | lm loss 0.697905 | loss scale 1024.0 | iteration 2300 | eval_loss: 0.6983037619363694 | eval acc(mrr): 0.2585565476190476 epoch 6/ 10 | global iteration 2310/ 3402 | learning rate 3e-06 | lm loss 0.697981 | loss scale 1024.0 | epoch 6/ 10 | global iteration 2320/ 3402 | learning rate 3e-06 | lm loss 0.691042 | loss scale 1024.0 | epoch 6/ 10 | global iteration 2330/ 3402 | learning rate 3e-06 | lm loss 0.695864 | loss scale 1024.0 | epoch 6/ 10 | global iteration 2340/ 3402 | learning rate 3e-06 | lm loss 0.67273 | loss scale 1024.0 | epoch 6/ 10 | global iteration 2350/ 3402 | learning rate 3e-06 | lm loss 0.690506 | loss scale 1024.0 | iteration 2350 | eval_loss: 0.7074877307528541 | eval acc(mrr): 0.24739583333333334 epoch 6/ 10 | global iteration 2360/ 3402 | learning rate 3e-06 | lm loss 0.705128 | loss scale 1024.0 | epoch 6/ 10 | global iteration 2370/ 3402 | learning rate 3e-06 | lm loss 0.687592 | loss scale 1024.0 | epoch 6/ 10 | global iteration 2380/ 3402 | learning rate 3e-06 | lm loss 0.688323 | loss scale 1024.0 | epoch 7/ 10 | global iteration 2390/ 3402 | learning rate 3e-06 | lm loss 0.683444 | loss scale 1024.0 | epoch 7/ 10 | global iteration 2400/ 3402 | learning rate 3e-06 | lm loss 0.672334 | loss scale 1024.0 | iteration 2400 | eval_loss: 0.7278343183653695 | eval acc(mrr): 0.25 epoch 7/ 10 | global iteration 2410/ 3402 | learning rate 3e-06 | lm loss 0.675314 | loss scale 1024.0 | epoch 7/ 10 | global iteration 2420/ 3402 | learning rate 3e-06 | lm loss 0.674542 | loss scale 1024.0 | epoch 7/ 10 | global iteration 2430/ 3402 | learning rate 3e-06 | lm loss 0.660416 | loss scale 1024.0 | epoch 7/ 10 | global iteration 2440/ 3402 | learning rate 3e-06 | lm loss 0.667278 | loss scale 1024.0 | epoch 7/ 10 | global iteration 2450/ 3402 | learning rate 3e-06 | lm loss 0.63572 | loss scale 1024.0 | iteration 2450 | eval_loss: 0.7405449535165515 | eval acc(mrr): 0.2544642857142857 epoch 7/ 10 | global iteration 2460/ 3402 | learning rate 3e-06 | lm loss 0.655351 | loss scale 1024.0 | epoch 7/ 10 | global iteration 2470/ 3402 | learning rate 3e-06 | lm loss 0.621747 | loss scale 1024.0 | epoch 7/ 10 | global iteration 2480/ 3402 | learning rate 3e-06 | lm loss 0.609748 | loss scale 1024.0 | epoch 7/ 10 | global iteration 2490/ 3402 | learning rate 3e-06 | lm loss 0.703892 | loss scale 1024.0 | epoch 7/ 10 | global iteration 2500/ 3402 | learning rate 3e-06 | lm loss 0.669335 | loss scale 1024.0 | iteration 2500 | eval_loss: 0.7237649375484103 | eval acc(mrr): 0.25558035714285715 epoch 7/ 10 | global iteration 2510/ 3402 | learning rate 3e-06 | lm loss 0.670744 | loss scale 1024.0 | epoch 7/ 10 | global iteration 2520/ 3402 | learning rate 3e-06 | lm loss 0.69109 | loss scale 1024.0 | epoch 7/ 10 | global iteration 2530/ 3402 | learning rate 3e-06 | lm loss 0.686819 | loss scale 1024.0 | epoch 7/ 10 | global iteration 2540/ 3402 | learning rate 3e-06 | lm loss 0.693898 | loss scale 1024.0 | epoch 7/ 10 | global iteration 2550/ 3402 | learning rate 3e-06 | lm loss 0.693356 | loss scale 1024.0 | iteration 2550 | eval_loss: 0.6972749559652238 | eval acc(mrr): 0.26376488095238093 epoch 7/ 10 | global iteration 2560/ 3402 | learning rate 3e-06 | lm loss 0.690967 | loss scale 1024.0 | epoch 7/ 10 | global iteration 2570/ 3402 | learning rate 3e-06 | lm loss 0.683783 | loss scale 1024.0 | epoch 7/ 10 | global iteration 2580/ 3402 | learning rate 3e-06 | lm loss 0.70324 | loss scale 1024.0 | epoch 7/ 10 | global iteration 2590/ 3402 | learning rate 3e-06 | lm loss 0.686547 | loss scale 1024.0 | epoch 7/ 10 | global iteration 2600/ 3402 | learning rate 3e-06 | lm loss 0.711012 | loss scale 1024.0 | iteration 2600 | eval_loss: 0.7167040308316549 | eval acc(mrr): 0.24144345238095238 epoch 7/ 10 | global iteration 2610/ 3402 | learning rate 3e-06 | lm loss 0.696502 | loss scale 1024.0 | epoch 7/ 10 | global iteration 2620/ 3402 | learning rate 3e-06 | lm loss 0.695969 | loss scale 1024.0 | epoch 7/ 10 | global iteration 2630/ 3402 | learning rate 3e-06 | lm loss 0.677303 | loss scale 1024.0 | epoch 7/ 10 | global iteration 2640/ 3402 | learning rate 3e-06 | lm loss 0.677053 | loss scale 1024.0 | epoch 7/ 10 | global iteration 2650/ 3402 | learning rate 3e-06 | lm loss 0.679518 | loss scale 1024.0 | iteration 2650 | eval_loss: 0.7189812958240509 | eval acc(mrr): 0.2537202380952381 epoch 7/ 10 | global iteration 2660/ 3402 | learning rate 3e-06 | lm loss 0.664639 | loss scale 1024.0 | epoch 7/ 10 | global iteration 2670/ 3402 | learning rate 3e-06 | lm loss 0.673837 | loss scale 1024.0 | epoch 7/ 10 | global iteration 2680/ 3402 | learning rate 3e-06 | lm loss 0.644255 | loss scale 1024.0 | epoch 7/ 10 | global iteration 2690/ 3402 | learning rate 3e-06 | lm loss 0.662967 | loss scale 1024.0 | epoch 7/ 10 | global iteration 2700/ 3402 | learning rate 3e-06 | lm loss 0.692747 | loss scale 1024.0 | iteration 2700 | eval_loss: 0.7153312705812 | eval acc(mrr): 0.2585565476190476 epoch 7/ 10 | global iteration 2710/ 3402 | learning rate 3e-06 | lm loss 0.676028 | loss scale 1024.0 | epoch 7/ 10 | global iteration 2720/ 3402 | learning rate 3e-06 | lm loss 0.661693 | loss scale 1024.0 | epoch 8/ 10 | global iteration 2730/ 3402 | learning rate 3e-06 | lm loss 0.676656 | loss scale 1024.0 | epoch 8/ 10 | global iteration 2740/ 3402 | learning rate 3e-06 | lm loss 0.649838 | loss scale 1024.0 | epoch 8/ 10 | global iteration 2750/ 3402 | learning rate 3e-06 | lm loss 0.656143 | loss scale 1024.0 | iteration 2750 | eval_loss: 0.7366618074121929 | eval acc(mrr): 0.2544642857142857 epoch 8/ 10 | global iteration 2760/ 3402 | learning rate 3e-06 | lm loss 0.641581 | loss scale 1024.0 | epoch 8/ 10 | global iteration 2770/ 3402 | learning rate 3e-06 | lm loss 0.627781 | loss scale 1024.0 | epoch 8/ 10 | global iteration 2780/ 3402 | learning rate 3e-06 | lm loss 0.642215 | loss scale 1024.0 | epoch 8/ 10 | global iteration 2790/ 3402 | learning rate 3e-06 | lm loss 0.613275 | loss scale 2048.0 | epoch 8/ 10 | global iteration 2800/ 3402 | learning rate 3e-06 | lm loss 0.638872 | loss scale 2048.0 | iteration 2800 | eval_loss: 0.7904411596911294 | eval acc(mrr): 0.25892857142857145 epoch 8/ 10 | global iteration 2810/ 3402 | learning rate 3e-06 | lm loss 0.613528 | loss scale 2048.0 | epoch 8/ 10 | global iteration 2820/ 3402 | learning rate 3e-06 | lm loss 0.591935 | loss scale 2048.0 | epoch 8/ 10 | global iteration 2830/ 3402 | learning rate 3e-06 | lm loss 0.687106 | loss scale 2048.0 | epoch 8/ 10 | global iteration 2840/ 3402 | learning rate 3e-06 | lm loss 0.640099 | loss scale 2048.0 | epoch 8/ 10 | global iteration 2850/ 3402 | learning rate 3e-06 | lm loss 0.651373 | loss scale 2048.0 | iteration 2850 | eval_loss: 0.7521440003599439 | eval acc(mrr): 0.2544642857142857 epoch 8/ 10 | global iteration 2860/ 3402 | learning rate 3e-06 | lm loss 0.699198 | loss scale 2048.0 | epoch 8/ 10 | global iteration 2870/ 3402 | learning rate 3e-06 | lm loss 0.669626 | loss scale 2048.0 | epoch 8/ 10 | global iteration 2880/ 3402 | learning rate 3e-06 | lm loss 0.684849 | loss scale 2048.0 | epoch 8/ 10 | global iteration 2890/ 3402 | learning rate 3e-06 | lm loss 0.683451 | loss scale 2048.0 | epoch 8/ 10 | global iteration 2900/ 3402 | learning rate 3e-06 | lm loss 0.677561 | loss scale 2048.0 | iteration 2900 | eval_loss: 0.7046851075830913 | eval acc(mrr): 0.2648809523809524 epoch 8/ 10 | global iteration 2910/ 3402 | learning rate 3e-06 | lm loss 0.675278 | loss scale 2048.0 | epoch 8/ 10 | global iteration 2920/ 3402 | learning rate 3e-06 | lm loss 0.695869 | loss scale 2048.0 | epoch 8/ 10 | global iteration 2930/ 3402 | learning rate 3e-06 | lm loss 0.673335 | loss scale 2048.0 | epoch 8/ 10 | global iteration 2940/ 3402 | learning rate 3e-06 | lm loss 0.702395 | loss scale 2048.0 | epoch 8/ 10 | global iteration 2950/ 3402 | learning rate 3e-06 | lm loss 0.68072 | loss scale 2048.0 | iteration 2950 | eval_loss: 0.7257188516003745 | eval acc(mrr): 0.25223214285714285 epoch 8/ 10 | global iteration 2960/ 3402 | learning rate 3e-06 | lm loss 0.676594 | loss scale 2048.0 | epoch 8/ 10 | global iteration 2970/ 3402 | learning rate 3e-06 | lm loss 0.646874 | loss scale 2048.0 | epoch 8/ 10 | global iteration 2980/ 3402 | learning rate 3e-06 | lm loss 0.656207 | loss scale 2048.0 | epoch 8/ 10 | global iteration 2990/ 3402 | learning rate 3e-06 | lm loss 0.661283 | loss scale 2048.0 | epoch 8/ 10 | global iteration 3000/ 3402 | learning rate 3e-06 | lm loss 0.640143 | loss scale 2048.0 | iteration 3000 | eval_loss: 0.7538890796048301 | eval acc(mrr): 0.25 epoch 8/ 10 | global iteration 3010/ 3402 | learning rate 3e-06 | lm loss 0.649307 | loss scale 2048.0 | epoch 8/ 10 | global iteration 3020/ 3402 | learning rate 3e-06 | lm loss 0.626442 | loss scale 2048.0 | epoch 8/ 10 | global iteration 3030/ 3402 | learning rate 3e-06 | lm loss 0.645292 | loss scale 2048.0 | epoch 8/ 10 | global iteration 3040/ 3402 | learning rate 3e-06 | lm loss 0.683165 | loss scale 2048.0 | epoch 8/ 10 | global iteration 3050/ 3402 | learning rate 3e-06 | lm loss 0.650318 | loss scale 2048.0 | iteration 3050 | eval_loss: 0.7200367436522529 | eval acc(mrr): 0.25744047619047616 epoch 8/ 10 | global iteration 3060/ 3402 | learning rate 3e-06 | lm loss 0.634466 | loss scale 2048.0 | epoch 9/ 10 | global iteration 3070/ 3402 | learning rate 3e-06 | lm loss 0.6692 | loss scale 2048.0 | epoch 9/ 10 | global iteration 3080/ 3402 | learning rate 3e-06 | lm loss 0.626486 | loss scale 2048.0 | epoch 9/ 10 | global iteration 3090/ 3402 | learning rate 3e-06 | lm loss 0.631398 | loss scale 2048.0 | epoch 9/ 10 | global iteration 3100/ 3402 | learning rate 3e-06 | lm loss 0.626967 | loss scale 2048.0 | iteration 3100 | eval_loss: 0.7670223528430575 | eval acc(mrr): 0.2540922619047619 epoch 9/ 10 | global iteration 3110/ 3402 | learning rate 3e-06 | lm loss 0.599213 | loss scale 2048.0 | epoch 9/ 10 | global iteration 3120/ 3402 | learning rate 3e-06 | lm loss 0.616626 | loss scale 2048.0 | epoch 9/ 10 | global iteration 3130/ 3402 | learning rate 3e-06 | lm loss 0.629231 | loss scale 2048.0 | epoch 9/ 10 | global iteration 3140/ 3402 | learning rate 3e-06 | lm loss 0.617427 | loss scale 2048.0 | epoch 9/ 10 | global iteration 3150/ 3402 | learning rate 3e-06 | lm loss 0.593994 | loss scale 2048.0 | iteration 3150 | eval_loss: 0.8040455466225034 | eval acc(mrr): 0.25967261904761907 epoch 9/ 10 | global iteration 3160/ 3402 | learning rate 3e-06 | lm loss 0.586816 | loss scale 2048.0 | epoch 9/ 10 | global iteration 3170/ 3402 | learning rate 3e-06 | lm loss 0.6783 | loss scale 2048.0 | epoch 9/ 10 | global iteration 3180/ 3402 | learning rate 3e-06 | lm loss 0.62898 | loss scale 2048.0 | epoch 9/ 10 | global iteration 3190/ 3402 | learning rate 3e-06 | lm loss 0.641141 | loss scale 2048.0 | epoch 9/ 10 | global iteration 3200/ 3402 | learning rate 3e-06 | lm loss 0.689595 | loss scale 2048.0 | iteration 3200 | eval_loss: 0.7497456882681165 | eval acc(mrr): 0.2622767857142857 epoch 9/ 10 | global iteration 3210/ 3402 | learning rate 3e-06 | lm loss 0.635642 | loss scale 2048.0 | epoch 9/ 10 | global iteration 3220/ 3402 | learning rate 3e-06 | lm loss 0.670698 | loss scale 2048.0 | epoch 9/ 10 | global iteration 3230/ 3402 | learning rate 3e-06 | lm loss 0.675011 | loss scale 2048.0 | epoch 9/ 10 | global iteration 3240/ 3402 | learning rate 3e-06 | lm loss 0.657949 | loss scale 2048.0 | epoch 9/ 10 | global iteration 3250/ 3402 | learning rate 3e-06 | lm loss 0.655372 | loss scale 2048.0 | iteration 3250 | eval_loss: 0.711858061097917 | eval acc(mrr): 0.25669642857142855 epoch 9/ 10 | global iteration 3260/ 3402 | learning rate 3e-06 | lm loss 0.671673 | loss scale 2048.0 | epoch 9/ 10 | global iteration 3270/ 3402 | learning rate 3e-06 | lm loss 0.664169 | loss scale 2048.0 | epoch 9/ 10 | global iteration 3280/ 3402 | learning rate 3e-06 | lm loss 0.678484 | loss scale 2048.0 | epoch 9/ 10 | global iteration 3290/ 3402 | learning rate 3e-06 | lm loss 0.65326 | loss scale 2048.0 | epoch 9/ 10 | global iteration 3300/ 3402 | learning rate 3e-06 | lm loss 0.665373 | loss scale 2048.0 | iteration 3300 | eval_loss: 0.7473735695793515 | eval acc(mrr): 0.25186011904761907 epoch 9/ 10 | global iteration 3310/ 3402 | learning rate 3e-06 | lm loss 0.639193 | loss scale 2048.0 | epoch 9/ 10 | global iteration 3320/ 3402 | learning rate 3e-06 | lm loss 0.638079 | loss scale 2048.0 | epoch 9/ 10 | global iteration 3330/ 3402 | learning rate 3e-06 | lm loss 0.639861 | loss scale 2048.0 | epoch 9/ 10 | global iteration 3340/ 3402 | learning rate 3e-06 | lm loss 0.591675 | loss scale 2048.0 | epoch 9/ 10 | global iteration 3350/ 3402 | learning rate 3e-06 | lm loss 0.637146 | loss scale 2048.0 | iteration 3350 | eval_loss: 0.7521008409204937 | eval acc(mrr): 0.2540922619047619 epoch 9/ 10 | global iteration 3360/ 3402 | learning rate 3e-06 | lm loss 0.624074 | loss scale 2048.0 | epoch 9/ 10 | global iteration 3370/ 3402 | learning rate 3e-06 | lm loss 0.615616 | loss scale 2048.0 | epoch 9/ 10 | global iteration 3380/ 3402 | learning rate 3e-06 | lm loss 0.686865 | loss scale 2048.0 | epoch 9/ 10 | global iteration 3390/ 3402 | learning rate 3e-06 | lm loss 0.635364 | loss scale 2048.0 | epoch 9/ 10 | global iteration 3400/ 3402 | learning rate 3e-06 | lm loss 0.628997 | loss scale 2048.0 | iteration 3400 | eval_loss: 0.7824753593830835 | eval acc(mrr): 0.24293154761904762

@t1101675
Copy link
Contributor

It seems that the pre-trained weights are not loaded (the lm loss printed for the first time should be less than 5,which in the provided log is around 10). If the model is successfully loaded, the log printed to the stdout (not log.txt) should contain message like successfully loaded /home/checkpoints/cpm-2-xxlarge/mp_rank_01_model_states.pt. Otherwise, WARNING: could not find the metadata file /***/latest_checkpointed_iteration.txt will not load any checkpoints and will start from random will display.

@k15201363625
Copy link
Author

Thanks,in terms of the initial loss, I this you are right, but 'successfully loaded' appears in the train_log.

[2021-08-05 19:15:21,544] [INFO] [state_dict_factory.py:165:check_ckpt_list] checkpoint file list: ['./checkpoints/cpm-2-xxlarge/100000/mp_rank_00_model_states.pt', './checkpoints/cpm-2-xxlarge/100000/mp_rank_01_model_states.pt', './checkpoints/cpm-2-xxlarge/100000/mp_rank_02_model_states.pt', './checkpoints/cpm-2-xxlarge/100000/mp_rank_03_model_states.pt']
[2021-08-05 19:15:21,544] [INFO] [state_dict_factory.py:165:check_ckpt_list] checkpoint file list: ['./checkpoints/cpm-2-xxlarge/100000/mp_rank_00_model_states.pt', './checkpoints/cpm-2-xxlarge/100000/mp_rank_01_model_states.pt', './checkpoints/cpm-2-xxlarge/100000/mp_rank_02_model_states.pt', './checkpoints/cpm-2-xxlarge/100000/mp_rank_03_model_states.pt']
[2021-08-05 19:15:22,382] [INFO] [state_dict_factory.py:56:load] mp_world_size: 4, mp_rank: 3, module_key: auto
[2021-08-05 19:15:22,383] [INFO] [state_dict_factory.py:85:load] rank: 3 loading checkpoint: ./checkpoints/cpm-2-xxlarge/100000/mp_rank_03_model_states.pt
[2021-08-05 19:15:22,389] [INFO] [state_dict_factory.py:56:load] mp_world_size: 4, mp_rank: 2, module_key: auto
[2021-08-05 19:15:22,390] [INFO] [state_dict_factory.py:85:load] rank: 2 loading checkpoint: ./checkpoints/cpm-2-xxlarge/100000/mp_rank_02_model_states.pt
[2021-08-05 19:15:22,402] [INFO] [state_dict_factory.py:56:load] mp_world_size: 4, mp_rank: 3, module_key: auto
[2021-08-05 19:15:22,403] [INFO] [state_dict_factory.py:85:load] rank: 3 loading checkpoint: ./checkpoints/cpm-2-xxlarge/100000/mp_rank_03_model_states.pt
[2021-08-05 19:15:22,407] [INFO] [state_dict_factory.py:56:load] mp_world_size: 4, mp_rank: 0, module_key: auto
[2021-08-05 19:15:22,407] [INFO] [state_dict_factory.py:85:load] rank: 0 loading checkpoint: ./checkpoints/cpm-2-xxlarge/100000/mp_rank_00_model_states.pt
[2021-08-05 19:15:22,410] [INFO] [state_dict_factory.py:56:load] mp_world_size: 4, mp_rank: 0, module_key: auto
[2021-08-05 19:15:22,411] [INFO] [state_dict_factory.py:85:load] rank: 0 loading checkpoint: ./checkpoints/cpm-2-xxlarge/100000/mp_rank_00_model_states.pt
[2021-08-05 19:15:22,551] [INFO] [state_dict_factory.py:56:load] mp_world_size: 4, mp_rank: 2, module_key: auto
[2021-08-05 19:15:22,551] [INFO] [state_dict_factory.py:85:load] rank: 2 loading checkpoint: ./checkpoints/cpm-2-xxlarge/100000/mp_rank_02_model_states.pt
[2021-08-05 19:15:24,925] [INFO] [state_dict_factory.py:56:load] mp_world_size: 4, mp_rank: 1, module_key: auto
[2021-08-05 19:15:24,926] [INFO] [state_dict_factory.py:85:load] rank: 1 loading checkpoint: ./checkpoints/cpm-2-xxlarge/100000/mp_rank_01_model_states.pt
[2021-08-05 19:15:25,028] [INFO] [state_dict_factory.py:56:load] mp_world_size: 4, mp_rank: 1, module_key: auto
[2021-08-05 19:15:25,028] [INFO] [state_dict_factory.py:85:load] rank: 1 loading checkpoint: ./checkpoints/cpm-2-xxlarge/100000/mp_rank_01_model_states.pt
[2021-08-05 19:15:26,172] [WARNING] [engine.py:1810:_get_all_zero_checkpoints] The following zero checkpoints paths are missing: ['./checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_0_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_1_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_2_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_3_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_4_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_5_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_6_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_7_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_8_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_9_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_10_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_11_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_12_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_13_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_14_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_15_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_16_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_17_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_18_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_19_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_20_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_21_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_22_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_23_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_24_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_25_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_26_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_27_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_28_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_29_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_30_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_31_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_32_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_33_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_34_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_35_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_36_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_37_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_38_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_39_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_40_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_41_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_42_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_43_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_44_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_45_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_46_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_47_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_48_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_49_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_50_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_51_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_52_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_53_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_54_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_55_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_56_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_57_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_58_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_59_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_60_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_61_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_62_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_63_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_64_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_65_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_66_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_67_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_68_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_69_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_70_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_71_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_72_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_73_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_74_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_75_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_76_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_77_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_78_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_79_mp_rank_00_optim_states.pt']
[2021-08-05 19:15:26,368] [WARNING] [engine.py:1810:_get_all_zero_checkpoints] The following zero checkpoints paths are missing: ['./checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_0_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_1_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_2_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_3_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_4_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_5_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_6_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_7_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_8_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_9_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_10_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_11_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_12_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_13_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_14_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_15_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_16_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_17_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_18_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_19_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_20_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_21_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_22_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_23_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_24_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_25_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_26_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_27_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_28_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_29_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_30_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_31_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_32_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_33_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_34_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_35_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_36_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_37_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_38_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_39_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_40_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_41_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_42_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_43_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_44_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_45_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_46_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_47_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_48_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_49_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_50_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_51_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_52_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_53_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_54_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_55_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_56_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_57_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_58_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_59_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_60_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_61_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_62_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_63_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_64_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_65_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_66_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_67_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_68_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_69_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_70_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_71_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_72_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_73_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_74_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_75_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_76_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_77_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_78_mp_rank_00_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_79_mp_rank_00_optim_states.pt']
[2021-08-05 19:15:32,283] [WARNING] [engine.py:1810:_get_all_zero_checkpoints] The following zero checkpoints paths are missing: ['./checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_0_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_1_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_2_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_3_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_4_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_5_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_6_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_7_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_8_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_9_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_10_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_11_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_12_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_13_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_14_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_15_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_16_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_17_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_18_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_19_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_20_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_21_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_22_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_23_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_24_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_25_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_26_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_27_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_28_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_29_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_30_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_31_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_32_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_33_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_34_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_35_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_36_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_37_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_38_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_39_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_40_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_41_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_42_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_43_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_44_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_45_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_46_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_47_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_48_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_49_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_50_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_51_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_52_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_53_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_54_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_55_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_56_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_57_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_58_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_59_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_60_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_61_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_62_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_63_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_64_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_65_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_66_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_67_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_68_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_69_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_70_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_71_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_72_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_73_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_74_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_75_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_76_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_77_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_78_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_79_mp_rank_03_optim_states.pt']
[2021-08-05 19:15:32,286] [WARNING] [engine.py:1810:_get_all_zero_checkpoints] The following zero checkpoints paths are missing: ['./checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_0_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_1_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_2_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_3_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_4_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_5_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_6_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_7_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_8_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_9_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_10_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_11_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_12_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_13_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_14_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_15_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_16_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_17_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_18_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_19_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_20_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_21_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_22_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_23_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_24_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_25_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_26_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_27_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_28_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_29_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_30_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_31_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_32_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_33_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_34_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_35_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_36_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_37_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_38_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_39_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_40_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_41_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_42_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_43_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_44_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_45_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_46_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_47_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_48_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_49_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_50_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_51_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_52_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_53_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_54_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_55_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_56_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_57_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_58_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_59_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_60_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_61_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_62_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_63_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_64_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_65_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_66_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_67_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_68_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_69_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_70_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_71_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_72_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_73_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_74_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_75_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_76_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_77_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_78_mp_rank_03_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_79_mp_rank_03_optim_states.pt']
[2021-08-05 19:15:32,760] [WARNING] [engine.py:1810:_get_all_zero_checkpoints] The following zero checkpoints paths are missing: ['./checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_0_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_1_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_2_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_3_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_4_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_5_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_6_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_7_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_8_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_9_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_10_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_11_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_12_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_13_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_14_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_15_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_16_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_17_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_18_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_19_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_20_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_21_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_22_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_23_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_24_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_25_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_26_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_27_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_28_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_29_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_30_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_31_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_32_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_33_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_34_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_35_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_36_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_37_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_38_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_39_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_40_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_41_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_42_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_43_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_44_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_45_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_46_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_47_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_48_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_49_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_50_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_51_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_52_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_53_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_54_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_55_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_56_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_57_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_58_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_59_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_60_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_61_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_62_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_63_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_64_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_65_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_66_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_67_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_68_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_69_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_70_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_71_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_72_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_73_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_74_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_75_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_76_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_77_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_78_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_79_mp_rank_02_optim_states.pt']
[2021-08-05 19:15:32,828] [WARNING] [engine.py:1810:_get_all_zero_checkpoints] The following zero checkpoints paths are missing: ['./checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_0_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_1_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_2_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_3_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_4_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_5_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_6_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_7_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_8_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_9_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_10_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_11_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_12_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_13_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_14_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_15_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_16_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_17_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_18_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_19_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_20_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_21_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_22_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_23_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_24_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_25_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_26_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_27_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_28_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_29_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_30_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_31_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_32_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_33_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_34_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_35_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_36_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_37_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_38_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_39_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_40_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_41_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_42_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_43_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_44_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_45_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_46_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_47_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_48_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_49_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_50_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_51_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_52_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_53_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_54_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_55_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_56_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_57_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_58_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_59_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_60_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_61_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_62_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_63_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_64_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_65_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_66_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_67_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_68_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_69_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_70_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_71_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_72_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_73_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_74_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_75_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_76_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_77_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_78_mp_rank_02_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_79_mp_rank_02_optim_states.pt']
[2021-08-05 19:15:35,130] [WARNING] [engine.py:1810:_get_all_zero_checkpoints] The following zero checkpoints paths are missing: ['./checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_0_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_1_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_2_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_3_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_4_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_5_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_6_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_7_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_8_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_9_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_10_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_11_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_12_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_13_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_14_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_15_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_16_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_17_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_18_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_19_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_20_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_21_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_22_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_23_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_24_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_25_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_26_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_27_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_28_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_29_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_30_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_31_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_32_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_33_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_34_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_35_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_36_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_37_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_38_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_39_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_40_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_41_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_42_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_43_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_44_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_45_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_46_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_47_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_48_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_49_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_50_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_51_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_52_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_53_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_54_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_55_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_56_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_57_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_58_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_59_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_60_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_61_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_62_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_63_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_64_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_65_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_66_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_67_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_68_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_69_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_70_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_71_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_72_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_73_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_74_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_75_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_76_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_77_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_78_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_79_mp_rank_01_optim_states.pt']
[2021-08-05 19:15:35,189] [WARNING] [engine.py:1810:_get_all_zero_checkpoints] The following zero checkpoints paths are missing: ['./checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_0_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_1_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_2_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_3_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_4_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_5_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_6_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_7_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_8_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_9_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_10_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_11_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_12_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_13_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_14_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_15_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_16_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_17_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_18_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_19_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_20_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_21_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_22_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_23_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_24_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_25_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_26_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_27_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_28_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_29_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_30_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_31_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_32_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_33_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_34_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_35_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_36_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_37_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_38_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_39_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_40_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_41_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_42_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_43_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_44_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_45_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_46_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_47_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_48_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_49_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_50_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_51_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_52_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_53_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_54_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_55_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_56_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_57_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_58_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_59_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_60_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_61_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_62_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_63_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_64_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_65_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_66_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_67_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_68_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_69_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_70_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_71_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_72_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_73_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_74_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_75_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_76_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_77_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_78_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_79_mp_rank_01_optim_states.pt']
[2021-08-05 19:16:12,687] [INFO] [stage2.py:1506:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 12 Skipping step. Attempted loss scale: 65536, reducing to 65536
[2021-08-05 19:16:12,687] [INFO] [stage2.py:1506:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 9 Skipping step. Attempted loss scale: 65536, reducing to 65536
[2021-08-05 19:16:12,687] [INFO] [stage2.py:1506:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 10 Skipping step. Attempted loss scale: 65536, reducing to 65536
[2021-08-05 19:16:12,687] [INFO] [stage2.py:1506:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 11 Skipping step. Attempted loss scale: 65536, reducing to 65536
[2021-08-05 19:16:12,687] [INFO] [stage2.py:1506:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 13 Skipping step. Attempted loss scale: 65536, reducing to 65536
[2021-08-05 19:16:12,687] [INFO] [stage2.py:1506:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 15 Skipping step. Attempted loss scale: 65536, reducing to 65536
[2021-08-05 19:16:12,687] [INFO] [stage2.py:1506:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 8 Skipping step. Attempted loss scale: 65536, reducing to 65536
[2021-08-05 19:16:12,687] [INFO] [stage2.py:1506:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 14 Skipping step. Attempted loss scale: 65536, reducing to 65536
[2021-08-05 19:16:37,825] [INFO] [stage2.py:1506:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 9 Skipping step. Attempted loss scale: 65536, reducing to 32768.0
[2021-08-05 19:16:37,825] [INFO] [stage2.py:1506:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 15 Skipping step. Attempted loss scale: 65536, reducing to 32768.0
[2021-08-05 19:16:37,825] [INFO] [stage2.py:1506:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 11 Skipping step. Attempted loss scale: 65536, reducing to 32768.0
[2021-08-05 19:16:37,825] [INFO] [stage2.py:1506:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 12 Skipping step. Attempted loss scale: 65536, reducing to 32768.0
[2021-08-05 19:16:37,825] [INFO] [stage2.py:1506:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 13 Skipping step. Attempted loss scale: 65536, reducing to 32768.0
[2021-08-05 19:16:37,825] [INFO] [stage2.py:1506:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 14 Skipping step. Attempted loss scale: 65536, reducing to 32768.0
[2021-08-05 19:16:37,825] [INFO] [stage2.py:1506:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 10 Skipping step. Attempted loss scale: 65536, reducing to 32768.0
[2021-08-05 19:16:37,825] [INFO] [stage2.py:1506:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 8 Skipping step. Attempted loss scale: 65536, reducing to 32768.0
[2021-08-05 19:17:03,141] [INFO] [stage2.py:1506:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 12 Skipping step. Attempted loss scale: 32768.0, reducing to 16384.0
[2021-08-05 19:17:03,141] [INFO] [stage2.py:1506:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 10 Skipping step. Attempted loss scale: 32768.0, reducing to 16384.0
[2021-08-05 19:17:03,141] [INFO] [stage2.py:1506:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 11 Skipping step. Attempted loss scale: 32768.0, reducing to 16384.0
[2021-08-05 19:17:03,141] [INFO] [stage2.py:1506:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 15 Skipping step. Attempted loss scale: 32768.0, reducing to 16384.0
[2021-08-05 19:17:03,141] [INFO] [stage2.py:1506:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 8 Skipping step. Attempted loss scale: 32768.0, reducing to 16384.0
[2021-08-05 19:17:03,141] [INFO] [stage2.py:1506:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 13 Skipping step. Attempted loss scale: 32768.0, reducing to 16384.0
[2021-08-05 19:17:03,141] [INFO] [stage2.py:1506:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 9 Skipping step. Attempted loss scale: 32768.0, reducing to 16384.0
[2021-08-05 19:17:03,141] [INFO] [stage2.py:1506:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 14 Skipping step. Attempted loss scale: 32768.0, reducing to 16384.0
[2021-08-05 19:17:28,288] [INFO] [stage2.py:1506:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 15 Skipping step. Attempted loss scale: 16384.0, reducing to 8192.0
[2021-08-05 19:17:28,288] [INFO] [stage2.py:1506:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 9 Skipping step. Attempted loss scale: 16384.0, reducing to 8192.0
[2021-08-05 19:17:28,288] [INFO] [stage2.py:1506:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 10 Skipping step. Attempted loss scale: 16384.0, reducing to 8192.0
[2021-08-05 19:17:28,288] [INFO] [stage2.py:1506:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 11 Skipping step. Attempted loss scale: 16384.0, reducing to 8192.0
[2021-08-05 19:17:28,288] [INFO] [stage2.py:1506:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 12 Skipping step. Attempted loss scale: 16384.0, reducing to 8192.0
[2021-08-05 19:17:28,288] [INFO] [stage2.py:1506:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 14 Skipping step. Attempted loss scale: 16384.0, reducing to 8192.0
[2021-08-05 19:17:28,288] [INFO] [stage2.py:1506:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 13 Skipping step. Attempted loss scale: 16384.0, reducing to 8192.0
[2021-08-05 19:17:28,288] [INFO] [stage2.py:1506:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 8 Skipping step. Attempted loss scale: 16384.0, reducing to 8192.0
[2021-08-05 19:17:53,466] [INFO] [stage2.py:1506:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 13 Skipping step. Attempted loss scale: 8192.0, reducing to 4096.0
[2021-08-05 19:17:53,466] [INFO] [stage2.py:1506:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 11 Skipping step. Attempted loss scale: 8192.0, reducing to 4096.0
[2021-08-05 19:17:53,466] [INFO] [stage2.py:1506:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 12 Skipping step. Attempted loss scale: 8192.0, reducing to 4096.0
[2021-08-05 19:17:53,466] [INFO] [stage2.py:1506:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 14 Skipping step. Attempted loss scale: 8192.0, reducing to 4096.0
[2021-08-05 19:17:53,466] [INFO] [stage2.py:1506:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 9 Skipping step. Attempted loss scale: 8192.0, reducing to 4096.0
[2021-08-05 19:17:53,466] [INFO] [stage2.py:1506:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 10 Skipping step. Attempted loss scale: 8192.0, reducing to 4096.0
[2021-08-05 19:17:53,466] [INFO] [stage2.py:1506:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 15 Skipping step. Attempted loss scale: 8192.0, reducing to 4096.0
[2021-08-05 19:17:53,467] [INFO] [stage2.py:1506:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 8 Skipping step. Attempted loss scale: 8192.0, reducing to 4096.0
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
nts/cpm-2-xxlarge/100000/zero_pp_rank_0_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_1_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_2_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_3_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_4_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_5_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_6_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_7_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_8_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_9_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_10_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_11_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_12_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_13_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_14_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_15_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_16_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_17_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_18_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_19_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_20_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_21_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_22_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_23_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_24_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_25_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_26_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_27_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_28_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_29_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_30_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_31_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_32_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_33_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_34_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_35_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_36_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_37_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_38_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_39_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_40_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_41_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_42_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_43_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_44_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_45_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_46_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_47_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_48_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_49_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_50_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_51_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_52_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_53_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_54_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_55_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_56_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_57_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_58_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_59_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_60_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_61_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_62_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_63_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_64_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_65_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_66_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_67_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_68_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_69_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_70_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_71_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_72_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_73_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_74_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_75_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_76_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_77_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_78_mp_rank_01_optim_states.pt', './checkpoints/cpm-2-xxlarge/100000/zero_pp_rank_79_mp_rank_01_optim_states.pt']
  successfully loaded ./checkpoints/cpm-2-xxlarge/100000/mp_rank_01_model_states.pt  successfully loaded ./checkpoints/cpm-2-xxlarge/100000/mp_rank_02_model_states.pt  successfully loaded ./checkpoints/cpm-2-xxlarge/100000/mp_rank_03_model_states.pt  successfully loaded ./checkpoints/cpm-2-xxlarge/100000/mp_rank_00_model_states.pt



[2021-08-05 19:16:32,184] [INFO] [checkpointing.py:400:forward] Activation Checkpointing Information
[2021-08-05 19:16:32,184] [INFO] [checkpointing.py:402:forward] ----Partition Activations False, CPU CHECKPOINTING False
[2021-08-05 19:16:32,184] [INFO] [checkpointing.py:405:forward] ----contiguous Memory Checkpointing False with 24 total layers
[2021-08-05 19:16:32,184] [INFO] [checkpointing.py:407:forward] ----Synchronization False
[2021-08-05 19:16:32,184] [INFO] [checkpointing.py:408:forward] ----Profiling time in checkpointing False
tensor(10.4336, device='cuda:0', grad_fn=<MeanBackward0>)
[2021-08-05 19:16:39,266] [INFO] [logging.py:68:log_dist] [Rank 0] rank=0 time (ms) | forward_microstep: 5993.54 | backward_microstep: 4442.34 | backward_inner_microstep: 4168.78 | backward_allreduce_microstep: 273.50 | step_microstep: 0.09
[2021-08-05 19:16:39,267] [INFO] [logging.py:68:log_dist] [Rank 0] rank=0 time (ms) | forward: 5993.52 | backward: 4442.32 | backward_inner: 4168.74 | backward_allreduce: 273.41 | step: 0.07

@t1101675
Copy link
Contributor

Did you use the docker we provided? Or the deepspeed from github?

@k15201363625
Copy link
Author

I used the latest deepspeed(v0.4.3 or v0.4.5).

t1101675 added a commit that referenced this issue Aug 17, 2021
@t1101675
Copy link
Contributor

We have reproduced the problem with deepspeed(v0.5.0). We think the problem is caused by a bug in deepspeed which we have fixed in the docker we provided.

Specifically, there is a copy operation from the fp32 states of the optimizer to the fp16 states of the model in the deepspeed zero optimizer. This works well when optimizer states (zero_pp_rank_0_mp_rank_01_optim_states.pt) are loaded. But when the optimizer states are not provided, the fp32 states in the optimizer are randomly initialized and overide the pre-trained states in the model. This is where the problem occurs.

To fix the problem, we recommend you to use our docker directly. But if you would like to use the latest deepspeed, you can fix the bug by adding a few lines of code in deepspeed/runtime/zero/stage1.py. We provide the fixed stage1.py in our repository. This file is based on deepspeed (v0.3.9), but other versions should be similar. We mark the lines that need to be modified with CPM: HACK. You can locate these lines by just searching CPM: HACK.

If you have any problems, please let us know. Thanks!

@k15201363625
Copy link
Author

👍🏻 Thank you!

@Tron1994
Copy link

Tron1994 commented Jan 4, 2023

Has the latest version of deepspeed(v0.7.7) been fixed? I seem to be OK
epoch 0/ 5 | global iteration 10/ 4476 | learning rate 3e-06 | lm loss 5.58512 | loss scale 4096.0 | cost 18.759696006774902
epoch 0/ 5 | global iteration 20/ 4476 | learning rate 3e-06 | lm loss 8.31435 | loss scale 4096.0 | cost 19.37296462059021
epoch 0/ 5 | global iteration 30/ 4476 | learning rate 3e-06 | lm loss 8.12611 | loss scale 4096.0 | cost 19.463809967041016

@Tron1994
Copy link

@t1101675 Does cpm1-finetune have the same problem? How to check the loading success, see loss? So what should the initial loss be

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants