Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fail to load structbert.en.large while trying to reproduce the result of GLUE #27

Closed
LemXiong opened this issue Dec 14, 2021 · 10 comments
Closed

Comments

@LemXiong
Copy link

Hi,
I downloaded the structbert.en.large through the given link (https://alice-open.oss-cn-zhangjiakou.aliyuncs.com/StructBERT/en_model), but the below error occured during running.

RuntimeError: Error(s) in loading state_dict for BertForSequenceClassificationMultiTask:
Missing key(s) in state_dict: "classifier.0.weight", "classifier.0.bias".
Unexpected key(s) in state_dict: "lm_bias", "linear.weight", "linear.bias", "LayerNorm.gamma", "LayerNorm.beta", "classifier.weight", "classifier.bias".

Do you have any idea why this happen? Thank you very much.

@luofuli
Copy link
Collaborator

luofuli commented Dec 14, 2021

It seems that the real error occurred in run_classifier_multi_task.py, Line 1056. Could you please comment out the try...except block, just keep Line 1056 to see what the real error is?

@LemXiong
Copy link
Author

if I use init_checkpoint to store the path and just keep line 1056, the error will be:

RuntimeError: Error(s) in loading state_dict for BertModel:
Missing key(s) in state_dict: "embeddings.word_embeddings.weight", "embeddings.position_embeddings.weight", "embeddings.token_type_embeddings.weight", "embeddings.LayerNorm.gamma", "embeddings.LayerNorm.beta", "encoder.layer.0.attention.self.query.weight", "encoder.layer.0.attention.self.query.bias", "encoder.layer.0.attention.self.key.weight", "encoder.layer.0.attention.self.key.bias", "encoder.layer.0.attention.self.value.weight", "encoder.layer.0.attention.self.value.bias", "encoder.layer.0.attention.output.dense.weight", "encoder.layer.0.attention.output.dense.bias", "encoder.layer.0.attention.output.LayerNorm.gamma", "encoder.layer.0.attention.output.LayerNorm.beta", "encoder.layer.0.intermediate.dense.weight", "encoder.layer.0.intermediate.dense.bias", "encoder.layer.0.output.dense.weight", "encoder.layer.0.output.dense.bias", "encoder.layer.0.output.LayerNorm.gamma", "encoder.layer.0.output.LayerNorm.beta", "encoder.layer.1.attention.self.query.weight", "encoder.layer.1.attention.self.query.bias", "encoder.layer.1.attention.self.key.weight", "encoder.layer.1.attention.self.key.bias", "encoder.layer.1.attention.self.value.weight", "encoder.layer.1.attention.self.value.bias", "encoder.layer.1.attention.output.dense.weight", "encoder.layer.1.attention.output.dense.bias", "encoder.layer.1.attention.output.LayerNorm.gamma", "encoder.layer.1.attention.output.LayerNorm.beta", "encoder.layer.1.intermediate.dense.weight", "encoder.layer.1.intermediate.dense.bias", "encoder.layer.1.output.dense.weight", "encoder.layer.1.output.dense.bias", "encoder.layer.1.output.LayerNorm.gamma", "encoder.layer.1.output.LayerNorm.beta", "encoder.layer.2.attention.self.query.weight", "encoder.layer.2.attention.self.query.bias", "encoder.layer.2.attention.self.key.weight", "encoder.layer.2.attention.self.key.bias", "encoder.layer.2.attention.self.value.weight", "encoder.layer.2.attention.self.value.bias", "encoder.layer.2.attention.output.dense.weight", "encoder.layer.2.attention.output.dense.bias", "encoder.layer.2.attention.output.LayerNorm.gamma", "encoder.layer.2.attention.output.LayerNorm.beta", "encoder.layer.2.intermediate.dense.weight", "encoder.layer.2.intermediate.dense.bias", "encoder.layer.2.output.dense.weight", "encoder.layer.2.output.dense.bias", "encoder.layer.2.output.LayerNorm.gamma", "encoder.layer.2.output.LayerNorm.beta", "encoder.layer.3.attention.self.query.weight", "encoder.layer.3.attention.self.query.bias", "encoder.layer.3.attention.self.key.weight", "encoder.layer.3.attention.self.key.bias", "encoder.layer.3.attention.self.value.weight", "encoder.layer.3.attention.self.value.bias", "encoder.layer.3.attention.output.dense.weight", "encoder.layer.3.attention.output.dense.bias", "encoder.layer.3.attention.output.LayerNorm.gamma", "encoder.layer.3.attention.output.LayerNorm.beta", "encoder.layer.3.intermediate.dense.weight", "encoder.layer.3.intermediate.dense.bias", "encoder.layer.3.output.dense.weight", "encoder.layer.3.output.dense.bias", "encoder.layer.3.output.LayerNorm.gamma", "encoder.layer.3.output.LayerNorm.beta", "encoder.layer.4.attention.self.query.weight", "encoder.layer.4.attention.self.query.bias", "encoder.layer.4.attention.self.key.weight", "encoder.layer.4.attention.self.key.bias", "encoder.layer.4.attention.self.value.weight", "encoder.layer.4.attention.self.value.bias", "encoder.layer.4.attention.output.dense.weight", "encoder.layer.4.attention.output.dense.bias", "encoder.layer.4.attention.output.LayerNorm.gamma", "encoder.layer.4.attention.output.LayerNorm.beta", "encoder.layer.4.intermediate.dense.weight", "encoder.layer.4.intermediate.dense.bias", "encoder.layer.4.output.dense.weight", "encoder.layer.4.output.dense.bias", "encoder.layer.4.output.LayerNorm.gamma", "encoder.layer.4.output.LayerNorm.beta", "encoder.layer.5.attention.self.query.weight", "encoder.layer.5.attention.self.query.bias", "encoder.layer.5.attention.self.key.weight", "encoder.layer.5.attention.self.key.bias", "encoder.layer.5.attention.self.value.weight", "encoder.layer.5.attention.self.value.bias", "encoder.layer.5.attention.output.dense.weight", "encoder.layer.5.attention.output.dense.bias", "encoder.layer.5.attention.output.LayerNorm.gamma", "encoder.layer.5.attention.output.LayerNorm.beta", "encoder.layer.5.intermediate.dense.weight", "encoder.layer.5.intermediate.dense.bias", "encoder.layer.5.output.dense.weight", "encoder.layer.5.output.dense.bias", "encoder.layer.5.output.LayerNorm.gamma", "encoder.layer.5.output.LayerNorm.beta", "encoder.layer.6.attention.self.query.weight", "encoder.layer.6.attention.self.query.bias", "encoder.layer.6.attention.self.key.weight", "encoder.layer.6.attention.self.key.bias", "encoder.layer.6.attention.self.value.weight", "encoder.layer.6.attention.self.value.bias", "encoder.layer.6.attention.output.dense.weight", "encoder.layer.6.attention.output.dense.bias", "encoder.layer.6.attention.output.LayerNorm.gamma", "encoder.layer.6.attention.output.LayerNorm.beta", "encoder.layer.6.intermediate.dense.weight", "encoder.layer.6.intermediate.dense.bias", "encoder.layer.6.output.dense.weight", "encoder.layer.6.output.dense.bias", "encoder.layer.6.output.LayerNorm.gamma", "encoder.layer.6.output.LayerNorm.beta", "encoder.layer.7.attention.self.query.weight", "encoder.layer.7.attention.self.query.bias", "encoder.layer.7.attention.self.key.weight", "encoder.layer.7.attention.self.key.bias", "encoder.layer.7.attention.self.value.weight", "encoder.layer.7.attention.self.value.bias", "encoder.layer.7.attention.output.dense.weight", "encoder.layer.7.attention.output.dense.bias", "encoder.layer.7.attention.output.LayerNorm.gamma", "encoder.layer.7.attention.output.LayerNorm.beta", "encoder.layer.7.intermediate.dense.weight", "encoder.layer.7.intermediate.dense.bias", "encoder.layer.7.output.dense.weight", "encoder.layer.7.output.dense.bias", "encoder.layer.7.output.LayerNorm.gamma", "encoder.layer.7.output.LayerNorm.beta", "encoder.layer.8.attention.self.query.weight", "encoder.layer.8.attention.self.query.bias", "encoder.layer.8.attention.self.key.weight", "encoder.layer.8.attention.self.key.bias", "encoder.layer.8.attention.self.value.weight", "encoder.layer.8.attention.self.value.bias", "encoder.layer.8.attention.output.dense.weight", "encoder.layer.8.attention.output.dense.bias", "encoder.layer.8.attention.output.LayerNorm.gamma", "encoder.layer.8.attention.output.LayerNorm.beta", "encoder.layer.8.intermediate.dense.weight", "encoder.layer.8.intermediate.dense.bias", "encoder.layer.8.output.dense.weight", "encoder.layer.8.output.dense.bias", "encoder.layer.8.output.LayerNorm.gamma", "encoder.layer.8.output.LayerNorm.beta", "encoder.layer.9.attention.self.query.weight", "encoder.layer.9.attention.self.query.bias", "encoder.layer.9.attention.self.key.weight", "encoder.layer.9.attention.self.key.bias", "encoder.layer.9.attention.self.value.weight", "encoder.layer.9.attention.self.value.bias", "encoder.layer.9.attention.output.dense.weight", "encoder.layer.9.attention.output.dense.bias", "encoder.layer.9.attention.output.LayerNorm.gamma", "encoder.layer.9.attention.output.LayerNorm.beta", "encoder.layer.9.intermediate.dense.weight", "encoder.layer.9.intermediate.dense.bias", "encoder.layer.9.output.dense.weight", "encoder.layer.9.output.dense.bias", "encoder.layer.9.output.LayerNorm.gamma", "encoder.layer.9.output.LayerNorm.beta", "encoder.layer.10.attention.self.query.weight", "encoder.layer.10.attention.self.query.bias", "encoder.layer.10.attention.self.key.weight", "encoder.layer.10.attention.self.key.bias", "encoder.layer.10.attention.self.value.weight", "encoder.layer.10.attention.self.value.bias", "encoder.layer.10.attention.output.dense.weight", "encoder.layer.10.attention.output.dense.bias", "encoder.layer.10.attention.output.LayerNorm.gamma", "encoder.layer.10.attention.output.LayerNorm.beta", "encoder.layer.10.intermediate.dense.weight", "encoder.layer.10.intermediate.dense.bias", "encoder.layer.10.output.dense.weight", "encoder.layer.10.output.dense.bias", "encoder.layer.10.output.LayerNorm.gamma", "encoder.layer.10.output.LayerNorm.beta", "encoder.layer.11.attention.self.query.weight", "encoder.layer.11.attention.self.query.bias", "encoder.layer.11.attention.self.key.weight", "encoder.layer.11.attention.self.key.bias", "encoder.layer.11.attention.self.value.weight", "encoder.layer.11.attention.self.value.bias", "encoder.layer.11.attention.output.dense.weight", "encoder.layer.11.attention.output.dense.bias", "encoder.layer.11.attention.output.LayerNorm.gamma", "encoder.layer.11.attention.output.LayerNorm.beta", "encoder.layer.11.intermediate.dense.weight", "encoder.layer.11.intermediate.dense.bias", "encoder.layer.11.output.dense.weight", "encoder.layer.11.output.dense.bias", "encoder.layer.11.output.LayerNorm.gamma", "encoder.layer.11.output.LayerNorm.beta", "encoder.layer.12.attention.self.query.weight", "encoder.layer.12.attention.self.query.bias", "encoder.layer.12.attention.self.key.weight", "encoder.layer.12.attention.self.key.bias", "encoder.layer.12.attention.self.value.weight", "encoder.layer.12.attention.self.value.bias", "encoder.layer.12.attention.output.dense.weight", "encoder.layer.12.attention.output.dense.bias", "encoder.layer.12.attention.output.LayerNorm.gamma", "encoder.layer.12.attention.output.LayerNorm.beta", "encoder.layer.12.intermediate.dense.weight", "encoder.layer.12.intermediate.dense.bias", "encoder.layer.12.output.dense.weight", "encoder.layer.12.output.dense.bias", "encoder.layer.12.output.LayerNorm.gamma", "encoder.layer.12.output.LayerNorm.beta", "encoder.layer.13.attention.self.query.weight", "encoder.layer.13.attention.self.query.bias", "encoder.layer.13.attention.self.key.weight", "encoder.layer.13.attention.self.key.bias", "encoder.layer.13.attention.self.value.weight", "encoder.layer.13.attention.self.value.bias", "encoder.layer.13.attention.output.dense.weight", "encoder.layer.13.attention.output.dense.bias", "encoder.layer.13.attention.output.LayerNorm.gamma", "encoder.layer.13.attention.output.LayerNorm.beta", "encoder.layer.13.intermediate.dense.weight", "encoder.layer.13.intermediate.dense.bias", "encoder.layer.13.output.dense.weight", "encoder.layer.13.output.dense.bias", "encoder.layer.13.output.LayerNorm.gamma", "encoder.layer.13.output.LayerNorm.beta", "encoder.layer.14.attention.self.query.weight", "encoder.layer.14.attention.self.query.bias", "encoder.layer.14.attention.self.key.weight", "encoder.layer.14.attention.self.key.bias", "encoder.layer.14.attention.self.value.weight", "encoder.layer.14.attention.self.value.bias", "encoder.layer.14.attention.output.dense.weight", "encoder.layer.14.attention.output.dense.bias", "encoder.layer.14.attention.output.LayerNorm.gamma", "encoder.layer.14.attention.output.LayerNorm.beta", "encoder.layer.14.intermediate.dense.weight", "encoder.layer.14.intermediate.dense.bias", "encoder.layer.14.output.dense.weight", "encoder.layer.14.output.dense.bias", "encoder.layer.14.output.LayerNorm.gamma", "encoder.layer.14.output.LayerNorm.beta", "encoder.layer.15.attention.self.query.weight", "encoder.layer.15.attention.self.query.bias", "encoder.layer.15.attention.self.key.weight", "encoder.layer.15.attention.self.key.bias", "encoder.layer.15.attention.self.value.weight", "encoder.layer.15.attention.self.value.bias", "encoder.layer.15.attention.output.dense.weight", "encoder.layer.15.attention.output.dense.bias", "encoder.layer.15.attention.output.LayerNorm.gamma", "encoder.layer.15.attention.output.LayerNorm.beta", "encoder.layer.15.intermediate.dense.weight", "encoder.layer.15.intermediate.dense.bias", "encoder.layer.15.output.dense.weight", "encoder.layer.15.output.dense.bias", "encoder.layer.15.output.LayerNorm.gamma", "encoder.layer.15.output.LayerNorm.beta", "encoder.layer.16.attention.self.query.weight", "encoder.layer.16.attention.self.query.bias", "encoder.layer.16.attention.self.key.weight", "encoder.layer.16.attention.self.key.bias", "encoder.layer.16.attention.self.value.weight", "encoder.layer.16.attention.self.value.bias", "encoder.layer.16.attention.output.dense.weight", "encoder.layer.16.attention.output.dense.bias", "encoder.layer.16.attention.output.LayerNorm.gamma", "encoder.layer.16.attention.output.LayerNorm.beta", "encoder.layer.16.intermediate.dense.weight", "encoder.layer.16.intermediate.dense.bias", "encoder.layer.16.output.dense.weight", "encoder.layer.16.output.dense.bias", "encoder.layer.16.output.LayerNorm.gamma", "encoder.layer.16.output.LayerNorm.beta", "encoder.layer.17.attention.self.query.weight", "encoder.layer.17.attention.self.query.bias", "encoder.layer.17.attention.self.key.weight", "encoder.layer.17.attention.self.key.bias", "encoder.layer.17.attention.self.value.weight", "encoder.layer.17.attention.self.value.bias", "encoder.layer.17.attention.output.dense.weight", "encoder.layer.17.attention.output.dense.bias", "encoder.layer.17.attention.output.LayerNorm.gamma", "encoder.layer.17.attention.output.LayerNorm.beta", "encoder.layer.17.intermediate.dense.weight", "encoder.layer.17.intermediate.dense.bias", "encoder.layer.17.output.dense.weight", "encoder.layer.17.output.dense.bias", "encoder.layer.17.output.LayerNorm.gamma", "encoder.layer.17.output.LayerNorm.beta", "encoder.layer.18.attention.self.query.weight", "encoder.layer.18.attention.self.query.bias", "encoder.layer.18.attention.self.key.weight", "encoder.layer.18.attention.self.key.bias", "encoder.layer.18.attention.self.value.weight", "encoder.layer.18.attention.self.value.bias", "encoder.layer.18.attention.output.dense.weight", "encoder.layer.18.attention.output.dense.bias", "encoder.layer.18.attention.output.LayerNorm.gamma", "encoder.layer.18.attention.output.LayerNorm.beta", "encoder.layer.18.intermediate.dense.weight", "encoder.layer.18.intermediate.dense.bias", "encoder.layer.18.output.dense.weight", "encoder.layer.18.output.dense.bias", "encoder.layer.18.output.LayerNorm.gamma", "encoder.layer.18.output.LayerNorm.beta", "encoder.layer.19.attention.self.query.weight", "encoder.layer.19.attention.self.query.bias", "encoder.layer.19.attention.self.key.weight", "encoder.layer.19.attention.self.key.bias", "encoder.layer.19.attention.self.value.weight", "encoder.layer.19.attention.self.value.bias", "encoder.layer.19.attention.output.dense.weight", "encoder.layer.19.attention.output.dense.bias", "encoder.layer.19.attention.output.LayerNorm.gamma", "encoder.layer.19.attention.output.LayerNorm.beta", "encoder.layer.19.intermediate.dense.weight", "encoder.layer.19.intermediate.dense.bias", "encoder.layer.19.output.dense.weight", "encoder.layer.19.output.dense.bias", "encoder.layer.19.output.LayerNorm.gamma", "encoder.layer.19.output.LayerNorm.beta", "encoder.layer.20.attention.self.query.weight", "encoder.layer.20.attention.self.query.bias", "encoder.layer.20.attention.self.key.weight", "encoder.layer.20.attention.self.key.bias", "encoder.layer.20.attention.self.value.weight", "encoder.layer.20.attention.self.value.bias", "encoder.layer.20.attention.output.dense.weight", "encoder.layer.20.attention.output.dense.bias", "encoder.layer.20.attention.output.LayerNorm.gamma", "encoder.layer.20.attention.output.LayerNorm.beta", "encoder.layer.20.intermediate.dense.weight", "encoder.layer.20.intermediate.dense.bias", "encoder.layer.20.output.dense.weight", "encoder.layer.20.output.dense.bias", "encoder.layer.20.output.LayerNorm.gamma", "encoder.layer.20.output.LayerNorm.beta", "encoder.layer.21.attention.self.query.weight", "encoder.layer.21.attention.self.query.bias", "encoder.layer.21.attention.self.key.weight", "encoder.layer.21.attention.self.key.bias", "encoder.layer.21.attention.self.value.weight", "encoder.layer.21.attention.self.value.bias", "encoder.layer.21.attention.output.dense.weight", "encoder.layer.21.attention.output.dense.bias", "encoder.layer.21.attention.output.LayerNorm.gamma", "encoder.layer.21.attention.output.LayerNorm.beta", "encoder.layer.21.intermediate.dense.weight", "encoder.layer.21.intermediate.dense.bias", "encoder.layer.21.output.dense.weight", "encoder.layer.21.output.dense.bias", "encoder.layer.21.output.LayerNorm.gamma", "encoder.layer.21.output.LayerNorm.beta", "encoder.layer.22.attention.self.query.weight", "encoder.layer.22.attention.self.query.bias", "encoder.layer.22.attention.self.key.weight", "encoder.layer.22.attention.self.key.bias", "encoder.layer.22.attention.self.value.weight", "encoder.layer.22.attention.self.value.bias", "encoder.layer.22.attention.output.dense.weight", "encoder.layer.22.attention.output.dense.bias", "encoder.layer.22.attention.output.LayerNorm.gamma", "encoder.layer.22.attention.output.LayerNorm.beta", "encoder.layer.22.intermediate.dense.weight", "encoder.layer.22.intermediate.dense.bias", "encoder.layer.22.output.dense.weight", "encoder.layer.22.output.dense.bias", "encoder.layer.22.output.LayerNorm.gamma", "encoder.layer.22.output.LayerNorm.beta", "encoder.layer.23.attention.self.query.weight", "encoder.layer.23.attention.self.query.bias", "encoder.layer.23.attention.self.key.weight", "encoder.layer.23.attention.self.key.bias", "encoder.layer.23.attention.self.value.weight", "encoder.layer.23.attention.self.value.bias", "encoder.layer.23.attention.output.dense.weight", "encoder.layer.23.attention.output.dense.bias", "encoder.layer.23.attention.output.LayerNorm.gamma", "encoder.layer.23.attention.output.LayerNorm.beta", "encoder.layer.23.intermediate.dense.weight", "encoder.layer.23.intermediate.dense.bias", "encoder.layer.23.output.dense.weight", "encoder.layer.23.output.dense.bias", "encoder.layer.23.output.LayerNorm.gamma", "encoder.layer.23.output.LayerNorm.beta", "pooler.dense.weight", "pooler.dense.bias".
Unexpected key(s) in state_dict: "lm_bias", "bert.embeddings.word_embeddings.weight", "bert.embeddings.position_embeddings.weight", "bert.embeddings.token_type_embeddings.weight", "bert.embeddings.LayerNorm.gamma", "bert.embeddings.LayerNorm.beta", "bert.encoder.layer.0.attention.self.query.weight", "bert.encoder.layer.0.attention.self.query.bias", "bert.encoder.layer.0.attention.self.key.weight", "bert.encoder.layer.0.attention.self.key.bias", "bert.encoder.layer.0.attention.self.value.weight", "bert.encoder.layer.0.attention.self.value.bias", "bert.encoder.layer.0.attention.output.dense.weight", "bert.encoder.layer.0.attention.output.dense.bias", "bert.encoder.layer.0.attention.output.LayerNorm.gamma", "bert.encoder.layer.0.attention.output.LayerNorm.beta", "bert.encoder.layer.0.intermediate.dense.weight", "bert.encoder.layer.0.intermediate.dense.bias", "bert.encoder.layer.0.output.dense.weight", "bert.encoder.layer.0.output.dense.bias", "bert.encoder.layer.0.output.LayerNorm.gamma", "bert.encoder.layer.0.output.LayerNorm.beta", "bert.encoder.layer.1.attention.self.query.weight", "bert.encoder.layer.1.attention.self.query.bias", "bert.encoder.layer.1.attention.self.key.weight", "bert.encoder.layer.1.attention.self.key.bias", "bert.encoder.layer.1.attention.self.value.weight", "bert.encoder.layer.1.attention.self.value.bias", "bert.encoder.layer.1.attention.output.dense.weight", "bert.encoder.layer.1.attention.output.dense.bias", "bert.encoder.layer.1.attention.output.LayerNorm.gamma", "bert.encoder.layer.1.attention.output.LayerNorm.beta", "bert.encoder.layer.1.intermediate.dense.weight", "bert.encoder.layer.1.intermediate.dense.bias", "bert.encoder.layer.1.output.dense.weight", "bert.encoder.layer.1.output.dense.bias", "bert.encoder.layer.1.output.LayerNorm.gamma", "bert.encoder.layer.1.output.LayerNorm.beta", "bert.encoder.layer.2.attention.self.query.weight", "bert.encoder.layer.2.attention.self.query.bias", "bert.encoder.layer.2.attention.self.key.weight", "bert.encoder.layer.2.attention.self.key.bias", "bert.encoder.layer.2.attention.self.value.weight", "bert.encoder.layer.2.attention.self.value.bias", "bert.encoder.layer.2.attention.output.dense.weight", "bert.encoder.layer.2.attention.output.dense.bias", "bert.encoder.layer.2.attention.output.LayerNorm.gamma", "bert.encoder.layer.2.attention.output.LayerNorm.beta", "bert.encoder.layer.2.intermediate.dense.weight", "bert.encoder.layer.2.intermediate.dense.bias", "bert.encoder.layer.2.output.dense.weight", "bert.encoder.layer.2.output.dense.bias", "bert.encoder.layer.2.output.LayerNorm.gamma", "bert.encoder.layer.2.output.LayerNorm.beta", "bert.encoder.layer.3.attention.self.query.weight", "bert.encoder.layer.3.attention.self.query.bias", "bert.encoder.layer.3.attention.self.key.weight", "bert.encoder.layer.3.attention.self.key.bias", "bert.encoder.layer.3.attention.self.value.weight", "bert.encoder.layer.3.attention.self.value.bias", "bert.encoder.layer.3.attention.output.dense.weight", "bert.encoder.layer.3.attention.output.dense.bias", "bert.encoder.layer.3.attention.output.LayerNorm.gamma", "bert.encoder.layer.3.attention.output.LayerNorm.beta", "bert.encoder.layer.3.intermediate.dense.weight", "bert.encoder.layer.3.intermediate.dense.bias", "bert.encoder.layer.3.output.dense.weight", "bert.encoder.layer.3.output.dense.bias", "bert.encoder.layer.3.output.LayerNorm.gamma", "bert.encoder.layer.3.output.LayerNorm.beta", "bert.encoder.layer.4.attention.self.query.weight", "bert.encoder.layer.4.attention.self.query.bias", "bert.encoder.layer.4.attention.self.key.weight", "bert.encoder.layer.4.attention.self.key.bias", "bert.encoder.layer.4.attention.self.value.weight", "bert.encoder.layer.4.attention.self.value.bias", "bert.encoder.layer.4.attention.output.dense.weight", "bert.encoder.layer.4.attention.output.dense.bias", "bert.encoder.layer.4.attention.output.LayerNorm.gamma", "bert.encoder.layer.4.attention.output.LayerNorm.beta", "bert.encoder.layer.4.intermediate.dense.weight", "bert.encoder.layer.4.intermediate.dense.bias", "bert.encoder.layer.4.output.dense.weight", "bert.encoder.layer.4.output.dense.bias", "bert.encoder.layer.4.output.LayerNorm.gamma", "bert.encoder.layer.4.output.LayerNorm.beta", "bert.encoder.layer.5.attention.self.query.weight", "bert.encoder.layer.5.attention.self.query.bias", "bert.encoder.layer.5.attention.self.key.weight", "bert.encoder.layer.5.attention.self.key.bias", "bert.encoder.layer.5.attention.self.value.weight", "bert.encoder.layer.5.attention.self.value.bias", "bert.encoder.layer.5.attention.output.dense.weight", "bert.encoder.layer.5.attention.output.dense.bias", "bert.encoder.layer.5.attention.output.LayerNorm.gamma", "bert.encoder.layer.5.attention.output.LayerNorm.beta", "bert.encoder.layer.5.intermediate.dense.weight", "bert.encoder.layer.5.intermediate.dense.bias", "bert.encoder.layer.5.output.dense.weight", "bert.encoder.layer.5.output.dense.bias", "bert.encoder.layer.5.output.LayerNorm.gamma", "bert.encoder.layer.5.output.LayerNorm.beta", "bert.encoder.layer.6.attention.self.query.weight", "bert.encoder.layer.6.attention.self.query.bias", "bert.encoder.layer.6.attention.self.key.weight", "bert.encoder.layer.6.attention.self.key.bias", "bert.encoder.layer.6.attention.self.value.weight", "bert.encoder.layer.6.attention.self.value.bias", "bert.encoder.layer.6.attention.output.dense.weight", "bert.encoder.layer.6.attention.output.dense.bias", "bert.encoder.layer.6.attention.output.LayerNorm.gamma", "bert.encoder.layer.6.attention.output.LayerNorm.beta", "bert.encoder.layer.6.intermediate.dense.weight", "bert.encoder.layer.6.intermediate.dense.bias", "bert.encoder.layer.6.output.dense.weight", "bert.encoder.layer.6.output.dense.bias", "bert.encoder.layer.6.output.LayerNorm.gamma", "bert.encoder.layer.6.output.LayerNorm.beta", "bert.encoder.layer.7.attention.self.query.weight", "bert.encoder.layer.7.attention.self.query.bias", "bert.encoder.layer.7.attention.self.key.weight", "bert.encoder.layer.7.attention.self.key.bias", "bert.encoder.layer.7.attention.self.value.weight", "bert.encoder.layer.7.attention.self.value.bias", "bert.encoder.layer.7.attention.output.dense.weight", "bert.encoder.layer.7.attention.output.dense.bias", "bert.encoder.layer.7.attention.output.LayerNorm.gamma", "bert.encoder.layer.7.attention.output.LayerNorm.beta", "bert.encoder.layer.7.intermediate.dense.weight", "bert.encoder.layer.7.intermediate.dense.bias", "bert.encoder.layer.7.output.dense.weight", "bert.encoder.layer.7.output.dense.bias", "bert.encoder.layer.7.output.LayerNorm.gamma", "bert.encoder.layer.7.output.LayerNorm.beta", "bert.encoder.layer.8.attention.self.query.weight", "bert.encoder.layer.8.attention.self.query.bias", "bert.encoder.layer.8.attention.self.key.weight", "bert.encoder.layer.8.attention.self.key.bias", "bert.encoder.layer.8.attention.self.value.weight", "bert.encoder.layer.8.attention.self.value.bias", "bert.encoder.layer.8.attention.output.dense.weight", "bert.encoder.layer.8.attention.output.dense.bias", "bert.encoder.layer.8.attention.output.LayerNorm.gamma", "bert.encoder.layer.8.attention.output.LayerNorm.beta", "bert.encoder.layer.8.intermediate.dense.weight", "bert.encoder.layer.8.intermediate.dense.bias", "bert.encoder.layer.8.output.dense.weight", "bert.encoder.layer.8.output.dense.bias", "bert.encoder.layer.8.output.LayerNorm.gamma", "bert.encoder.layer.8.output.LayerNorm.beta", "bert.encoder.layer.9.attention.self.query.weight", "bert.encoder.layer.9.attention.self.query.bias", "bert.encoder.layer.9.attention.self.key.weight", "bert.encoder.layer.9.attention.self.key.bias", "bert.encoder.layer.9.attention.self.value.weight", "bert.encoder.layer.9.attention.self.value.bias", "bert.encoder.layer.9.attention.output.dense.weight", "bert.encoder.layer.9.attention.output.dense.bias", "bert.encoder.layer.9.attention.output.LayerNorm.gamma", "bert.encoder.layer.9.attention.output.LayerNorm.beta", "bert.encoder.layer.9.intermediate.dense.weight", "bert.encoder.layer.9.intermediate.dense.bias", "bert.encoder.layer.9.output.dense.weight", "bert.encoder.layer.9.output.dense.bias", "bert.encoder.layer.9.output.LayerNorm.gamma", "bert.encoder.layer.9.output.LayerNorm.beta", "bert.encoder.layer.10.attention.self.query.weight", "bert.encoder.layer.10.attention.self.query.bias", "bert.encoder.layer.10.attention.self.key.weight", "bert.encoder.layer.10.attention.self.key.bias", "bert.encoder.layer.10.attention.self.value.weight", "bert.encoder.layer.10.attention.self.value.bias", "bert.encoder.layer.10.attention.output.dense.weight", "bert.encoder.layer.10.attention.output.dense.bias", "bert.encoder.layer.10.attention.output.LayerNorm.gamma", "bert.encoder.layer.10.attention.output.LayerNorm.beta", "bert.encoder.layer.10.intermediate.dense.weight", "bert.encoder.layer.10.intermediate.dense.bias", "bert.encoder.layer.10.output.dense.weight", "bert.encoder.layer.10.output.dense.bias", "bert.encoder.layer.10.output.LayerNorm.gamma", "bert.encoder.layer.10.output.LayerNorm.beta", "bert.encoder.layer.11.attention.self.query.weight", "bert.encoder.layer.11.attention.self.query.bias", "bert.encoder.layer.11.attention.self.key.weight", "bert.encoder.layer.11.attention.self.key.bias", "bert.encoder.layer.11.attention.self.value.weight", "bert.encoder.layer.11.attention.self.value.bias", "bert.encoder.layer.11.attention.output.dense.weight", "bert.encoder.layer.11.attention.output.dense.bias", "bert.encoder.layer.11.attention.output.LayerNorm.gamma", "bert.encoder.layer.11.attention.output.LayerNorm.beta", "bert.encoder.layer.11.intermediate.dense.weight", "bert.encoder.layer.11.intermediate.dense.bias", "bert.encoder.layer.11.output.dense.weight", "bert.encoder.layer.11.output.dense.bias", "bert.encoder.layer.11.output.LayerNorm.gamma", "bert.encoder.layer.11.output.LayerNorm.beta", "bert.encoder.layer.12.attention.self.query.weight", "bert.encoder.layer.12.attention.self.query.bias", "bert.encoder.layer.12.attention.self.key.weight", "bert.encoder.layer.12.attention.self.key.bias", "bert.encoder.layer.12.attention.self.value.weight", "bert.encoder.layer.12.attention.self.value.bias", "bert.encoder.layer.12.attention.output.dense.weight", "bert.encoder.layer.12.attention.output.dense.bias", "bert.encoder.layer.12.attention.output.LayerNorm.gamma", "bert.encoder.layer.12.attention.output.LayerNorm.beta", "bert.encoder.layer.12.intermediate.dense.weight", "bert.encoder.layer.12.intermediate.dense.bias", "bert.encoder.layer.12.output.dense.weight", "bert.encoder.layer.12.output.dense.bias", "bert.encoder.layer.12.output.LayerNorm.gamma", "bert.encoder.layer.12.output.LayerNorm.beta", "bert.encoder.layer.13.attention.self.query.weight", "bert.encoder.layer.13.attention.self.query.bias", "bert.encoder.layer.13.attention.self.key.weight", "bert.encoder.layer.13.attention.self.key.bias", "bert.encoder.layer.13.attention.self.value.weight", "bert.encoder.layer.13.attention.self.value.bias", "bert.encoder.layer.13.attention.output.dense.weight", "bert.encoder.layer.13.attention.output.dense.bias", "bert.encoder.layer.13.attention.output.LayerNorm.gamma", "bert.encoder.layer.13.attention.output.LayerNorm.beta", "bert.encoder.layer.13.intermediate.dense.weight", "bert.encoder.layer.13.intermediate.dense.bias", "bert.encoder.layer.13.output.dense.weight", "bert.encoder.layer.13.output.dense.bias", "bert.encoder.layer.13.output.LayerNorm.gamma", "bert.encoder.layer.13.output.LayerNorm.beta", "bert.encoder.layer.14.attention.self.query.weight", "bert.encoder.layer.14.attention.self.query.bias", "bert.encoder.layer.14.attention.self.key.weight", "bert.encoder.layer.14.attention.self.key.bias", "bert.encoder.layer.14.attention.self.value.weight", "bert.encoder.layer.14.attention.self.value.bias", "bert.encoder.layer.14.attention.output.dense.weight", "bert.encoder.layer.14.attention.output.dense.bias", "bert.encoder.layer.14.attention.output.LayerNorm.gamma", "bert.encoder.layer.14.attention.output.LayerNorm.beta", "bert.encoder.layer.14.intermediate.dense.weight", "bert.encoder.layer.14.intermediate.dense.bias", "bert.encoder.layer.14.output.dense.weight", "bert.encoder.layer.14.output.dense.bias", "bert.encoder.layer.14.output.LayerNorm.gamma", "bert.encoder.layer.14.output.LayerNorm.beta", "bert.encoder.layer.15.attention.self.query.weight", "bert.encoder.layer.15.attention.self.query.bias", "bert.encoder.layer.15.attention.self.key.weight", "bert.encoder.layer.15.attention.self.key.bias", "bert.encoder.layer.15.attention.self.value.weight", "bert.encoder.layer.15.attention.self.value.bias", "bert.encoder.layer.15.attention.output.dense.weight", "bert.encoder.layer.15.attention.output.dense.bias", "bert.encoder.layer.15.attention.output.LayerNorm.gamma", "bert.encoder.layer.15.attention.output.LayerNorm.beta", "bert.encoder.layer.15.intermediate.dense.weight", "bert.encoder.layer.15.intermediate.dense.bias", "bert.encoder.layer.15.output.dense.weight", "bert.encoder.layer.15.output.dense.bias", "bert.encoder.layer.15.output.LayerNorm.gamma", "bert.encoder.layer.15.output.LayerNorm.beta", "bert.encoder.layer.16.attention.self.query.weight", "bert.encoder.layer.16.attention.self.query.bias", "bert.encoder.layer.16.attention.self.key.weight", "bert.encoder.layer.16.attention.self.key.bias", "bert.encoder.layer.16.attention.self.value.weight", "bert.encoder.layer.16.attention.self.value.bias", "bert.encoder.layer.16.attention.output.dense.weight", "bert.encoder.layer.16.attention.output.dense.bias", "bert.encoder.layer.16.attention.output.LayerNorm.gamma", "bert.encoder.layer.16.attention.output.LayerNorm.beta", "bert.encoder.layer.16.intermediate.dense.weight", "bert.encoder.layer.16.intermediate.dense.bias", "bert.encoder.layer.16.output.dense.weight", "bert.encoder.layer.16.output.dense.bias", "bert.encoder.layer.16.output.LayerNorm.gamma", "bert.encoder.layer.16.output.LayerNorm.beta", "bert.encoder.layer.17.attention.self.query.weight", "bert.encoder.layer.17.attention.self.query.bias", "bert.encoder.layer.17.attention.self.key.weight", "bert.encoder.layer.17.attention.self.key.bias", "bert.encoder.layer.17.attention.self.value.weight", "bert.encoder.layer.17.attention.self.value.bias", "bert.encoder.layer.17.attention.output.dense.weight", "bert.encoder.layer.17.attention.output.dense.bias", "bert.encoder.layer.17.attention.output.LayerNorm.gamma", "bert.encoder.layer.17.attention.output.LayerNorm.beta", "bert.encoder.layer.17.intermediate.dense.weight", "bert.encoder.layer.17.intermediate.dense.bias", "bert.encoder.layer.17.output.dense.weight", "bert.encoder.layer.17.output.dense.bias", "bert.encoder.layer.17.output.LayerNorm.gamma", "bert.encoder.layer.17.output.LayerNorm.beta", "bert.encoder.layer.18.attention.self.query.weight", "bert.encoder.layer.18.attention.self.query.bias", "bert.encoder.layer.18.attention.self.key.weight", "bert.encoder.layer.18.attention.self.key.bias", "bert.encoder.layer.18.attention.self.value.weight", "bert.encoder.layer.18.attention.self.value.bias", "bert.encoder.layer.18.attention.output.dense.weight", "bert.encoder.layer.18.attention.output.dense.bias", "bert.encoder.layer.18.attention.output.LayerNorm.gamma", "bert.encoder.layer.18.attention.output.LayerNorm.beta", "bert.encoder.layer.18.intermediate.dense.weight", "bert.encoder.layer.18.intermediate.dense.bias", "bert.encoder.layer.18.output.dense.weight", "bert.encoder.layer.18.output.dense.bias", "bert.encoder.layer.18.output.LayerNorm.gamma", "bert.encoder.layer.18.output.LayerNorm.beta", "bert.encoder.layer.19.attention.self.query.weight", "bert.encoder.layer.19.attention.self.query.bias", "bert.encoder.layer.19.attention.self.key.weight", "bert.encoder.layer.19.attention.self.key.bias", "bert.encoder.layer.19.attention.self.value.weight", "bert.encoder.layer.19.attention.self.value.bias", "bert.encoder.layer.19.attention.output.dense.weight", "bert.encoder.layer.19.attention.output.dense.bias", "bert.encoder.layer.19.attention.output.LayerNorm.gamma", "bert.encoder.layer.19.attention.output.LayerNorm.beta", "bert.encoder.layer.19.intermediate.dense.weight", "bert.encoder.layer.19.intermediate.dense.bias", "bert.encoder.layer.19.output.dense.weight", "bert.encoder.layer.19.output.dense.bias", "bert.encoder.layer.19.output.LayerNorm.gamma", "bert.encoder.layer.19.output.LayerNorm.beta", "bert.encoder.layer.20.attention.self.query.weight", "bert.encoder.layer.20.attention.self.query.bias", "bert.encoder.layer.20.attention.self.key.weight", "bert.encoder.layer.20.attention.self.key.bias", "bert.encoder.layer.20.attention.self.value.weight", "bert.encoder.layer.20.attention.self.value.bias", "bert.encoder.layer.20.attention.output.dense.weight", "bert.encoder.layer.20.attention.output.dense.bias", "bert.encoder.layer.20.attention.output.LayerNorm.gamma", "bert.encoder.layer.20.attention.output.LayerNorm.beta", "bert.encoder.layer.20.intermediate.dense.weight", "bert.encoder.layer.20.intermediate.dense.bias", "bert.encoder.layer.20.output.dense.weight", "bert.encoder.layer.20.output.dense.bias", "bert.encoder.layer.20.output.LayerNorm.gamma", "bert.encoder.layer.20.output.LayerNorm.beta", "bert.encoder.layer.21.attention.self.query.weight", "bert.encoder.layer.21.attention.self.query.bias", "bert.encoder.layer.21.attention.self.key.weight", "bert.encoder.layer.21.attention.self.key.bias", "bert.encoder.layer.21.attention.self.value.weight", "bert.encoder.layer.21.attention.self.value.bias", "bert.encoder.layer.21.attention.output.dense.weight", "bert.encoder.layer.21.attention.output.dense.bias", "bert.encoder.layer.21.attention.output.LayerNorm.gamma", "bert.encoder.layer.21.attention.output.LayerNorm.beta", "bert.encoder.layer.21.intermediate.dense.weight", "bert.encoder.layer.21.intermediate.dense.bias", "bert.encoder.layer.21.output.dense.weight", "bert.encoder.layer.21.output.dense.bias", "bert.encoder.layer.21.output.LayerNorm.gamma", "bert.encoder.layer.21.output.LayerNorm.beta", "bert.encoder.layer.22.attention.self.query.weight", "bert.encoder.layer.22.attention.self.query.bias", "bert.encoder.layer.22.attention.self.key.weight", "bert.encoder.layer.22.attention.self.key.bias", "bert.encoder.layer.22.attention.self.value.weight", "bert.encoder.layer.22.attention.self.value.bias", "bert.encoder.layer.22.attention.output.dense.weight", "bert.encoder.layer.22.attention.output.dense.bias", "bert.encoder.layer.22.attention.output.LayerNorm.gamma", "bert.encoder.layer.22.attention.output.LayerNorm.beta", "bert.encoder.layer.22.intermediate.dense.weight", "bert.encoder.layer.22.intermediate.dense.bias", "bert.encoder.layer.22.output.dense.weight", "bert.encoder.layer.22.output.dense.bias", "bert.encoder.layer.22.output.LayerNorm.gamma", "bert.encoder.layer.22.output.LayerNorm.beta", "bert.encoder.layer.23.attention.self.query.weight", "bert.encoder.layer.23.attention.self.query.bias", "bert.encoder.layer.23.attention.self.key.weight", "bert.encoder.layer.23.attention.self.key.bias", "bert.encoder.layer.23.attention.self.value.weight", "bert.encoder.layer.23.attention.self.value.bias", "bert.encoder.layer.23.attention.output.dense.weight", "bert.encoder.layer.23.attention.output.dense.bias", "bert.encoder.layer.23.attention.output.LayerNorm.gamma", "bert.encoder.layer.23.attention.output.LayerNorm.beta", "bert.encoder.layer.23.intermediate.dense.weight", "bert.encoder.layer.23.intermediate.dense.bias", "bert.encoder.layer.23.output.dense.weight", "bert.encoder.layer.23.output.dense.bias", "bert.encoder.layer.23.output.LayerNorm.gamma", "bert.encoder.layer.23.output.LayerNorm.beta", "bert.pooler.dense.weight", "bert.pooler.dense.bias", "classifier.weight", "classifier.bias", "linear.weight", "linear.bias", "LayerNorm.gamma", "LayerNorm.beta".

@luofuli
Copy link
Collaborator

luofuli commented Dec 14, 2021

I guess it may be you use the --pretrain_model to load the pre-trained model. Please use the --init_checkpoint augment instead (the same as the README.md).
Or you can show your running scripts.

@LemXiong
Copy link
Author

I use the following command

nohup python run_classifier_multi_task.py
--task_name CoLA
--do_train
--do_eval
--do_test
--amp_type O1
--lr_decay_factor 1
--dropout 0.1
--do_lower_case
--detach_index -1
--core_encoder bert
--data_dir GLUE
--vocab_file config/vocab.txt
--bert_config_file config/large_bert_config.json
--init_checkpoint pretrained_model/en_model
--max_seq_length 128
--train_batch_size 16
--learning_rate 2e-5
--num_train_epochs 3
--fast_train
--save_model
--gradient_accumulation_steps 1
--output_dir output > output.log 2>&1 &

I have tried both --pretain_model and --init_checkpoint, the errors were the first and the second respectively

@luofuli
Copy link
Collaborator

luofuli commented Dec 15, 2021

We have tested the case that uses --init_checkpoint and keeps all codes unchanged works. Please don't just keep line 1056.

@LemXiong
Copy link
Author

We have tested the case that uses --init_checkpoint and keeps all codes unchanged works. Please don't just keep line 1056.

Unchanged code can run successfully all the time, but the result is only the half of the paper. And if you print the log you will find it execute the except part not the try part.

@wangwei7175878
Copy link
Collaborator

Could you please post the full log? The picture below is the result of our reproduction under default setting.
截屏2021-12-15 下午12 30 51

@LemXiong
Copy link
Author

nohup: ignoring input
12/14/2021 05:02:17 - INFO - main - COMMAND: run_classifier_multi_task.py --task_name CoLA,MNLI,MRPC,QNLI,QQP,RTE,SST-2,STS-B,WNLI --do_train --do_eval --amp_type O1 --lr_decay_factor 1 --dropout 0.1 --do_lower_case --detach_index -1 --core_encoder bert --data_dir GLUE --vocab_file config/vocab.txt --bert_config_file config/large_bert_config.json --init_checkpoint pretrained_model/en_model --max_seq_length 128 --train_batch_size 16 --learning_rate 2e-5 --num_train_epochs 3 --fast_train --save_model --gradient_accumulation_steps 1 --output_dir output
12/14/2021 05:02:17 - INFO - main - device cuda n_gpu 1 distributed training False
12/14/2021 05:02:30 - INFO - main - LOOKING AT GLUE/MRPC/train.tsv
12/14/2021 05:02:47 - INFO - main - Install apex first if you want to use mix precition.
12/14/2021 05:02:47 - INFO - main - ***** Process training data *****
12/14/2021 05:02:51 - INFO - main - start tokenize
12/14/2021 05:02:58 - INFO - main - start tokenize
12/14/2021 05:03:22 - INFO - main - start tokenize
12/14/2021 05:03:29 - INFO - main - start tokenize
12/14/2021 05:03:48 - INFO - main - start tokenize
12/14/2021 05:04:14 - INFO - main - start tokenize
12/14/2021 05:04:24 - INFO - main - start tokenize
12/14/2021 05:04:36 - INFO - main - start tokenize
12/14/2021 05:04:46 - INFO - main - start tokenize
12/14/2021 05:04:49 - INFO - main - ***** Running training *****
12/14/2021 05:04:49 - INFO - main - Num examples = 949733
12/14/2021 05:04:49 - INFO - main - Num tasks = 9
12/14/2021 05:04:49 - INFO - main - Batch size = 16
12/14/2021 05:04:49 - INFO - main - Num steps = 178074
12/14/2021 05:04:49 - INFO - main - ***** Process dev data *****
12/14/2021 05:04:59 - INFO - main - start tokenize
12/14/2021 05:05:08 - INFO - main - start tokenize
12/14/2021 05:05:18 - INFO - main - start tokenize
12/14/2021 05:05:26 - INFO - main - start tokenize
12/14/2021 05:05:35 - INFO - main - start tokenize
12/14/2021 05:05:46 - INFO - main - start tokenize
12/14/2021 05:05:55 - INFO - main - start tokenize
12/14/2021 05:06:04 - INFO - main - start tokenize
12/14/2021 05:06:13 - INFO - main - start tokenize
12/14/2021 05:06:14 - INFO - main - ***** Running evaluation *****
12/14/2021 05:06:14 - INFO - main - Num examples = 69711
12/14/2021 05:06:14 - INFO - main - Num tasks = 9
12/14/2021 05:06:14 - INFO - main - Batch size = 8

Epoch: 0%| | 0/3 [00:00<?, ?it/s]

Iteration: 100%|██████████| 59359/59359 [53:25<00:00, 14.81it/s]
12/14/2021 06:06:04 - INFO - main - ***** Eval results *****
12/14/2021 06:06:04 - INFO - main - pretrained_model/en_model: CoLA,MNLI,MRPC,QNLI,QQP,RTE,SST-2,STS-B,WNLI: eval_accuracy = [0.3087248322147651, 0.2982643660609762, 0.6078431372549019, 0.537250594911221, 0.5476378926539698, 0.44765342960288806, 0.48394495412844035, 0.0, 0.5211267605633803]
12/14/2021 06:06:04 - INFO - main - pretrained_model/en_model: CoLA,MNLI,MRPC,QNLI,QQP,RTE,SST-2,STS-B,WNLI: eval_corrcoef = [0, 0, 0, 0, 0, 0, 0, -0.18605788966825199, 0]
12/14/2021 06:06:04 - INFO - main - pretrained_model/en_model: CoLA,MNLI,MRPC,QNLI,QQP,RTE,SST-2,STS-B,WNLI: eval_loss = 0.9975722805879683
12/14/2021 06:06:04 - INFO - main - pretrained_model/en_model: CoLA,MNLI,MRPC,QNLI,QQP,RTE,SST-2,STS-B,WNLI: eval_matthew = [-0.018148342420931135, 0, 0, 0, 0, 0, 0, 0, 0]
12/14/2021 06:06:04 - INFO - main - pretrained_model/en_model: CoLA,MNLI,MRPC,QNLI,QQP,RTE,SST-2,STS-B,WNLI: global_step = 47487
12/14/2021 06:06:04 - INFO - main - pretrained_model/en_model: CoLA,MNLI,MRPC,QNLI,QQP,RTE,SST-2,STS-B,WNLI: loss = 0.9634366636350805
12/14/2021 06:06:07 - INFO - main - Save model of Epoch 0

Epoch: 33%|███▎ | 1/3 [59:51<1:59:43, 3591.88s/it]

Iteration: 100%|██████████| 59359/59359 [53:50<00:00, 14.70it/s]
12/14/2021 07:06:37 - INFO - main - ***** Eval results *****
12/14/2021 07:06:37 - INFO - main - pretrained_model/en_model: CoLA,MNLI,MRPC,QNLI,QQP,RTE,SST-2,STS-B,WNLI: eval_accuracy = [0.3087248322147651, 0.2982643660609762, 0.6078431372549019, 0.537250594911221, 0.5476378926539698, 0.44765342960288806, 0.48394495412844035, 0.0, 0.5211267605633803]
12/14/2021 07:06:37 - INFO - main - pretrained_model/en_model: CoLA,MNLI,MRPC,QNLI,QQP,RTE,SST-2,STS-B,WNLI: eval_corrcoef = [0, 0, 0, 0, 0, 0, 0, -0.18605788966825199, 0]
12/14/2021 07:06:37 - INFO - main - pretrained_model/en_model: CoLA,MNLI,MRPC,QNLI,QQP,RTE,SST-2,STS-B,WNLI: eval_loss = 0.9975722805879683
12/14/2021 07:06:37 - INFO - main - pretrained_model/en_model: CoLA,MNLI,MRPC,QNLI,QQP,RTE,SST-2,STS-B,WNLI: eval_matthew = [-0.018148342420931135, 0, 0, 0, 0, 0, 0, 0, 0]
12/14/2021 07:06:37 - INFO - main - pretrained_model/en_model: CoLA,MNLI,MRPC,QNLI,QQP,RTE,SST-2,STS-B,WNLI: global_step = 94974
12/14/2021 07:06:37 - INFO - main - pretrained_model/en_model: CoLA,MNLI,MRPC,QNLI,QQP,RTE,SST-2,STS-B,WNLI: loss = 0.9633064780335283
12/14/2021 07:06:41 - INFO - main - Save model of Epoch 1

Epoch: 67%|██████▋ | 2/3 [2:00:25<1:00:04, 3604.43s/it]

Iteration: 100%|██████████| 59359/59359 [54:27<00:00, 14.53it/s]
12/14/2021 08:07:35 - INFO - main - ***** Eval results *****
12/14/2021 08:07:35 - INFO - main - pretrained_model/en_model: CoLA,MNLI,MRPC,QNLI,QQP,RTE,SST-2,STS-B,WNLI: eval_accuracy = [0.3087248322147651, 0.2982643660609762, 0.6078431372549019, 0.537250594911221, 0.5476378926539698, 0.44765342960288806, 0.48394495412844035, 0.0, 0.5211267605633803]
12/14/2021 08:07:35 - INFO - main - pretrained_model/en_model: CoLA,MNLI,MRPC,QNLI,QQP,RTE,SST-2,STS-B,WNLI: eval_corrcoef = [0, 0, 0, 0, 0, 0, 0, -0.18605788966825199, 0]
12/14/2021 08:07:35 - INFO - main - pretrained_model/en_model: CoLA,MNLI,MRPC,QNLI,QQP,RTE,SST-2,STS-B,WNLI: eval_loss = 0.9975722805879683
12/14/2021 08:07:35 - INFO - main - pretrained_model/en_model: CoLA,MNLI,MRPC,QNLI,QQP,RTE,SST-2,STS-B,WNLI: eval_matthew = [-0.018148342420931135, 0, 0, 0, 0, 0, 0, 0, 0]
12/14/2021 08:07:35 - INFO - main - pretrained_model/en_model: CoLA,MNLI,MRPC,QNLI,QQP,RTE,SST-2,STS-B,WNLI: global_step = 142461
12/14/2021 08:07:35 - INFO - main - pretrained_model/en_model: CoLA,MNLI,MRPC,QNLI,QQP,RTE,SST-2,STS-B,WNLI: loss = 0.9636250535845807
12/14/2021 08:07:38 - INFO - main - Save model of Epoch 2

Epoch: 100%|██████████| 3/3 [3:01:22<00:00, 3620.24s/it]

@wangwei7175878
Copy link
Collaborator

The performance of multi-task mode is related to the similarity between tasks. For example, MNLI and STS-B are similar, results of STS-B when run MNLI and STS-B together is better than run STS-B only, but CoLA and other tasks are not similar. You can get normal accuracy on the premise of running CoLA task alone (--task_name CoLA).

@LemXiong
Copy link
Author

Thank you very much for your patient reply. Seems like it is necessary to install apex for reproduce the correct result.

without apex

12/16/2021 04:50:44 - INFO - __main__ -   pretrained_model/en_model:  MRPC:  eval_accuracy = [0.5735294117647058]
12/16/2021 04:50:44 - INFO - __main__ -   pretrained_model/en_model:  MRPC:  eval_corrcoef = [0]
12/16/2021 04:50:44 - INFO - __main__ -   pretrained_model/en_model:  MRPC:  eval_loss = 0.6800402063949436
12/16/2021 04:50:44 - INFO - __main__ -   pretrained_model/en_model:  MRPC:  eval_matthew = [0]

with apex

12/16/2021 05:16:49 - INFO - __main__ -   pretrained_model/en_model:  MRPC:  eval_accuracy = [0.875]
12/16/2021 05:16:49 - INFO - __main__ -   pretrained_model/en_model:  MRPC:  eval_corrcoef = [0]
12/16/2021 05:16:49 - INFO - __main__ -   pretrained_model/en_model:  MRPC:  eval_loss = 0.46738045994539323
12/16/2021 05:16:49 - INFO - __main__ -   pretrained_model/en_model:  MRPC:  eval_matthew = [0]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants