Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to reproduce the result of StructBert on SST-B? #34

Closed
sangyx opened this issue Feb 10, 2022 · 1 comment
Closed

How to reproduce the result of StructBert on SST-B? #34

sangyx opened this issue Feb 10, 2022 · 1 comment

Comments

@sangyx
Copy link

sangyx commented Feb 10, 2022

Hi, I can not reproduce the result reported in the paper by the code example:

python run_classifier_multi_task.py \
  --task_name STS-B \
  --do_train \
  --do_eval \
  --do_test \
  --lr_decay_factor 1 \
  --dropout 0.1 \
  --do_lower_case \
  --detach_index -1 \
  --core_encoder bert \
  --data_dir data \
  --vocab_file config/vocab.txt \
  --bert_config_file config/large_bert_config.json \
  --init_checkpoint model/en_model \
  --max_seq_length 128 \
  --train_batch_size 32 \
  --learning_rate 2e-5 \
  --num_train_epochs 3 \
  --fast_train \
  --gradient_accumulation_steps 1 \
  --output_dir output \
  --amp_type O1

Are there any hyper-params I set wrong?

@wangwei7175878
Copy link
Collaborator

Hi, two things that may cause the difference in results. First, a trick common to GLUE is fine-tuning MNLI before STS-B and using this checkpoint as initialization; second, the results reported in the paper are the numbers on the test set.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants