How to reproduce the result of StructBert on SST-B? #34

sangyx · 2022-02-10T02:51:41Z

Hi, I can not reproduce the result reported in the paper by the code example:

python run_classifier_multi_task.py \
  --task_name STS-B \
  --do_train \
  --do_eval \
  --do_test \
  --lr_decay_factor 1 \
  --dropout 0.1 \
  --do_lower_case \
  --detach_index -1 \
  --core_encoder bert \
  --data_dir data \
  --vocab_file config/vocab.txt \
  --bert_config_file config/large_bert_config.json \
  --init_checkpoint model/en_model \
  --max_seq_length 128 \
  --train_batch_size 32 \
  --learning_rate 2e-5 \
  --num_train_epochs 3 \
  --fast_train \
  --gradient_accumulation_steps 1 \
  --output_dir output \
  --amp_type O1

Are there any hyper-params I set wrong?

The text was updated successfully, but these errors were encountered:

wangwei7175878 · 2022-03-18T02:50:20Z

Hi, two things that may cause the difference in results. First, a trick common to GLUE is fine-tuning MNLI before STS-B and using this checkpoint as initialization; second, the results reported in the paper are the numbers on the test set.

wangwei7175878 closed this as completed Mar 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to reproduce the result of StructBert on SST-B? #34

How to reproduce the result of StructBert on SST-B? #34

sangyx commented Feb 10, 2022

wangwei7175878 commented Mar 18, 2022

How to reproduce the result of StructBert on SST-B? #34

How to reproduce the result of StructBert on SST-B? #34

Comments

sangyx commented Feb 10, 2022

wangwei7175878 commented Mar 18, 2022