Reproduce the result of Chemprot using RoBERTa #22

zhutixiaojie0120 · 2020-10-28T06:54:07Z

Is there anyone who tried to produce the result on Chemprot using RoBERta?

I used the command it provided, but I only got as half of the F-score as it shown on the paper.

-----------Command I used----------------
python -m scripts.train
--config training_config/classifier.jsonnet
--serialization_dir model_logs/chemprot-ROBERTA_CLASSIFIER_BIG-202010271621
--hyperparameters ROBERTA_CLASSIFIER_BIG
--dataset chemprot
--model roberta-base
--device 0
--perf +f1
--evaluate_on_test
--seed 0

-------------Result I got---------------------
2020-10-28 15:47:32,735 - INFO - allennlp.models.archival - archiving weights and vocabulary to model_logs/chemprot-ROBERTA_CLASSIFIER_BIG-202010271621/model.tar.gz
2020-10-28 15:48:00,526 - INFO - allennlp.common.util - Metrics: {
"best_epoch": 2,
"peak_cpu_memory_MB": 4431.752,
"peak_gpu_0_memory_MB": 13629,
"peak_gpu_1_memory_MB": 10,
"training_duration": "0:05:36.203710",
"training_start_epoch": 0,
"training_epochs": 2,
"epoch": 2,
"training_f1": 0.5388954075483176,
"training_accuracy": 0.8424082513792276,
"training_loss": 0.528517140297649,
"training_cpu_memory_MB": 4431.752,
"training_gpu_0_memory_MB": 13629,
"training_gpu_1_memory_MB": 10,
"validation_f1": 0.5084102337176983,
"validation_accuracy": 0.8026370004120313,
"validation_loss": 0.6763799888523001,
"best_validation_f1": 0.5084102337176983,
"best_validation_accuracy": 0.8026370004120313,
"best_validation_loss": 0.6763799888523001,
"test_f1": 0.4786599434625644,
"test_accuracy": 0.7999423464975497,
"test_loss": 0.679223679412495
}

-----The result shown on the paper---------

jonhilgart22 · 2020-11-22T18:35:48Z

I think there is a difference between macro/micro f1 scores that they used in the paper.

Atomu2014 · 2021-01-06T20:32:10Z

Hi, can you figure out which f1 score (micro/macro) is used in the provided training command? I tried to reproduce the CS domain results. It seems my micro f1 score is closer to the paper results, yet my macro f1 is much lower.

kernelmachine · 2021-03-17T07:44:29Z

Hi there, sorry about the delay here. As we report in the paper, we use the micro_f1 score, which is just "accuracy" field in this case.

snat1505027 · 2022-12-05T01:47:30Z

Hi @zhutixiaojie0120 , can you kindly share the environment.yml file that you are using for this task? I am facing a lot of errors due to version mismatch.

kernelmachine closed this as completed Mar 17, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reproduce the result of Chemprot using RoBERTa #22

Reproduce the result of Chemprot using RoBERTa #22

zhutixiaojie0120 commented Oct 28, 2020

jonhilgart22 commented Nov 22, 2020

Atomu2014 commented Jan 6, 2021

kernelmachine commented Mar 17, 2021

snat1505027 commented Dec 5, 2022 •

edited

Reproduce the result of Chemprot using RoBERTa #22

Reproduce the result of Chemprot using RoBERTa #22

Comments

zhutixiaojie0120 commented Oct 28, 2020

jonhilgart22 commented Nov 22, 2020

Atomu2014 commented Jan 6, 2021

kernelmachine commented Mar 17, 2021

snat1505027 commented Dec 5, 2022 • edited

snat1505027 commented Dec 5, 2022 •

edited