Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproduce the result of Chemprot using RoBERTa #22

Closed
zhutixiaojie0120 opened this issue Oct 28, 2020 · 4 comments
Closed

Reproduce the result of Chemprot using RoBERTa #22

zhutixiaojie0120 opened this issue Oct 28, 2020 · 4 comments

Comments

@zhutixiaojie0120
Copy link

Is there anyone who tried to produce the result on Chemprot using RoBERta?

I used the command it provided, but I only got as half of the F-score as it shown on the paper.

-----------Command I used----------------
python -m scripts.train
--config training_config/classifier.jsonnet
--serialization_dir model_logs/chemprot-ROBERTA_CLASSIFIER_BIG-202010271621
--hyperparameters ROBERTA_CLASSIFIER_BIG
--dataset chemprot
--model roberta-base
--device 0
--perf +f1
--evaluate_on_test
--seed 0

-------------Result I got---------------------
2020-10-28 15:47:32,735 - INFO - allennlp.models.archival - archiving weights and vocabulary to model_logs/chemprot-ROBERTA_CLASSIFIER_BIG-202010271621/model.tar.gz
2020-10-28 15:48:00,526 - INFO - allennlp.common.util - Metrics: {
"best_epoch": 2,
"peak_cpu_memory_MB": 4431.752,
"peak_gpu_0_memory_MB": 13629,
"peak_gpu_1_memory_MB": 10,
"training_duration": "0:05:36.203710",
"training_start_epoch": 0,
"training_epochs": 2,
"epoch": 2,
"training_f1": 0.5388954075483176,
"training_accuracy": 0.8424082513792276,
"training_loss": 0.528517140297649,
"training_cpu_memory_MB": 4431.752,
"training_gpu_0_memory_MB": 13629,
"training_gpu_1_memory_MB": 10,
"validation_f1": 0.5084102337176983,
"validation_accuracy": 0.8026370004120313,
"validation_loss": 0.6763799888523001,
"best_validation_f1": 0.5084102337176983,
"best_validation_accuracy": 0.8026370004120313,
"best_validation_loss": 0.6763799888523001,
"test_f1": 0.4786599434625644,
"test_accuracy": 0.7999423464975497,
"test_loss": 0.679223679412495
}

-----The result shown on the paper---------
image

@jonhilgart22
Copy link

I think there is a difference between macro/micro f1 scores that they used in the paper.

@Atomu2014
Copy link

Hi, can you figure out which f1 score (micro/macro) is used in the provided training command? I tried to reproduce the CS domain results. It seems my micro f1 score is closer to the paper results, yet my macro f1 is much lower.

@kernelmachine
Copy link
Contributor

Hi there, sorry about the delay here. As we report in the paper, we use the micro_f1 score, which is just "accuracy" field in this case.

@snat1505027
Copy link

snat1505027 commented Dec 5, 2022

Hi @zhutixiaojie0120 , can you kindly share the environment.yml file that you are using for this task? I am facing a lot of errors due to version mismatch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants