Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performances of BERT on ACE2005 and MAVEN #4

Closed
alderpaw opened this issue May 17, 2021 · 4 comments
Closed

Performances of BERT on ACE2005 and MAVEN #4

alderpaw opened this issue May 17, 2021 · 4 comments

Comments

@alderpaw
Copy link

Hi, thanks for your work on this dataset.
I notice that you compare the performances of BiLSTM and BERT on both ACE2005 and MAVEN, and it seems that BiLSTM outperforms BERT on ACE2005. However, some papers report different results. For example, in https://www.aclweb.org/anthology/P19-1522/, they report a 80+ F1 score with BERT. And in https://www.aclweb.org/anthology/2020.emnlp-main.435/, results on BERT+MLP is better than DMBERT (76.2 vs 74.9). What do you think of these results?
image
image

@Bakser
Copy link
Member

Bakser commented Aug 16, 2021

Hi, thanks for your interest in our work. I didn't reproduce the two works so I will only give some intuitions here.

  1. The PLMEE work adopts their proposed sophisticated mechanisms to help BERT and the results may be reasonably much higher than vanilla DMBERT.
  2. The BERT+MLP is more similar to DMBERT, but still not the same thing. Also, I am not sure if they run the experiments multiple times and report the averaged results. Due to the large variance on the small ACE 2005 dataset, I will not be surprised if the differences come from only the randomness.

@Bakser Bakser closed this as completed Aug 16, 2021
@alderpaw
Copy link
Author

alderpaw commented Sep 1, 2021

Hi, thanks for your interest in our work. I didn't reproduce the two works so I will only give some intuitions here.

  1. The PLMEE work adopts their proposed sophisticated mechanisms to help BERT and the results may be reasonably much higher than vanilla DMBERT.
  2. The BERT+MLP is more similar to DMBERT, but still not the same thing. Also, I am not sure if they run the experiments multiple times and report the averaged results. Due to the large variance on the small ACE 2005 dataset, I will not be surprised if the differences come from only the randomness.

Thanks for your reply!
I wonder do you use the same split as HMEAE? It seems to be different from the one used in https://github.com/nlpcl-lab/ace2005-preprocessing and leads to a different result.

@Bakser
Copy link
Member

Bakser commented Sep 1, 2021

Hi,
We use the same split as HMEAE, which is also the same as the split you mentioned in fact. But I find that Ziqi uploaded a wrong split file for the example logs in the HMEAE repo and we haven't realized this for such a long time... Now we have fixed the split file in HMEAE repo.
Thanks for bringing this to our attention and sorry for the inconvenience caused.

@alderpaw
Copy link
Author

alderpaw commented Sep 1, 2021

Hi,
We use the same split as HMEAE, which is also the same as the split you mentioned in fact. But I find that Ziqi uploaded a wrong split file for the example logs in the HMEAE repo and we haven't realized this for such a long time... Now we have fixed the split file in HMEAE repo.
Thanks for bringing this to our attention and sorry for the inconvenience caused.

Thanks for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants