New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reproduction of Doc2EDAG #37
Comments
Hi there. I think your reproduced results may be macro averaged, while F=79.0 reported in my paper is micro averaged. |
Thanks for your reply. Unfortunately, the results above are actually micro scores, and here are my reproduction results: We choose the results on the test set by the best model on the dev set and the epoch id is 96. |
By the way, I find that the main diff between your reproduction results and ours is from the precision score. I wonder whether different python packages will lead the model to different "styles", such as (high precision, low recall) or (low precision, high recall) |
What's more, in README.md of this repo you said that |
|
Your reproduced results are interesting since the original Doc2EDAG paper says Doc2EDAG reaches a MACRO-averaged F1 score of 76.3. Your MICRO-averaged score is only 76.3, which means the macro averaged score is far below the reported score. |
I made the reproduction with the following steps:
Now I think maybe the version of the packages causes the differences in our reproductions |
Sorry, I don't know... Maybe you could open an issue in the original repo. |
I will change my python packages version and then do the reproduction again. Looking forward to the final results. |
** Idea sharing **
While sharing what you want to do, make sure to protect your ideas.
** Problems **
If you have any questions about event extraction, make sure you have read the latest papers or searched on the Internet.
** Others **
Other things you may want to share or discuss.
Hello, Spico! I'm very glad to talk with you about event extraction. Does the order of event type (o2o, o2m, m2m) in training data important for model performance? I find that the reproduction of Doc2EDAG in your paper is (P=86.2, R=70.8, F=79.0, overall scores), but my reproduction is only (P=79.7, R=73.2, F=76.3, overall scores). I just git clone code from the Github repo in Doc2EDAG paper and run the code without modified data preprocessing.
The text was updated successfully, but these errors were encountered: