Reproduction of Doc2EDAG #37

CarlanLark · 2022-06-30T16:32:15Z

** Idea sharing **
While sharing what you want to do, make sure to protect your ideas.

** Problems **
If you have any questions about event extraction, make sure you have read the latest papers or searched on the Internet.

** Others **
Other things you may want to share or discuss.
Hello, Spico! I'm very glad to talk with you about event extraction. Does the order of event type (o2o, o2m, m2m) in training data important for model performance? I find that the reproduction of Doc2EDAG in your paper is (P=86.2, R=70.8, F=79.0, overall scores), but my reproduction is only (P=79.7, R=73.2, F=76.3, overall scores). I just git clone code from the Github repo in Doc2EDAG paper and run the code without modified data preprocessing.

Spico197 · 2022-07-01T07:08:02Z

Hi there. I think your reproduced results may be macro averaged, while F=79.0 reported in my paper is micro averaged.

CarlanLark · 2022-07-01T09:54:22Z

Thanks for your reply. Unfortunately, the results above are actually micro scores, and here are my reproduction results:
{
"MacroPrecision": 0.7605239514335949,
"MacroRecall": 0.6683277432764718,
"MacroF1": 0.7092646497139452,
"MicroPrecision": 0.797208663819402,
"MicroRecall": 0.7323874583990191,
"MicroF1": 0.7634245649911446,
"TP": 20906,
"FP": 5318,
"FN": 7639
}

We choose the results on the test set by the best model on the dev set and the epoch id is 96.

CarlanLark · 2022-07-01T09:59:33Z

By the way, I find that the main diff between your reproduction results and ours is from the precision score. I wonder whether different python packages will lead the model to different "styles", such as (high precision, low recall) or (low precision, high recall)

CarlanLark · 2022-07-01T10:50:34Z

What's more, in README.md of this repo you said that
# generate data with doc type (o2o, o2m, m2m) for better evaluation
So do you keep the document order of train.json or do you use a new document order? I mean if documents in the training dataset have the order (first o2o, then o2m, then m2m), the model will be first trained on o2o documents. This order may lead the model to prefer less generation (higher precision, lower recall) rather than more generation(lower precision, higher recall), which reason for the difference between your reproduction and my reproduction.

Spico197 · 2022-07-01T13:51:35Z

What's more, in README.md of this repo you said that # generate data with doc type (o2o, o2m, m2m) for better evaluation So do you keep the document order of train.json or do you use a new document order? I mean if documents in the training dataset have the order (first o2o, then o2m, then m2m), the model will be first trained on o2o documents. This order may lead the model to prefer less generation (higher precision, lower recall) rather than more generation(lower precision, higher recall), which reason for the difference between your reproduction and my reproduction.

doc_type is only used in evaluation. We keep the same data loading strategy with the original Doc2EDAG code.
The order isn't really a problem in evaluation. doc_type is used to make fine-grained evaluation results on each document type.

Spico197 · 2022-07-01T14:02:57Z

Your reproduced results are interesting since the original Doc2EDAG paper says Doc2EDAG reaches a MACRO-averaged F1 score of 76.3. Your MICRO-averaged score is only 76.3, which means the macro averaged score is far below the reported score.
Do you change the exec script that Doc2EDAG provides ?

CarlanLark · 2022-07-02T02:34:48Z

I made the reproduction with the following steps:

git clone https://github.com/dolphin-zs/Doc2EDAG.git
unzip Data.zip
modify the code in train_multi.sh
python -m torch.distributed.launch --nproc_per_node ${NUM_GPUS} run_dee_task.py $*
to
python3 -m torch.distributed.launch --nproc_per_node ${NUM_GPUS} run_dee_task.py $*
./train_multi.sh 8 --task_name [TASK_NAME]
I didn't change any parameters in the Doc2EDAG code and then got the results above.

Now I think maybe the version of the packages causes the differences in our reproductions

Spico197 · 2022-07-02T05:05:03Z

Sorry, I don't know... Maybe you could open an issue in the original repo.

CarlanLark · 2022-07-04T03:32:20Z

I will change my python packages version and then do the reproduction again. Looking forward to the final results.

CarlanLark added the discussion Discussion on DocEE and SentEE label Jun 30, 2022

Spico197 added the help wanted Extra attention is needed label Jul 2, 2022

Spico197 closed this as completed Jul 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reproduction of Doc2EDAG #37

Reproduction of Doc2EDAG #37

CarlanLark commented Jun 30, 2022 •

edited

Spico197 commented Jul 1, 2022

CarlanLark commented Jul 1, 2022

CarlanLark commented Jul 1, 2022

CarlanLark commented Jul 1, 2022

Spico197 commented Jul 1, 2022

Spico197 commented Jul 1, 2022

CarlanLark commented Jul 2, 2022 •

edited

Spico197 commented Jul 2, 2022 •

edited

CarlanLark commented Jul 4, 2022

Reproduction of Doc2EDAG #37

Reproduction of Doc2EDAG #37

Comments

CarlanLark commented Jun 30, 2022 • edited

Spico197 commented Jul 1, 2022

CarlanLark commented Jul 1, 2022

CarlanLark commented Jul 1, 2022

CarlanLark commented Jul 1, 2022

Spico197 commented Jul 1, 2022

Spico197 commented Jul 1, 2022

CarlanLark commented Jul 2, 2022 • edited

Spico197 commented Jul 2, 2022 • edited

CarlanLark commented Jul 4, 2022

CarlanLark commented Jun 30, 2022 •

edited

CarlanLark commented Jul 2, 2022 •

edited

Spico197 commented Jul 2, 2022 •

edited