Results for PRIMERA-arxiv #12

oaimli · 2022-05-23T00:20:46Z

Hi,

Thanks for sharing this nice work. After running your codes, I can just get 28.5 for RougeL-fmeasure for arxiv dataset, but in your paper the Rouge-L is 42.6 in Table 3, while Rouge-1 and Rouge-2 are the same as yours. Moreover, I can only get 46.6/19.1/27.5 for Rouge-1/2/L with led-large-16384-arxiv (i.e., the SOTA for arxiv), but in your Table 3, it is 41.8 for Rouge-L. Could you please helping to explain how you get such high Rouge-L values for arxiv dataset?

Wendy-Xiao · 2022-05-26T02:50:35Z

Hi,

Yes, it is because of the inconsistency on how to deal with sentences in the summary when measuring ROUGE-L.

As in previous work, the sentences in the generated summaries and ground-truth summaries are aggregated with '\n' in between

s1 . \n s2 . \n s3 . \n ...

but in the setting of our experiments the sentences are aggregated with ' ' in between

s1 . s2 . s3 . ....

To make the results comparable with previous works, we split the summaries (both generated and ground-truth) with '.' and aggregated them with '\n' in between.

The ROUGE-L computed in this way is consistent with previous work, which is also the number shown in our paper. (same for led)

oaimli · 2022-05-26T10:07:44Z

Hi!

Thank you so much for giving me a reply and sharing how you get the numbers. I can now reproduce the results reported in the paper for the arxiv datasets after splitting sentences with '\n'. If this is the case, in Table 3 of your paper, under the column 'RougeL', you reported RougeL for multi-news, multi-xscience, and wcep, but RougeLsum for arxiv dataset. This may be a little confusing. If so, would you mind telling me the reason why you use different measurements in the same column of the result table? Much appreciated!

Again, thanks for sharing your nice work.

Wendy-Xiao · 2022-07-18T17:42:35Z

Hi there,

Sorry for the late reply. I do not have any particular reason for that, just to match the results of previous works on different datasets. The difference between datasets might come from the natural format of the original datasets, i.e. some datasets are built in the format as '\n'-splitted summaries, and some are built with the summary as a complete paragraph.

oaimli · 2022-07-25T13:48:21Z

Thanks for your kind reply!

oaimli closed this as completed Jul 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Results for PRIMERA-arxiv #12

Results for PRIMERA-arxiv #12

oaimli commented May 23, 2022

Wendy-Xiao commented May 26, 2022 •

edited

oaimli commented May 26, 2022 •

edited

Wendy-Xiao commented Jul 18, 2022

oaimli commented Jul 25, 2022

Results for PRIMERA-arxiv #12

Results for PRIMERA-arxiv #12

Comments

oaimli commented May 23, 2022

Wendy-Xiao commented May 26, 2022 • edited

oaimli commented May 26, 2022 • edited

Wendy-Xiao commented Jul 18, 2022

oaimli commented Jul 25, 2022

Wendy-Xiao commented May 26, 2022 •

edited

oaimli commented May 26, 2022 •

edited