How was pearson correlation calculated in the experiments ?

Hi, thanks for the great works !

There are a bit of details regarding correlation of LaSE in the paper that I did not quite understand. For each target language, the top-5 source languages were used to evaluate LaSE's correlation with ROUGE-2 for out-lang scenario. 

Let's assume those 5 languages are lan1, lan2 .. lan5, with the target language being tgt_lan0. I'm assuming that the procedure is like this: generate summaries in tgt_lan with 5 src_lans to obtain 5 prediction sets pred1, pred2 .. pred5. Aggregate those prediction sets, and evaluate LaSE score with references similar to source languages (ref1, ref2 .. ref5), then calculate ROUGE-2 with ref0. In total, we have `len(pred1)+len(pred2) + .. len(pred5)` scores each for LaSE and ROUGE-2. 

After this, we calculate pearson correlation based on two 1D arrays formed of these two lines of scores. Is this interpretation correct ? If it is, since scores of different references-predictions pair might differ (e.g. a similar score of 0.5 might be bad for certain pairs, but considered good for some other pairs), do you think aggregating them this way is suboptimal ?

Could you help clarify this @Tahmid04 ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How was pearson correlation calculated in the experiments ? #6

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

How was pearson correlation calculated in the experiments ? #6

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions