WMT15 BaryScore #3

jbgruenwald · 2022-02-04T18:20:30Z

I also have a problem with reproducing your results. If I run your command_line.sh on WMT15 de-en I get a pearson correlation of -0.3559000480869218, but in your paper you reported 75.9. how did you run these experiments?

PierreColombo · 2022-02-04T18:54:54Z

Hi,
Thanks for opening the issue ! This is indeed a problem.
We report absolute pearson correlations.
Which model/layers do you use ? What \espilon do you use ? And which distance/divergence do you use ?
Kindest regards,

jbgruenwald · 2022-02-04T21:22:35Z

I didn't change model/layer/epsilon. just the default settings from the command_line.sh in your latest commit. Only thing I changed was adding the lines for the pearson correlation and putting the wmt15 newstest de-en into the samples folder

PierreColombo · 2022-02-04T21:30:55Z

Can you run by considering the last 3/5 layers and check the different div/metric.
As mentioned in the mail. These code were developed while I was at IBM that is why I cannot publish the full project.

You should be able to reproduce the exact same results (a co-author did) but I cannot tell you the exact parameters now. I will try to have access to the file.
Did you try to reproduce the results for Bleu / bert score / meteor ?

jbgruenwald · 2022-02-04T22:56:28Z

ok, i just tried various layers: last 5 layers is -0.3559000480869218, last 3 layers is -0.44103185124368643, some other values were similar.

If i try to run depthscore on the same data, I get similar correlation. btw: depthscore logs some warnings:

/home/repos/nlg_eval_via_simi_measures/depth_score.py:200: RuntimeWarning: divide by zero encountered in true_divide | 0/3 [00:00<?, ?it/s]
square_inv_matrix = u / np.sqrt(s)
/home/repos/nlg_eval_via_simi_measures/depth_score.py:202: RuntimeWarning: invalid value encountered in matmul
return X_transf @ square_inv_matrix

and InfoLM even stops with an error:

Traceback (most recent call last):
File "score_cli.py", line 110, in
main()
File "score_cli.py", line 98, in main
preds = metric.evaluate_batch(candidate_batch, golden_batch, idf_hyps=idf_hyps, idf_ref=idf_ref)
TypeError: evaluate_batch() got an unexpected keyword argument 'idf_hyps'

Yes, for e.g. BERTScore I get the results, that the authors reported in their paper

jbgruenwald · 2022-02-04T22:58:28Z

or could you please tell me the results of the examples in the samples folder? maybe that helps finding the solution

PierreColombo · 2022-02-05T08:09:54Z

There is no results on wmt 15 in the bertscore paper. Can you tell me the correlations you get with Bleu and bertscore ?

jbgruenwald · 2022-02-05T09:22:27Z

BERTScore published results on WMT18 and there is a list of how BERTScore performs using various models on WMT16 linked on their github repo: https://docs.google.com/spreadsheets/d/1RKOVpselB98Nnh_EOC4A2BYn8_201tmPODpNWu4w7xI/edit#gid=0

BERTScore gives me a correlation of 0.7485 on WMT15, BLEU I didn't try yet, I'll send it later

PierreColombo · 2022-02-05T10:27:27Z

since you mention different scores and different datasets. I released some of the raw scores ... See raw score folders.

Thanks for the typos in the cli it has been corrected

.

jbgruenwald · 2022-02-05T11:49:00Z

thanks a lot for these scores. That looks really helpful. Do you have the script, that created these files in your repo? I couldn't configure score_cli.py to report these models.

for example roberta-base_lm_wsw_nbarycentersTrue_range(8, 13) means you use roberta-base in baryscore? then I guess range(8, 13) means last_layers=5? nbarycentersTrue means, you take the 'baryscore_W' from the score dictionary? and what does lm and wsw mean?

PierreColombo · 2022-02-05T15:10:42Z

I update the raw score folder for full reproducibility. You can now reproduce the results.
The issue is solved. I close it.

PierreColombo closed this as completed Feb 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WMT15 BaryScore #3

WMT15 BaryScore #3

jbgruenwald commented Feb 4, 2022

PierreColombo commented Feb 4, 2022 •

edited

jbgruenwald commented Feb 4, 2022

PierreColombo commented Feb 4, 2022

jbgruenwald commented Feb 4, 2022 •

edited

jbgruenwald commented Feb 4, 2022

PierreColombo commented Feb 5, 2022

jbgruenwald commented Feb 5, 2022

PierreColombo commented Feb 5, 2022 •

edited

jbgruenwald commented Feb 5, 2022

PierreColombo commented Feb 5, 2022

WMT15 BaryScore #3

WMT15 BaryScore #3

Comments

jbgruenwald commented Feb 4, 2022

PierreColombo commented Feb 4, 2022 • edited

jbgruenwald commented Feb 4, 2022

PierreColombo commented Feb 4, 2022

jbgruenwald commented Feb 4, 2022 • edited

jbgruenwald commented Feb 4, 2022

PierreColombo commented Feb 5, 2022

jbgruenwald commented Feb 5, 2022

PierreColombo commented Feb 5, 2022 • edited

jbgruenwald commented Feb 5, 2022

PierreColombo commented Feb 5, 2022

PierreColombo commented Feb 4, 2022 •

edited

jbgruenwald commented Feb 4, 2022 •

edited

PierreColombo commented Feb 5, 2022 •

edited