Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproducibility Study #26

Open
KyraGolden opened this issue Apr 18, 2022 · 2 comments
Open

Reproducibility Study #26

KyraGolden opened this issue Apr 18, 2022 · 2 comments

Comments

@KyraGolden
Copy link

Dear Jingjing
We are trying to reproduce your experiments from your paper: "Vocabulary Learning via Optimal Transport for Neural Machine Translation" for a University Seminar at the University of Zurich.
And we are looking for the TED multilingual dataset as it was not found in the GitHub repo.
We would also be very grateful if you could make these datasets available:

  • TED EN-X data​
  • WMT-14 EN-De

Many thanks in advance.

With kind regards,

Kyra & Joëlle

@Jingjing-NLP
Copy link
Owner

Jingjing-NLP commented Apr 19, 2022

Dear Kyra & Joëlle,

Thanks for your interests! You can refer to "examples/run_ende.sh" to reproduce results on WMT-14 En-De. Due to the large size of WMT-14 EN-De, we do not provide the specific data, instead provide the downloading and processing script. Data can be dowloaded and processed via the script: https://github.com/Jingjing-NLP/VOLT/blob/master/examples/prepare-wmt14en2de.sh. If you need the processed data, I can share you a processed version.

Best,
Jingjing

@Jingjing-NLP
Copy link
Owner

Dear Jingjing We are trying to reproduce your experiments from your paper: "Vocabulary Learning via Optimal Transport for Neural Machine Translation" for a University Seminar at the University of Zurich. And we are looking for the TED multilingual dataset as it was not found in the GitHub repo. We would also be very grateful if you could make these datasets available:

  • TED EN-X data​
  • WMT-14 EN-De

Many thanks in advance.

With kind regards,

Kyra & Joëlle

The TED En-X data is stored at https://drive.google.com/drive/folders/1CkcpPu7ovPuvLpCbl1cBLnx510mT0Q2W?usp=sharing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants