-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Europarl #25
Comments
Temporarily closing while we finish version 1. |
I could pull this, clean it up and look to see how it's organized if we are still interested. The parallel texts in many languages is interesting too. For v1, do we still want to keep all languages in though? |
We are pretty much done with V1 (8 GiB short) so I went ahead and removed
the “deferred” label from all of the V2 datasets. You’re welcome to do
this, but it’s going in V2 not V1.
…On Sun, Sep 20, 2020 at 10:59 PM Travis Hoppe ***@***.***> wrote:
I could pull this, clean it up and look to see how it's organized if we
are still interested. The parallel texts in many languages is interesting
too. For v1, do we still want to keep all languages in though?
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
<#25 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADZJVMDWU2VEL6EYJR4NQO3SG26RHANCNFSM4Q5A5NRQ>
.
|
Starting the processing on this. For reference, the data file is 1.5GB but it takes over 14 hours to download from the main site. |
This is complete. The processing code is here https://github.com/thoppe/The-Pile-EuroParl with the temporary download link here https://drive.google.com/file/d/15kQ6jAGHsI3ZrA0ibXGuTmzGdib9NA63/view?usp=sharing
Once incorporated, this issue can be close and moved to the completed section. |
Transcripts from EU Parliament meetings from 1996 to 2011. Contains approximately 4.5 GB of text.
Languages: French, Italian, Spanish, Portuguese, Romanian, English, Dutch, German, Danish, Swedish, Bulgarian, Czech, Polish, Slovak, Slovene, Finnish, Hungarian, Estonian, Latvian, Lithuanian, and Greek.
Link: www.statmt.org/europarl/
The text was updated successfully, but these errors were encountered: