Which benchmark do you use in training elmo? #25

jiangtongli · 2018-03-23T03:12:56Z

Sorry to bother again. I find there are two benchmarks in https://github.com/ciprian-chelba/1-billion-word-language-modeling-benchmark, the big one(9.9G) and the small one(1.7G). Would you like to tell me which benchmark do you use in training elmo.

matt-peters · 2018-03-23T17:08:27Z

I downloaded the data from http://www.statmt.org/lm-benchmark/

After expanding the tar ball, I used the all the files in the training-monolingual.tokenized.shuffled directory (approximately 3.9 GB decompressed).

matt-peters closed this as completed Mar 23, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Which benchmark do you use in training elmo? #25

Which benchmark do you use in training elmo? #25

jiangtongli commented Mar 23, 2018

matt-peters commented Mar 23, 2018

Which benchmark do you use in training elmo? #25

Which benchmark do you use in training elmo? #25

Comments

jiangtongli commented Mar 23, 2018

matt-peters commented Mar 23, 2018