You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Except the database and index data on huggingface, the train_data.json in the repo could thought to be an example right? Would you mind releasing the full version of train and test dataset for reproducing the result ?
The text was updated successfully, but these errors were encountered:
We explain how to retrieve the library and how to handle training data in index-server/README.md. The pile is used as training corpus.
We used the first 400,000 texts of pile_00.json and the first 400,000 texts of pile_29.json for a total of 800,000 texts as training corpus. Do the truncate operation for text over 1025.
Except the database and index data on huggingface, the train_data.json in the repo could thought to be an example right? Would you mind releasing the full version of train and test dataset for reproducing the result ?
The text was updated successfully, but these errors were encountered: