New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Dataset Descriptions And Instructions #358
Conversation
Let me add a documentation tutorial instead because we can download it with the existing command line args. Will continue with the prepare_dataset.md, but just a quick check before I put more work into it: does that sound good to you @carmocca ? |
Yes! |
This should be complete now. (Or, at least good for review). I suggest merging #466 and #447 first though, because these are the OpenWeb Text and RedPajama documents referenced at the bottom of this doc. Note that a focus of this document is to highlight the use of |
This is the dataset tutorial companion to all the datasets we added recently (Dolly, LIMA, Alpaca Libre). It's updated after the |
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
--max_seq_length