-
Notifications
You must be signed in to change notification settings - Fork 16
9. Training datasets
Willza edited this page Jun 10, 2023
·
11 revisions
Resource link: https://huggingface.co/datasets/OpenAssistant/oasst1
# Example code
import pandas as pd
from datasets import load_dataset
ds = load_dataset("OpenAssistant/oasst1")
df_train = ds['train'].to_pandas() # len(train)=84437 (95%)
df_validation = ds['validation'].to_pandas() # len(val)=4401 (5%)
Resource link: https://huggingface.co/datasets/wikitext
# Example code
from datasets import load_dataset
ds = load_dataset('wikitext', 'wikitext-2-v1')
df = ds['train'].to_pandas()
Resource link: https://huggingface.co/datasets/cerebras/SlimPajama-627B
Note: this dataset is large
from datasets import load_dataset
ds = load_dataset("cerebras/SlimPajama-627B")