You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I got an error when loading the data set using huggingface, as follows:
datasets.exceptions.DatasetGenerationCastError: An error occurred while generating the dataset
All the data files must have the same columns, but at some point there are 2 new columns ({'split', 'index'})
This happened while the json dataset builder was generating data using
hf://datasets/ZachW/MGTDetect_CoCo/gpt3.5-davinci3/gpt3.5-Mixed-davinci3/gpt3.5_mixed_1000_train.jsonl (at revision aa49f92a8667f5a704ff576c728765c236940c6c)
Please either edit the data files to have matching columns, or separate them into different configurations (see docs at https://hf.co/docs/hub/datasets-manual-configuration#multiple-configurations)
My code:
from datasets import load_dataset
dataset = load_dataset("ZachW/MGTDetect_CoCo")
The text was updated successfully, but these errors were encountered:
Hi yongxin, I would suggest using json.loads() directly. You can refer to L13 in preprocess/extract_keywords.py. The 2 columns you mentioned are for the crawler to log the human-written text source.
I got an error when loading the data set using huggingface, as follows:
My code:
The text was updated successfully, but these errors were encountered: