This folder contains the proposed dataset. See data_generation_and_evaluation for description how the files are obtained.
qg_train_data_raw.jsonl and qg_test_data_raw.jsonl are raw dataset with all sub-questions, corresponding answers and feedback (in case of train) produced by ChatGPT.
qg_train_dataset.jsonl and qg_test_dataset.jsonl is a form of the dataset that is used for training baselines in our work. It contains only input problems, sub-questions and rewards.
See baselines for examples of the dataset usage.
test_chat.jsonl is a result of ChatGPT answering its own sub-questions on GSM8K test dataset.