This repository contains the training and test datasets utilized by MemoryBench to evaluate different baselines, including generated dialogues and user feedback.
Our source code illustrates how to process raw data into formats compatible with MemoryBench for both training and testing (refer to the _load_data
method within each dataset class). In contrast, this repository provides preprocessed datasets that are immediately ready for use. You are also free to override the _load_data
method as needed. For example:
import json
from datasets import load_from_disk
# ... Data Class
def _load_data(self, type_="train") -> Dict[str, List[Dict[str, Any]]]:
data = load_from_disk(f"{ROOT_PATH}/dataset/{DATASET_NAME}/{type_}")
data_list = []
for item in data:
new_item = {}
for key, value in item.items():
try:
new_item[key] = json.loads(value) if isinstance(value, str) else value
except:
new_item[key] = value
data_list.append(new_item)
return data_list
Each dataset is split into training and testing sets, with the following core fields:
test_idx
: A unique identifier for each data item.input_prompt
(orinput_chat_messages
): The user input, either as a string (input_prompt
) or as a list of chat messages (input_chat_messages
).dataset_name
: The name of the dataset.lang
: The language of the data item.info
: Additional information for evaluating response quality.dialog
: The dialogue history, whereQwen3-8B
serves as the assistant andQwen3-32B
acts as the User Simulator.implicit_feedback
: The simulated implicit feedback within the dialogue.
Additional fields may be present depending on the dataset, such as references to the corresponding raw data entry or its subclass. These fields are for reference only and are not used in MemoryBench’s training, testing, or evaluation processes.
For the DialSim and Locomo datasets, there is also a corpus split that contains the long context required by these datasets. As these datasets do not have a vanilla baseline, we include dialogue and implicit feedback from other baselines, stored in the dialog_{BASELINE_NAME}
and implicit_feedback_{BASELINE_NAME}
fields, respectively.