MemoryBench-Dataset

This repository contains the training and test datasets utilized by MemoryBench to evaluate different baselines, including generated dialogues and user feedback.

Our source code illustrates how to process raw data into formats compatible with MemoryBench for both training and testing (refer to the _load_data method within each dataset class). In contrast, this repository provides preprocessed datasets that are immediately ready for use. You are also free to override the _load_data method as needed. For example:

import json
from datasets import load_from_disk

# ... Data Class

def _load_data(self, type_="train") -> Dict[str, List[Dict[str, Any]]]:
  data = load_from_disk(f"{ROOT_PATH}/dataset/{DATASET_NAME}/{type_}")
  data_list = []
  for item in data:
      new_item = {}
      for key, value in item.items():
          try:
              new_item[key] = json.loads(value) if isinstance(value, str) else value
          except:
              new_item[key] = value
      data_list.append(new_item)
  return data_list

Dataset Structure

Each dataset is split into training and testing sets, with the following core fields:

test_idx: A unique identifier for each data item.
input_prompt (or input_chat_messages): The user input, either as a string (input_prompt) or as a list of chat messages (input_chat_messages).
dataset_name: The name of the dataset.
lang: The language of the data item.
info: Additional information for evaluating response quality.
dialog: The dialogue history, where Qwen3-8B serves as the assistant and Qwen3-32B acts as the User Simulator.
implicit_feedback: The simulated implicit feedback within the dialogue.

Additional fields may be present depending on the dataset, such as references to the corresponding raw data entry or its subclass. These fields are for reference only and are not used in MemoryBench’s training, testing, or evaluation processes.

For the DialSim and Locomo datasets, there is also a corpus split that contains the long context required by these datasets. As these datasets do not have a vanilla baseline, we include dialogue and implicit feedback from other baselines, stored in the dialog_{BASELINE_NAME} and implicit_feedback_{BASELINE_NAME} fields, respectively.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
dataset		dataset
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MemoryBench-Dataset

Dataset Structure

About

Uh oh!

Releases

Packages

bebr2/MemoryBench-Dataset

Folders and files

Latest commit

History

Repository files navigation

MemoryBench-Dataset

Dataset Structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages