Skip to content

bebr2/MemoryBench-Dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 

Repository files navigation

MemoryBench-Dataset

This repository contains the training and test datasets utilized by MemoryBench to evaluate different baselines, including generated dialogues and user feedback.

Our source code illustrates how to process raw data into formats compatible with MemoryBench for both training and testing (refer to the _load_data method within each dataset class). In contrast, this repository provides preprocessed datasets that are immediately ready for use. You are also free to override the _load_data method as needed. For example:

import json
from datasets import load_from_disk

# ... Data Class

def _load_data(self, type_="train") -> Dict[str, List[Dict[str, Any]]]:
  data = load_from_disk(f"{ROOT_PATH}/dataset/{DATASET_NAME}/{type_}")
  data_list = []
  for item in data:
      new_item = {}
      for key, value in item.items():
          try:
              new_item[key] = json.loads(value) if isinstance(value, str) else value
          except:
              new_item[key] = value
      data_list.append(new_item)
  return data_list

Dataset Structure

Each dataset is split into training and testing sets, with the following core fields:

  • test_idx: A unique identifier for each data item.
  • input_prompt (or input_chat_messages): The user input, either as a string (input_prompt) or as a list of chat messages (input_chat_messages).
  • dataset_name: The name of the dataset.
  • lang: The language of the data item.
  • info: Additional information for evaluating response quality.
  • dialog: The dialogue history, where Qwen3-8B serves as the assistant and Qwen3-32B acts as the User Simulator.
  • implicit_feedback: The simulated implicit feedback within the dialogue.

Additional fields may be present depending on the dataset, such as references to the corresponding raw data entry or its subclass. These fields are for reference only and are not used in MemoryBench’s training, testing, or evaluation processes.

For the DialSim and Locomo datasets, there is also a corpus split that contains the long context required by these datasets. As these datasets do not have a vanilla baseline, we include dialogue and implicit feedback from other baselines, stored in the dialog_{BASELINE_NAME} and implicit_feedback_{BASELINE_NAME} fields, respectively.

About

Dataset used for training and testing in MemoryBench.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published