MuDABench

A benchmark for large-scale multi-document analytical question answering

Overview

MuDABench is a benchmark for multi-document analytical question answering over large-scale document collections.

The benchmark focuses on analytical QA over Chinese A-share & US market documents, where each question requires aggregating and reasoning over information from multiple financial documents, rather than extracting an answer from a single source.

MuDABench is designed to evaluate systems that combine:

multi-document retrieval
long-context evidence aggregation
structured financial information understanding
analytical reasoning over heterogeneous documents
concise and detailed answer generation

Dataset

The public release contains:

File / Directory	Description
`data/simple.json`	166 QA samples with concise final answers
`data/complex.json`	166 QA samples with more detailed analytical final answers
`data/pdf/`	589 source PDF files referenced by the QA samples

Each QA sample is paired with document-level structured evidence and reference answers.

Hugging Face Dataset

The dataset is available on Hugging Face:

https://huggingface.co/datasets/Zhanli-Li/MuDABench

Data Format

Each item in data/simple.json or data/complex.json is a multi-document analytical QA sample.

{
  "question": "...",
  "metadata": [
    {
      "id": "uuid-used-as-pdf-filename",
      "symbol": "company ticker",
      "year": 2021,
      "doctype": "document type",
      "schema": {
        "value_xxx": "field meaning"
      },
      "value_xxx": "structured value"
    }
  ],
  "source_answer": "intermediate supporting facts (text)",
  "final_answer": "reference final answer"
}

Field Description

Field	Description
`question`	The analytical question to be answered
`metadata`	Document-level structured evidence used by the question
`metadata[].id`	UUID that matches the corresponding PDF filename stem in `data/pdf/`
`metadata[].symbol`	Company ticker / stock symbol
`metadata[].year`	Document year
`metadata[].doctype`	Type of the referenced document
`metadata[].schema`	Semantics of the structured `value_*` fields
`source_answer`	Intermediate supporting facts
`final_answer`	Reference final answer

Note: Different questions may use different subsets of value_* fields.

File Structure

MuDABench/
├── data/
│   ├── simple.json
│   ├── complex.json
│   └── pdf/
├── fig/
│   └── case.png
├── LICENSE
└── README.md

Intended Use

MuDABench can be used for:

evaluating multi-document analytical QA systems
benchmarking retrieval-augmented generation pipelines
testing long-context reasoning over financial documents
studying Chinese financial document understanding
comparing concise-answer and complex-answer QA performance

Example Use Cases

MuDABench is suitable for evaluating whether a system can:

retrieve relevant documents from a large PDF collection;
identify structured evidence across multiple companies, years, or document types;
aggregate scattered financial facts;
perform analytical comparison or reasoning;
generate faithful final answers grounded in the evidence.

Repository Links

GitHub: https://github.com/Zhanli-Li/MuDABench
Hugging Face: https://huggingface.co/datasets/Zhanli-Li/MuDABench

Citation

If you find MuDABench useful for your research, please consider citing this work.

@misc{mudabench2026,
  title        = {MuDABench: A Benchmark for Large-Scale Multi-Document Analysis},
  author       = {Li, Zhanli and others},
  year         = {2026},
  note         = {ACL 2026},
  howpublished = {\url{https://github.com/Zhanli-Li/MuDABench}}
}

License

MuDABench is released under the Apache License 2.0. See LICENSE for more details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MuDABench

Overview

Dataset

Hugging Face Dataset

Data Format

Field Description

File Structure

Intended Use

Example Use Cases

Repository Links

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
data		data
fig		fig
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

MuDABench

Overview

Dataset

Hugging Face Dataset

Data Format

Field Description

File Structure

Intended Use

Example Use Cases

Repository Links

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Packages