Skip to content
Branch: master
Clone or download

Latest commit

Latest commit 8bf5cc3 Feb 26, 2020


Type Name Latest commit message Commit time
Failed to load latest commit information.
_layouts added high level logo to layout Feb 19, 2020
assets/css Add files via upload Jan 30, 2020
break_dataset Add files via upload Feb 1, 2020
images added break high-level logo Feb 19, 2020 Update Feb 26, 2020
_config.yml Add files via upload Jan 30, 2020 Update Feb 3, 2020 Update Feb 1, 2020 Update Feb 3, 2020

Break: A Question Understanding Benchmark

Break is a human annotated dataset of natural language questions and their Question Decomposition Meaning Representations (QDMRs). Break consists of 83,978 examples sampled from 10 question answering datasets over text, images and databases. This repository contains the Break dataset along with information on the exact data format.

For more details check out our TACL paper "Break It Down: A Question Understanding Benchmark", and website.
The code and models presented in our paper, see our repository at:


  • 2/26/2020 Our paper's entire codebase is now available.
  • 1/31/2020 The entire codebase and official leaderboard will be released soon.
  • 1/31/2020 The full Break dataset has been released!

Question Answering Datasets

Data Description


  • QDMR: Contains questions over text, images and databases annotated with their Question Decomposition Meaning Representation. In addition to the train, dev and (hidden) test sets we provide lexicon_tokens files. For each question, the lexicon file contains the set of valid tokens that could potentially appear in its decomposition (Section 3).
  • QDMR high-level: Contains questions annotated with the high-level variant of QDMR. These decomposition are exclusive to Reading Comprehension tasks (Section 2). lexicon_tokens files are also provided.
  • logical-forms: Contains questions and QDMRs annotated with full logical-forms of QDMR operators + arguments. Full logical-forms were inferred by the annotation-consistency algorithm described in Section 4.3.

Data Format

  • QDMR & QDMR high-level:
    • train.csv, dev.csv, test.csv:
      • question_id: The Break question id, of the format [ORIGINAL DATASET]_[original split]_[original id]. E.g., NLVR2_dev_dev-1049-1-1 is from NLVR2 dev split with its NLVR2 id being, dev-1049-1-1.
      • question_text: Original question text.
      • decomposition: The annotated QDMR of the question, its steps delimited by ;. E.g., return flights ;return #1 from washington ;return #2 to boston ;return #3 in the afternoon.
      • operators: List of tagged QDMR operators for each step. QDMR operators are fully described in (Section 2) of the paper. The 14 potential operators are, select, project, filter, aggregate, group, superlative, comparative, union, intersection, discard, sort, boolean, arithmetic, comparison. Unidefntified operators are tagged with None.
      • split: The Break dataset split of the example, train / dev / test.
    • train_lexicon_tokens.json, dev_lexicon_tokens.json, test_lexicon_tokens.json:
      • "source": The source question.
      • "allowed_tokens": The set of valid lexicon tokens that can appear in the QDMR of the question.
  • logical-forms:
    • train.csv, dev.csv, test.csv:
      • question_id: Same as before.
      • question_text: Same as before.
      • decomposition: Same as before.
      • program: List of QDMR operators and arguments that the original QDMR was mapped to. E.g., for the QDMR, return citations ;return #1 of Making database systems usable ;return number of #2, its program is, [ SELECT['citations'], FILTER['#1', 'of Making database systems usable'], AGGREGATE['count', '#2'] ].
      • operators: Same as before.
      • split: Same as before.

Data Statistics

Break question decomposition datasets:

Data Examples Train Dev Test
QDMR 60,150 44,321 (73.7%) 7,760 (12.9%) 8,069 (13.4%)
QDMR High-level 23,828 17,503 (73.5%) 3,130 (13.1%) 3,195 (13.4%)
logical-forms (QDMR) 59,823 44,098 (73.7%) 7,719 (12.9%) 8,006 (13.4%)

QDMR annotations by original dataset:

Data Examples Train Dev Test
Academic 195 195 0 0
ATIS 4,906 4,042 457 407
GeoQuery 877 547 50 280
Spider 7,982 6,955 502 525
CLEVR-humans 13,935 9,453 2,215 2,267
NLVR2 13,517 9,915 1,805 1,797
ComQA 5,520 3,546 988 986
ComplexWebQuestions 2,988 1,985 475 528
DROP 10,230 7,683 1,268 1,279

QDMR High-level annotations by original dataset:

Data Examples Train Dev Test
ComplexWebQuestions 2,991 1,988 475 528
DROP 10,262 7,705 1,273 1,284
HotpotQA-hard 10,575 7,810 1,382 1,383


  title={Break It Down: A Question Understanding Benchmark},
  author={Wolfson, Tomer and Geva, Mor and Gupta, Ankit and Gardner, Matt and Goldberg, Yoav and Deutch, Daniel and Berant, Jonathan},
  journal={Transactions of the Association for Computational Linguistics},
You can’t perform that action at this time.