Skip to content

chaochun/nlu-asdiv-dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

Academia Sinica Diverse MWP Dataset (ASDiv) V1.0

This repository provides ASDiv (a new diverse dataset in terms of both language patterns and problem types) for evaluating and developing MWP Solvers. It contains 2305 english Math Word Problems (MWPs), and is published in this paper "A Diverse Corpus for Evaluating and Developing English Math Word Problem Solvers".

Format

This dataset uses XML format for each MWP:

<Problem ID="nluds-0001" Grade="1" Source="http://www.k5learning.com">
	<Body>Seven red apples and two green apples are in the basket.</Body>
	<Question>How many apples are in the basket?</Question>
	<Solution-Type>Addition</Solution-Type>
	<Answer>9 (apples)</Answer>
	<Formula>7+2=9</Formula>
</Problem>

Each Problem cell includes the corresponding 3 attributes and 5 properties (Body, Question, Solution-Type, Answer, and Formula):

  • Problem attributes:

    • ID: a unique 10-bytes character identifier.
    • Grade: a grade level indicating the level of difficulty.
    • Source: an url identifying the reference source where the MWP came from.
  • Problem properties:

    • Body: the problem text includes facts and clues for solving an MWP.
    • Question: the text identify the question to solve.
    • Solution-Type: a tag which indicates a crucial math operation pattern for solving an MWP. Currently, we provide 24 different common solution types. Details are shown in our paper.
    • Answer: the annodated answer (includes associated measurement units if exists).
    • Formula: the annotated formula.

Multiple-Step MWPs

For multiple-step MWPs, the formula can be represented as an expression tree. The non-leaf nodes are operations and the leaf-nodes are operands. We denote the operation in the root node to be the Solution-type of this MWP, and the other operations on the non-leaf nodes are sub-types (in a depth-first travel manner) of the MWP and are denoted in the Subtype attribute of the Solution-type.

<Problem ID="nluds-2262" Grade="3" Source="http://www.commoncoresheets.com">
  <Body>Sam invited 9 friends to a birthday party, but 6 couldn't come. If he wanted to buy enough cupcakes so each person could have exactly 2,</Body>
  <Question>how many should he buy?</Question>
  <Solution-Type Subtype-1="Subtraction">Multiplication</Solution-Type>
  <Answer>6 (cupcakes)</Answer>
  <Formula>(9-6)*2=6</Formula>
</Problem>

Evaluation

We provide two Fold-ID sets that used in our N-folds cross-validation performance evaluation. One is for the arithmetic subset and another is used for the whole set, you can access them in the dataset/nfolds/asdiv-a and dataset/nfolds/asdiv-w sub-directories respectively.

Citation

This dataset is released under the CC BY-NC 4.0 license, and constructed by Natural Language Understanding laboratory, Institute of Information Science, Academia Sinica. If you find this work useful for your research, please cite our paper:

@inproceedings{miao-etal-2020-diverse,
  title={A Diverse Corpus for Evaluating and Developing English Math Word Problem Solvers},
  author={Miao, Shen-yun and Liang, Chao-Chun and Su, Keh-Yih},
  booktitle={Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics},
  pages={975--984},
  year={2020}
}

Issues

Please contact ccliang@iis.sinica.edu.tw for any suggestions, comments, and issues.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published