Skip to content
This repository has been archived by the owner on May 17, 2018. It is now read-only.

Commit

Permalink
Initial commit of the TrecQA dataset (#1)
Browse files Browse the repository at this point in the history
Initial commit of the TrecQA dataset from Yao et al. "Answer Extraction as Sequence Tagging with Tree Edit Distance" in NAACL-HLT 2013. Downloaded from http://cs.jhu.edu/~xuchen/packages/jacana-qa-naacl2013-data-results.tar.bz2
  • Loading branch information
lintool committed Apr 2, 2017
1 parent b4fc0e9 commit eddb13b
Show file tree
Hide file tree
Showing 7 changed files with 453,691 additions and 0 deletions.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
*~
.DS_Store
30 changes: 30 additions & 0 deletions TrecQA/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
TrecQA
------

The TrecQA dataset is commonly used for evaluating answer selection in question answering. It was first released and then organized by the following papers:

+ Wang et al. [What is the Jeopardy Model? A Quasi-Synchronous Grammar for QA.](http://www.aclweb.org/anthology/D07-1003) *EMNLP-CoNLL 2007*.
+ Heilman and Smith. [Tree Edit Models for Recognizing Textual Entailments, Paraphrases,
and Answers to Questions.](http://www.aclweb.org/anthology/N10-1145) *NAACL 2010*.
+ Yao et al. [Answer Extraction as Sequence Tagging with Tree Edit Distance.](http://www.aclweb.org/anthology/N13-1106) *NAACL-HLT 2013*.

Specifically, we use the data prepared by Yao et al., downloaded from `http://cs.jhu.edu/~xuchen/packages/jacana-qa-naacl2013-data-results.tar.bz2`.

The raw data source, `jacana-qa-naacl2013-data-results.tar.bz2` with an MD5 checksum of `11f0275e95691594cd74825e0c341b7a`, is stored in this repository.

For convenience, the `data/` directory contains the following splits in a pseudo-XML format:

+ `TRAIN.xml`
+ `TRAIN-ALL.xml`
+ `DEV.xml`
+ `TEST.xml`

Per the README file in `jacana-qa-naacl2013-data-results.tar.bz2`, the source of the above files are as follows:

```
train-less-than-40.manual-edit.xml: TRAIN in paper
train2393.cleanup.xml.gz: TRAIN-ALL in paper
dev-less-than-40.manual-edit.xml: DEV in paper
test-less-than-40.manual-edit.xml: TEST in paper
```

9,218 changes: 9,218 additions & 0 deletions TrecQA/data/DEV.xml

Large diffs are not rendered by default.

12,087 changes: 12,087 additions & 0 deletions TrecQA/data/TEST.xml

Large diffs are not rendered by default.

397,786 changes: 397,786 additions & 0 deletions TrecQA/data/TRAIN-ALL.xml

Large diffs are not rendered by default.

34,568 changes: 34,568 additions & 0 deletions TrecQA/data/TRAIN.xml

Large diffs are not rendered by default.

Binary file added TrecQA/jacana-qa-naacl2013-data-results.tar.bz2
Binary file not shown.

0 comments on commit eddb13b

Please sign in to comment.