This repository has been archived by the owner on May 17, 2018. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 30
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Initial commit of the TrecQA dataset (#1)
Initial commit of the TrecQA dataset from Yao et al. "Answer Extraction as Sequence Tagging with Tree Edit Distance" in NAACL-HLT 2013. Downloaded from http://cs.jhu.edu/~xuchen/packages/jacana-qa-naacl2013-data-results.tar.bz2
- Loading branch information
Showing
7 changed files
with
453,691 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
*~ | ||
.DS_Store |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
TrecQA | ||
------ | ||
|
||
The TrecQA dataset is commonly used for evaluating answer selection in question answering. It was first released and then organized by the following papers: | ||
|
||
+ Wang et al. [What is the Jeopardy Model? A Quasi-Synchronous Grammar for QA.](http://www.aclweb.org/anthology/D07-1003) *EMNLP-CoNLL 2007*. | ||
+ Heilman and Smith. [Tree Edit Models for Recognizing Textual Entailments, Paraphrases, | ||
and Answers to Questions.](http://www.aclweb.org/anthology/N10-1145) *NAACL 2010*. | ||
+ Yao et al. [Answer Extraction as Sequence Tagging with Tree Edit Distance.](http://www.aclweb.org/anthology/N13-1106) *NAACL-HLT 2013*. | ||
|
||
Specifically, we use the data prepared by Yao et al., downloaded from `http://cs.jhu.edu/~xuchen/packages/jacana-qa-naacl2013-data-results.tar.bz2`. | ||
|
||
The raw data source, `jacana-qa-naacl2013-data-results.tar.bz2` with an MD5 checksum of `11f0275e95691594cd74825e0c341b7a`, is stored in this repository. | ||
|
||
For convenience, the `data/` directory contains the following splits in a pseudo-XML format: | ||
|
||
+ `TRAIN.xml` | ||
+ `TRAIN-ALL.xml` | ||
+ `DEV.xml` | ||
+ `TEST.xml` | ||
|
||
Per the README file in `jacana-qa-naacl2013-data-results.tar.bz2`, the source of the above files are as follows: | ||
|
||
``` | ||
train-less-than-40.manual-edit.xml: TRAIN in paper | ||
train2393.cleanup.xml.gz: TRAIN-ALL in paper | ||
dev-less-than-40.manual-edit.xml: DEV in paper | ||
test-less-than-40.manual-edit.xml: TEST in paper | ||
``` | ||
|
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
Binary file not shown.