Initial commit of the TrecQA dataset (#1)

Initial commit of the TrecQA dataset from Yao et al. "Answer Extraction as Sequence Tagging with Tree Edit Distance" in NAACL-HLT 2013. Downloaded from http://cs.jhu.edu/~xuchen/packages/jacana-qa-naacl2013-data-results.tar.bz2
castorini · Apr 2, 2017 · eddb13b · eddb13b
1 parent b4fc0e9
commit eddb13b
Show file tree

Hide file tree

Showing 7 changed files with 453,691 additions and 0 deletions.
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,2 @@
+*~
+.DS_Store
diff --git a/TrecQA/README.md b/TrecQA/README.md
@@ -0,0 +1,30 @@
+TrecQA
+------
+
+The TrecQA dataset is commonly used for evaluating answer selection in question answering. It was first released and then organized by the following papers:
+
++ Wang et al. [What is the Jeopardy Model? A Quasi-Synchronous Grammar for QA.](http://www.aclweb.org/anthology/D07-1003) *EMNLP-CoNLL 2007*.
++ Heilman and Smith. [Tree Edit Models for Recognizing Textual Entailments, Paraphrases,
+and Answers to Questions.](http://www.aclweb.org/anthology/N10-1145) *NAACL 2010*.
++ Yao et al. [Answer Extraction as Sequence Tagging with Tree Edit Distance.](http://www.aclweb.org/anthology/N13-1106) *NAACL-HLT 2013*.
+
+Specifically, we use the data prepared by Yao et al., downloaded from `http://cs.jhu.edu/~xuchen/packages/jacana-qa-naacl2013-data-results.tar.bz2`.
+
+The raw data source, `jacana-qa-naacl2013-data-results.tar.bz2` with an MD5 checksum of `11f0275e95691594cd74825e0c341b7a`, is stored in this repository.
+
+For convenience, the `data/` directory contains the following splits in a pseudo-XML format:
+
++ `TRAIN.xml`
++ `TRAIN-ALL.xml`
++ `DEV.xml`
++ `TEST.xml`
+
+Per the README file in `jacana-qa-naacl2013-data-results.tar.bz2`, the source of the above files are as follows:
+
+```
+train-less-than-40.manual-edit.xml: TRAIN in paper
+train2393.cleanup.xml.gz:           TRAIN-ALL in paper
+dev-less-than-40.manual-edit.xml:   DEV in paper
+test-less-than-40.manual-edit.xml:  TEST in paper
+```
+
diff --git a/TrecQA/data/DEV.xml b/TrecQA/data/DEV.xml
diff --git a/TrecQA/data/TEST.xml b/TrecQA/data/TEST.xml
diff --git a/TrecQA/data/TRAIN-ALL.xml b/TrecQA/data/TRAIN-ALL.xml
diff --git a/TrecQA/data/TRAIN.xml b/TrecQA/data/TRAIN.xml
diff --git a/TrecQA/jacana-qa-naacl2013-data-results.tar.bz2 b/TrecQA/jacana-qa-naacl2013-data-results.tar.bz2