Skip to content

NPoe/lowresourcecqa

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

lowresourcecqa

This is a dataset associated with the following publication:

@inproceedings{poerner-schutze-2019-multi,
    title = "Multi-View Domain Adapted Sentence Embeddings for Low-Resource Unsupervised Duplicate Question Detection",
    author = {Poerner, Nina  and Schütze, Hinrich},
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)",
    month = nov,
    year = "2019",
    address = "Hong Kong, China",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/D19-1173",
    doi = "10.18653/v1/D19-1173",
    pages = "1630--1641"
}

Data

data/lowresource.zip

Contains training and test data from 12 low-resource Stack Exchange forums (December 2018 data dump). The zipfile contains 36 files (3 per forum):

$FORUM.stackexchange.train.tsv

Unlabeled questions for representation training. Unlabeld questions are all questions that are not flagged as duplicates. They may or may not be originals of duplicates. The TSV is structured as follows:

  • qid: question ID from Stack Exchange
  • title: preprocessed question title
  • body: preprocessed question body
  • date: time stamp from Stack Exchange

We concatenate question titles and bodies when training or calculating question representations. The time stamp is for reference only, we did not use it in our experiments.

$FORUM.stackexchange.test.tsv

All labeled duplicates. Structured like $FORUM.stackexchange.train.tsv.

$FORUM.stackexchange.gt.tsv

Ground truth question pairs. Structured as follows:

  • qid: question ID of duplicate. Always from $FORUM.stackexchange.test.tsv
  • rqid: question ID of original. May be from $FORUM.stackexchange.test.tsv or $FORUM.stackexchange.train.tsv

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published