CAQA

Pytorch Implementation of the EMNLP 2021 Paper Contrastive Domain Adaptation for Question Answering using Limited Text Corpora

We propose a novel framework for domain adaptation called contrastive domain adaptation for QA (CAQA). Specifically, CAQA combines techniques from question generation and domain-invariant learning to answer out-of-domain questions in settings with limited text corpora. Here, we train a QA system on both source data and generated data from the target domain with a contrastive adaptation loss that is incorporated in the training objective. By combining techniques from question generation and domain-invariant learning, our model achieved considerable improvements compared to state-of-the-art baselines

Citing

Please cite the following paper if you use our methods in your research:

@inproceedings{yue2021contrastive,
  title={Contrastive Domain Adaptation for Question Answering using Limited Text Corpora},
  author={Yue, Zhenrui and Kratzwald, Bernhard and Feuerriegel, Stefan},
  booktitle={Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
  year={2021}
}

Data & Requirements

Download MRQA datasets with the following command

bash download_data.sh

To run our code you need PyTorch & Transformers, see requirements.txt for our running environment

Train QAGen-T5 Generation Model

Our QAGen-T5 model consists of a question generation model and an answer generation model, we train our models with question_generation and we provide all generated synthetic data here

Train BERT-QA with Contrastive Adaptation Loss

With the generated synthetic data, BERT-QA model can be trained using both SQuAD and the synthetic data, run following command to train a model with the proposed contrastive adaptation loss

CUDA_VISIBLE_DEVICES=0 python src/bert_squad_trans.py \
    --do_lower_case \
    --adaptation_method=smooth_can \
    --do_train \
    --train_both \
    --beta=0.001 \
    --sigma=0.001 \
    --mrqa_train_file=./data/synthetic/qagen_t5large_LM_5_10000_HotpotQA.jsonl \
    --do_predict \
    --predict_squad

Performance

Acknowledgement

During the implementation we base our code mostly on Transformers from Hugging Face, question_generation by Suraj Patil and MRQA-Shared-Task-2019 by Fisch et al. Many thanks to these authors for their great work!

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
pics		pics
src		src
.gitignore		.gitignore
README.md		README.md
download_data.sh		download_data.sh
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CAQA

Citing

Data & Requirements

Train QAGen-T5 Generation Model

Train BERT-QA with Contrastive Adaptation Loss

Performance

Acknowledgement

About

Releases

Packages

Languages

Yueeeeeeee/CAQA

Folders and files

Latest commit

History

Repository files navigation

CAQA

Citing

Data & Requirements

Train QAGen-T5 Generation Model

Train BERT-QA with Contrastive Adaptation Loss

Performance

Acknowledgement

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages