Skip to content

DM2-ND/TransTQA

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
img
 
 
mlm
 
 
 
 
src
 
 
 
 

TransTQA data and code

This repository contains the code package for the EMNLP'20 paper:

A Technical Question Answering System with Transfer Learning. Wenhao Yu (ND), Lingfei Wu (IBM), Yu Deng (IBM), Ruchi Mahindru (IBM), Qingkai Zeng (ND), Sinem Guven (IBM), Meng Jiang (ND).

System overall and Framework

Figure: High level architecture of our proposed TransTQA system. First, the pre-trained ALBERT model is fine tuned with unstructured source technical corpus with masked language model (MLM) task, i.e., θ → θ+. Second, a siamese ALBERT takes fine tuned ALBERT and fine tunes with source technical QA, i.e., θ+ → θ++. Third, the siamese ALBERT further fine tunes with target QA, i.e., θ++ → θ+++. Our deployed system takes θ+++. Given a query, our system first calculates similarity scores between the query and each candidate answer, then ranks all scores from highest to lowest. Finally, the system returns top-3 ranked answers.

Environment settings

A detailed dependencies list can be found in requirements.txt and can be installed by:

pip install -r requirements.txt

If you want to run with fp16, you need to install Apex

Run the code

For fine-tuning BERT masked language model on tech corpus (before you run the model, you have to first download the technical corpus from here and put it into mlm/corpus folder ):

./script/run_mlm.sh

For pre-training the model (we use askubuntu for technical domain QA pre-training as default):

./script/run_pretrain.sh

For model transfer learning (we have two target datasets: stackunix and techqa (ACL 2020)):

./script/run_transfer.sh

Note that you should specify the path of pre-trained model and dataset.

Citation

If you find this repository useful in your research, please consider to cite our paper:

@inproceedings{yu2020technical,
  title={A Technical Question Answering System with Transfer Learning},
  author={Yu, Wenhao and Wu, Lingfei and Deng, Yu and Mahindru, Ruchi and Zeng, Qingkai and Guven, Sinem and Jiang, Meng},
  booktitle={Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
  year={2020}
}

About

Author: Wenhao Yu (wyu1@nd.edu). EMNLP'20. Transfer Learning for Technical Question Answering.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published