Controllable Open-ended Question Generation with A New Question Type Ontology

Code for ACL 2021 paper "Controllable Open-ended Question Generation with A New Question Type Ontology".

Raw data

Our Yahoo dataset is based on the Yahoo Answer L6 dataset. After obtaining the license for the L6 dataset, please email Shuyang (caoshuy@umich.edu) with the proof of license attached to obtain the Yahoo dataset.

Environment

Our experiments are based on PyTorch 1.7.0 and Fairseq at commit 0db28cd with a simple edit. Newer versions of Fairseq might also work. For graph neural networks, we use PyTorch-Geometric 1.7.2.

# virtual environment
conda create -n open_ended_qg python=3.7
conda activate open_ended_qg

# install pytorch
conda install pytorch==1.7.0 torchvision==0.8.0 torchaudio==0.7.0 cudatoolkit=10.2 -c pytorch

# install fairseq, note that you need to follow the instructions in fairseq/README.md 
# to install other dependencies (e.g., apex for training)
cd lib/fairseq
pip install -e .
# fix hydra error
pip install hydra-core==1.0.7

# install torch-geometric
pip install torch-scatter==2.0.7 -f https://data.pyg.org/whl/torch-1.7.0+cu102.html
pip install torch-sparse==0.6.9 -f https://data.pyg.org/whl/torch-1.7.0+cu102.html
pip install torch-geometric==1.7.2

Note: we use AllenNLP during data processing, which requires a different PyTorch version. Please use a different virtual environment for AllenNLP.

Data Preprocess

Preprocessed binarized Reddit data can be downloaded from here.

For data preprocessing, please refer to the README in data_preprocess.

Run our models

Please download the generation models from here and put them under $MODEL/generation_models. The binarized dataset should be under $DATA/binarized_data.

To convert the fairseq generation output to text, use convert_output.py:

python convert_output.py --generate-dir <result_dir>

JointGen

cd gen_scripts
./jointgen.sh $DATA/output/jointgen

ExplGen

cd gen_scripts
./explgen.sh $DATA/output/explgen

TplGen

cd gen_scripts
./tplgen_question_generation.sh $DATA/output/tplgen_question

ExplGen: conditioned on top 9 types

cd gen_scripts
./explgen_9types.sh $DATA/output/explgen_9types

TplGen: conditioned on top 9 types

cd gen_scripts
./tplgen_question_generation_9types.sh $DATA/output/tplgen_question_9types

Train our models

Please set BART_PATH as the path to the bart.large model, which can be downloaded here.

export BART_PATH=<path_to_bart_large_dir>/model.pt

JointGen

cd train_scripts
CUDA_VISIBLE_DEVICES=0,1 ./jointgen.sh $BART_PATH $MODEL/jointgen

ExplGen

cd train_scripts
CUDA_VISIBLE_DEVICES=0,1 ./explgen.sh $BART_PATH $MODEL/explgen

TplGen: template generation

cd train_scripts
CUDA_VISIBLE_DEVICES=0,1 ./tplgen_template_generation.sh $BART_PATH $MODEL/tplgen_template_generation

TplGen: question generation

cd train_scripts
CUDA_VISIBLE_DEVICES=0,1 ./tplgen_question_generation.sh $BART_PATH $MODEL/tplgen_question_generation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Controllable Open-ended Question Generation with A New Question Type Ontology

Raw data

Environment

Data Preprocess

Run our models

JointGen

ExplGen

TplGen

ExplGen: conditioned on top 9 types

TplGen: conditioned on top 9 types

Train our models

JointGen

ExplGen

TplGen: template generation

TplGen: question generation

Files

README.md

Latest commit

History

README.md

File metadata and controls

Controllable Open-ended Question Generation with A New Question Type Ontology

Raw data

Environment

Data Preprocess

Run our models

JointGen

ExplGen

TplGen

ExplGen: conditioned on top 9 types

TplGen: conditioned on top 9 types

Train our models

JointGen

ExplGen

TplGen: template generation

TplGen: question generation