This is the official code for the EMNLP Findings paper Dior-CVAE: Pre-trained Language Models and Diffusion Priors for Variational Dialog Generation by Tianyu Yang, Thy Thy Tran, Iryna Gurevych
Please use the following citation:
@inproceedings{yang-etal-2023-dior-cvae,
title = {Dior-CVAE: Pre-trained Language Models and Diffusion Priors for Variational Dialog Generation},
author = {Yang, Tianyu and Tran, Thy Thy and Gurevych, Iryna},
booktitle = "Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing",
year = "2023",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/",
}
Abstract: Current variational dialog models have employed pre-trained language models (PLMs) to parameterize the likelihood and posterior distributions. However, the Gaussian assumption made on the prior distribution is incompatible with these distributions, thus restricting the diversity of generated responses.These models also suffer from posterior collapse, i.e., the decoder tends to ignore latent variables and directly access information captured in the encoder through the cross-attention mechanism. In this work, we propose Dior-CVAE, a hierarchical conditional variational autoencoder (CVAE) with diffusion priors to address these challenges. with an informative prior parameterized by a diffusion model. We employ a diffusion model to increase the complexity of the prior distribution and its compatibility to the distributions produced by a PLM. Also, we propose memory dropout to the cross-attention mechanism, which actively encourages the use of latent variables for response generation. Overall, experiments across two commonly-used open-domain dialog datasets show that our method can generate more diverse responses without large-scale dialog pre-training.
Contact person: Tianyu Yang, yang@ukp.tu-darmstadt.de
https://www.ukp.tu-darmstadt.de/
Don't hesitate to send us an e-mail or report an issue, if something is broken (and it shouldn't be) or if you have further questions.
This repository contains experimental software and is published for the sole purpose of giving additional background details on the respective publication.
data/
-- this folder is used to contain all the needed data.generation_output/
-- this folder is used to contain the generation results.tensorboard/
-- this folder is used to contain the tensorboard record.preprocess/
-- this folder contains the code for preprocessing the data.improved_diffusion/
-- this folder contains the code the diffusion model.diffusion/
-- this folder contains the code for the denoising network.checkpoints/
-- this folder is used to save the checkpoint weights during the training process
-
Denpendent packages information can be seen in the
requirements.txt
-
Install the environment through
pip install -r requirements.txt
You need to download and preprocess the dataset in the following steps:
-
Download the datasets, including
DailyDialog
andPersonaChat
bash preprocess/get_data.sh
-
Preprocess the dataset coarsely
bash preprocess/process.sh
-
Preprocess the dataset into jsonl format
python preprocess.py
Note to adapt the relevant path in the code.
You need to prepare the dataset in the format of jsonl. Each line is a json like:
{'source': The dialog context, 'target': The response}
-
Run the codes with:
python main.py --train_file [path to training set] --valid_file [path to valid set] --dataset_type wp --per_gpu_train_batch_size 16 --model_name [config info of this training] --cycle_annealing --diffusion_prior --pretrained_model facebook/bart-base --bart
-
For beam search (the number of beams is default as 10), run:
python main.py --generation --test_file [path to test set] --model_name [config info of training] --load_epoch [the number of epoch to load] --num_beams 10 --diffusion_prior --pretrained_model facebook/bart-base --bart
-
For greedy decoding, run:
python main.py --generation --test_file [path to test set] --model_name [config info of training] --load_epoch [the number of epoch to load] --greedy_decoding --diffusion_prior --pretrained_model facebook/bart-base --bart
-
For top-k, top-p sampling, run:
python main.py --generation --test_file [path to test set] --model_name [config info of training] --load_epoch [the number of epoch to load] --top_k 50 --top_p 0.9 --diffusion_prior --pretrained_model facebook/bart-base --bart
-
Run the evaluation code
python evaluation.py
Note to adapt the relevant path in the code.
The information of the device used in the experiments is as below
Linux version 5.15.0-83-generic (buildd@lcy02-amd64-027) (gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, GNU ld (GNU Binutils for Ubuntu) 2.38) #92-Ubuntu SMP Mon Aug 14 09:30:42 UTC 2023
GPU: Tesla V100-SXM3-32GB
Mem: 1510G
CPU: Intel(R) Xeon(R) Platinum 8168 CPU @ 2.70GHz