Dior-CVAE: Pre-trained Language Models and Diffusion Priors for Variational Dialog Generation

This is the official code for the EMNLP Findings paper Dior-CVAE: Pre-trained Language Models and Diffusion Priors for Variational Dialog Generation by Tianyu Yang, Thy Thy Tran, Iryna Gurevych

Please use the following citation:

@inproceedings{yang-etal-2023-dior-cvae,
  title     = {Dior-CVAE: Pre-trained Language Models and Diffusion Priors for Variational Dialog Generation},
  author    = {Yang, Tianyu and Tran, Thy Thy and Gurevych, Iryna},
  booktitle = "Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing",
  year = "2023",
  publisher = "Association for Computational Linguistics",
  url = "https://aclanthology.org/",
}

Abstract: Current variational dialog models have employed pre-trained language models (PLMs) to parameterize the likelihood and posterior distributions. However, the Gaussian assumption made on the prior distribution is incompatible with these distributions, thus restricting the diversity of generated responses.These models also suffer from posterior collapse, i.e., the decoder tends to ignore latent variables and directly access information captured in the encoder through the cross-attention mechanism. In this work, we propose Dior-CVAE, a hierarchical conditional variational autoencoder (CVAE) with diffusion priors to address these challenges. with an informative prior parameterized by a diffusion model. We employ a diffusion model to increase the complexity of the prior distribution and its compatibility to the distributions produced by a PLM. Also, we propose memory dropout to the cross-attention mechanism, which actively encourages the use of latent variables for response generation. Overall, experiments across two commonly-used open-domain dialog datasets show that our method can generate more diverse responses without large-scale dialog pre-training.

Contact person: Tianyu Yang, yang@ukp.tu-darmstadt.de

https://www.ukp.tu-darmstadt.de/

https://www.tu-darmstadt.de/

Don't hesitate to send us an e-mail or report an issue, if something is broken (and it shouldn't be) or if you have further questions.

This repository contains experimental software and is published for the sole purpose of giving additional background details on the respective publication.

Project structure

data/ -- this folder is used to contain all the needed data.
generation_output/ -- this folder is used to contain the generation results.
tensorboard/ -- this folder is used to contain the tensorboard record.
preprocess/ -- this folder contains the code for preprocessing the data.
improved_diffusion/ -- this folder contains the code the diffusion model.
diffusion/ -- this folder contains the code for the denoising network.
checkpoints/ -- this folder is used to save the checkpoint weights during the training process

Requirements

Denpendent packages information can be seen in the requirements.txt
Install the environment through
```
pip install -r requirements.txt
```

Running the experiments

Prepare the public datasets

You need to download and preprocess the dataset in the following steps:

Download the datasets, including DailyDialog and PersonaChat
```
bash preprocess/get_data.sh
```
Preprocess the dataset coarsely
```
bash preprocess/process.sh
```
Preprocess the dataset into jsonl format
```
python preprocess.py
```
Note to adapt the relevant path in the code.

Prepare the custom Dataset

You need to prepare the dataset in the format of jsonl. Each line is a json like:

{'source': The dialog context, 'target': The response}

Training

Run the codes with:

python main.py --train_file [path to training set] --valid_file [path to valid set] --dataset_type wp --per_gpu_train_batch_size 16 --model_name [config info of this training] --cycle_annealing --diffusion_prior --pretrained_model facebook/bart-base --bart

Generation

For beam search (the number of beams is default as 10), run:

python main.py --generation --test_file [path to test set] --model_name [config info of training] --load_epoch [the number of epoch to load] --num_beams 10 --diffusion_prior --pretrained_model facebook/bart-base --bart

For greedy decoding, run:

python main.py --generation --test_file [path to test set] --model_name [config info of training] --load_epoch [the number of epoch to load] --greedy_decoding --diffusion_prior --pretrained_model facebook/bart-base --bart

For top-k, top-p sampling, run:

python main.py --generation --test_file [path to test set] --model_name [config info of training] --load_epoch [the number of epoch to load] --top_k 50 --top_p 0.9 --diffusion_prior --pretrained_model facebook/bart-base --bart

Evaluation

Run the evaluation code
```
python evaluation.py
```
Note to adapt the relevant path in the code.

Reproduce

The information of the device used in the experiments is as below

Linux version 5.15.0-83-generic (buildd@lcy02-amd64-027) (gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, GNU ld (GNU Binutils for Ubuntu) 2.38) #92-Ubuntu SMP Mon Aug 14 09:30:42 UTC 2023
GPU: Tesla V100-SXM3-32GB
Mem: 1510G
CPU: Intel(R) Xeon(R) Platinum 8168 CPU @ 2.70GHz

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
__pycache__		__pycache__
checkpoints		checkpoints
data		data
diffusion		diffusion
generation_output		generation_output
improved_diffusion		improved_diffusion
preprocess		preprocess
tensorboard		tensorboard
util		util
.gitignore		.gitignore
.main.py.swp		.main.py.swp
Diffusers.py		Diffusers.py
LICENSE		LICENSE
LLM_Prompt.py		LLM_Prompt.py
NOTICE.txt		NOTICE.txt
README.md		README.md
dataset.py		dataset.py
dist.py		dist.py
eval.py		eval.py
evaluation.py		evaluation.py
inference.py		inference.py
llm_dd.py		llm_dd.py
llm_pc.py		llm_pc.py
main.py		main.py
model.py		model.py
modeling_bart.py		modeling_bart.py
modeloutput.py		modeloutput.py
preprocess.py		preprocess.py
requirements.txt		requirements.txt
train.py		train.py
train_utils.py		train_utils.py

License

UKPLab/dior-cvae

Folders and files

Latest commit

History

Repository files navigation

Dior-CVAE: Pre-trained Language Models and Diffusion Priors for Variational Dialog Generation

Project structure

Requirements

Running the experiments

Prepare the public datasets

Prepare the custom Dataset

Training

Generation

Evaluation

Reproduce

Reference

About

Resources

License

Stars

Watchers

Forks

Languages