Skip to content

Code for NAACL 2022 paper "Learning as Conversation: Dialogue Systems Reinforced for Information Acquisition". Using self-play and reinforcement learning to train a dialogue agent which aims at conveying knowledge to end user.

Notifications You must be signed in to change notification settings

IBM/reinforced-dialog-system-for-learning

Repository files navigation

Reinforced Dialog System For Learning

This is the repo for the NAACL 2022 paper Learning as Conversation: Dialogue Systems Reinforced for Information Acquisition.

Build the environment

We recommend creating a CONDA environment by

conda env create -f conda_env.yml

Also, following This Github Repo, You have to copy the patch provided in patch folder to the desired location, i.e. find out the path where the transformers library is installed and replace the original generation_utils.py file in the transformers library with the patch/generation_utils.py file.

You may choose to download the preprocessed datasets, or build it yourself from scratch.

Download preprocessed datasets

###Process the datasets yourself from scratch

Please prepare the data in another directory (e.g. you may name it Talk_ ) under the same parent directory

mkdir -p ../Talk_/data/WoW-raw
cd ../Talk_/data/WoW-raw
wget http://parl.ai/downloads/wizard_of_wikipedia/wizard_of_wikipedia.tgz
tar zxvf wizard_of_wikipedia.tgz 
mv valid_random_split.json dev.json

These commands will download and decompress the Wizard of Wikipedia dataaset which would be used to pre-tune our teacher and student bots.

You may continue to build other folders under the Talk_ directory, which would be used to save hyper-parameters, model dumps and logs.

cd ../Talk_
mkdir -p za/args
mkdir saved_models
mkdir logs

The downloaded Wizard of Wikipedia dataset may miss some information, please refer to

scripts/prepare_data/load_wikipedia_into_mysql.py 

to build up a Mysql database for Wikipedia (Please revise the code to fit your mysql setting)

When building the dataset using scripts in scripts/prepare_data/prepare_wow_wiz_app, the script would utilized the Wikipedia database to fill up the missing information

Use the following script to prepare data to pre-tune the wizard model and the apprentice model

python scripts/prepare_data/prepare_wow_wiz_app/prepare_wow_1.1.py

To build datasets for RL piloted fine-tune (Wikipedia, CNN-DailyMail, Paper Abstracts), please refer to scripts in the following folder:

scripts/prepare_data/prepare_finetune_datasets

To build coherence evaluation datasets (WoW-coherence), run the following python file:

python shell/prepare_data/prepare_coh-1.5.py

Our pre-trained and fine-tuned model dumps

You may train your own models following two-phase procedures:

Phrase 1: Pre-tune

To pre-tune the wizard model, run the following command line:

python shell/train_wiz.sh

To pre-tune the apprentice model, run the following command line:

python shell/train_app.sh

To train the coherence evaluation model on WoW-coherence dataset, run the following command line:

python shell/train_coh.sh

Phrase 2: RL piloted fine tuning

To fine-tune the selector model using RL, run the following command line:

python shell/rl_self_play.sh

You may revise the train_file_rl and validation_file_rl parameters to selects fine-tune datasets.

The fine-tuning could take to up to two days on a songle A-100 GPU.

About

Code for NAACL 2022 paper "Learning as Conversation: Dialogue Systems Reinforced for Information Acquisition". Using self-play and reinforcement learning to train a dialogue agent which aims at conveying knowledge to end user.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published