Skip to content

convei-lab/BotsTalk

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BotsTalk: Machine-Sourced Framework for Automatic Curation of Large-scale Multi-skill Dialogue Datasets

figure

Official Pytorch implementation of our EMNLP paper:
Minju Kim*, Chaehyeong Kim*, Yongho Song*, Seung-won Hwang and Jinyoung Yeo. BotsTalk: Machine-Sourced Framework for Automatic Curation of Large-scale Multi-skill Dialogue Datasets. EMNLP, 2022 [Paper] (* equal contribution)

Reference

If you use the materials in this repository as part of any published research, we ask you to cite the following paper:

@inproceedings{Kim2022botstalk,
  title={BotsTalk: Machine-Sourced Framework for Automatic Curation of Large-scale Multi-skill Dialogue Datasets},
  author={Kim, Minju and Kim, Chaehyeong and Song, Yongho, Seung-won Hwang and Yeo, Jinyoung},
  booktitle={EMNLP},
  year=2022
}

BSBT dataset

You can download the paper version of our BSBT dataset here.

Running Experiments

Building BSBT dataset

python scripts/self_mix.py \
--subtasks convai2,wizard_of_wikipedia,empatheticdialogues \
--num-self-mixs 5 \
--selfmix-max-turns 6 \
--datatype train \
--expert-model-files zoo:dodecadialogue/convai2_ft/model,zoo:dodecadialogue/wizard_of_wikipedia_ft/model,zoo:dodecadialogue/empathetic_dialogues_ft/model \
--expert-model-opt-files opt_files/conv.opt,opt_files/wow.opt,opt_files/ed.opt \
--display-examples True \
--task convai2 --seed_messages_from_task 1 \
--model-file zoo:dodecadialogue/convai2_ft/model \
--skip-generation False --inference nucleus \
--beam-size 3 \
--beam-min-length 10 --beam-block-ngram 3 --beam-context-block-ngram 3 \
--save-format parlai \
--ranker-model-files zoo:pretrained_transformers/model_poly/model,/your_path/empathetic_dialogues_poly/model.checkpoint,/your_path/wizard_of_wikipedia_poly/model.checkpoint \
--outfile your_path/output/test_files.txt

Have any question?

Please contact Minju Kim at minnju@yonsei.ac.kr.

License

This repository is MIT licensed. See the LICENSE file for details.

About

🤖 Code for our EMNLP 2022 paper: "BotsTalk: Machine-sourced Framework for Automatic Curation of Large-scale Multi-skill Dialogue Datasets"

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published