TOAD

This software project accompanies the research paper, TOAD: Task-Oriented Automatic Dialogs with Diverse Response Styles. This paper has been accepted by ACL 2024.

TOAD is a synthetic TOD dataset that simulates realistic app context interactions and provides multiple system response styles (verbosity & mirroring user expressions).

Run Data Synthesis

Preparation:

Install dependencies from requirements.txt.
We use OpenAI Compatible API to make requests to LLMs. Set the environment variable OPENAI_API_KEY, BASE_URL (optional) and ENGINE (e.g. "gpt-3.5-turbo") to config the backend LLM. You can use a dotenv file.

Synthesis: The data synthesis pipeline is divided into 3 steps. The generated files will be stored in data/.

Step 1: Context generation

Run python -m context_generation.occupation_generator to synthesize occupations.json (you can skip this step and re-use the existing file).
Run python -m context_generation.persona_generator to synthesize personas.jsonl using occupations.
Run python -m context_generation.context_generator to synthesize contexts.jsonl using personas.

Step 2: Dialog generation

Run code in dialog_generation to synthesize dialogs based on contexts. Example command:

python -m dialog_generation.main \
    --phenomena='compound' \
    --output_dir='data/dialogs' \
    --number_of_data=1000 \
    --full_options_mode \
    --thread_num=15

--phenomena specifies the phenomena to be used in dialog generation. It can be one of compound, compositional, none.
--output_dir specifies the path to save the generated dialogs.
--number_of_data specifies the number of dialogs to generate.
--full_options_mode asks for generating of all 6 response style options.
--thread_num specifies the number of threads to run in parallel.

For how to customize dialog generation by modifying the schema.json, please refer to the documentation in that directory.

Step 3: Quality control

Run python -m quality_control.main to filter out inconsistent dialogs using the LLM.

Citation

@inproceedings{liu2024toad,
    title = "{TOAD}: Task-Oriented Automatic Dialogs with Diverse Response Styles", 
    author = "Liu, Yinhong  and
      Fang, Yimai  and
      Vandyke, David  and
      Collier, Nigel",
    booktitle = "Findings of the Association for Computational Linguistics: ACL 2024",
    year = "2024",
    url = "https://arxiv.org/abs/2402.10137"
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
context_generation		context_generation
data		data
dialog_generation		dialog_generation
quality_control		quality_control
resources		resources
utilities		utilities
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TOAD

Run Data Synthesis

Citation

About

Releases

Packages

Contributors 2

Languages

License

apple/ml-toad

Folders and files

Latest commit

History

Repository files navigation

TOAD

Run Data Synthesis

Citation

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages