LUCID

This software project accompanies the research paper LUCID: LLM-Generated Utterances for Complex and Interesting Dialogues.

LUCID is a highly automated, LLM-driven data generation system for task-oriented dialogues. LUCID aims to produce realistic, diverse and challenging conversations, with highly accurate labels. LUCID takes a modularised approach to data generation, compartmentalising the data generation task into manageable steps that an LLM can consistently perform accurately. For more details, please see our paper.

This repo contains the code for the data generation system (which can be used to generate more data), the data we have already generated for our paper (LUCIDv1.0), and the code for our baseline models.

Documentation

Getting Started

Step 1: Generating intents

To create new intents from a description:

Open lucid_generate_data/run_scripts/create_intents_from_description.py
In this file, update INTENTS, a dictionary containing domains, and the desired intent descriptions within each domain
Once finished, run the .py file from the root directory (_** python lucid_generate_data/run_scripts/create_intents_from_description.py**)
The new intents will be generated in lucid_generate_data/intents_for_data_generation

Step 2: Generating conversations

a Open lucid_generate_data/run_scripts/run_conversations.py

Inside this file, decide now many conversations to generate per intent (CONVS_PER_INTENT), the maximum number of intents for a conversation (MAX_INTENTS_IN_CONVERSATION)
You also need to specify the conversational phenomena that you would like for the conversation (UNHAPPY_PATHS). Note that for the data generated for the paper, these were randomly sampled for each conversation (with either 0, 1 or 2 unhappy paths per conversation.
Your saved conversations will be stored in lucid_generate_data/saved_conversations

Step 3: Data formatting and post-processing

To assemble your generated conversations into your final dataset, run lucid_generate_data/compile_data.py
Your final dataset will be called LUCID_data.json

Step 4: Running our baseline model

To run the LUCID baselines, please use: python running_baseline/run_llm.py

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
lucid_generate_data		lucid_generate_data
lucid_v1.0		lucid_v1.0
running_baseline		running_baseline
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lucid_generate_data

lucid_generate_data

lucid_v1.0

lucid_v1.0

running_baseline

running_baseline

CODE_OF_CONDUCT.md

CODE_OF_CONDUCT.md

CONTRIBUTING.md

CONTRIBUTING.md

LICENSE

LICENSE

README.md

README.md

pyproject.toml

pyproject.toml

Repository files navigation

LUCID

Documentation

Getting Started

Step 1: Generating intents

Step 2: Generating conversations

Step 3: Data formatting and post-processing

Step 4: Running our baseline model

About

Releases

Packages

Languages

License

apple/ml-lucid-datagen

Folders and files

Latest commit

History

Repository files navigation

LUCID

Documentation

Getting Started

Step 1: Generating intents

Step 2: Generating conversations

Step 3: Data formatting and post-processing

Step 4: Running our baseline model

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Languages