Skip to content

A Corpus of Controlled Opinionated and Knowledgeable Movie Discussions for Training Neural Conversation Models

License

Notifications You must be signed in to change notification settings

clp-research/komodis-dataset

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

KOMODIS dataset

This is the repository for the paper: "A Corpus of Controlled Opinionated and Knowledgeable Movie Discussions for Training Neural Conversation Models". The paper can be found here: http://arxiv.org/abs/2003.13342.

We introduce an augmented dialogue dataset (Knowledgable and Opinionated MOvie DIScussions) that is crowd-sourced and collected with Amazon Mechanical Turk. Each dialogue is based on two feature structures (one for each crowd-worker) about the same movie: example_dialogues

Dialogue examples

For detailed information please check the paper. Below are two dialogue examples.

Data

We provide the full postprocessed dialogue dataset in data/dataset.json.

For explanations on how to read and use the structured data, please check data/example.json (will be updloaded soon!).

Model

We provide a baseline script to train a GPT-2 model with our dataset in PyTorch in model/.

To train a model, you have to run the train.py script:

python train.py --dataset komodis

More information regarding additional arguments can be found in the script. Please download the pretrained GPT-2 weights from https://github.com/huggingface/transformers and store them in data/pretrained_models/gpt2/ and data/pretrained_weights/tokenizers.

About

A Corpus of Controlled Opinionated and Knowledgeable Movie Discussions for Training Neural Conversation Models

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%