No description, website, or topics provided.
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
convokit
datasets
doc
examples
website
.gitignore
Coordination_README.html
Coordination_README.md
Data_format.md
LICENSE.md add MIT license Aug 13, 2018
Makefile
Politeness_README.md
QuestionTypology_README.html
QuestionTypology_README.md
README.md
requirements.txt fix typo in reqs Aug 13, 2018
setup.py

README.md

Cornell Conversational Analysis Toolkit (ConvoKit)

This toolkit contains tools to extract conversational features and analyze social phenomena in conversations. Several large conversational datasets are included together with scripts exemplifying the use of the toolkit on these datasets.

The toolkit currently implements features for:

Datasets

These datasets are included for ready use with the toolkit:

  • Conversations Gone Awry Corpus: a collection of conversations from Wikipedia talk pages that derail into personal attacks (1,270 conversations, 6,963 comments)

  • Tennis Corpus: transcripts for tennis singles post-match press conferences for major tournaments between 2007 to 2015 (6,467 post-match press conferences)

  • Wikipedia Talk Pages Corpus: collection of conversations from Wikipedia editors' talk pages

  • Supreme Court Corpus: collection of conversations from the U.S. Supreme Court Oral Arguments

  • Parliament Corpus: parliamentary question periods from May 1979 to December 2016 (216,894 question-answer pairs)

These datasets can be downloaded using the convokit.download() helper function. Alternatively you can access them directly here.

Data format

To use the toolkit with your own dataset, it needs to be in a standard json format.

Installation

This toolkit requires Python 3.

  1. Download the toolkit: pip3 install convokit
  2. Download Spacy's English model: python3 -m spacy download en

Alternatively, visit our Github Page to install from source.

Usage

See the example ipython notebooks linked above to familiarize yourself with how to use the different modules of the toolkit. The basic process is:

  1. import convokit into your python3 project.
  2. Load a corpus of conversations using corpus = convokit.Corpus(filename=...); use your own corpus or one of the ones provided with the toolkit.
  3. Use convokit functionality to extract features from the conversations, for example ps = convokit.PolitenessStrategies(corpus) extracts the politeness strategies used in all the conversations.
  4. Have fun analyzing coversations.

Documentation

Documentation is hosted here.

The documentation is built with Sphinx (pip3 install sphinx). To build it yourself, navigate to doc/ and run make html.

Acknowledgements

Andrew Wang (azw7@cornell.edu) wrote the Coordination code and the respective example script, wrote the helper functions and designed the structure of the toolkit.

Ishaan Jhaveri (iaj8@cornell.edu) refactored the Question Typology code and wrote the respective example scripts.

Jonathan Chang (jpc362@cornell.edu) wrote the example script for Conversations Gone Awry.

ConvoKit