Releases: CornellNLP/ConvoKit
ConvoKit Version 4.1.1
We’re excited to release ConvoKit 4.1.1, which adds support for Gemini API keys in the GenAI module.
What’s New
Added support for Gemini API keys in the GenAI module: #347
ConvoKit Version 4.1.0
We're excited to release ConvoKit 4.1.0, featuring a revamped convokit website, new WikiConv multilingual datasets and five other datasets, and added examples for recently released features.
1. ConvoKit Website Revamp
We have completely redesigned the main ConvoKit website to better accommodate the ever growing list of datasets and features available. The new site includes functionality like searching for datasets and features by tags and light/dark mode support.
2. Multilingual WikiConv
We have added datasets from the German, Russian, Chinese and Greek versions of Wikipedia Talk Pages, in addition to the existing English WikiConv dataset. For more information, check out the corpuses on the WikiConv page, which contains updated download instructions.
3. Additional Datasets
Thanks to the contribution of students from CS 6742, we also have five new datasets covering a wide range of domains and conversation dynamics. More information on these corpuses can be found on our website and documentation site.
- New datasets:
4. New Examples
We have added new examples for Redirection and Utterance Likelihood, Talk Time Sharing Dynamics, Summary of Conversation Dynamics (SCD), ConDynS, and Pivotal Moments to our examples page. Check them out here.
ConvoKit Version 4.0.0
We're excited to release ConvoKit 4.0.0, featuring major enhancements that bring LLM-powered analysis capabilities to conversational data processing.
1. GenAI Module: LLM Prompt Transformer
The new GenAI module introduces LLMPromptTransformer, a flexible transformer that enables users to apply LLM prompts for arbitrary tasks with seamless integration between conversational elements. With the new module, it is easy to:
- Apply prompts at multiple levels: utterances, conversations, speakers, or entire corpus
- Support for multiple LLM providers (OpenAI GPT, Google Gemini, local models)
- Having unified configuration management for API keys and model settings
- Store LLM responses directly as metadata on corpus objects
For more information, check GenAI module guide.
2. Summary of Conversation Dynamics (SCD)
We introduce SCD Transformer, a transformer that generates structured summaries of conversational dynamics using the LLM Prompt Transformer. For more detailed information, check page SCD.
3. Conversation Dynamics Similarity (ConDynS)
We present ConDynS, a similarity measure for comparing conversations with respect to their dynamics, as introduced in "A Similarity Measure for Comparing Conversational Dynamics". For more information, check page on ConDynS
ConvoKit 3.5.0
We are excited to release ConvoKit version 3.5.0, which introduces the new TalkTimeSharing Transformer, a module for analyzing how talk-time evolves between speakers throughout a conversation—capturing both overall balance and moment-to-moment dynamics. The release includes a demo applying the method to different conversational datasets. Please see PR #276 for details!
ConvoKit Version 3.4.0 and 3.4.1
We’re excited to announce ConvoKit 3.4.0, which includes:
- New
TransformerDecoderModelandTransformerEncoderModelclasses, enabling the use of LLMs for forecasting tasks. - An expanded CGA-CMV corpus, now containing 19,578 conversations and 116,793 comments.
Version 3.4.1 addresses issues reported by @kakeith, including MacOS installation problems caused by unsupported dependency packages and a download error for the FOMC Corpus. This update introduces:
- A new optional installation target for LLM-related packages required by certain transformers (currently Forecaster, Redirection, Pivotal Moments, and Utterance Simulator). These can be installed via:
pip install 'convokit[llm]'.
We appreciate the efforts of all contributors in making this release, and thank @kakeith for raising the issue that helped us improve!
ConvoKit Version 3.3.0
We are excited to release version 3.3.0 where we updates on the Supreme Court Corpus with merged and cleaned utterance data covering 1955–2023, incorporating newly added transcripts from 2019–2023. Extensive validation and manual checks were performed to ensure the dataset's integrity. For full details, see the pull request.
ConvoKit Version 3.2.0
We are excited to announce the release of ConvoKit 3.2.0! This version introduces a framework for identifying pivotal moments in conversations. We also release a demonstration of the framework via this notebook. For more information, please check the PR for the new features.
ConvoKit Version 3.1.0
We are excited to announce the release of ConvoKit 3.1.0! This version introduces a framework for measuring redirection in conversation flow, as described in this paper. We also release a demonstration of the framework on Supreme Court oral arguments via Google Colab. In addition to redirection, we provide a generalized transformer for annotating utterance-level likelihoods given a defined conversation context. For more information, check the PR for the new features #250.
ConvoKit Version 3.0.2
We are excited to release ConvoKit 3.0.2! This minor update resolves installation issues related to older versions of SciPy by updating the package dependency to require a more recent version. We found Google Colab, with its pre-loaded packages at runtime, may still result in errors. This can be resolved by restarting the session and re-running the code blocks to ensure the correct package versions are imported. For more details, please refer to our Troubleshooting page and pull request #257.
ConvoKit Version 3.0.1
We are excited to announce the release of ConvoKit 3.0.1, which focuses on bug fixes, adding new datasets, and dependency upgrades. Key updates include:
- Fixed issue with ConvoKit's download method that prevented datasets from being downloaded to the configured directory.
- Fixed the support for downloading non-corpus objects
- Updated the conversational forecasting transformer to make it more flexible
- Added five new datasets, with documentation available on our website and documentation site.
- Addressed compatibility issues related to Numpy by building against Numpy 2.0+ and upgrading dependency packages accordingly.
We address some potential issues on our Troubleshooting page, especially with Numpy. If you encounter any issues, feel free to join our Discord community for more support, or submit an issue on GitHub. Thank you!
Notice that we no longer support Python 3.8 (EOL) and 3.9 (not supported by Numpy 2.0.0+).
You can refer to the following pull requests for more details:
-
Fixing bugs:
-
New datasets:
-
Dependency packages:
Contributors:
- Kaixiang Zhang (Sean)
- Ethan Xia
- Yash Chatha
- Laerdon Yah-Sung Kim
- Jonathan P. Chang