-
Notifications
You must be signed in to change notification settings - Fork 8
Release notes
Release date: Wednesday August 14th, 2024
Version 2.2 introduces new capabilities to improve speaker labeling during speaker separation. It also introduces new features for preprocessing videos with multiple speakers to better support downstream facial expressivity and emotional expressivity features.
If you have feedback or questions, please bring it up in the Discussion tab.
Contributors
WillisDiarize v1.0
A new function was added for correcting speaker labeling errors after speech transcription. This function takes the JSON file of a transcript as input, passes it through an ensemble LLM model, and outputs the corrected JSON file.
WillisDiarize AWS v1.0
WillisDiarize AWS performs the same task as the previous function. However, it is best suited for users that are operating within their own EC2 instance. This function assumes the user has already deployed the WillisDiarize model as a SageMaker endpoint (see Getting Started page for instructions).
Speech transcription with AWS v1.2 / Speech transcription with Whisper v1.2
Updated to add the option of implementing the WillisDiarize functions to correct speaker labeling errors prior to creating the JSON output file.
Speech characteristics v3.1
Added functionality to only compute sets of speech coherence variables if desired, to avoid unnecessary computational burden, using the option
parameter.
Vocal acoustics v2.1
Updated to include the option to calculate framewise summary statistics only for voiced segments longer than 100ms.
Video preprocessing for faces v1.0
This function adds preprocessing capabilities for video files containing the face of more than one individual. For contexts such as video calls and recordings of clinic visits, this function detects unique faces; the output can be used to apply the facial_expressivity
and emotional_expressivity
functionings to a unique face in a video
Video cropping v1.0
This function, designed to be used in conjunction with preprocess_face_video
, allows the user to adjust parameters related to cropping and trimming videos to extract frames for each unique face.
General updates
Updated Pyannote from 3.0.0 to 3.1.1 to match WhisperX dependencies.
Release date: Thursday March 21st, 2024
Version 2.1 adds new measures in vocal acoustics and speech characteristics analyses, specifically relating to MDD, Schizophrenia and Parkinson's Disease. It also increases support for speaker identification (in speech transcription functions) for more clinical interview scripts.
If you have feedback or questions, please bring it up in the Discussion tab
Contributors
Speech characteristics v3.0
New measures based on recent reports in the scientific literature on speech characteristics associated with schizophrenia and depression; now includes variables such as speech coherence, sentence tangentiality, semantic perplexity, and improved measurement of parts of speech.
Vocal acoustics v2.0
New measures are added that are grouped in several categories:
- Relative variations and durations of pauses related to Parkinson’s Disease
- Depression related cepstral variables
- Vocal tremor variables (to be run in sustained vowel phonation)
- Advanced variables in Normalized Amplitude Quotient (NAQ), opening quotient (OQ) and Harmonic Richness Factor (HRF)
Added functionality to only compute sets of variables if desired, to avoid unnecessary computational burden, using the option
parameter.
Removed:
- Min/max features, which we noticed were not useful or interpretable
- Pause characteristics (these were redundant with speech characteristics)
Speech Transcription with AWS v1.1/ Speech Transcription with Whisper v1.1
Added more clinical interview support for:
- HAM-A, conducted in accordance with Hamilton Anxiety Rating Scale (SIGH-A)
- CAPS past week conducted in accordance with DSM-5 (CAPS-5) Past Week Version
- CAPS past month conducted in accordance with DSM-5 (CAPS-5) Past Month Version
- CAPS DSM IV conducted in accordance with Clinician-Administered PTSD Scale For DSM-IV.
- MINI conducted in accordance with Version 7.0.2 for DSM-5
- CAINS conducted in accordance with CAINS (v1.0)
Release date: Monday February 5th, 2024
Version 2.0 adds support for GPS analysis and addresses potential issues caused by the min_turn_length
functionality in the speech characteristics function.
If you have feedback or questions, please bring it up in the Discussions tab
Contributors
General updates
Upgraded requirements, by bumping transformers to version 4.36.0 and downgrading Vosk to version 0.3.44 as to avoid installation issues on MacOS.
Speech characteristics v2.3
Updated logic for calculating variables that are affected when a minimum turn length is specified
GPS analysis v1.0
A new function for GPS analysis was added; it calculates clinically meaningful measures from passively collected GPS data. Specifically we measure things such as:
- Time and speed of travel
- Time spent idle
- home related variables, such as time spent at home and maximum distance from home
OpenWillis was developed by a small team of clinicians, scientists, and engineers based in Brooklyn, NY.
- Release notes
- Getting started
-
List of functions
- Video Preprocessing for Faces v1.0
- Create Cropped Video v1.0
- Facial Expressivity v2.0
- Emotional Expressivity v2.0
- Eye Blink Rate v1.0
- Speech Transcription with Vosk v1.0
- Speech Transcription with Whisper v1.2
- Speech Transcription with AWS v1.2
- Speaker Separation with Labels v1.0
- Speaker Separation without Labels v1.0
- WillisDiarize v1.0
- WillisDiarize AWS v1.0
- Vocal Acoustics v2.1
- Speech Characteristics v3.1
- GPS Analysis v1.0
- Research guidelines