Skip to content

Release notes V1

GeorgiosEfstathiadis edited this page Feb 5, 2024 · 1 revision

OpenWillis v1.6

Release date: Wednesday November 15th, 2023

Version 1.6 brings significant changes leading to flexibility in speech transcription, speaker separation, and subsequent quantification of speech characteristics.

The user is now able to easily choose between different models for speech transcription and separate audio files with multiple speakers regardless of speech transcription model used. The speech characteristics function has been updated to support outputs from any of these routes.

If you have feedback or questions, please reach out.

Contributors

General updates

There are now three separate speech transcription functions: one using Vosk, one using WhisperX, and one using Amazon Transcribe, each with its own pros and cons as described below.

Speech Transcription with Vosk Speech transcription conducted locally on a user’s machine; needs fewer computational resources but is less accurate
Speech Transcription with Whisper Speech transcription conducted locally on a user’s machine; needs greater computational resources but is more accurate
Speech Transcription with AWS Speech transcription conducted via the Amazon Transcribe API; requires (typically) paid access to the API and AWS resources

The first speech transcription function does not provide speaker labels, whereas the latter two do. Consequently, there are now two different speaker separation functions:

Speaker Separation without Labels Separates a source audio file with multiple speakers into individual audio files, one for each speaker, without needing prior labeling of speakers in the transcript JSON
Speaker Separation with Labels Separates a source audio file with multiple speakers into individual audio files, one for each speaker, assuming prior labeling of speakers in the transcript JSON

Finally, the Speech Characteristics function has been updated to support JSON transcripts from each of the speech transcription functions. It also contains bug fixes that were leading to certain variables not being calculated in certain contexts.


OpenWillis v1.5

Release date: Thursday Oct 5th, 2023

Version 1.5 brings refined methods for speech transcription and speaker separation. OpenWillis is now able to use Whisper for speech transcription. This integration ensures consistent transcription accuracy, whether processed locally or on cloud-based servers, and introduces support for multiple languages.

If you have feedback or questions, please reach out.

Contributors

General updates

The speech transcription and speaker separation functions have been updated to allow for a processing workflow similar to that of the cloud-based speech transcription and speaker separation functions through the integration of Whisper as one of the transcription models available. This also prompted a revision to the Speech Characteristics function so that it may support JSON files produced by Whisper.

Speech transcription v2.0

The new speech transcription function can now use WhisperX to transcribe speech to text, which can label speakers in the case of multiple speakers and has integrated speaker identification in case of structured clinical interviews.

Speaker separation v2.0

The speaker separation function has been updated to support JSON files with labeled speakers that the user can now obtain by leaning on WhisperX during speech transcription. In this scenario, it simply splits the speakers based on the labels in the JSON file.

Speech characteristics v2.1

The speech characteristics function now supports JSON files acquired through WhisperX. All output variables remain the same.


OpenWillis v1.4

Release date: September 26th, 2023

Version 1.4 includes support for analysis of MADRS clinical interview recordings and addresses known bugs associated with certain functions in the previous release.

If you have feedback or questions, please reach out.

Contributors

General updates

  • Updated requirements.txt to address issue with the eye blink rate v1.0 function
  • The summary output in all functions is now one row to maintain consistency across functions; the facial expressivity and emotional expressivity functions were updated to achieve this.

Facial expressivity v2.0

The facial expressivity function now allows for more nuance when analyzing facial expressivity in specific areas of the face. Previously, the function would output expressivity either in one of several hundred landmarks or in the face as a whole. The new version now has variables for overall expressivity in the lower face, upper face, lips, and eyebrows. It also has a mouth open vs. closed indicator.

We hope that these updates make the function more useful to researchers studying behaviors such as flattened affect in schizophrenia or facial tremors in motor disorders such as Parkinson’s Disease.

Speech transcription cloud v1.1

The speech transcription cloud function was previously able to identify each speaker as a clinician or a participant in addition to simply labeling them as speaker0 or speaker1. This was only supported if the audio file contained a recording of a PANSS interview. The function has now been updated to support MADRS interviews that follow the Structured Interview Guide for the MADRS (SIGMA).

Speaker separation v1.1

For researchers using the local workflow for speech transcription and speaker separation rather than the cloud-based functions, we have also updated the speaker separation function to support speaker identification for MADRS recordings.


OpenWillis v1.3

Release date: August 17th, 2023

Version 1.3 now supports multi-speaker speech analysis and video-based eye blink detection.

If you have feedback or questions, please reach out

Contributors

General updates

  • Updated versions for dependencies for improved optimization:
    • Tensorflow: 2.9.0 to 2.11.1
    • Protobuf: 3.20.0 to 3.20.2

Version 2.0 of the speech characteristics function processes multi-speaker JSONs, allowing user-selected speaker analysis. Outputs are now segmented by word, phrase, turn, and overall file. Refer to speech transcription cloud v1.0 on how to acquire labeled transcripts.

The new eye blink rate function allows for precise quantification of both basic blink rates and blink characteristics from videos of an individual.

For improved scalability, we’ve isolated speaker separation based on pre-labeled multi-speaker JSONs into its own function. The existing speaker separation v1.1 function will be meant to work on JSONs without speaker labels.


OpenWillis v1.2

Release date: June 14th, 2023

The v1.2 release improves OpenWillis’ speech analysis capabilities and improves processing workflows.

If you have feedback or questions, please do reach out.

Contributors

General updates

  1. For better accessibility, all method description documentation has been moved from Google Docs to the repo’s wiki––a much more appropriate place for it.
  2. The example uses from the notebook included in the code have been moved to the same methods description documents in the wiki, consolidating this information in one place.

Repository updates

We have restructured the folder organization: Functions are now categorized based on the modality of data they process. This will feel more intuitive to independent contributors.

Function updates

We've separated speech transcription into two functions:

  1. Speech transcription v1.1: This uses locally executable models for speech transcription, maintaining the functionality of the previous version of the same method.
  2. Speech transcription cloud v1.0: This new function uses cloud-based models for speech transcription, specifically incorporating Amazon Transcribe. Users must input their own AWS credentials for this. A notable feature of this version is its ability to label speakers in a dual-speaker audio file. In the case of clinical interview recordings, speakers can also be identified as 'clinician' or 'participant', with these labels included in the outputted JSON.

The speaker separation function has been updated to accommodate both transcription workflows:

  1. The locally executable models that separate speakers remain the same, the difference being that they use the JSON output from the speech transcription v1.1 function for improved efficiency.
  2. For when the user employs the speech transcription cloud v1.0 function to get a JSON with speaker labels included, the speaker separation function can simply use those labels to separate the audio into individual files for each speaker. This is a much faster option.

In response to these function modifications, we are also releasing speech characteristics v1.1, which enables users to choose which speaker they wish to calculate speech characteristics from thanks to the labeling included in the output JSON file from the cloud-based speech transcription function.

Clone this wiki locally