-
Notifications
You must be signed in to change notification settings - Fork 8
Speaker separation cloud v1.0
Date completed | June 21, 2023 |
Release where first appeared | OpenWillis v1.3 |
Researcher / Developer | Vijay Yadav |
The json
input is coming from the cloud-based speech_transcription_cloud
function:
import openwillis as ow
signal_label = ow.speaker_separation_cloud(filepath = 'data.wav', json_response = '{}')
To use the signal_label
output to save the separated audio files:
import openwillis as ow
ow.to_audio(filepath = 'data.wav', signal_label, out_dir)
The speaker_separation
function separates an audio file with two speakers into two audio signals, each containing speech only from one of the two speakers. The to_audio
function is used to save that audio signal as an audio file.
The assumption is that the user has already used the speech_transcription_cloud
function to acquire a JSON transcript with labeled speakers.For a more accurate method using a This function uses the timepoints in the JSON to slice the audio file. Consequently, the function returns an audio signal dictionary with keys representing the labels 'speaker0' or 'speaker1' or 'clinician' and 'participant' and the values holding the audio signals in numpy array format.
The to_audio
function exports audio signals from a dictionary to individual WAV files. It takes a file path, a dictionary containing labeled speakers and their respective audio signals as numpy arrays, and an output directory. Each WAV file (for both speakers) is saved in the specified output directory with a unique name in the format of "filename_speakerlabel.wav", where "filename" refers to the original file name, and "speakerlabel" represents the label of the speaker.
Type | str |
Description | path to audio file to be separated |
Type | json |
Description | json output that lists each word transcribe, the confidence level associated with that word’s transcription, its utterance start time, and its utterance end time. |
Type | dictionary |
Description | A dictionary with the speaker label as the key and the audio signal numpy array as the value. |
What the dictionary looks like:
labels = {'speaker0': [2,35,56,-52 … 13, -14], 'speaker1': [12,45,26,-12 ……43, -54] }
Below are dependencies specific to calculation of this measure.
Dependency | License | Justification |
OpenWillis was developed by a small team of clinicians, scientists, and engineers based in Brooklyn, NY.
- Release notes
- Getting started
-
List of functions
- Video Preprocessing for Faces v1.0
- Create Cropped Video v1.0
- Facial Expressivity v2.0
- Emotional Expressivity v2.0
- Eye Blink Rate v1.0
- Speech Transcription with Vosk v1.0
- Speech Transcription with Whisper v1.2
- Speech Transcription with AWS v1.2
- Speaker Separation with Labels v1.0
- Speaker Separation without Labels v1.0
- WillisDiarize v1.0
- WillisDiarize AWS v1.0
- Vocal Acoustics v2.1
- Speech Characteristics v3.1
- GPS Analysis v1.0
- Research guidelines