Skip to content

Speaker Separation with Labels v1.0

anzar edited this page Nov 14, 2023 · 3 revisions
Date completed November 14th, 2023
Release where first appeared OpenWillis v1.6
Researcher / Developer Vijay Yadav

1 – Use

import openwillis as ow

speaker_dict = ow.speaker_separation_labels(filepath = '', transcript_json = '')

To use the speaker_dict output to save the separated audio files:

ow.to_audio(filepath = '', speaker_dict = '', output_dir = '')

2 – Methods

This function splits an audio file with multiple speakers into individual files for each speaker, assuming prior speaker labeling during transcription. It is meant to work alongside the Speech transcription with AWS and Speech transcription with Whisper functions in OpenWillis.

The user must provide the source audio file that was originally transcribed and the JSON output from the speech transcription function they used. In return, the function will output a speaker dictionary, named speaker_dict.

This dictionary can then be fed into the to_audio function to save the separated audio files in the specified output directory output_dir. The naming convention for these files is filename_speakerlabel, with filename indicating the original audio source's name and speakerlabel denoting the respective speaker's label from the JSON output.


3 – Inputs

3.1 – filepath

Type String
Description Path to audio file to be separated

3.2 – transcript_json

Type JSON
Description JSON output from the Speech transcription with AWS/Whisper functions

4 – Outputs

4.1 – speaker_dict

Type Dictionary
Description A dictionary with the speaker label as the key and the audio signal numpy array as the value.
Clone this wiki locally