Speaker separation cloud v1.0

Date completed	June 21, 2023
Release where first appeared	OpenWillis v1.3
Researcher / Developer	Vijay Yadav

1 – Use

1.1 – Processing

The json input is coming from the cloud-based speech_transcription_cloud function:

import openwillis as ow

signal_label = ow.speaker_separation_cloud(filepath = 'data.wav', json_response = '{}')

1.2 – Saving audio files

To use the signal_label output to save the separated audio files:

import openwillis as ow

ow.to_audio(filepath = 'data.wav', signal_label, out_dir)

2 – Methods

The speaker_separation function separates an audio file with two speakers into two audio signals, each containing speech only from one of the two speakers. The to_audio function is used to save that audio signal as an audio file.

2.1 – Following use of the `speech_transcription_cloud` function

The assumption is that the user has already used the speech_transcription_cloud function to acquire a JSON transcript with labeled speakers.For a more accurate method using a This function uses the timepoints in the JSON to slice the audio file. Consequently, the function returns an audio signal dictionary with keys representing the labels 'speaker0' or 'speaker1' or 'clinician' and 'participant' and the values holding the audio signals in numpy array format.

2.3 – Following use of the `to_audio` function

The to_audio function exports audio signals from a dictionary to individual WAV files. It takes a file path, a dictionary containing labeled speakers and their respective audio signals as numpy arrays, and an output directory. Each WAV file (for both speakers) is saved in the specified output directory with a unique name in the format of "filename_speakerlabel.wav", where "filename" refers to the original file name, and "speakerlabel" represents the label of the speaker.

3 – Inputs

3.1 – `filepath`

Type	str
Description	path to audio file to be separated

3.2 – `json_response`

Type	json
Description	json output that lists each word transcribe, the confidence level associated with that word’s transcription, its utterance start time, and its utterance end time.

4 – Outputs

4.1 – `signal_label`

Type	dictionary
Description	A dictionary with the speaker label as the key and the audio signal numpy array as the value.

What the dictionary looks like:

labels = {'speaker0': [2,35,56,-52 … 13, -14], 'speaker1': [12,45,26,-12 ……43, -54] }

5 – Dependencies

Below are dependencies specific to calculation of this measure.

Dependency	License	Justification

OpenWillis was developed by a small team of clinicians, scientists, and engineers based in Brooklyn, NY.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speaker separation cloud v1.0

1 – Use

1.1 – Processing

1.2 – Saving audio files

2 – Methods

2.1 – Following use of the `speech_transcription_cloud` function

2.3 – Following use of the `to_audio` function

3 – Inputs

3.1 – `filepath`

3.2 – `json_response`

4 – Outputs

4.1 – `signal_label`

5 – Dependencies

Table of contents

Clone this wiki locally

Speaker separation cloud v1.0

1 – Use

1.1 – Processing

1.2 – Saving audio files

2 – Methods

2.1 – Following use of the speech_transcription_cloud function

2.3 – Following use of the to_audio function

3 – Inputs

3.1 – filepath

3.2 – json_response

4 – Outputs

4.1 – signal_label

5 – Dependencies

Table of contents

Clone this wiki locally

2.1 – Following use of the `speech_transcription_cloud` function

2.3 – Following use of the `to_audio` function

3.1 – `filepath`

3.2 – `json_response`

4.1 – `signal_label`