Skip to content

Vocal acoustics v1.0

anzar edited this page Jun 14, 2023 · 1 revision
Date completed April 13, 2023
Release where first appeared OpenWillis v1.0
Researcher / Developer Vijay Yadav

1 – Use

import openwillis as ow

framewise, pauses, summary = ow.vocal_acoustics(audio_path = 'audio.wav')

2 – Methods

Calculating a list of vocal acoustic features from inputted audio (only .wav files supported)

  • First, a set of vocal acoustic properties that have framewise values are calculated through Parselmouth and saved in framewise. This includes the following variables:
  • Pydub is used to detect the presence of voice in the audio file. This information is compiled into the pauses output, which lists each pause, when it started, when it ended, and its duration.
  • In the summary output, the mean, standard deviation, minimum, maximum, and range of each of the variables from the first step are saved.
  • The information stored in pauses is compiled into three variables, also saved in summary:
    Number of pauses per minute (pause_rate)
    • Mean duration of pauses (pause_meandur), measured in seconds
    • Silence ratio (silence_ratio), the percentage of frames with no voice detected
  • Parselmouth is used to calculate an additional set of variables that pertain to the entirety of the audio file rather than be framewise measures. These are saved directly in the summary output:

3 – Inputs

3.1 – audio_path

Type str
Description path to audio file; can only support .wav files

4 – Outputs

4.1 – framewise

Type data-type
Description framewise output of acoustic properties that can be calculated for individual frames. columns represent variables, rows represent frames

What the data frame looks like:

frame f0 f1 f2 f3 f4 loudness hnr
0
1
...

4.2 – pauses

Type data-type
Description list of all pauses detected in the audio file, with start times, end times, and durations. precursor for pause variables in summary output. all values are in seconds.

What the data frame looks like:

pause_start pause_end pause_duration
...

4.3 – summary

Type data-type
Description final output of all vocal acoustic measures calculated from the input audio file.

The data frame is the transpose of the table below:

f0_mean
f0_stdev
f0_min
f0_max
f0_range
f1_mean
f1_stdev
f1_min
f1_max
f1_range
...
loudness_mean
loudness_stdev
loudness_min
loudness_max
loudness_range
hnr
jitter
jitter_abs
jitter_rap
jitter_ppq5
jitter_ddp
shimmer
shimmer_db
shimmer_apq3
shimmer_apq5
shimmer_apq11
shimmer_dda
gne_ratio
pause_meandur
pause_rate
silence_ratio

5 – Example use

Here, we use this function to process a sample audio file included in the repository.

import openwillis as ow

framewise, pauses, summary = ow.vocal_acoustics(audio_path = 'data/trim.wav')
framewise.head(2)
frame f0 loudness hnr form1freq form2freq form3freq form4freq
0 107.72 49.71 7.88 439.77 1720.29 2662.75 4328.91
1 105.88 48.59 9.10 376.80 2513.84 2667.70 4105.55

6 – Dependencies

Below are dependencies specific to calculation of this measure.

Dependency License Justification
Parselmouth GPL 3.0 License Python implementation of the Praat software library, a long trusted source for measurement methods in vocal acoustics
Pdyub MIT License Open source and accurate methods for analysis of audio files; using it to parse speech versus silence in audio files
Clone this wiki locally