Vocal acoustics v2.0

Date completed	March 19, 2024
Release where first appeared	OpenWillis v2.1
Researcher / Developer	Vijay Yadav, Georgios Efstathiadis

1 – Use

import openwillis as ow

framewise, summary = ow.vocal_acoustics(audio_path = 'audio.wav', option = 'simple')

2 – Methods

Calculating a list of vocal acoustic features from inputted audio (only .wav and .mp3 files supported)

First, a set of vocal acoustic properties that have framewise values are calculated through Parselmouth and saved in framewise. This includes the following variables:
- Fundamental frequency (f0), measured in Hertz
- Formant frequencies 1 through 4 (f1, f2, f3, and f4), measured in Hertz
- Loudness, measured in decibels
- Harmonics-to-noise ratio (hnr)
In the summary output, the mean, standard deviation, and range of each of the variables from the first step are saved.
The pause information is compiled into three variables, also saved in summary:
- Number of pauses per minute (pause_rate)
- Mean duration of pauses (pause_meandur), measured in seconds
- Silence ratio (silence_ratio), the percentage of frames with no voice detected
Parselmouth is used to calculate an additional set of variables that pertain to the entirety of the audio file rather than be framewise measures. These are saved directly in the summary output:
- Jitter (absolute)
- Jitter (rap)
- Jitter (ppq5)
- Jitter (ddp)
- Shimmer (absolute)
- Shimmer (db)
- Shimmer (apq3)
- Shimmer (apq5)
- Shimmer (apq11)
- Shimmer (dda)
- Glottal-to-noise excitation ratio
Additionally from the previous outputs the following measures are saved in the summary output (Kovac et al., 2023):
- Pitch variation (relF0SD), defined as a standard deviation of F0 contour of voiced segments longer than 100 ms relative to its mean
- Speech Loudness variation (relSE0SD), defined as a standard deviation of energy of voiced segments longer than 100 ms relative to its mean
- SPIR, Number of pauses (longer than 50 ms and shorter than 2 s) relative to total speech time.
- DurMED, Median duration of silences longer than 50 ms and shorter than 2 s
- DurMAD, Median absolute deviation of silence duration (longer than 50 ms and shorter than 2s)
The mean and variance of cepstral features are also calculated and saved in the summary output (Silva et al., 2021; Jiang et al., 2017; Dumpala et al., 2021; Berardi et al., 2023):
- First 14 Mel-Frequency Cepstral Coefficients (MFCCs), are a low dimension representation of the short-term power spectrum of a sound, based on a linear cosine transform of a log power spectrum on a nonlinear mel scale of frequency
- Cepstral Peak Prominence (CPP), measure of breathiness and overall dysphonia
Vocal tremor related statistics are also saved in the summary output calculated from tremor.praat (these are meaningful when input is a sustained vowel phonation) and they include the following:
- frequency contour magnitude (FCoM),
- (maximum) frequency tremor cyclicality (FTrC),
- number of frequency modulations above thresholds (FMoN),
- (strongest) frequency tremor frequency (FTrF),
- frequency tremor intensity index (FTrI) at FTrF,
- frequency tremor power index (FTrP) at FTrF,
- frequency tremor cyclicality intensity product (FTrCIP) at FTrF,
- frequency tremor product sum (FTrPS),
- frequency contour harmonicity-to-noise ratio (FCoHNR),
- amplitude contour magnitude (ACoM)
- (maximum) amplitude tremor cyclicality (ATrC)
- number of amplitude modulations above thresholds (AMoN)
- (strongest) amplitude tremor frequency (ATrF)
- amplitude tremor intensity index (ATrI)
- amplitude tremor power index (ATrP)
- amplitude tremor cyclicality intensity product (ATrCIP)
- amplitude tremor product sum (ATrPS)
- amplitude contour harmonicity-to-noise ratio (ACoHNR).
Glotta features are also calculated and saved in the summary output (note they take a long time to compute). These include:
- Average of Harmonic Richness Factor (HRF), ratio of the sum of the harmonics amplitude and the amplitude of the fundamental frequency
- Variability of Harmonic Richness Factor (HRF)
- Average Normalized Amplitude Quotient (NAQ) for consecutive glottal cycles-> ratio of the amplitude quotient and the duration of the glottal cycle
- Variability of Normalized Amplitude Quotient (NAQ) for consecutive glottal cycles-> ratio of the amplitude quotient and the duration of the glottal cycle
- Average opening quotient (OQ) for consecutive glottal cycles-> rate of opening phase duration / duration of glottal cycle
- Variability of opening quotient (OQ) for consecutive glottal cycles-> rate of opening phase duration /duration of glottal cycle

3 – Inputs

3.1 – audio_path

Type	str
Description	path to audio file; can only support .wav files

3.2 – option

Type	str
Description	Default is ‘simple’, string that determines measures calculated; can be ‘simple’, ‘advanced’ or ‘tremor’

Option	List of variables calculated
simple	Parselmouth measures, pause measures, cepstral measures
tremor	Simple measures + tremor measures
advanced	Simple measures + tremor measures and glottal measures

4 – Outputs

4.1 – framewise

Type	data-type
Description	framewise output of acoustic properties that can be calculated for individual frames. columns represent variables, rows represent frames

What the data frame looks like:

frame	f0	f1	f2	f3	f4	loudness	hnr
0
1
...

4.2 – summary

Type	data-type
Description	final output of all vocal acoustic measures calculated from the input audio file.

5 – Example use

Here, we use this function to process a sample audio file included in the repository.

import openwillis as ow

framewise, summary = ow.vocal_acoustics(audio_path = 'data/trim.wav')

framewise.head(2)

frame	f0	loudness	hnr	form1freq	form2freq	form3freq	form4freq
0	107.72	49.71	7.88	439.77	1720.29	2662.75	4328.91
1	105.88	48.59	9.10	376.80	2513.84	2667.70	4105.55

6 – Dependencies

Below are dependencies specific to calculation of this measure.

Dependency	License	Justification
Parselmouth	GPL 3.0 License	Python implementation of the Praat software library, a long trusted source for measurement methods in vocal acoustics
Pydub	MIT License	Open source and accurate methods for analysis of audio files; using it to parse speech versus silence in audio files
DisVoice	MIT License	Only using the glottal module for calculation of advanced glottal features (HRF, NAQ, QOQ), which is a python implementation of the widely used MatLab COVAREP project.
pysptk	MIT License	A python wrapper for Speech Signal Processing Toolkit (SPTK), used in DisVoice feature calculations
librosa	ISC License	Packaged for music and audio analysis, used in cepstral variable calculation and specifically extraction of MFCCs.

7 – References

Berardi, M. L., Brosch, K., Pfarr, J., Schneider, K., Sültmann, A., Thomas-Odenthal, F., Wroblewski, A., Usemann, P., Philipsen, A., Dannlowski, U., Nenadić, I., Kircher, T., Krug, A., Stein, F., & Dietrich, M. (2023). Relative importance of speech and voice features in the classification of schizophrenia and depression. Translational Psychiatry, 13(1). https://doi.org/10.1038/s41398-023-02594-0

Dumpala, S. H., Rempel, S., Dikaios, K., Sajjadian, M., Uher, R., & Oore, S. (2021). Estimating Severity of Depression From Acoustic Features and Embeddings of Natural Speech. IEE. https://doi.org/10.1109/icassp39728.2021.9414129

Jiang, H., Hu, B., Liu, Z., Yan, L., Wang, T., Liu, F., Kang, H., & Li, X. (2017). Investigation of different speech types and emotions for detecting depression using different classifiers. Speech Communication, 90, 39–46. https://doi.org/10.1016/j.specom.2017.04.001

Silva, W. J., Lopes, L. W., Galdino, M. K. C., & Almeida, A. A. (2021). Voice Acoustic Parameters as Predictors of Depression. Journal of Voice. https://doi.org/10.1016/j.jvoice.2021.06.018

Kovac, D., Mekyska, J., Brabenec, L., Košťálová, M., & Rektorová, I. (2023). Research on passive assessment of Parkinson’s Disease utilising speech biomarkers. In Pervasive Computing Technologies for Healthcare (pp. 259–273). https://doi.org/10.1007/978-3-031-34586-9_18

OpenWillis was developed by a small team of clinicians, scientists, and engineers based in Brooklyn, NY.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly