# Extraction of phonation features from audio files

Compute phonation features from sustained vowels and continuous speech.

For continuous speech, the features are computed over voiced segments

Seven descriptors are computed:
1. First derivative of the fundamental Frequency
2. Second derivative of the fundamental Frequency
3. Jitter
4. Shimmer
5. Amplitude perturbation quotient
6. Pitch perturbation quotient
7. Logaritmic Energy

Static or dynamic matrices can be computed:

Static matrix is formed with 29 features formed with (seven descriptors) x (4 functionals: mean, std, skewness, kurtosis) + degree of Unvoiced

Dynamic matrix is formed with the seven descriptors computed for frames of 40 ms.

Notes:

1. In dynamic features the first 11 frames of each recording are not considered to be able to stack the APQ and PPQ descriptors with the remaining ones.
2. The fundamental frequency is computed the RAPT algorithm. To use the PRAAT method,  change the "self.pitch method" variable in the class constructor.


In [1]:
import os
from tempfile import TemporaryDirectory

from disvoice import Phonation

################################################################################
###          (please add 'export KALDI_ROOT=<your_path>' in your $HOME/.profile)
###          (or run as: KALDI_ROOT=<your_path> python <your_script>.py)
################################################################################

2024-08-08 11:59:12.068240: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-08-08 11:59:12.085808: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-08-08 11:59:12.091324: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-08-08 11:59:12.108678: I tensorflow/core/pl

In [2]:
audio_path = os.environ['PROJECT_DIR'] + '/audios/OSR_us_000_0030_8k.wav'

In [3]:
import logging
import matplotlib.font_manager as fm

# Suppress font warnings
logging.getLogger('matplotlib.font_manager').setLevel(logging.ERROR)

with TemporaryDirectory() as temp_dir:
    phonation = Phonation(temp_dir=temp_dir)
    features_static = phonation.extract_features_file(audio_path, static=True, plots=False, fmt="dataframe")
    features_dynamic = phonation.extract_features_file(audio_path, static=False, plots=False, fmt="dataframe")

In [4]:
features_static

Unnamed: 0,avg DF0,avg DDF0,avg Jitter,avg Shimmer,avg apq,avg ppq,avg logE,std DF0,std DDF0,std Jitter,...,skewness apq,skewness ppq,skewness logE,kurtosis DF0,kurtosis DDF0,kurtosis Jitter,kurtosis Shimmer,kurtosis apq,kurtosis ppq,kurtosis logE
0,-0.058501,-0.017433,3.440231,5.790716,30.707316,3.781551,-22.551062,12.281038,18.976227,5.208262,...,1.474444,2.772781,-0.895794,12.98495,11.139913,17.14917,16.477441,3.508748,9.046955,0.573518


In [5]:
features_dynamic

Unnamed: 0,DF0,DDF0,Jitter,Shimmer,apq,ppq,logE
0,17.878860,22.589783,2.746126,0.000000,1.596087,6.582022,-13.577238
1,-5.400208,-23.279068,2.293697,15.869283,1.835650,3.730968,-12.445401
2,-4.510513,0.889694,2.847681,5.849544,49.164437,0.958094,-11.493121
3,5.599915,10.110428,2.127637,1.758244,74.955984,6.678155,-9.738120
4,4.183960,-1.415955,2.762840,0.000000,12.045754,6.566669,-8.584700
...,...,...,...,...,...,...,...
904,4.407417,15.093826,0.634683,1.245423,36.501317,3.757366,-27.618290
905,-1.248093,-5.655510,3.074614,0.202874,27.058643,5.665347,-30.930276
906,-6.046173,-4.798080,9.983157,3.786988,75.378467,0.060009,-33.183852
907,19.631699,25.677872,2.842269,0.000000,80.002849,6.430687,-28.863415
