# Extraction of articulation features from audio files

Compute articulation features from continuous speech.

122 descriptors are computed:

- 1 - 22. Bark band energies in onset transitions (22 BBE).
- 23 - 34. Mel frequency cepstral coefficients in onset transitions (12 MFCC onset)
- 35 - 46. First derivative of the MFCCs in onset transitions (12 DMFCC onset)
- 47 - 58. Second derivative of the MFCCs in onset transitions (12 DDMFCC onset)
- 59 - 80. Bark band energies in offset transitions (22 BBE).
- 81 - 92. MFCCC in offset transitions (12 MFCC offset)
- 93 - 104. First derivative of the MFCCs in offset transitions (12 DMFCC offset)
- 105 - 116. Second derivative of the MFCCs in offset transitions (12 DMFCC offset)
- 117 First formant Frequency
- 118 First Derivative of the first formant frequency
- 119 Second Derivative of the first formant frequency
- 120 Second formant Frequency
- 121 First derivative of the Second formant Frequency
- 122 Second derivative of the Second formant Frequency

In addition, static (for all utterance) or dynamic (at-frame level) features can be computed:

- The static feature vector is formed with 488 features (122 descriptors) x (4 functionals: mean, std, skewness, kurtosis)

- The dynamic matrix contains 58 descriptors (22 BBEs, 12 MFCC, 12DMFCC, 12 DDMFCC ) computed for frames of 40 ms of onset segments.

The first two frames of each recording are not considered for dynamic analysis to be able to stack the derivatives of MFCCs

#### Notes:
1. The fundamental frequency is computed the PRAAT algorithm. To use the RAPT method,  change the "self.pitch method" variable in the class constructor.

2. The formant frequencies are computed using Praat


In [1]:
import os
from tempfile import TemporaryDirectory

from disvoice import Articulation

################################################################################
###          (please add 'export KALDI_ROOT=<your_path>' in your $HOME/.profile)
###          (or run as: KALDI_ROOT=<your_path> python <your_script>.py)
################################################################################

2024-08-08 11:27:58.727769: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-08-08 11:27:58.746225: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-08-08 11:27:58.751922: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-08-08 11:27:58.768815: I tensorflow/core/pl

In [2]:
audio_path = os.environ['PROJECT_DIR'] + '/audios/OSR_us_000_0030_8k.wav'

In [3]:
import logging
import matplotlib.font_manager as fm

# Suppress font warnings
logging.getLogger('matplotlib.font_manager').setLevel(logging.ERROR)

with TemporaryDirectory() as temp_dir:
    articulation = Articulation(temp_dir=temp_dir)
    features_static = articulation.extract_features_file(audio_path, static=True, plots=False, fmt="dataframe")
    features_dynamic = articulation.extract_features_file(audio_path, static=False, plots=False, fmt="dataframe")

In [4]:
features_static

Unnamed: 0,avg BBEon_1,avg BBEon_2,avg BBEon_3,avg BBEon_4,avg BBEon_5,avg BBEon_6,avg BBEon_7,avg BBEon_8,avg BBEon_9,avg BBEon_10,...,kurtosis DDMFCCoff_9,kurtosis DDMFCCoff_10,kurtosis DDMFCCoff_11,kurtosis DDMFCCoff_12,kurtosis F1,kurtosis DF1,kurtosis DDF1,kurtosis F2,kurtosis DF2,kurtosis DDF2
0,-2.542912,-2.208225,-2.422675,-2.995519,-3.402027,-4.161382,-4.120139,-4.162634,-4.207636,-4.278871,...,1.524634,3.324434,2.113409,0.605703,1.499489,4.145998,3.999157,0.463639,4.521872,4.500206


In [5]:
features_dynamic

Unnamed: 0,BBEon_1,BBEon_2,BBEon_3,BBEon_4,BBEon_5,BBEon_6,BBEon_7,BBEon_8,BBEon_9,BBEon_10,...,DDMFCCon_3,DDMFCCon_4,DDMFCCon_5,DDMFCCon_6,DDMFCCon_7,DDMFCCon_8,DDMFCCon_9,DDMFCCon_10,DDMFCCon_11,DDMFCCon_12
0,-1.816360,-1.120524,-1.814429,-2.805558,-2.125560,-2.388739,-3.270600,-2.271640,-2.539541,-3.127944,...,10.082840,-3.127076,-1.461429,1.390143,-4.218781,-8.131559,-3.765467,5.525076,-4.887705,-9.422140
1,-3.547371,-2.770981,-3.032056,-3.709576,-3.908765,-5.717900,-5.603095,-5.416730,-4.978442,-5.604150,...,-15.662475,13.217981,-1.917713,-6.630940,13.217208,10.508193,16.838117,-19.275331,20.171197,0.049376
2,-3.479554,-3.465163,-3.641804,-3.775019,-4.507940,-5.827165,-4.654594,-4.930025,-4.366652,-4.984956,...,8.031270,-7.717962,1.276864,3.169509,-5.079582,-7.747613,-7.256674,11.010452,-12.506819,1.008664
3,-1.796232,-0.677130,-1.330062,-3.130833,-3.793016,-4.173329,-4.195295,-3.784495,-3.731152,-4.313415,...,2.785947,-3.155843,-1.491007,-5.843211,5.179020,-1.676895,-0.504131,5.417559,0.575893,1.415070
4,-2.001505,-2.807916,-2.590931,-3.214853,-3.980035,-4.405516,-4.475764,-4.514409,-3.449280,-3.348848,...,-7.549854,6.859757,3.113630,11.111005,-9.409234,3.025913,-0.944906,-5.925921,-7.881275,4.005316
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
170,-3.448896,-2.078614,-1.990119,-3.199511,-3.697083,-4.093710,-4.277461,-4.641791,-4.301754,-4.591862,...,8.510786,-5.771531,1.922919,-2.320703,6.083039,-6.871660,2.221071,2.624397,-2.163539,-1.878584
171,-0.831051,0.167061,-0.488164,-0.839675,-0.969961,-1.811701,-2.190678,-1.982427,-2.725100,-2.490531,...,0.701102,-4.583303,-1.802910,-1.161929,-7.354570,5.806035,-3.570228,1.059499,5.593093,-5.238790
172,-2.599252,-2.594206,-3.786128,-3.282752,-3.173048,-4.094819,-3.550626,-2.775175,-2.162926,-0.552564,...,-10.507345,17.001978,8.054626,-6.355431,7.351775,7.608827,6.811062,-31.042956,17.587322,7.492774
173,-4.153315,-3.720331,-3.010083,-2.444595,-2.622359,-3.687546,-3.780620,-2.980153,-1.892596,-1.229907,...,5.225195,-10.098891,-5.066624,7.480690,-4.571053,-8.708182,-1.800346,23.778432,-18.343626,-3.629535
