# Extraction of glottal features from audio files

Compute features based on the glottal source reconstruction from sustained vowels
Nine descriptors are computed:

1. Variability of time between consecutive glottal closure instants (GCI)
2. Average opening quotient (OQ) for consecutive glottal cycles-> rate of opening phase duration / duration of glottal cycle
3. Variability of opening quotient (OQ) for consecutive glottal cycles-> rate of opening phase duration /duration of glottal cycle
4. Average normalized amplitude quotient (NAQ) for consecutive glottal cycles-> ratio of the amplitude quotient and the duration of the glottal cycle
5. Variability of normalized amplitude quotient (NAQ) for consecutive glottal cycles-> ratio of the amplitude quotient and the duration of the glottal cycle
6. Average H1H2: Difference between the first two harmonics of the glottal flow signal
7. Variability H1H2: Difference between the first two harmonics of the glottal flow signal
8. Average of Harmonic richness factor (HRF): ratio of the sum of the harmonics amplitude and the amplitude of the fundamental frequency
9. Variability of HRF

--Static or dynamic matrices can be computed:

--Static matrix is formed with 36 features formed with (9 descriptors) x (4 functionals: mean, std, skewness, kurtosis)

--Dynamic matrix is formed with the 9 descriptors computed for frames of 200 ms length.

In [11]:
# import sys
# # sys.path.append("../")
# print(sys.path)
import numpy as np
import pandas as pd
from glottal import Glottal

In [2]:
glottalf=Glottal()
file_audio="../audios/001_readtext_PCGITA.wav"

## Extract features and return them as a numpy array

In [20]:
features1=glottalf.extract_features_file(file_audio, static=True, plots=False, fmt="npy")

  f_spec=20*np.log10(np.abs(np.fft.fft(f_win, fs)))


In [21]:
features2=glottalf.extract_features_file(file_audio, static=True, plots=False, fmt="npy", general=True)

used general


In [18]:
print(features1[features1 < 200])
print(features1)
print(features2)

[ 3.52637694e-03  5.16052014e-03  3.13140601e-03  2.61152193e-01
  2.39859677e-01  1.21485048e+01  9.66461320e+00  5.02069727e-03
  2.50342625e-03  9.96783527e-04  1.70564502e-01  3.16361774e-01
  4.79026846e+00  3.18451955e+00  4.25562992e+00 -1.53000098e-01
 -1.05363232e+00  3.04246886e+00  5.13925566e+00 -2.87619217e-01
 -4.03783454e-01  7.17631355e+00  7.21391412e+00  2.54393678e+01
 -6.19878305e-01  2.19882579e+00  1.60824027e+01  3.03795199e+01
 -3.92701594e-01  1.70611389e-01  4.99664115e+01  5.02081597e+01]
[[ 3.52637694e-03  5.16052014e-03  3.13140601e-03  2.61152193e-01
   2.39859677e-01  1.21485048e+01  9.66461320e+00  9.80000383e+02
   4.72589228e+03  5.02069727e-03  2.50342625e-03  9.96783527e-04
   1.70564502e-01  3.16361774e-01  4.79026846e+00  3.18451955e+00
   7.33095124e+03  3.07047921e+04  4.25562992e+00 -1.53000098e-01
  -1.05363232e+00  3.04246886e+00  5.13925566e+00 -2.87619217e-01
  -4.03783454e-01  7.17631355e+00  7.21391412e+00  2.54393678e+01
  -6.19878305e-01

In [15]:
f1 = pd.DataFrame(features1)
f2 = pd.DataFrame(features2)
f1.head()


Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,26,27,28,29,30,31,32,33,34,35
0,0.003526,0.005161,0.003131,0.261152,0.23986,12.148505,9.664613,980.000383,4725.892281,0.005021,...,7.213914,25.439368,-0.619878,2.198826,16.082403,30.37952,-0.392702,0.170611,49.966412,50.20816


In [16]:
f2.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,26,27,28,29,30,31,32,33,34,35
0,0.003534,0.00517,0.003135,0.263821,0.241072,12.120237,9.538392,286.20946,1768.781987,0.005022,...,6.880847,25.388567,-0.623173,2.212246,15.809674,30.54767,-0.455838,0.007379,43.616506,46.659228


In [4]:
print(len(features1))
print(features1.shape)

print(len(features2))
print(features2.shape)

1
(1, 36)
1
(1, 36)


In [None]:
features1 == features2

array([[False, False, False, False, False, False, False, False, False,
        False, False, False, False, False, False, False, False, False,
        False, False, False, False, False, False, False, False, False,
        False, False, False, False, False, False, False, False, False]])

In [19]:
mse = np.mean((features1[features1 < 100] - features2[features2 < 100]) ** 2)
print(f"Mean Squared Error: {mse}")

Mean Squared Error: 1.6888122548301414


## Extract static features and return them as a dataframe 

In [6]:
features1=glottalf.extract_features_file(file_audio, static=True, plots=False, fmt="csv")
print(features1)

   global avg var GCI  global avg avg NAQ  global avg std NAQ  \
0             0.00105            0.004712            0.002313   

   global avg avg QOQ  global avg std QOQ  global avg avg H1H2  \
0            0.483546            0.248339            18.348063   

   global avg std H1H2  global avg avg HRF  global avg std HRF  \
0            12.408972           56.071506          195.619791   

   global std var GCI  ...  global skewness std HRF  global kurtosis var GCI  \
0            0.000279  ...                 2.128254                12.084682   

   global kurtosis avg NAQ  global kurtosis std NAQ  global kurtosis avg QOQ  \
0                -0.206489                -0.599053                  0.04543   

   global kurtosis std QOQ  global kurtosis avg H1H2  \
0                -0.260547                 -0.494493   

   global kurtosis std H1H2  global kurtosis avg HRF  global kurtosis std HRF  
0                  1.435911                 2.482266                 4.523592  

[1 rows

## Extract dynamic features and return them as a dataframe

In [9]:
features1=glottalf.extract_features_file(file_audio, static=False, plots=False, fmt="csv")
features1.head()

Unnamed: 0,var GCI,avg NAQ,std NAQ,avg QOQ,std QOQ,avg H1H2,std H1H2,avg HRF,std HRF
0,0.002405,0.004921,0.002326,0.458432,0.256464,24.431431,12.206051,44.903851,170.956369
1,0.000952,0.004901,0.002396,0.463102,0.257962,20.481655,12.524803,59.413526,128.012155
2,0.00086,0.004637,0.002754,0.417272,0.286215,20.46812,13.119944,30.793197,108.499991
3,0.001048,0.004448,0.00268,0.334249,0.228981,24.19663,15.142982,-11.732551,95.18391
4,0.001077,0.004742,0.002282,0.398391,0.246383,18.933918,14.51194,87.467043,327.42986


In [10]:
features1.shape

(39, 9)

In [12]:
features2=glottalf.extract_features_file(file_audio, static=True, plots=False, fmt='csv')

In [13]:
features2.head()

Unnamed: 0,global avg var GCI,global avg avg NAQ,global avg std NAQ,global avg avg QOQ,global avg std QOQ,global avg avg H1H2,global avg std H1H2,global avg avg HRF,global avg std HRF,global std var GCI,...,global skewness std HRF,global kurtosis var GCI,global kurtosis avg NAQ,global kurtosis std NAQ,global kurtosis avg QOQ,global kurtosis std QOQ,global kurtosis avg H1H2,global kurtosis std H1H2,global kurtosis avg HRF,global kurtosis std HRF
0,0.00105,0.004712,0.002313,0.483546,0.248339,18.348063,12.408972,56.071506,195.619791,0.000279,...,2.128254,12.084682,-0.206489,-0.599053,0.04543,-0.260547,-0.494493,1.435911,2.482266,4.523592


In [None]:
features2.shape

(1, 36)

For females : shimmer, AQ  
for males:    To1, OQ1, OQ2, OQa, SQ2  
for both:     Tc, HRF, CIQ, OQa, QoQ, SQ2

In [15]:
features2.columns

Index(['global avg var GCI', 'global avg avg NAQ', 'global avg std NAQ',
       'global avg avg QOQ', 'global avg std QOQ', 'global avg avg H1H2',
       'global avg std H1H2', 'global avg avg HRF', 'global avg std HRF',
       'global std var GCI', 'global std avg NAQ', 'global std std NAQ',
       'global std avg QOQ', 'global std std QOQ', 'global std avg H1H2',
       'global std std H1H2', 'global std avg HRF', 'global std std HRF',
       'global skewness var GCI', 'global skewness avg NAQ',
       'global skewness std NAQ', 'global skewness avg QOQ',
       'global skewness std QOQ', 'global skewness avg H1H2',
       'global skewness std H1H2', 'global skewness avg HRF',
       'global skewness std HRF', 'global kurtosis var GCI',
       'global kurtosis avg NAQ', 'global kurtosis std NAQ',
       'global kurtosis avg QOQ', 'global kurtosis std QOQ',
       'global kurtosis avg H1H2', 'global kurtosis std H1H2',
       'global kurtosis avg HRF', 'global kurtosis std HRF'],
    

## Extract dynamic features and return them as a torch tensor

In [7]:
features1=glottalf.extract_features_file(file_audio, static=False, plots=False, fmt="torch")
print(features1.dtype)
print(features1.size())

  return array(a, dtype, copy=False, order=order)


torch.float64
torch.Size([19, 9])


## Extract static features from a path return them as a numpy array

In [8]:
path_audio="../audios/"
features1=glottalf.extract_features_path(path_audio, static=True, plots=False, fmt="npy")
print(features1.shape)

Processing 001_readtext_PCGITA.wav:  50%|█████     | 2/4 [01:04<01:00, 30.02s/it]

Utterance likely to contain creak
Utterance likely to contain creak
Utterance likely to contain creak
Utterance likely to contain creak
Utterance likely to contain creak
Utterance likely to contain creak
Utterance likely to contain creak
Utterance likely to contain creak
Utterance likely to contain creak
Utterance likely to contain creak
Utterance likely to contain creak
Utterance likely to contain creak
Utterance likely to contain creak
Utterance likely to contain creak
Utterance likely to contain creak
Utterance likely to contain creak
Utterance likely to contain creak
Utterance likely to contain creak
Utterance likely to contain creak


Processing 098_u1_PCGITA.wav: 100%|██████████| 4/4 [03:49<00:00, 57.34s/it]      

(4, 36)



