FCN_GCI

Detection of GCIs from raw speech signals using a fully-convolutional network (FCN)

Code for running Glottal Closure Instants (GCI) detection using the fully-convolutional neural network models described in the following publication :

GCI detection from raw speech using a fully-convolutional network
Luc Ardaillon, Axel Roebel.
Submitted on arxiv on 22 Oct 2019.

We kindly request academic publications making use of our FCN models to cite the aforementioned paper.

Description

The code provided in this repository aims at performing GCI dectection using a Fully-Convolutional Neural Network. Note that it also allows to perform the prediction of the glottal flow shape (normalized in amplitude) from which more information than the GCIs may be extracted.

The provided code allows to run the GCI detection on given speech sound files using the provided pretrained models, but no code is currently provided to train the model on new data.
All pre-trained models evaluated in the above-mentionned paper are provided.
The models "FCN_synth_GF" and "FCN_synth_tri have been trained on a large database of high-quality synthetic speech (obtained by resynthesizing the BREF [1] and TIMIT [2] database using the PaN vocoder [3, and 4 Section 3.5.2]). The difference between those 2 models is that "FCN_synth_tri" predicts a triangular curve from which the GCIs are extracted by simple peak-picking on the maximums, while "FCN_synth_GF" predicts the glottal flow shape and performs the peak-picking on its negative derivative. The "FCN_CMU__10_90" and "FCN_CMU__60_20_20" models have been trained on the CMU database (with different train/validation/test splits) using a triangle shape as target.

The models, algorithm, training, and evaluation procedures, as well as the constitution of the databases, have been described in our publication "GCI detection from raw speech using a fully-convolutional network" (https://arxiv.org/abs/1910.10235).

Below are the results of our evaluations comparing our models to the SEDREAMS [5] and DPI [6] algorithms, in terms of IDR, MR, FAR, and IDA. The evaluation has been conducted on both a test database of synthetic speech and two datasets of real speech samples from the CMU artic [7] and PTDB-TUG [8] databases). All model and algorithms have been evaluated on 16kHz audio.

	_IDR			_MR			_FAR			_IDA
	_synth	_CMU	_PTDB	_synth	_CMU	_PTDB	_synth	_CMU	_PTDB	_synth	_CMU	_PTDB
_{FCN-synth-tri}	_99.90	_97.95	_95.37	_0.08	_1.89	_3.40	_0.02	_0.17	_1.22	_0.08	_0.26	_0.32
_FCN-synth-GF	_99.91	_98.43	_95.64	_0.06	_1.20	_2.91	_0.04	_0.37	_1.45	_0.11	_0.34	_0.38
_{FCN-CMU-10/90}	_49.63	_99.39	_90.13	_48.05	_0.50	_8.91	_0.51	_0.11	_0.95	_0.52	_0.10	_0.26
_{FCN-CMU-60/20/20}	_60.06	_99.52	_88.17	_39.14	_0.40	_11.00	_0.64	_0.08	_0.81	_0.50	_0.09	_0.26
_SEDREAMS	_89.26	_99.04	_95.34	_3.86	_0.21	_2.15	_6.88	_0.75	_2.51	_0.68	_0.36	_0.62
_DPI	_88.22	_98.69	_91.3	_2.14	_0.23	_2.16	_9.64	_1.08	_6.53	_0.83	_0.23	_0.49
_{DCNN (from [5])}		_99.3			_0.3			_0.4			_0.2

Example command-line usage (using provided pretrained models)

Default analysis :

This will run the glottal flow prediction and GCI detection using the FCN-synth-GF on the input file and store the output files (predicted glottal flow as 16kHz wav file, and GCI markers as sdif file) in the same folder than the input file :

python /path_to/FCN-f0/FCN_GCI.py -i /path_to/test_file.wav (note that you may specify a directory au audio files as input instead of a single file)

Run the analysis on a whole folder of audio files and specify output directory :

python /path_to/FCN-f0/FCN_GCI.py -i /path_to/audio_files -o /path_to/output_directory (If the output directory doesn't exist, it will be created)

Run the analysis using a specific model (default is FCN_synth_GF)

Use FCN-synth-tri model :

python /path_to/FCN-f0/FCN_GCI.py -i path_to/audio_files -m FCN_synth_tri -o /path_to/output.FCN-synth-tri.GCI.sdif (possible tags for pre-trained models are "FCN_synth_GF", "FCN_synth_tri", "FCN_CMU__10_90", and "FCN_CMU__60_20_20")

Example figures

Example of prediction of triangle shape from real speech extract :

Example of prediction of glottal flow shape from real speech extract :

Dependencies

keras tensorflow scipy numpy (optional : pysndfile)

References

[1] J. L. Gauvain, L. F. Lamel, and M. Eskenazi, "Design Considerations and Text Selection for BREF, a large French Read-Speech Corpus", 1st International Conference on Spoken Language Processing, ICSLP, http://www.limsi.fr/~lamel/kobe90.pdf

[2] V. Zue, S. Seneff, and J. Glass, "Speech Database Development At MIT : TIMIT And Beyond"

[3] Stefan Huber and Axel Roebel, "On glottal source shape parameter transformation using a novel deterministic and stochastic speech analysis and synthesis system", in Interspeech 2015

[4] L. Ardaillon, "Synthesis and expressive transformation of singing voice", Ph.D. dissertation, EDITE; UPMC-Paris 6 Sorbonne Universités, 2017 (Section 3.5.2 : "PaN engine")

[5] Thomas Drugman and Thierry Dutoit, "Glottal Closure and Opening Instant Detection from Speech Signals", in Interspeech 2009

[6] A P Prathosh, T V Ananthapadmanabha, A G Ramakrishnan, and Senior Member, "Epoch Extraction Based on Integrated Linear Prediction Residual Using Plosion Index"

[7] John Kominek and Alan W Black, "THE CMU ARCTIC SPEECH DATABASES", in 5th ISCA Speech Synthesis Workshop, 2004

[8] Gregor Pirker, Michael Wohlmayr, Stefan Petrik, and Franz Pernkopf, “A Pitch Tracking Corpus with Evaluation on Multipitch Tracking Scenario"

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
evaluation		evaluation
examples		examples
file_utils		file_utils
models		models
.gitignore		.gitignore
FCN_GCI.py		FCN_GCI.py
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
predict_target.py		predict_target.py
target_to_GCI.py		target_to_GCI.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

evaluation

evaluation

examples

examples

file_utils

file_utils

models

models

.gitignore

.gitignore

FCN_GCI.py

FCN_GCI.py

LICENSE

LICENSE

README.md

README.md

init.py

init.py

predict_target.py

predict_target.py

target_to_GCI.py

target_to_GCI.py

Repository files navigation

FCN_GCI

Description

Example command-line usage (using provided pretrained models)

Default analysis :

Run the analysis on a whole folder of audio files and specify output directory :

Run the analysis using a specific model (default is FCN_synth_GF)

Example figures

Dependencies

References

About

Releases

Packages

Languages

License

ardaillon/FCN_GCI

Folders and files

Latest commit

History

Repository files navigation

FCN_GCI

Description

Example command-line usage (using provided pretrained models)

Default analysis :

Run the analysis on a whole folder of audio files and specify output directory :

Run the analysis using a specific model (default is FCN_synth_GF)

Example figures

Dependencies

References

About

Resources

License

Stars

Watchers

Forks

Languages