Low Level Speech Model (version 2) for high-quality speech analysis/synthesis
libllsm2 is a C library providing data structures and routines for analysis (parametrization), modification and synthesis of digital speech signals.
LLSM is a two-layer model of speech. The first layer (layer 0) is a signal-level parametrization that separately models harmonic and noise (e.g. aspiration and consonants) components; the second layer (layer 1) is an acoustic-level parametrization that performs an approximated glottal inverse filtering on the harmonic component.
- libllsm2 can be viewed as a lossy speech coder, but it differs from conventional speech coders in being designed for modifications (rather than compression).
libllsm2 is designed for the use in speech synthesis and modification applications, where the analysis of speech signals is performed once and for all. Therefore, the library is made to analyze speech carefully, and synthesize speech rapidly. Things you can do with libllsm2 include,
- Modification of glottal tension (the Rd parameter in a LF model)
- Modification of aspiration noise
Using libllsm2 in statistical parameteric speech synthesizers is possible, but you may need to assign default values for parameters typically not modelled by those statistical systems (e.g. phases and the temporal structure of turbulent noise).
What's new in libllsm2
libllsm2 is a rewrite of libllsm with the following features and changes,
- Clean and well-structured API.
- Better documentation (with Doxygen) and testing.
- New feature: synthesis can be done at a different sampling rate without changing the model.
- New feature: fast and accurate speech analysis/synthesis based on Chirp-Z Transform.
- New feature: real-time synthesis.
mkdir external build
- Create a symbolic link to
- Create a symbolic link to
ciglet; create a symbolic link named
Note: in all of my C libraries there's a macro named
FP_TYPE, which is either float or double. It may need to be specified as a compiler flag (i.e.,
The test speech
test/arctic_a0001.wav is a sample taken from the CMU Arctic database.
doxygen doxyfile to generate the API documentation. Examples of using libllsm2 are given in the tests.
libllsm is licensed under GPLv3.
I have a pending patent on LLSM-related technologies. Under the terms of GPLv3, the patent license is granted to libllsm2 users, free from royalty.
Please contact me (k.hua.kanru [at] ieee [dot] org) for an alternatively licensed version for commercial purposes.
Currently there's no publication directly associated with LLSM. However there is a poster on the pseudo glottal inverse filtering method in layer 1 LLSM.
K. Hua, "Speech Analysis/Synthesis by Non-parametric Separation of Vocal Source and Tract Responses," presented at Speech Processing Courses in Crete, 2016.
The following are the major publications that LLSM draws inspiration from.
G. Degottex, P. Lanchantin, A. Roebel, and X. Rodet, "Mixed source model and its adapted vocal tract filter estimate for voice transformation and synthesis," Speech Communication, vol. 55, no. 2, pp. 278–294, 2013.
J. P. Cabral, K. Richmond, J. Yamagishi, and S. Renals, "Glottal Spectral Separation for Speech Synthesis," IEEE Journal of Selected Topics in Signal Processing, vol. 8, no. 2, pp. 195208, 2014.
Y. Pantazis and Y. Stylianou, "Improving the modeling of the noise part in the harmonic plus noise model of speech," 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, 2008.
I. Saratxaga, Hernáez I., D. Erro, E. Navas, and Sánchez J., "Simple representation of signal phase for harmonic speech models," Electronics Letters, vol. 45, no. 7, p. 381, 2009.