Articulatory Synthesis and Inversion

Machine learning models for articulatory-to-acoustic synthesis and acoustic-to-articulatory inversion.

Correspondence to:

Peter Wu (peterw1@berkeley.edu)

Installation

git clone https://github.com/articulatory/articulatory.git
cd articulatory
pip3 install -e .

EMA-to-Speech

Training

cd egs/ema/voc1
mkdir downloads
# download MNGU0 dataset and move emadata/ folder to downloads/
python3 local/mk_ema_feats.py
python3 local/pitch.py downloads/emadata/cin_us_mngu0 --hop 80
python3 local/combine_feats.py downloads/emadata/cin_us_mngu0 --feats pitch actions -o fnema
./run.sh --conf conf/e2w_hifigan.yaml --stage 1 --tag e2w_hifi --train_set mngu0_train_fnema --dev_set mngu0_val_fnema --eval_set mngu0_test_fnema

Stage 1 in ./run.sh is preprocessing and thus only needs to be run once per train-dev.-eval. triple. Stage 2 is training, so subsequent training experiments with the same data can use ./run.sh --stage 2.
Replace conf/e2w_hifigan.yaml with conf/e2w_hifigan_car.yaml to use our autoregressive model (HiFi-CAR)

Inference

Here is a link to the weights of an already-trained HiFi-CAR model. Inputs to this model are 200-Hz, 12-dimensional EMA features (tongue dorsum x, y, tongue body x, y, tongue tip x, y, lower incisor x, y, upper lip x, y, lower lip x, y). Our synthesis model with normalized pitch plus EMA as input (13-dimensional) is here. Here is the model for tract variable (TV) to speech, and here is the model for normalized pitch + TV to speech. TV features are LA, LP, JA, TTCL, TTCD, TMCL, TMCD, TRCL, and TRCD, as described here.

python3 local/predict_wav.py \
        --scp [feature_scp_file] \
        --outdir [output_dir] \
        --checkpoint [model_ckpt_file] \
        --config [model_config_file]

Speech-to-EMA

Here is a link to the weights of an already-trained articulatory inversion model. Inputs to this model are 16 kHz waveforms and the first 12 dimensions of the outputs are EMA features (lower incisor x, y, upper lip x, y, lower lip x, y, tongue tip x, y, tongue body x, y, tongue dorsum x, y).

cd egs/ema/voc1
python3 local/predict_ema.py [model_dir] [input_wav_dir] [output_dir]

Speech-to-EMA with a linear regression model can be done with egs/ema/voc1/local/linear_inference.py. The weights to this model are here, and the order of the EMA features here is: tongue dorsum x, y, tongue body x, y, tongue tip x, y, lower incisor x, y, upper lip x, y, lower lip x, y.

Creating Your Own Speech Synthesizer

cd egs
mkdir <your_id>
cp -r TEMPLATE/voc1 <your_id>

To use your own model, add the model code to a new file in articulatory/models and an extra line referencing that file in articulatory/models/__init__.py. Then, change generator_type or discriminator_type in the .yaml config to the name of the new model class.
To customize the loss function, similarly modify the code in articulatory/losses. Then, call the loss function in articulatory/bin/train.py. Existing loss functions can be toggled on/off and modified through the .yaml config, e.g., in the "STFT LOSS SETTING" and "ADVERSARIAL LOSS SETTING" sections.

Papers

If you find this repository useful, please cite our respective paper:

Deep Speech Synthesis from Articulatory Representations
Interspeech 2022

@inproceedings{peter2022artsyn,
  title={Deep Speech Synthesis from Articulatory Representations},
  author={Wu, Peter and Watanabe, Shinji and Goldstein, Louis and Black, Alan W and Anumanchipalli, Gopala Krishna},
  booktitle={Interspeech},
  year={2022}
}

Speaker-Independent Acoustic-to-Articulatory Speech Inversion
ICASSP 2023

@inproceedings{peter2023artinv,
  title={Speaker-Independent Acoustic-to-Articulatory Speech Inversion},
  author={Wu, Peter and Chen, Li-Wei and Cho, Cheol Jun and Watanabe, Shinji and Goldstein, Louis and Black, Alan W and Anumanchipalli, Gopala K},
  booktitle={ICASSP},
  year={2023}
}

Evidence of Vocal Tract Articulation in Self-Supervised Learning of Speech
ICASSP 2023

@inproceedings{cho2023evidence,
  title={Evidence of Vocal Tract Articulation in Self-Supervised Learning of Speech},
  author={Cho, Cheol Jun and Wu, Peter and Mohamed, Abdelrahman and Anumanchipalli, Gopala K},
  booktitle={ICASSP},
  year={2023},
}

Acknowledgements

Based on https://github.com/kan-bayashi/ParallelWaveGAN.

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
articulatory		articulatory
egs		egs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyrightconfig.json		pyrightconfig.json
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

articulatory

articulatory

egs

egs

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

pyrightconfig.json

pyrightconfig.json

requirements.txt

requirements.txt

setup.cfg

setup.cfg

setup.py

setup.py

Repository files navigation

Articulatory Synthesis and Inversion

Installation

EMA-to-Speech

Training

Inference

Speech-to-EMA

Creating Your Own Speech Synthesizer

Papers

Acknowledgements

About

Releases

Packages

Contributors 2

Languages

License

articulatory/articulatory

Folders and files

Latest commit

History

Repository files navigation

Articulatory Synthesis and Inversion

Installation

EMA-to-Speech

Training

Inference

Speech-to-EMA

Creating Your Own Speech Synthesizer

Papers

Acknowledgements

About

Topics

Resources

License

Stars

Watchers

Forks

Languages