A PyTorch implementation of Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis
pip3 install -r requirements.txt
Hyperparameters.py
--- contain all hyperparametersNetwork.py
--- encoder\decoderModules.py
--- some modules for tacotronLoss.py
--- calculate lossData.py
--- load datasetutils.py
--- some util function for loading and saving dataSynthesis.py
--- generate wav file
- Download multispeaker dataset
- preprocess your data and write yout
get_XX_data
function inData.py
- Adjust hyperparameters in
Hyperparameters.py
- make a directory named
log
in the parent of parent directory of Tacotron code
--- log
| |
| --- log[log_number]
|
--- code
|
--- Tacotron
|
--- train.py
|
--- Network.py
|
......
- run train.py
python3 train.py [log_number] [dataset_size] [start_epoch]
[log_number]: the log directory number
[dataset_size]: int or all
[start_epoch]: which epoch start to train (0 if start from scratch )
for example:
python3 train.py 0 all 0
rungenerate.py
, modify the text
in generate.py
before running
only support Chinese