<a href="https://colab.research.google.com/github/ShinAsakawa/ShinAsakawa.github.io/blob/master/2022notebooks/ch00_Quick_start.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Quick start

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/r9y9/ttslearn/blob/master/notebooks/ch00_Quick-start.ipynb)

「Pythonで学ぶ音声合成」のquick startページへようこそ！

このページ（ノートブック形式）では、書籍中で解説している3つの音声合成について、音声合成のサンプルコード・音声サンプルを示します。「解説を読む前に手を動かしてみたい」という方には、最初の一歩に最適なノートブックです。

ここで示す音声合成は、GitHubリポジトリで学習済みモデルが配布されています。音声サンプルを聴くだけでなく、ぜひ自分で音声合成を試してみて下さい。
そして、音声合成の詳細を理解するためには、ソースコードと書籍を併せて参照してください。

## 準備

### ttslearn のインストール

In [1]:
%%capture
try:
    import ttslearn
except ImportError:
    !pip install ttslearn

In [2]:
import ttslearn
ttslearn.__version__

'0.2.2'

### パッケージのインポート

In [3]:
%pylab inline
import IPython
from IPython.display import Audio
import librosa
import librosa.display
from tqdm.notebook import tqdm
import torch

Populating the interactive namespace from numpy and matplotlib


## DNN音声合成 (第5章・第6章)

In [4]:
from ttslearn.dnntts import DNNTTS
dnntts_engine = DNNTTS()

The use of pre-trained models is permitted for non-commercial use only.
Please visit https://github.com/r9y9/ttslearn to confirm the license.
Downloading: "https://github.com/r9y9/ttslearn/releases/download/v0.2.0/dnntts.tar.gz"


dnntts.tar.gz: 0.00B [00:00, ?B/s]

In [11]:
%time wav, sr = dnntts_engine.tts("ワード2ベック")
IPython.display.display(Audio(wav, rate=sr))

CPU times: user 546 ms, sys: 7.17 ms, total: 553 ms
Wall time: 550 ms


In [7]:
%time wav, sr = dnntts_engine.tts("失語ロイドっておもしろいよねー")
IPython.display.display(Audio(wav, rate=sr))

CPU times: user 910 ms, sys: 13 ms, total: 923 ms
Wall time: 892 ms


In [8]:
from ttslearn.wavenet import WaveNetTTS
wavenet_engine = WaveNetTTS()

The use of pre-trained models is permitted for non-commercial use only.
Please visit https://github.com/r9y9/ttslearn to confirm the license.
Downloading: "https://github.com/r9y9/ttslearn/releases/download/v0.2.0/wavenettts.tar.gz"


wavenettts.tar.gz: 0.00B [00:00, ?B/s]

In [9]:
%time wav, sr = wavenet_engine.tts("小さな鰻屋に、熱気のようなものがみなぎる", tqdm=tqdm)
IPython.display.display(Audio(wav, rate=sr))

  0%|          | 0/52640 [00:00<?, ?it/s]

CPU times: user 8min 3s, sys: 8.03 s, total: 8min 11s
Wall time: 8min 14s


## Tacotron 2 (第9章・第10章)

In [None]:
from ttslearn.tacotron import Tacotron2TTS
tacotron_engine = Tacotron2TTS()

The use of pre-trained models is permitted for non-commercial use only.
Please visit https://github.com/r9y9/ttslearn to confirm the license.
Downloading: "https://github.com/r9y9/ttslearn/releases/download/v0.2.0/tacotron2.tar.gz"


tacotron2.tar.gz: 0.00B [00:00, ?B/s]

In [None]:
%time wav, sr = tacotron_engine.tts("昼にはペスカトーレを、夜には寿司をパクパク食べた。", tqdm=tqdm)
IPython.display.display(Audio(wav, rate=sr))

  0%|          | 0/51200 [00:00<?, ?it/s]

CPU times: user 6min 52s, sys: 3.18 s, total: 6min 55s
Wall time: 6min 56s


## おわりに

In [None]:
text = "これから音声合成を始める皆様にとって、少しでも学習の助けになれば幸いです。"
print(text)

for idx, (name, engine) in enumerate([
    ("DNNTTS", dnntts_engine), 
    ("WaveNet TTS", wavenet_engine),
    ("Tacotron 2", tacotron_engine),
]):
    %time wav, sr = engine.tts(text, tqdm=tqdm)
    IPython.display.display(Audio(wav, rate=sr))

これから音声合成を始める皆様にとって、少しでも学習の助けになれば幸いです。
CPU times: user 1.88 s, sys: 70.6 ms, total: 1.95 s
Wall time: 1.9 s


  0%|          | 0/97680 [00:00<?, ?it/s]

CPU times: user 14min 47s, sys: 8.37 s, total: 14min 55s
Wall time: 14min 59s


  0%|          | 0/93200 [00:00<?, ?it/s]

CPU times: user 13min 14s, sys: 7.57 s, total: 13min 22s
Wall time: 13min 26s
