<img src="https://nv-adlr.github.io/images/waveglow_logo.png" width=300 align=center >




# Part1. Voice Synthesize with NVIDIA WaveGlow Model
 


by **Hyungon Ryu** | Sr. Solution Architect at NVIDIA


---

```

```


----



  **Content**
- **Part1. Voice Synthesis with NVIDIA  WaveGlow Model**
- Part2. Voice Synthesis with NVIDIA Tacotron2 + WaveGlow



In this jupyter, I'll demonstrate Voice Synthesis from Mel with WaveGlow Model. You can reproduce  through the provided pretrained WaveGlow parameters. You can reproduce the voice synthesis of the WaveGlow model on this jupyter notebook in Google COLAB environmnet with Tesla K80. You can replay it within 10 minutes, including the time you receive the weight file. If you use Tesla T4 or Tesla V100, you can synthesize voice in real time. 
Visit the NVIDIA ADLR's WaveGlow [blog](https://nv-adlr.github.io/WaveGlow) to see the sound quality of WaveGlow model. 


```

```
----

## Step1. DevOps


### allocate GPU

At the time of creation of this jupyter noteboo, I already selected GPU as a preference Accelerator option. Before we get started, let's see if a GPU is allocated.


Select  the **[EDIT]** menu > Select the  **[Notebook Settings]** >  and check the box of ** [ GPU ]** option. 

#### check Tesla K80
Google COLAB provide  
You can see the assigned GPU information with simple command  `nvidia-smi`

In [0]:
!nvidia-smi | grep Tesla

|   0  Tesla K80           On   | 00000000:00:04.0 Off |                    0 |


#### system information and configure
You can see detailed information about the specs for free systems offered by Google COLAB.  In particular, the nvidia-smi tool allows you to adjust the Tesla K80's application clock to its highest application clock rate of 875 Mhz

In [0]:
%%bash
#check the environemnt 
echo "Check H/W"
lscpu | grep 'CPU(s):            '
lscpu | grep GHz
echo "memory" && free -m | cut -c-49 |  head -n 2 
echo "storage" && df -h |  cut -c-60 | head -n 2
df -h |  grep '/dev/sda1'
echo " " && nvidia-smi -L | cut -c-17
echo "confure Max Application Clock for K80 875Mhz"
nvidia-smi -ac 2505,875 && nvidia-smi -pm 1
echo " " &&echo "Check S/W"
cat /etc/*-release | grep PRETTY_NAME
python --version 
nvcc --version | grep  tools

Check H/W
CPU(s):              2
Model name:          Intel(R) Xeon(R) CPU @ 2.30GHz
memory
              total        used        free      
Mem:          13022        4431         142      
storage
Filesystem      Size  Used Avail Use% Mounted on
overlay         359G   12G  329G   4% /
/dev/sda1       365G   15G  351G   5% /opt/bin
 
GPU 0: Tesla K80 
confure Max Application Clock for K80 875Mhz
Applications clocks set to "(MEM 2505, SM 875)" for GPU 00000000:00:04.0
All done.
Persistence mode is already Enabled for GPU 00000000:00:04.0.
All done.
 
Check S/W
PRETTY_NAME="Ubuntu 18.04.1 LTS"
Python 3.6.6
Cuda compilation tools, release 9.2, V9.2.148


### clone WaveGlow  Model

Copy the  NVIDIA's 
[WaveGlow](https://github.com/NVIDIA/waveglow) model to COLAB via the git clone command. In particular, the WaveGlow model uses tacotron2 as a submodule to creat a Mel Spectrogram.

This jupyter is based on the last commit [ f4c04e2 ](https://github.com/NVIDIA/waveglow/commit/f4c04e2d968de01b22d2fb092bbbf0cec0b6586f)  and Google COLAB environment in October 10, 2018

In [0]:
%%bash
git clone https://github.com/NVIDIA/waveglow.git
cd waveglow
git submodule init
git submodule update

Submodule path 'tacotron2': checked out 'fc0cf6a89a47166350b65daa1beaa06979e4cddf'


Cloning into 'waveglow'...
Submodule 'tacotron2' (http://github.com/NVIDIA/tacotron2) registered for path 'tacotron2'
Cloning into '/content/waveglow/tacotron2'...


### install requirements

The WaveGlow model has been tested in pytorch 0.4.0. You also need some library like librosa to handle audio and mel spectrogram  files. It takes about one minute to finish. It may vary depending on network environment.

In [0]:
%%time
%%bash 
pip install torch==0.4.0 matplotlib==2.1.0 tensorflow  inflect==0.2.5 \
 librosa==0.6.0 scipy==1.0.0 tensorboardX==1.1 Unidecode==1.0.22 pillow 

```



```
---

## Step2. Prepare Wavegloe Weight Files



### 2-1 WaveGlow weight from  NVIDIA ADLR


###  

NVIDIA provide pre-trained WaveGlow Weight for voice synthesis. 



### 2-2 download checkpoint file direct from Google Drive
You can download checkpint files from Googie drive directly.

#### define python function 
 I borrow the charlesreid1's [python code](https://gist.githubusercontent.com/charlesreid1/4f3d676b33b95fce83af08e4ec261822/raw/4ec8b6b6f306a70fc229d01404ded90162f56a82/get_drive_file.py) 

In [0]:
import requests

def download_file_from_google_drive(id, destination):
    def get_confirm_token(response):
        for key, value in response.cookies.items():
            if key.startswith('download_warning'):
                return value

        return None

    def save_response_content(response, destination):
        CHUNK_SIZE = 32768

        with open(destination, "wb") as f:
            for chunk in response.iter_content(CHUNK_SIZE):
                if chunk: # filter out keep-alive new chunks
                    f.write(chunk)

    URL = "https://docs.google.com/uc?export=download"

    session = requests.Session()

    response = session.get(URL, params = { 'id' : id }, stream = True)
    token = get_confirm_token(response)

    if token:
        params = { 'id' : id, 'confirm' : token }
        response = session.get(URL, params = params, stream = True)

    save_response_content(response, destination)




#### download  waveglow_old.pt (2GB)
It will takes 15 sec to download checkpoint file from Google Drive directly.

In [0]:
%%time
destination="/content/waveglow_old.pt"
file_id="1cjKPHbtAMh_4HTHmuIGNkbOkPBD9qwhj"
download_file_from_google_drive(file_id, destination)

CPU times: user 5.3 s, sys: 5.88 s, total: 11.2 s
Wall time: 14.8 s


Check waveglow file in local storage in COLAB VM

In [0]:
%%bash
ls -alh "/content/waveglow_old.pt"

-rw-r--r-- 1 root root 2.0G Nov 12 11:00 /content/waveglow_old.pt


```

```

## Step3. Voice Synthesis from provided Mel files

### 3-1.  public Mel files

NVIDIA provide Generated [Mel files](https://drive.google.com/file/d/1g_VXK2lpP9J25dQFhQwx7doWl_p20fXA/view?usp=sharing) from real voice in  [github](https://github.com/NVIDIA/waveglow) to reproduce sample in ADLR [WaveGlow page](https://nv-adlr.github.io/WaveGlow)






### 3-2  download Mel files 
It takes only one second to download Mel files(1.5MB) which ADLR Waveglow team provided.

In [0]:
%%time
destination="/content/mel_spectrograms.zip"
file_id="1g_VXK2lpP9J25dQFhQwx7doWl_p20fXA"
download_file_from_google_drive(file_id, destination)

CPU times: user 36.7 ms, sys: 9.35 ms, total: 46.1 ms
Wall time: 1.66 s


In [0]:
%%bash
ls -alh "/content/mel_spectrograms.zip"

-rw-r--r-- 1 root root 1.5M Nov 12 11:00 /content/mel_spectrograms.zip


### 3-3 . Decompess Mel files

An abnormal phenomenon was observed in COLAB.  The root cause was the compressed file include some MACOSX related files.  Delete all files associated with MACOSX in compresse zip file.

In [0]:
%%bash
unzip mel_spectrograms.zip
rm -rf content/mel_spectrogram/.DS_Store
rm -rf __MACOSX 

Archive:  mel_spectrograms.zip
   creating: mel_spectrograms/
  inflating: mel_spectrograms/LJ001-0153.wav.pt  
  inflating: mel_spectrograms/LJ001-0096.wav.pt  
  inflating: mel_spectrograms/LJ001-0094.wav.pt  
  inflating: mel_spectrograms/.DS_Store  
   creating: __MACOSX/
   creating: __MACOSX/mel_spectrograms/
  inflating: __MACOSX/mel_spectrograms/._.DS_Store  
  inflating: mel_spectrograms/LJ001-0079.wav.pt  
  inflating: mel_spectrograms/LJ001-0051.wav.pt  
  inflating: mel_spectrograms/LJ001-0063.wav.pt  
  inflating: mel_spectrograms/LJ001-0173.wav.pt  
  inflating: mel_spectrograms/LJ001-0102.wav.pt  
  inflating: mel_spectrograms/LJ001-0015.wav.pt  
  inflating: mel_spectrograms/LJ001-0072.wav.pt  


### 3-4 . Generate Audio

Now we will synthesize the voice from the provided Mel Spectrogram. Likewise, it takes time to load 2GB parameter file.

In [0]:
%%bash
rm -rf audio_mel_ref 
mkdir audio_mel_ref 
cd waveglow
python inference.py -f <(ls /content/mel_spectrograms/*.pt) -w /content/waveglow_old.pt -o /content/audio_mel_ref   -s 0.6

/content/audio_mel_ref/LJ001-0015.wav_synthesis.wav
/content/audio_mel_ref/LJ001-0051.wav_synthesis.wav
/content/audio_mel_ref/LJ001-0063.wav_synthesis.wav
/content/audio_mel_ref/LJ001-0072.wav_synthesis.wav
/content/audio_mel_ref/LJ001-0079.wav_synthesis.wav
/content/audio_mel_ref/LJ001-0094.wav_synthesis.wav
/content/audio_mel_ref/LJ001-0096.wav_synthesis.wav
/content/audio_mel_ref/LJ001-0102.wav_synthesis.wav
/content/audio_mel_ref/LJ001-0153.wav_synthesis.wav
/content/audio_mel_ref/LJ001-0173.wav_synthesis.wav




### 3-5. Check Voice Quality

**Generated Voice** from provided Mel

select one example LJ001-0153.wav to check generated voice quality from real voice Mel.

Sentence "DUMMY/LJ001-0153.wav| only nominally so, however, in many cases, since when he uses a headline he counts that in,"

In [0]:
audio_file_synth = "/content/audio_mel_ref/LJ001-0153.wav_synthesis.wav"
import IPython.display as ipd
ipd.Audio(audio_file_synth, rate=22050)

##  Step4. Voice Synthesis  from Mel of Real Audio

You could generate the audio from voice files. Visit the NVIDIA ADLR's WaveGlow [blog](https://nv-adlr.github.io/WaveGlow) to see the sound quality of WaveGlow model. You can reproduce through the provided pretrained WaveGlow parameters.



In [0]:
%%time
destination="/content/LJ001-0153.wav"
file_id="1kM_7q5dVGkf4CV97cc7rY07JLwB9VaAL"
download_file_from_google_drive(file_id, destination)

CPU times: user 28.9 ms, sys: 3.64 ms, total: 32.5 ms
Wall time: 676 ms


In [0]:
ls -alh /content/LJ001-0153.wav

-rw-r--r-- 1 root root 279K Nov 11 23:05 /content/LJ001-0153.wav


### 4-2. Generate Mel from Real Audio
I created Mel Spectrogram from Real Audio(LJ001-0153.wav) in the **`Mel_real `** folder as configured by config.json.

```
    "data_config": {
        "training_files":"train_files.txt",
        "segment_length": 16000,
        "sampling_rate": 22050,
        "filter_length": 1024,
        "hop_length": 256,
        "win_length": 1024,
        "mel_fmin": 0.0,
        "mel_fmax": 8000.0
    },
```

  

In [0]:
%%time
%%bash
rm -rf Mel_real
mkdir Mel_real
cd waveglow
ls /content/LJ001-0153.wav > /content/waveglow/test_files.txt
# mel2samp refer train_files 
ls /content/LJ001-0153.wav > /content/waveglow/train_files.txt 
python mel2samp.py -f test_files.txt -o /content/Mel_real -c config.json
ls /content/Mel_real/

/content/Mel_real/LJ001-0153.wav.pt
LJ001-0153.wav.pt
CPU times: user 2.39 ms, sys: 6.4 ms, total: 8.8 ms
Wall time: 2.54 s


### 4-3. Generate Synthetic Audio
6 seconds of voice can be generated in about 20 seconds. It takes most of the time to load a 2GB weight file. Actual speech synthesis is processed in real time, and it takes time to save the speech file.

In [0]:
%%time
%%bash
rm -rf audio
mkdir audio_real
ls /content/Mel_real/*.pt > /content/waveglow/mel_files_real.txt
cd waveglow
python inference.py -f mel_files_real.txt -w /content/waveglow_old.pt -o /content/audio_real  -s 0.6

/content/audio_real/LJ001-0153.wav_synthesis.wav


mkdir: cannot create directory ‘audio_real’: File exists


CPU times: user 2.89 ms, sys: 8.89 ms, total: 11.8 ms
Wall time: 15.9 s


### 4-4. compare  Audio Quality

Sentence "DUMMY/LJ001-0153.wav| only nominally so, however, in many cases, since when he uses a headline he counts that in,""


**Real Audio**


In [0]:
import IPython.display as ipd
audio_file_real ="/content/LJ001-0153.wav"
ipd.Audio(audio_file_real, rate=22050)

 **synthesis Audio**



In [0]:
audio_file_synth = "/content/audio_real/LJ001-0153.wav_synthesis.wav"
ipd.Audio(audio_file_synth, rate=22050)

If the sound quality differs from the actual sound, the option settings for preprocessing may be incorrect as [issue7](https://github.com/NVIDIA/waveglow/issues/7)

```


```

---

  ```   ```

## Summary

With this jupyter you can easily demonstrate the speech synthesis.

I especially would like to thank Rafael Valle for urgent commit during validating this jupyter.


## Reference
- paper  https://arxiv.org/abs/1811.00002

- blog https://nv-adlr.github.io/WaveGlow 

- github https://github.com/NVIDIA/waveglow


<img src="https://nv-adlr.github.io/images/waveglow_logo.png" width=300 align=center >

```





```