Skip to content
This repository has been archived by the owner on Jan 11, 2022. It is now read-only.

Adding PyTorch wrapper for NV-Wavenet #7

Merged
merged 2 commits into from
May 18, 2018

Conversation

RPrenger
Copy link

No description provided.

@PetrochukM
Copy link
Contributor

PetrochukM commented May 18, 2018

This is awesome. I've reviewed the README for the pull request, I have a couple questions:

  • embedding_prev and embedding_cur, these are used for global conditioning via embeddings like speaker embeddings? Why are there previous and current embeddings?
  • cond_input Is where we would locally condition the mel-spectrogram, right?
  • Does this PyTorch module allow for training or just inference? If it does not allow for training, what is the recommended method for training?
  • For the initial 2x1 casual convolution, where can we set the weight matrix?

@rafaelvalle Reposted after: NVIDIA/tacotron2#3 (comment)

@RPrenger
Copy link
Author

RPrenger commented May 18, 2018

Hi @PetrochukM.

  1. The embedding_prev and embedding_cur are used on the audio, not on the conditioning inputs. Right now, nv-wavenet only works with a one-hot representation of audio and so there's an embedding matrix at the beginning. The reason there's a prev and a curr is because in the DeepVoice WaveNet implementation there was a causal convolution at the beginning with kernel size=2. The curr embedding is for the current time audio sample, and the prev embedding is for the audio sample before it. I used a similiar convention for dilated causal convolutions later. If your WaveNet just uses one embedding you can set the embedding_prev to all zeros and it'll have no effect on the output (we actually do this with our network).

  2. cond_input is a little more complicated than just the mels or features. The nv-wavenet code is only doing the auto-regressive part of the inference, and all the computation that can be done in a non-auto-regressive way (in parallel) is done before hand. So all the input preprocessing, and upsampling are done before hand, but also the convolutions done on the upsampled features (which are potentially different for each layer). So the cond_input is a very large (2R x batch_size x num_layers x samples) tensor that needs to be calculated before the inference can be run. However because this calculation can be done in parallel across time it's much faster than the part of inference nv-wavenet is doing.

  3. Right now this is just a wrapper for the nv-wavenet code, so just for inference. We're working on open sourcing our WaveNet training code which will include code for translating itself to the nv-wavenet wrapper (which is what we used for the nv_wavenet_test.py). But the code to translate to the wrapper isn't complicated. If your WaveNet fits the constraints of nv-wavenet it's just a matter of feeding your tensors in to the NVWaveNet constructor. The nv_wavenet_test.py example code might help (it's very short). Translation was just a matter of saving the tensors and parameters in a dictionary with the right keys.

  4. The initial 2x1 causal convolution are set with the embedding_prev and embedding_curr inputs (See answer 1). Because nv-wavenet is only working with one-hot representations of audio, an initial convolution can be written as an embedding. If you're not using a one-hot representation nv-wavenet code won't work yet.

@maozhiqiang
Copy link

maozhiqiang commented May 18, 2018

hi @RPrenger , When I use python pytorch/nv_wavenet_test.py
ERROR:
GPUassert: invalid device function ../nv_wavenet_util.cuh 48

@BrianPharris BrianPharris merged commit cc364ca into NVIDIA:master May 18, 2018
@PetrochukM
Copy link
Contributor

Hi @RPrenger,

Similarly, I get an error:

$ python3.6 nv-wavenet/pytorch/nv_wavenet_test.py
GPUassert: invalid device function ../nv_wavenet_util.cuh 48

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants