


# NU-Wave 2: A General Neural Audio Upsampling Model for Various Sampling Rates
---



## Resources

- 📃 [Paper](https://arxiv.org/abs/2206.08545)
- 📚 [Project Page](https://mindslab-ai.github.io/nuwave2)
- 🎬 [Examples](https://mindslab-ai.github.io/nuwave2)
- 💻 [Code](https://github.com/mindslab-ai/nuwave2)

## Abstract

[Abstract](https://arxiv.org/pdf/2206.08545.pdf)—*
Conventionally, audio super-resolution models fixed the initial
and the target sampling rates, which necessitate the model to be
trained for each pair of sampling rates. We introduce NU-Wave
2, a diffusion model for neural audio upsampling that enables
the generation of 48 kHz audio signals from inputs of various
sampling rates with a single model. Based on the architecture of NU-Wave, NU-Wave 2 uses short-time Fourier convolution (STFC) to generate harmonics to resolve the main failure
modes of NU-Wave, and incorporates bandwidth spectral feature transform (BSFT) to condition the bandwidths of inputs
in the frequency domain. We experimentally demonstrate that
NU-Wave 2 produces high-resolution audio regardless of the
sampling rate of input while requiring fewer parameters than
other models. The official code and the audio samples are available at* https://mindslab-ai.github.io/nuwave2


## Authors

Seungu Han<sup>1,2</sup>,
Junhyeok Lee<sup>1</sup>
<br>
<sup>1</sup>*MINDsLab Inc., Republic of Korea,*<br>
<sup>2</sup>*Seoul National University, Republic of Korea*

## Citation

### Plain Text


```
Han, Seungu, and Junhyeok Lee. "NU-Wave 2: A General Neural Audio Upsampling Model for Various Sampling Rates." arXiv preprint arXiv:2206.08545 (2022).
```



### BibTex

```
@misc{https://doi.org/10.48550/arxiv.2206.08545,
  doi = {10.48550/ARXIV.2206.08545},
  url = {https://arxiv.org/abs/2206.08545, 
  author = {Han, Seungu and Lee, Junhyeok},
  keywords = {Audio and Speech Processing (eess.AS), Machine Learning (cs.LG), FOS: Electrical engineering, electronic engineering, information engineering, FOS: Electrical engineering, electronic engineering, information engineering, FOS: Computer and information sciences, FOS: Computer and information sciences}, 
  title = {NU-Wave 2: A General Neural Audio Upsampling Model for Various Sampling Rates},
  publisher = {arXiv},
  year = {2022},
  copyright = {arXiv.org perpetual, non-exclusive license}
}
```



# Set up the notebook

In [None]:
# @markdown Mount your Google Drive at `/content/gdrive`
from google.colab import drive
drive.mount('/content/gdrive')

In [None]:
# @markdown Clone the repository
!git clone --recursive https://github.com/mindslab-ai/nuwave2.git
%cd nuwave2

In [None]:
# @markdown Install requirements
!pip install -r requirements.txt

## Prepare the [`VCTK`](https://datashare.ed.ac.uk/handle/10283/3443) dataset

### Download from an official source

In [None]:
# @markdown Download the dataset from a official source
!wget https://datashare.ed.ac.uk/bitstream/handle/10283/3443/VCTK-Corpus-0.92.zip?sequence=2&isAllowed=y

In [None]:
# @markdown Copy the dataset to your gdrive
!cp VCTK-Corpus-0.92.zip /content/gdrive/MyDrive/datasets

### Download from your gdrive

In [None]:
!cp /content/gdrive/MyDrive/datasets/VCTK-Corpus-0.92.zip VCTK-Corpus-0.92.zip

In [None]:
,#@title 🤗 Signal completion 📣🎶✨

from google.colab import output
output.eval_js('new Audio("https://cdn.pixabay.com/download/audio/2021/08/04/audio_0625c1539c.mp3?filename=success-1-6297.mp3").play()')

### Prepare the data

In [None]:
vctk_dir = "vctk"
%env VCTK_DIR=$vctk_dir

In [None]:
!mkdir -p $vctk_dir
!unzip VCTK-Corpus-0.92.zip -d $vctk_dir

In [None]:
# @markdown Remove the speakers `p280` and `p315`

!rm -r $vctk_dir/txt/p280
!rm -r $vctk_dir/wav48_silence_trimmed/p280
!rm -r $vctk_dir/txt/p315
!rm -r $vctk_dir/wav48_silence_trimmed/p315

In [None]:
# @title ### Modify the config dataset path
%%writefile hparameter.yaml

train:
  batch_size: 12
  lr: 2e-4
  weight_decay: 0.00
  num_workers: 8
  gpus: 2 #ddp
  opt_eps: 1e-9
  beta1: 0.9
  beta2: 0.99

data:
  timestamp_path: 'vctk-silence-labels/vctk-silences.0.92.txt'
  base_dir: 'vctk/wav48_silence_trimmed/'
  dir: 'vctk/wav48_silence_trimmed_wav/'
  format: '*mic1.wav'
  cv_ratio: (100./108., 8./108., 0.00) #train/val/test

audio:
  filter_length: 1024
  hop_length: 256
  win_length: 1024
  sampling_rate: 48000
  sr_min: 6000
  sr_max: 48000
  length: 32768 #32*1024 ~ 1sec

arch:
  residual_layers: 15 #
  residual_channels: 64
  pos_emb_dim: 512
  bsft_channels: 64

logsnr:
  logsnr_min: -20.0
  logsnr_max: 20.0

dpm:
  max_step: 1000
  pos_emb_scale: 50000
  pos_emb_channels: 128 
  infer_step: 8
  infer_schedule: "torch.tensor([-2.6, -0.8, 2.0, 6.4, 9.8, 12.9, 14.4, 17.2])"

log:
  name: 'nuwave2'
  checkpoint_dir: 'checkpoint'
  tensorboard_dir: 'tensorboard'
  test_result_dir: 'test_sample/result'


In [None]:
!python utils/flac2wav.py

# Train!

In [None]:
# @title Modify the config
%%writefile hparameter.yaml

train:
  batch_size: 6
  lr: 2e-4
  weight_decay: 0.00
  num_workers: 2
  gpus: 1 #ddp
  opt_eps: 1e-9
  beta1: 0.9
  beta2: 0.99

data:
  timestamp_path: 'vctk-silence-labels/vctk-silences.0.92.txt'
  base_dir: 'vctk/wav48_silence_trimmed/'
  dir: 'vctk/wav48_silence_trimmed_wav/'
  format: '*mic1.wav'
  cv_ratio: (100./108., 8./108., 0.00) #train/val/test

audio:
  filter_length: 1024
  hop_length: 256
  win_length: 1024
  sampling_rate: 48000
  sr_min: 6000
  sr_max: 48000
  length: 32768 #32*1024 ~ 1sec

arch:
  residual_layers: 15 #
  residual_channels: 64
  pos_emb_dim: 512
  bsft_channels: 64

logsnr:
  logsnr_min: -20.0
  logsnr_max: 20.0

dpm:
  max_step: 1000
  pos_emb_scale: 50000
  pos_emb_channels: 128 
  infer_step: 8
  infer_schedule: "torch.tensor([-2.6, -0.8, 2.0, 6.4, 9.8, 12.9, 14.4, 17.2])"

log:
  name: 'nuwave2'
  checkpoint_dir: 'checkpoint'
  tensorboard_dir: 'tensorboard'
  test_result_dir: 'test_sample/result'


In [None]:
# @markdown Start a `TensorBoard`
%load_ext tensorboard
%tensorboard --logdir=./tensorboard --bind_all

In [None]:
!python trainer.py

In [None]:
!pip install torch==1.10.0+cu111 torchvision==0.11.0+cu111 torchaudio==0.10.0 -f https://download.pytorch.org/whl/torch_stable.html
!pip install torchtext==0.11.0