<a href="https://colab.research.google.com/github/magenta/ddsp/blob/main/ddsp/colab/tutorials/1_synths_and_effects.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


##### Copyright 2021 Google LLC.

Licensed under the Apache License, Version 2.0 (the "License");





In [None]:
# Copyright 2021 Google LLC. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================

# DDSP Synths and Effects

This notebook demonstrates the use of several of the Synths and Effects Processors in the DDSP library. While the core functions are also directly accessible through `ddsp.core`, using Processors is the preferred API for end-2-end training. 

As demonstrated in the [0_processors.ipynb](colab/tutorials/0_processors.ipynb) tutorial, Processors contain the necessary nonlinearities and preprocessing in their `get_controls()` method to convert generic neural network outputs into valid processor controls, which are then converted to signal by `get_signal()`. The two methods are called in series by `__call__()`.

While each processor is capable of a wide range of expression, we focus on simple examples here for clarity.

In [None]:
#@title Install DDSP

#@markdown Install ddsp in a conda environment with Python 3.9 for compatibility.
#@markdown This transfers a lot of data and _should take about 5 minutes_.
#@markdown You can ignore warnings.

!rm -rf /content/miniconda
!curl -L https://repo.anaconda.com/miniconda/Miniconda3-py39_23.11.0-2-Linux-x86_64.sh -o miniconda.sh
!chmod +x miniconda.sh
!sh miniconda.sh -b -p /content/miniconda
!/content/miniconda/bin/pip install tensorflow==2.11 tensorflow-probability==0.19.0 tensorflow-datasets==4.9.0 ddsp==3.7.0
print('\nDone installing DDSP in conda environment!')

In [None]:
#@title Import display helpers

import warnings
warnings.filterwarnings("ignore")

import base64
import io
import os

import numpy as np
import matplotlib.pyplot as plt
from IPython import display
from scipy.io import wavfile
from scipy import signal as scipy_signal

from google.colab import files as colab_files
from google.colab import output

sample_rate = 16000


def play(array_of_floats, sample_rate=sample_rate):
  """Play audio in colab using HTML5 audio widget."""
  if isinstance(array_of_floats, list):
    array_of_floats = np.array(array_of_floats)
  if len(array_of_floats.shape) == 2:
    array_of_floats = array_of_floats[0]
  normalizer = float(np.iinfo(np.int16).max)
  array_of_ints = np.array(
      np.asarray(array_of_floats) * normalizer, dtype=np.int16)
  memfile = io.BytesIO()
  wavfile.write(memfile, sample_rate, array_of_ints)
  html = """<audio controls>
              <source controls src="data:audio/wav;base64,{base64_wavfile}"
              type="audio/wav" />
              Your browser does not support the audio element.
            </audio>"""
  html = html.format(
      base64_wavfile=base64.b64encode(memfile.getvalue()).decode('ascii'))
  memfile.close()
  display.display(display.HTML(html))


def specplot(audio, vmin=-5, vmax=1, rotate=True, size=512 + 256):
  """Plot the log magnitude spectrogram of audio."""
  if isinstance(audio, list):
    audio = np.array(audio)
  if len(audio.shape) == 2:
    audio = audio[0]
  f, t, Sxx = scipy_signal.stft(audio, fs=sample_rate, nperseg=size,
                                 noverlap=size * 3 // 4)
  logmag = np.log10(np.abs(Sxx) + 1e-7)
  if rotate:
    logmag = np.flipud(logmag)
  plt.matshow(logmag, vmin=vmin, vmax=vmax, cmap=plt.cm.magma, aspect='auto')
  plt.xticks([])
  plt.yticks([])
  plt.xlabel('Time')
  plt.ylabel('Frequency')


def record_audio(seconds=3, sample_rate=sample_rate):
  """Record audio from the browser microphone."""
  record_js_code = """
  const sleep  = time => new Promise(resolve => setTimeout(resolve, time))
  const b2text = blob => new Promise(resolve => {
    const reader = new FileReader()
    reader.onloadend = e => resolve(e.srcElement.result)
    reader.readAsDataURL(blob)
  })

  var record = time => new Promise(async resolve => {
    stream = await navigator.mediaDevices.getUserMedia({ audio: true })
    recorder = new MediaRecorder(stream)
    chunks = []
    recorder.ondataavailable = e => chunks.push(e.data)
    recorder.start()
    await sleep(time)
    recorder.onstop = async ()=>{
      blob = new Blob(chunks)
      text = await b2text(blob)
      resolve(text)
    }
    recorder.stop()
  })
  """
  print('Starting recording for {} seconds...'.format(seconds))
  display.display(display.Javascript(record_js_code))
  audio_string = output.eval_js('record(%d)' % (seconds * 1000.0))
  print('Finished recording!')
  audio_bytes = base64.b64decode(audio_string.split(',')[1])
  from pydub import AudioSegment
  segment = AudioSegment.from_file(io.BytesIO(audio_bytes))
  segment = segment.set_frame_rate(sample_rate).set_channels(1).set_sample_width(2)
  samples = np.array(segment.get_array_of_samples()).astype(np.float32)
  samples = samples / float(np.iinfo(np.int16).max)
  return samples


def upload_audio(sample_rate=sample_rate):
  """Upload audio files and return (filenames, audio_arrays)."""
  from pydub import AudioSegment
  audio_files = colab_files.upload()
  fnames = list(audio_files.keys())
  audios = []
  for fname in fnames:
    segment = AudioSegment.from_file(io.BytesIO(audio_files[fname]))
    segment = segment.set_frame_rate(sample_rate).set_channels(1).set_sample_width(2)
    samples = np.array(segment.get_array_of_samples()).astype(np.float32)
    samples = samples / float(np.iinfo(np.int16).max)
    audios.append(samples)
  return fnames, audios


print('Helpers imported!')

# Synths

Synthesizers, located in `ddsp.synths`, take network outputs and produce a signal (usually used as audio). 

## Harmonic

The harmonic synthesizer models a sound as a linear combination of harmonic sinusoids. Amplitude envelopes are generated with 50% overlapping hann windows. The final audio is cropped to `n_samples`.

Inputs:
* `amplitudes`: Amplitude envelope of the synthesizer output.
* `harmonic_distribution`: Normalized amplitudes of each harmonic.
* `frequencies`: Frequency in Hz of base oscillator.

In [None]:
#@title Run all synth demos (Harmonic, FilteredNoise, Wavetable)

SCRIPT = r'''
import os
import warnings
warnings.filterwarnings("ignore")

import numpy as np
import ddsp
import tensorflow as tf

output_dir = '/content/synth_outputs'
os.makedirs(output_dir, exist_ok=True)
sample_rate = 16000

# ===========================================================================
# Harmonic Synth
# ===========================================================================
print('--- Harmonic Synth ---')
n_frames = 1000
hop_size = 64
n_samples_h = n_frames * hop_size

# Amplitude [batch, n_frames, 1].
amps = np.linspace(1.0, -3.0, n_frames)
amps = amps[np.newaxis, :, np.newaxis]

# Harmonic Distribution [batch, n_frames, n_harmonics].
n_harmonics = 20
harmonic_distribution = np.ones([n_frames, 1]) * np.linspace(1.0, -1.0, n_harmonics)[np.newaxis, :]
harmonic_distribution = harmonic_distribution[np.newaxis, :, :]

# Fundamental frequency in Hz [batch, n_frames, 1].
f0_hz = 440.0 * np.ones([1, n_frames, 1])

# Create synthesizer and generate audio.
harmonic_synth = ddsp.synths.Harmonic(n_samples=n_samples_h,
                                      scale_fn=ddsp.core.exp_sigmoid,
                                      sample_rate=sample_rate)
audio_harmonic = harmonic_synth(amps, harmonic_distribution, f0_hz)
audio_h = audio_harmonic.numpy() if hasattr(audio_harmonic, 'numpy') else np.array(audio_harmonic)
np.save(os.path.join(output_dir, 'audio_harmonic.npy'), audio_h)
print('Harmonic audio saved.')  

# ===========================================================================
# FilteredNoise Synth
# ===========================================================================
print('--- FilteredNoise Synth ---')
n_frames_fn = 250
n_frequencies = 1000
n_samples_fn = 64000

# Bandpass filters, [n_batch, n_frames, n_frequencies].
magnitudes = [tf.sin(tf.linspace(0.0, w, n_frequencies)) for w in np.linspace(8.0, 80.0, n_frames_fn)]
magnitudes = 0.5 * tf.stack(magnitudes)**4.0
magnitudes = magnitudes[tf.newaxis, :, :]

# Create synthesizer and generate audio.
filtered_noise_synth = ddsp.synths.FilteredNoise(n_samples=n_samples_fn, scale_fn=None)
audio_fn = filtered_noise_synth(magnitudes)
audio_fn_np = audio_fn.numpy() if hasattr(audio_fn, 'numpy') else np.array(audio_fn)
np.save(os.path.join(output_dir, 'audio_filtered_noise.npy'), audio_fn_np)
print('FilteredNoise audio saved.')

# ===========================================================================
# Wavetable Synth
# ===========================================================================
print('--- Wavetable Synth ---')
n_samples_wt = 64000
n_wavetable = 2048
n_frames_wt = 100

# Amplitude [batch, n_frames, 1].
amps_wt = tf.linspace(0.5, 1e-3, n_frames_wt)[tf.newaxis, :, tf.newaxis]

# Fundamental frequency in Hz [batch, n_frames, 1].
f0_hz_wt = 110 * tf.linspace(1.5, 1, n_frames_wt)[tf.newaxis, :, tf.newaxis]

# Wavetables [batch, n_frames, n_wavetable].
wavetable_sin = tf.sin(tf.linspace(0.0, 2.0 * np.pi, n_wavetable))
wavetable_sin = wavetable_sin[tf.newaxis, tf.newaxis, :]
wavetable_square = tf.cast(wavetable_sin > 0.0, tf.float32) * 2.0 - 1.0
wavetables = tf.concat([wavetable_square, wavetable_sin], axis=1)
wavetables = ddsp.core.resample(wavetables, n_frames_wt)

# Create synthesizer and generate audio.
wavetable_synth = ddsp.synths.Wavetable(n_samples=n_samples_wt,
                                        sample_rate=sample_rate,
                                        scale_fn=None)
audio_wt = wavetable_synth(amps_wt, wavetables, f0_hz_wt)
audio_wt_np = audio_wt.numpy() if hasattr(audio_wt, 'numpy') else np.array(audio_wt)
np.save(os.path.join(output_dir, 'audio_wavetable.npy'), audio_wt_np)
print('Wavetable audio saved.')

print('\nDone! All synth outputs saved.')
'''

with open('/content/synth_demos.py', 'w') as f:
  f.write(SCRIPT)

!unset PYTHONPATH PYTHONHOME && /content/miniconda/bin/python /content/synth_demos.py

In [None]:
# Harmonic Synth
audio_harmonic = np.load('/content/synth_outputs/audio_harmonic.npy')
play(audio_harmonic)
specplot(audio_harmonic)

## Filtered Noise



The filtered noise synthesizer is a subtractive synthesizer that shapes white noise with a series of time-varying filter banks. 

Inputs:
* `magnitudes`: Amplitude envelope of each filter bank (linearly spaced from 0Hz to the Nyquist frequency).

In [None]:
# FilteredNoise Synth
audio_fn = np.load('/content/synth_outputs/audio_filtered_noise.npy')
play(audio_fn)
specplot(audio_fn)

## Wavetable

The wavetable synthesizer generates audio through interpolative lookup from small chunks of waveforms (wavetables) provided by the network. In principle, it is very similar to the `Harmonic` synth, but with a parameterization in the waveform domain and generation using linear interpolation vs. cumulative summation of sinusoid phases.

Inputs:
* `amplitudes`: Amplitude envelope of the synthesizer output.
* `wavetables`: A series of wavetables that are interpolated to cover n_samples.
* `frequencies`: Frequency in Hz of base oscillator.

In [None]:
# Wavetable Synth - notice the aliasing artifacts from linear interpolation.
audio_wt = np.load('/content/synth_outputs/audio_wavetable.npy')
play(audio_wt)
specplot(audio_wt)

# Effects

Effects, located in `ddsp.effects` are different in that they take network outputs to transform a given audio signal. Some effects, such as Reverb, optionally have trainable parameters of their own.

## Reverb

There are several types of reverberation processors in ddsp.

* Reverb
* ExpDecayReverb
* FilteredNoiseReverb

Unlike other processors, reverbs also have the option to treat the impulse response as a 'trainable' variable, and not require it from network outputs. This is helpful for instance if the room environment is the same for the whole dataset. To make the reverb trainable, just pass the kwarg `trainable=True` to the constructor

In [None]:
#@markdown Record or Upload Audio

record_or_upload = "Upload (.mp3 or .wav)" #@param ["Record", "Upload (.mp3 or .wav)"]

record_seconds =   5#@param {type:"number", min:1, max:10, step:1}

if record_or_upload == "Record":
  audio_input = record_audio(seconds=record_seconds)
else:
  filenames, audios = upload_audio()
  audio_input = audios[0]

# Add batch dimension
audio_input = audio_input[np.newaxis, :]

# Save for effects scripts
np.save('/content/audio_input.npy', audio_input)

# Listen.
specplot(audio_input)
play(audio_input)

In [None]:
#@title Run Reverb and FIRFilter effects demos

SCRIPT = r'''
import os
import warnings
warnings.filterwarnings("ignore")

import numpy as np
import ddsp
import tensorflow as tf

output_dir = '/content/effects_outputs'
os.makedirs(output_dir, exist_ok=True)
sample_rate = 16000

# Load input audio
audio = np.load('/content/audio_input.npy')

# ===========================================================================
# ExpDecayReverb
# ===========================================================================
print('--- ExpDecayReverb ---')
reverb = ddsp.effects.ExpDecayReverb(reverb_length=48000)
gain = [[-2.0]]
decay = [[2.0]]
audio_reverb = reverb(audio, gain, decay)
audio_reverb_np = audio_reverb.numpy() if hasattr(audio_reverb, 'numpy') else np.array(audio_reverb)
np.save(os.path.join(output_dir, 'audio_exp_reverb.npy'), audio_reverb_np)

# ===========================================================================
# FilteredNoiseReverb
# ===========================================================================
print('--- FilteredNoiseReverb ---')
reverb2 = ddsp.effects.FilteredNoiseReverb(reverb_length=48000, scale_fn=None)

n_frames = 1000
n_frequencies = 100
frequencies = np.linspace(0, sample_rate / 2.0, n_frequencies)
center_frequency = 4000.0 * np.linspace(0, 1.0, n_frames)
width = 500.0
gauss = lambda x, mu: 2.0 * np.pi * width**-2.0 * np.exp(- ((x - mu) / width)**2.0)
magnitudes = np.array([gauss(frequencies, cf) for cf in center_frequency])
magnitudes = magnitudes[np.newaxis, ...]
magnitudes /= magnitudes.sum(axis=-1, keepdims=True) * 5

audio_fn_reverb = reverb2(audio, magnitudes)
audio_fn_reverb_np = audio_fn_reverb.numpy() if hasattr(audio_fn_reverb, 'numpy') else np.array(audio_fn_reverb)
np.save(os.path.join(output_dir, 'audio_fn_reverb.npy'), audio_fn_reverb_np)
np.save(os.path.join(output_dir, 'fn_reverb_magnitudes.npy'), magnitudes)

# ===========================================================================
# FIRFilter
# ===========================================================================
print('--- FIRFilter ---')
fir_filter = ddsp.effects.FIRFilter(scale_fn=None)

n_seconds = audio.size / sample_rate
frame_rate = 100
n_frames_fir = int(n_seconds * frame_rate)
n_samples_fir = int(n_frames_fir * sample_rate / frame_rate)
audio_trimmed = audio[:, :n_samples_fir]

n_frequencies_fir = 1000
frequencies_fir = np.linspace(0, sample_rate / 2.0, n_frequencies_fir)
lfo_rate = 0.5
n_cycles = n_seconds * lfo_rate
center_frequency_fir = 1000 + 500 * np.sin(np.linspace(0, 2.0*np.pi*n_cycles, n_frames_fir))
width_fir = 500.0
gauss_fir = lambda x, mu: 2.0 * np.pi * width_fir**-2.0 * np.exp(- ((x - mu) / width_fir)**2.0)

magnitudes_fir = np.array([gauss_fir(frequencies_fir, cf) for cf in center_frequency_fir])
magnitudes_fir = magnitudes_fir[np.newaxis, ...]
magnitudes_fir /= magnitudes_fir.max(axis=-1, keepdims=True)

audio_fir = fir_filter(audio_trimmed, magnitudes_fir)
audio_fir_np = audio_fir.numpy() if hasattr(audio_fir, 'numpy') else np.array(audio_fir)
np.save(os.path.join(output_dir, 'audio_fir.npy'), audio_fir_np)
np.save(os.path.join(output_dir, 'fir_magnitudes.npy'), magnitudes_fir)

# ===========================================================================
# ModDelay (Flanger, Chorus, Vibrato)
# ===========================================================================
print('--- ModDelay ---')

def sin_phase(mod_rate, n_samples):
    n_seconds = n_samples / sample_rate
    phase = tf.sin(tf.linspace(0.0, mod_rate * n_seconds * 2.0 * np.pi, n_samples))
    return phase[tf.newaxis, :, tf.newaxis]

def modulate_audio(audio, center_ms, depth_ms, mod_rate, name):
    mod_delay = ddsp.effects.ModDelay(center_ms=center_ms,
                                      depth_ms=depth_ms,
                                      gain_scale_fn=None,
                                      phase_scale_fn=None)
    phase = sin_phase(mod_rate, audio.size)
    gain = 1.0 * np.ones_like(audio)[..., np.newaxis]
    audio_out = 0.5 * mod_delay(audio, gain, phase)
    out_np = audio_out.numpy() if hasattr(audio_out, 'numpy') else np.array(audio_out)
    np.save(os.path.join(output_dir, f'audio_{name}.npy'), out_np)
    print(f'{name} audio saved.')

modulate_audio(audio, center_ms=0.75, depth_ms=0.75, mod_rate=0.25, name='flanger')
modulate_audio(audio, center_ms=25.0, depth_ms=1.0, mod_rate=2.0, name='chorus')
modulate_audio(audio, center_ms=25.0, depth_ms=12.5, mod_rate=5.0, name='vibrato')

print('\nDone! All effects outputs saved.')
'''

with open('/content/effects_demos.py', 'w') as f:
  f.write(SCRIPT)

!unset PYTHONPATH PYTHONHOME && /content/miniconda/bin/python /content/effects_demos.py

In [None]:
# ExpDecayReverb
audio_reverb = np.load('/content/effects_outputs/audio_exp_reverb.npy')
print('ExpDecayReverb')
specplot(audio_reverb)
play(audio_reverb)

In [None]:
# FilteredNoiseReverb
audio_fn_reverb = np.load('/content/effects_outputs/audio_fn_reverb.npy')
fn_reverb_mags = np.load('/content/effects_outputs/fn_reverb_magnitudes.npy')

print('FilteredNoiseReverb')
specplot(audio_fn_reverb)
play(audio_fn_reverb)
plt.matshow(np.rot90(fn_reverb_mags[0]), aspect='auto')
plt.title('Impulse Response Frequency Response')
plt.xlabel('Time')
plt.ylabel('Frequency')
plt.xticks([])
_ = plt.yticks([])

## FIR Filter

Linear time-varying finite impulse response (LTV-FIR) filters are a broad class of filters that can vary over time.

In [None]:
# FIRFilter
audio_fir = np.load('/content/effects_outputs/audio_fir.npy')
fir_mags = np.load('/content/effects_outputs/fir_magnitudes.npy')

print('FIR Filter')
play(audio_fir)
specplot(audio_fir)
_ = plt.matshow(np.rot90(fir_mags[0]), aspect='auto')
plt.title('Frequency Response')
plt.xlabel('Time')
plt.ylabel('Frequency')
plt.xticks([])
_ = plt.yticks([])

## ModDelay

Variable length delay lines create an instantaneous pitch shift that can be useful in a variety of time modulation effects such as [vibrato](https://en.wikipedia.org/wiki/Vibrato), [chorus](https://en.wikipedia.org/wiki/Chorus_effect), and [flanging](https://en.wikipedia.org/wiki/Flanging). 

In [None]:
print('Flanger')
audio_flanger = np.load('/content/effects_outputs/audio_flanger.npy')
play(audio_flanger)
specplot(audio_flanger)

print('Chorus')
audio_chorus = np.load('/content/effects_outputs/audio_chorus.npy')
play(audio_chorus)
specplot(audio_chorus)

print('Vibrato')
audio_vibrato = np.load('/content/effects_outputs/audio_vibrato.npy')
play(audio_vibrato)
specplot(audio_vibrato)