<a href="https://colab.research.google.com/github/magenta/ddsp/blob/main/ddsp/colab/tutorials/0_processor.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


##### Copyright 2021 Google LLC.

Licensed under the Apache License, Version 2.0 (the "License");





In [None]:
# Copyright 2021 Google LLC. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================

# DDSP Processor Demo

This notebook provides an introduction to the signal `Processor()` object. The main object type in the DDSP library, it is the base class used for Synthesizers and Effects, which share the methods:

* `get_controls()`: inputs -> controls.
* `get_signal()`: controls -> signal.
* `__call__()`: inputs -> signal. (i.e. `get_signal(**get_controls())`)

Where:
* `inputs` is a variable number of tensor arguments (depending on processor). Often the outputs of a neural network.
* `controls` is a dictionary of tensors scaled and constrained specifically for the processor
* `signal` is an output tensor (usually audio or control signal for another processor)

Let's see why this is a helpful approach by looking at the specific example of the `Harmonic()` synthesizer processor. 

In [None]:
#@title Install DDSP

#@markdown Install ddsp in a conda environment with Python 3.9 for compatibility.
#@markdown This transfers a lot of data and _should take about 5 minutes_.
#@markdown You can ignore warnings.

!rm -rf /content/miniconda
!curl -L https://repo.anaconda.com/miniconda/Miniconda3-py39_23.11.0-2-Linux-x86_64.sh -o miniconda.sh
!chmod +x miniconda.sh
!sh miniconda.sh -b -p /content/miniconda
!/content/miniconda/bin/pip install tensorflow==2.11 tensorflow-probability==0.19.0 tensorflow-datasets==4.9.0 ddsp==3.7.0
print('\nDone installing DDSP in conda environment!')

In [None]:
#@title Import display helpers

#@markdown These helper functions run in the Colab kernel (no ddsp needed)
#@markdown and are used for audio playback and spectrogram plotting.

import warnings
warnings.filterwarnings("ignore")

import base64
import io

import numpy as np
import matplotlib.pyplot as plt
from IPython import display
from scipy.io import wavfile
from scipy import signal as scipy_signal

sample_rate = 16000


def play(array_of_floats, sample_rate=sample_rate):
  """Play audio in colab using HTML5 audio widget."""
  if isinstance(array_of_floats, list):
    array_of_floats = np.array(array_of_floats)
  if len(array_of_floats.shape) == 2:
    array_of_floats = array_of_floats[0]
  normalizer = float(np.iinfo(np.int16).max)
  array_of_ints = np.array(
      np.asarray(array_of_floats) * normalizer, dtype=np.int16)
  memfile = io.BytesIO()
  wavfile.write(memfile, sample_rate, array_of_ints)
  html = """<audio controls>
              <source controls src="data:audio/wav;base64,{base64_wavfile}"
              type="audio/wav" />
              Your browser does not support the audio element.
            </audio>"""
  html = html.format(
      base64_wavfile=base64.b64encode(memfile.getvalue()).decode('ascii'))
  memfile.close()
  display.display(display.HTML(html))


def specplot(audio, vmin=-5, vmax=1, rotate=True, size=512 + 256):
  """Plot the log magnitude spectrogram of audio."""
  if isinstance(audio, list):
    audio = np.array(audio)
  if len(audio.shape) == 2:
    audio = audio[0]
  f, t, Sxx = scipy_signal.stft(audio, fs=sample_rate, nperseg=size,
                                 noverlap=size * 3 // 4)
  logmag = np.log10(np.abs(Sxx) + 1e-7)
  if rotate:
    logmag = np.flipud(logmag)
  plt.matshow(logmag, vmin=vmin, vmax=vmax, cmap=plt.cm.magma, aspect='auto')
  plt.xticks([])
  plt.yticks([])
  plt.xlabel('Time')
  plt.ylabel('Frequency')


print('Helpers imported!')

# Example: harmonic synthesizer

The harmonic synthesizer models a sound as a linear combination of harmonic sinusoids. Amplitude envelopes are generated with 50% overlapping hann windows. The final audio is cropped to n_samples.

## `__init__()`

All member variables are initialized in the constructor, which makes it easy to change them as hyperparameters using the [gin](https://github.com/google/gin-config) dependency injection library. All processors also have a `name` that is used by `ProcessorGroup()`.


## `get_controls()` 

The outputs of a neural network are often not properly scaled and constrained. The `get_controls` method gives a dictionary of valid control parameters based on neural network outputs.



**3 inputs (amps, hd, f0)**
* `amplitude`: Amplitude envelope of the synthesizer output.
* `harmonic_distribution`: Normalized amplitudes of each harmonic.
* `fundamental_frequency`: Frequency in Hz of base oscillator



In [None]:
# Generate some arbitrary inputs.

n_frames = 1000
hop_size = 64
n_samples = n_frames * hop_size

# Amplitude [batch, n_frames, 1].
# Make amplitude linearly decay over time.
amps = np.linspace(1.0, -3.0, n_frames)
amps = amps[np.newaxis, :, np.newaxis]

# Harmonic Distribution [batch, n_frames, n_harmonics].
# Make harmonics decrease linearly with frequency.
n_harmonics = 30
harmonic_distribution = (np.linspace(-2.0, 2.0, n_frames)[:, np.newaxis] + 
                         np.linspace(3.0, -3.0, n_harmonics)[np.newaxis, :])
harmonic_distribution = harmonic_distribution[np.newaxis, :, :]

# Fundamental frequency in Hz [batch, n_frames, 1].
f0_hz = 440.0 * np.ones([1, n_frames, 1], dtype=np.float32)

# Save inputs for use by the conda script.
np.save('/content/amps.npy', amps)
np.save('/content/harmonic_distribution.npy', harmonic_distribution)
np.save('/content/f0_hz.npy', f0_hz)

In [None]:
# Plot it!
time = np.linspace(0, n_samples / sample_rate, n_frames)

plt.figure(figsize=(18, 4))
plt.subplot(131)
plt.plot(time, amps[0, :, 0])
plt.xticks([0, 1, 2, 3, 4])
plt.title('Amplitude')

plt.subplot(132)
plt.plot(time, harmonic_distribution[0, :, :])
plt.xticks([0, 1, 2, 3, 4])
plt.title('Harmonic Distribution')

plt.subplot(133)
plt.plot(time, f0_hz[0, :, 0])
plt.xticks([0, 1, 2, 3, 4])
_ = plt.title('Fundamental Frequency')

Consider the plots above as outputs of a neural network. These outputs violate the synthesizer's expectations:
* Amplitude is not >= 0 (avoids phase shifts)
* Harmonic distribution is not normalized (factorizes timbre and amplitude)
* Fundamental frequency * n_harmonics > nyquist frequency (440 * 20 > 8000), which will lead to [aliasing](https://en.wikipedia.org/wiki/Aliasing).


In [None]:
#@title Run Harmonic synth demo (get_controls, exp_sigmoid, get_signal, __call__)

#@markdown This script runs inside the conda environment and performs:
#@markdown 1. Creates a Harmonic synthesizer
#@markdown 2. Runs `get_controls()` on the inputs above
#@markdown 3. Computes `exp_sigmoid()` for visualization
#@markdown 4. Runs `get_signal()` and `__call__()` to synthesize audio

SCRIPT = r'''
import os
import sys
import warnings
warnings.filterwarnings("ignore")

import numpy as np
import ddsp
import tensorflow as tf

output_dir = '/content/proc_demo1'
os.makedirs(output_dir, exist_ok=True)

# Load inputs
amps = np.load('/content/amps.npy')
harmonic_distribution = np.load('/content/harmonic_distribution.npy')
f0_hz = np.load('/content/f0_hz.npy')

n_frames = 1000
hop_size = 64
n_samples = n_frames * hop_size
sample_rate = 16000

# Create synthesizer
harmonic_synth = ddsp.synths.Harmonic(n_samples=n_samples,
                                      sample_rate=sample_rate)

# --- get_controls() ---
controls = harmonic_synth.get_controls(amps, harmonic_distribution, f0_hz)
print('controls keys:', list(controls.keys()))

# Save controls for plotting
for key, val in controls.items():
    v = val.numpy() if hasattr(val, 'numpy') else np.array(val)
    np.save(os.path.join(output_dir, f'controls_{key}.npy'), v)

# --- exp_sigmoid() ---
x = tf.linspace(-10.0, 10.0, 1000)
y = ddsp.core.exp_sigmoid(x)
np.save(os.path.join(output_dir, 'exp_sigmoid_x.npy'), x.numpy())
np.save(os.path.join(output_dir, 'exp_sigmoid_y.npy'), y.numpy())

# --- get_signal() ---
audio_get_signal = harmonic_synth.get_signal(**controls)
audio_gs = audio_get_signal.numpy() if hasattr(audio_get_signal, 'numpy') else np.array(audio_get_signal)
np.save(os.path.join(output_dir, 'audio_get_signal.npy'), audio_gs)

# --- __call__() ---
audio_call = harmonic_synth(amps, harmonic_distribution, f0_hz)
audio_c = audio_call.numpy() if hasattr(audio_call, 'numpy') else np.array(audio_call)
np.save(os.path.join(output_dir, 'audio_call.npy'), audio_c)

print('Done! Outputs saved to', output_dir)
'''

with open('/content/proc_demo1.py', 'w') as f:
  f.write(SCRIPT)

!unset PYTHONPATH PYTHONHOME && /content/miniconda/bin/python /content/proc_demo1.py

In [None]:
# Load controls from the script output
controls_amplitudes = np.load('/content/proc_demo1/controls_amplitudes.npy')
controls_harmonic_distribution = np.load('/content/proc_demo1/controls_harmonic_distribution.npy')
controls_f0_hz = np.load('/content/proc_demo1/controls_f0_hz.npy')
print('Controls keys: amplitudes, harmonic_distribution, f0_hz')

In [None]:
# Now let's see what they look like...
time = np.linspace(0, n_samples / sample_rate, n_frames)

plt.figure(figsize=(18, 4))
plt.subplot(131)
plt.plot(time, controls_amplitudes[0, :, 0])
plt.xticks([0, 1, 2, 3, 4])
plt.title('Amplitude')

plt.subplot(132)
plt.plot(time, controls_harmonic_distribution[0, :, :])
plt.xticks([0, 1, 2, 3, 4])
plt.title('Harmonic Distribution')

plt.subplot(133)
plt.plot(time, controls_f0_hz[0, :, 0])
plt.xticks([0, 1, 2, 3, 4])
_ = plt.title('Fundamental Frequency')

Notice that 
* Amplitudes are now all positive
* The harmonic distribution sums to 1.0
* All harmonics that are above the Nyquist frequency now have an amplitude of 0.

The amplitudes and harmonic distribution are scaled by an "exponentiated sigmoid" function (`ddsp.core.exp_sigmoid`). There is nothing particularly special about this function (other functions can be specified as `scale_fn=` during construction), but it has several nice properties:
* Output scales logarithmically with input (as does human perception of loudness).
* Centered at 0, with max and min in reasonable range for normalized neural network outputs.
* Max value of 2.0 to prevent signal getting too loud.
* Threshold value of 1e-7 for numerical stability during training.

In [None]:
x = np.load('/content/proc_demo1/exp_sigmoid_x.npy')
y = np.load('/content/proc_demo1/exp_sigmoid_y.npy')

plt.figure(figsize=(18, 4))
plt.subplot(121)
plt.plot(x, y)

plt.subplot(122)
_ = plt.semilogy(x, y)

## `get_signal()`

Synthesizes audio from controls.

In [None]:
audio = np.load('/content/proc_demo1/audio_get_signal.npy')

play(audio)
specplot(audio)

## `__call__()` 

Synthesizes audio directly from the raw inputs. `get_controls()` is called internally to turn them into valid control parameters.

In [None]:
audio = np.load('/content/proc_demo1/audio_call.npy')

play(audio)
specplot(audio)

# Example: Just for fun... 
Let's run another example where we tweak some of the controls...

In [None]:
## Some weird control envelopes...

n_frames = 1000
hop_size = 64
n_samples = n_frames * hop_size

# Amplitude [batch, n_frames, 1].
amps2 = np.ones([n_frames]) * -5.0
amps2[:50] +=  np.linspace(0, 7.0, 50)
amps2[50:200] += 7.0
amps2[200:900] += (7.0 - np.linspace(0.0, 7.0, 700))
amps2 *= np.abs(np.cos(np.linspace(0, 2*np.pi * 10.0, n_frames)))
amps2 = amps2[np.newaxis, :, np.newaxis]

# Harmonic Distribution [batch, n_frames, n_harmonics].
n_harmonics = 20
harmonic_distribution2 = np.ones([n_frames, 1]) * np.linspace(1.0, -1.0, n_harmonics)[np.newaxis, :]
for i in range(n_harmonics):
  harmonic_distribution2[:, i] = 1.0 - np.linspace(i * 0.09, 2.0, 1000)
  harmonic_distribution2[:, i] *= 5.0 * np.abs(np.cos(np.linspace(0, 2*np.pi * 0.1 * i, n_frames)))
  if i % 2 != 0:
    harmonic_distribution2[:, i] = -3
harmonic_distribution2 = harmonic_distribution2[np.newaxis, :, :]

# Fundamental frequency in Hz [batch, n_frames, 1].
f0_hz2 = np.ones([n_frames]) * 200.0
f0_hz2[:100] *= np.linspace(2, 1, 100)**2
f0_hz2[200:1000] += 20 * np.sin(np.linspace(0, 8.0, 800) * 2 * np.pi * np.linspace(0, 1.0, 800))  * np.linspace(0, 1.0, 800)
f0_hz2 = f0_hz2[np.newaxis, :, np.newaxis]

# Save for conda script
np.save('/content/amps2.npy', amps2)
np.save('/content/harmonic_distribution2.npy', harmonic_distribution2)
np.save('/content/f0_hz2.npy', f0_hz2)

In [None]:
#@title Run Harmonic synth with weird controls

SCRIPT = r'''
import os
import warnings
warnings.filterwarnings("ignore")

import numpy as np
import ddsp

output_dir = '/content/proc_demo2'
os.makedirs(output_dir, exist_ok=True)

# Load inputs
amps = np.load('/content/amps2.npy')
harmonic_distribution = np.load('/content/harmonic_distribution2.npy')
f0_hz = np.load('/content/f0_hz2.npy')

n_frames = 1000
hop_size = 64
n_samples = n_frames * hop_size
sample_rate = 16000

# Create synthesizer
harmonic_synth = ddsp.synths.Harmonic(n_samples=n_samples,
                                      sample_rate=sample_rate)

# Get valid controls
controls = harmonic_synth.get_controls(amps, harmonic_distribution, f0_hz)

# Save controls for plotting
for key, val in controls.items():
    v = val.numpy() if hasattr(val, 'numpy') else np.array(val)
    np.save(os.path.join(output_dir, f'controls_{key}.npy'), v)

# Synthesize
audio = harmonic_synth.get_signal(**controls)
audio_np = audio.numpy() if hasattr(audio, 'numpy') else np.array(audio)
np.save(os.path.join(output_dir, 'audio.npy'), audio_np)

print('Done! Outputs saved to', output_dir)
'''

with open('/content/proc_demo2.py', 'w') as f:
  f.write(SCRIPT)

!unset PYTHONPATH PYTHONHOME && /content/miniconda/bin/python /content/proc_demo2.py

In [None]:
# Load and plot controls
controls2_amplitudes = np.load('/content/proc_demo2/controls_amplitudes.npy')
controls2_harmonic_distribution = np.load('/content/proc_demo2/controls_harmonic_distribution.npy')
controls2_f0_hz = np.load('/content/proc_demo2/controls_f0_hz.npy')

time = np.linspace(0, n_samples / sample_rate, n_frames)

plt.figure(figsize=(18, 4))
plt.subplot(131)
plt.plot(time, controls2_amplitudes[0, :, 0])
plt.xticks([0, 1, 2, 3, 4])
plt.title('Amplitude')

plt.subplot(132)
plt.plot(time, controls2_harmonic_distribution[0, :, :])
plt.xticks([0, 1, 2, 3, 4])
plt.title('Harmonic Distribution')

plt.subplot(133)
plt.plot(time, controls2_f0_hz[0, :, 0])
plt.xticks([0, 1, 2, 3, 4])
_ = plt.title('Fundamental Frequency')

In [None]:
audio2 = np.load('/content/proc_demo2/audio.npy')

play(audio2)
specplot(audio2)