In [1]:
import numpy as np
import matplotlib.pyplot as plt

## Harmonic Oscillator

At the core of DDSP's synthesis is the **sinusoidal oscillator**. A bank of oscillators generates a signal $x(n)$ over discrete time $n$ as:

$\quad\large x(n) = \sum\limits_{k=1}^{K} A_k(n) \sin(\phi_k(n))$

where:
- $A_k(n)$: time-varying amplitude of the $k$-th sinusoidal component.
- $\phi_k(n)$: instantaneous phase.

The phase $\phi_k(n)$ evolves by integrating the instantaneous frequency $f_k(n)$:

$\quad\large \phi_k(n) = 2\pi \sum\limits_{m=0}^{n} f_k(m) + \phi_{0,k}$

where:
- $\phi_{0,k}$: initial phase (can be randomized, fixed, or learned).

For a **harmonic oscillator**, all frequencies are integer multiples of the fundamental frequency $f_0(n)$:

$\quad f_k(n) = k \cdot f_0(n)$

Thus, the oscillator is fully defined by:
- $f_0(n)$: fundamental frequency.
- $A_k(n)$: harmonic amplitudes.

The harmonic amplitudes are **factorized** for interpretability:

$\quad A_k(n) = A(n) \cdot c_k(n)$

where:
- $A(n)$: global amplitude (controls loudness).
- $c_k(n)$: normalized harmonic distribution (controls spectral variations), satisfying:

$\quad\sum\limits_{k=0}^{K} c_k(n) = 1, \quad c_k(n) \geq 0$

To ensure positivity, a **modified sigmoid nonlinearity** is applied to the network's outputs.


## Harmonic Synthesizer

The harmonic synthesizer generates **101 harmonics**. Amplitude and harmonic distribution parameters are **upsampled** using overlapping **Hamming window** envelopes with:
- Frame size: 128.
- Hop size: 64.
- Initial phase: fixed at 0.

Absolute harmonic phase offsets don't impact perceptual quality, so spectrogram losses ignore them.

Non-negativity is enforced for amplitudes, harmonic distributions, and filtered noise magnitudes via a **modified sigmoid**:

$\quad\large y = 2.0 \cdot \sigma(x)^{\log_{10}} + 10^{-7}$

This modification stabilizes training by scaling the sigmoid output.
