# Spectral Analysis for Modal Parameter Linear Estimate

## Setup

### Libraries
Install the `sample` package and its dependencies.
The extras will install dependencies for helper functions such as plots

In [None]:
import sys
!$sys.executable -m pip install -qU lim-sample[notebooks,plots]==2.1.0
import sample

sample(logo=dict(size_inches=6))

### Generate test audio
We will synthesize a modal-like sound with three modal frequencies using simple additive synthesis.  
Also, we will add a gaussian noise at -45 dB SNR to mimic a bad recording environment.  
Sampling frequency is 44100 Hz and the duration is 2 seconds.

In [None]:
import numpy as np
from IPython import display as ipd
from matplotlib import pyplot as plt

from sample.sample import additive_synth
from sample.utils import dsp as dsp_utils


def resize(diag: float = 8.485, aspect: float = 1, shape=(1, 1)):
  plt.gcf().set_size_inches(
      diag * np.true_divide([aspect, 1], np.sqrt(aspect * aspect + 1)) *
      np.flip(shape))


ground_truth = {
    "freqs": [440, 1103, 1097],
    "decays": [1, 0.75, 2],
    "amps": [1, 0.8, 0.2],
}
ground_truth["amps"] = np.array(ground_truth["amps"]) / sum(
    ground_truth["amps"])

fs = 44100
x = additive_synth(np.arange(int(2 * fs)) / fs, **ground_truth)

# Add noise
np.random.seed(42)
x += np.random.randn(np.size(x)) * dsp_utils.db2a(-45)
x /= np.max(np.abs(x))
t = np.arange(x.size) / fs

ipd.display(ipd.Audio(x, rate=fs))

plt.plot(t, x, alpha=.5, zorder=100)
plt.grid()
resize(aspect=16 / 9)

## Interface
Using the SAMPLE model is simplified by a scikit-learn-like API

In [None]:
model = sample.SAMPLE(sinusoidal__tracker__max_n_sines=16,
                      sinusoidal__tracker__peak_threshold=-30,
                      sinusoidal__intermediate__save=True)
model.fit(x, sinusoidal__tracker__fs=fs)

## Sinusoidal Model
SAMPLE is based on Serra's *Spectral Modelling Synthesis* (SMS),
an analysis and synthesis system for musical sounds based
on the decomposition of the sound into a deterministic
sinusoidal and a stochastic component.

The main components of the sinusoidal analysis are the peak detection
and the peak continuation algorithms.

### STFT
The peak detection/continuation algorithm is based on an analysis of the Short-Time Fourier Transform. Zero-phase windowing is employed.

In [None]:
from sample import plots

stft = np.array([mx for mx, _ in model.sinusoidal.intermediate["stft"]]).T
f = fs * np.arange(stft.shape[0]) / model.sinusoidal.w.size

plots.tf_plot(stft,
              tlim=t[[0, -1]],
              flim=f[[0, -1]],
              ylim=[0, 2000],
              aspect_ratio=16 / 9)
resize(aspect=16 / 9)

### Peak detection
The peak detection algorithm detects peaks in each STFT frame of the analysed
sound as a local maximum in the magnitude spectrum

In [None]:
mx, px = model.sinusoidal.intermediate["stft"][0]
ploc, pmag, pph = model.sinusoidal.intermediate["peaks"][0]

ax = plt.subplot(121)
plt.fill_between(f, np.full(mx.shape, -120), mx, alpha=.1)
plt.plot(f, mx)
plt.scatter(ploc * fs / model.sinusoidal.w.size, pmag, c="C0")
plt.ylim([-60, plt.ylim()[1]])
plt.grid()
plt.title("magnitude")

plt.subplot(122, sharex=ax)
plt.plot(f, px)
plt.scatter(ploc * fs / model.sinusoidal.w.size, pph)
plt.ylim([np.min(px[f < 2000]), np.max(px[f < 2000])])
plt.grid()
plt.title("phase")
plt.xlim([0, 2000])
resize(shape=(1, 2))

### Peak continuation
The peak continuation algorithm organizes the peaks into temporal tracks,
with every track representing the time-varying behaviour of a partial.
For every peak in a trajectory, the instantaneous frequency, magnitude
and phase are stored to allow further manipulation and resynthesis.

The general-purpose SMS method enables recycling of the peak tracks data structures: if one trajectory
becomes inactive, it can be later picked up when a newly detected partial arises.
Our implementation doesn't allow this.

Moreover, two tracks that do not overlap in time but have approximately the same
average frequency can be considered as belonging to the same partial and merged into the same track.

In [None]:
plots.sine_tracking_2d(model)
resize(shape=(1, 2))

In [None]:
plots.sine_tracking_3d(model)
resize()

## Regression
Partials of a modal impact sound are characterized by exponentially decaying amplitudes.
Our model for modal partials is
$$x(t) = m\cdot e^{-2\frac{t}{d}}\cdot \sin{\left(2\pi f t + \phi\right)}$$

The magnitude in decibels is a linear funtion of time
$$m_{dB}(t) = 20\log_{10}{\left(m\cdot e^{-2\frac{t}{d}}\right)} = 20\log_{10}{m} - 40\frac{\log_{10}{e}}{d} \cdot t$$

$$k = - 40\frac{\log_{10}{e}}{d}$$
$$q = 20\log_{10}{m}$$

$$m_{dB}(t) = kt + q$$

We use linear regression to find an initial estimate of the parameters $k$ and $q$ from the magnitude tracks. Then, we refine the estimate by fitting a semi-linear *hinge* function. Amplitude is then doubled to compensate for the fact that we are looking at only half of the spectrum

In [None]:
t_x = np.arange(x.size) / fs
for i, ((f, d, a),
        t) in enumerate(zip(model.param_matrix_.T, model.sinusoidal.tracks_)):
  c = "C{}".format(i)
  t_t = (t["start_frame"] +
         np.arange(t["freq"].size)) / model.sinusoidal.frame_rate
  plt.plot(t_t, t["mag"] + 6.02, c=c, alpha=.33,
           linewidth=3)  # compensate for spectral halving
  plt.plot(t_x, 20 * np.log10(a * np.exp(-2 * t_x / d)), "--", c=c)

plt.title("fitted curves")
plt.grid()
plt.ylabel("magnitude (dB)")
plt.xlabel("time (s)")
plt.legend(["track", "fitted"])
resize()

Frequency is simply estimated as the mean frequency of the peak track

# Resynthesize
Let's resynthesize the sound using the estimated parameters (via additive synthesis)

In [None]:
def label_and_play(i: int, k: str, y: np.ndarray):
  return ipd.display(ipd.HTML(f"<h1>{k}</h1>"),
                     ipd.Audio(y, rate=fs, normalize=False))


_, axs = plots.resynthesis(x, {"Resynthesis": model},
                           tf_kws=dict(ylim=(0, 2000), aspect_ratio=1),
                           foreach=label_and_play)
resize(shape=(2, len(axs) - 1))

# BeatsDROP
We can also apply a regression algorithm to disentangle beating partials!

In [None]:
import sample.beatsdrop.regression
from sample import beatsdrop as bd

_, axs = plots.beatsdrop_comparison(model, {
    "BeatsDROP": bd.regression.DualBeatRegression(),
    "Baseline": bd.regression.BeatRegression(),
},
                                    x,
                                    track_i=np.argmax(model.freqs_),
                                    transpose=True)
resize(7, 16 / 9, shape=axs.shape)

The `SAMPLEBeatsDROP` class integrates BeatsDROP and SAMPLE

In [None]:
from sklearn import base

model_bd = sample.SAMPLEBeatsDROP(**base.clone(model).get_params())
model_bd.fit(x)

In [None]:
_, axs = plots.resynthesis(x, {
    "SAMPLE": model,
    "SAMPLE+BeatsDROP": model_bd
},
                           tf_kws=dict(ylim=(0, 2000), aspect_ratio=1),
                           foreach=label_and_play)
resize(4 * np.sqrt(2), shape=(2, len(axs) - 1))