# Spectral Analysis for Modal Parameter Linear Estimate

## Setup

### Libraries
Install the `sample` package and its dependencies.
The extras will install dependencies for helper functions such as plots

In [None]:
import sys
!$sys.executable -m pip install -qU lim-sample[notebooks,plots]==1.3.0

### Generate test audio
We will synthesize a modal-like sound with three modal frequencies (440, 650, 690 Hz) using simple additive synthesis.

Sampling frequency is 44100 Hz and the duration is 2 seconds.

Also, we will add a gaussian noise at -40 dB SNR to mimic a bad recording environment.

In [None]:
from matplotlib import pyplot as plt
from librosa.display import waveplot, specshow
from IPython.display import Audio as play
from sample.utils import test_audio
import numpy as np

def resize(w=12, h=6):
  plt.gcf().set_size_inches([w, h])

fs = 44100
x = test_audio(fs=fs, noise_db=-40)

waveplot(x, sr=fs, alpha=.5, zorder=100)
plt.grid()
resize()
play(x, rate=fs)

## Interface
Using the SAMPLE model is simplified by a scikit-learn-like API

In [None]:
from sample import SAMPLE
sample = SAMPLE(
    sinusoidal_model__max_n_sines=10,
    sinusoidal_model__peak_threshold=-30,
    sinusoidal_model__save_intermediate=True
).fit(x)

## Sinusoidal Model
SAMPLE is based on Serra's *Spectral Modelling Synthesis* (SMS),
an analysis and synthesis system for musical sounds based
on the decomposition of the sound into a deterministic
sinusoidal and a stochastic component.

The main components of the sinusoidal analysis are the peak detection
and the peak continuation algorithms.

### STFT
The peak detection/continuation algorithm is based on an analysis of the Short-Time Fourier Transform. Zero-phase windowing is employed.

In [None]:
stft = np.array([
  mx
  for mx, _ in sample.sinusoidal_model.intermediate_["stft"]
]).T

specshow(stft, sr=fs, x_axis="time", y_axis="hz");
plt.ylim([0, 2000])
resize()

### Peak detection
The peak detection algorithm detects peaks in each STFT frame of the analysed
sound as a local maximum in the magnitude spectrum

In [None]:
mx, px = sample.sinusoidal_model.intermediate_["stft"][0]
f = fs * np.arange(mx.size) / sample.sinusoidal_model.w_.size
ploc, pmag, pph = sample.sinusoidal_model.intermediate_["peaks"][0]

ax = plt.subplot(121)
plt.fill_between(f, np.full(mx.shape, -120), mx, alpha=.1)
plt.plot(f, mx)
plt.scatter(ploc * fs / sample.sinusoidal_model.w_.size, pmag, c="C0")
plt.ylim([-60, plt.ylim()[1]])
plt.grid()
plt.title("magnitude")

plt.subplot(122, sharex=ax)
plt.plot(f, px)
plt.scatter(ploc * fs / sample.sinusoidal_model.w_.size, pph)
plt.ylim([np.min(px[f < 2000]), np.max(px[f < 2000])])
plt.grid()
plt.title("phase")
plt.xlim([0, 2000])
resize()

### Peak continuation
The peak continuation algorithm organizes the peaks into temporal tracks,
with every track representing the time-varying behaviour of a partial.
For every peak in a trajectory, the instantaneous frequency, magnitude
and phase are stored to allow further manipulation and resynthesis.

The general-purpose SMS method enables recycling of the peak tracks data structures: if one trajectory
becomes inactive, it can be later picked up when a newly detected partial arises.
Our implementation doesn't allow this.

Moreover, two tracks that do not overlap in time but have approximately the same
average frequency can be considered as belonging to the same partial and merged into the same track.

In [None]:
from sample import plots
plots.sine_tracking_2d(sample.sinusoidal_model)
resize()

In [None]:
from sample import plots
plots.sine_tracking_3d(sample.sinusoidal_model)
resize(6, 6)

## Regression
Partials of a modal impact sound are characterized by exponentially decaying amplitudes.
Our model for model partials is
$$x(t) = m\cdot e^{-2\frac{t}{d}}\cdot \sin{\left(2\pi f t + \phi\right)}$$

The magnitude in decibels is a linear funtion of time
$$m_{dB}(t) = 20\log_{10}{\left(m\cdot e^{-2\frac{t}{d}}\right)} = 20\log_{10}{m} - 40\frac{\log_{10}{e}}{d} \cdot t$$

$$k = - 40\frac{\log_{10}{e}}{d}$$
$$q = 20\log_{10}{m}$$

$$m_{dB}(t) = kt + q$$

We use linear regression to find an initial estimate of the parameters $k$ and $q$ from the magnitude tracks. Then, we refine the estimate by fitting a semi-linear *hinge* function. Amplitude is then doubled to compensate for the fact that we are looking at only half of the spectrum

In [None]:
t_x = np.arange(x.size) / fs
for i, ((f, d, a), t) in enumerate(zip(sample.param_matrix_.T, sample.sinusoidal_model.tracks_)):
    c = "C{}".format(i)
    t_t = (t["start_frame"] + np.arange(t["freq"].size)) * sample.sinusoidal_model.h / sample.sinusoidal_model.fs
    plt.plot(t_t, t["mag"] + 6.02, c=c, alpha=.33, linewidth=3)  # compensate for spectral halving
    plt.plot(t_x, 20*np.log10(a * np.exp(-2*t_x / d)), "--", c=c)

plt.title("fitted curves")
plt.grid()
plt.ylabel("magnitude (dB)")
plt.xlabel("time (s)")
plt.legend(["track", "fitted"])
resize(6, 6)

Frequency is simply estimated as the mean frequency of the peak track

# Resynthesize
Let's resynthesize the sound using the estimated parameters (via additive synthesis)

In [None]:
x_hat = sample.predict(np.arange(x.size) / fs)

waveplot(x_hat, sr=fs, alpha=.5, zorder=100)
plt.grid()
resize()
play(x_hat, rate=fs)

Play back the original sound to compare

In [None]:
play(x, rate=fs)

Or play both at the same time in stereo

In [None]:
from librosa import stft, amplitude_to_db

ax = plt.subplot(211)
x_dual = np.array([x, x_hat])
for l, xi in zip(("original", "resynthesis"), x_dual):
    waveplot(xi, sr=fs, alpha=.5, zorder=100, label=l, ax=ax)
plt.grid()
plt.legend()

X_db = amplitude_to_db(np.abs(stft(x)), ref=np.max)
ax = plt.subplot(223, sharex=ax)
specshow(X_db, ax=ax, sr=fs, x_axis="time", y_axis="hz")
ax.set_title("original")

X_hat_db = amplitude_to_db(np.abs(stft(x_hat)), ref=np.max)
ax = plt.subplot(224, sharex=ax, sharey=ax)
specshow(X_hat_db, ax=ax, sr=fs, x_axis="time", y_axis="hz")
ax.set_title("resynthesis")
ax.set_ylim([0, 2000])

resize(12, 12)
play(x_dual, rate=fs)