<div style="margin: 0 auto 10px; height: 70px; border: 2px solid gray; border-radius: 6px;">
  <div style="float: left; margin: 5px 10px 5px 10px; "><img src="img/bfh.jpg" /></div>
  <div style="float: right; margin: 20px 30px 0; font-size: 15pt; font-weight: bold; color: #98b7d2;"><a href="https://moodle.bfh.ch/course/view.php?id=39255" style="color: #98b7d2;">BTE5476 - Project-Oriented Digital Signal Processing </a></div>
</div>
<div style="clear: both; font-size: 30pt; font-weight: bold; color: #64788b; margin-left: 30px;">
    Acoustic Reverberation
</div>

In [None]:
import ipywidgets as widgets
import matplotlib.pyplot as plt
import numpy as np
import scipy.signal as sp
from IPython.display import Audio
from scipy.io import wavfile

In [None]:
plt.rcParams['figure.figsize'] = 14, 4 

In [None]:
def load_wav(file):
    sf, x = wavfile.read(f'data/{file}.wav')
    return sf, (x - np.mean(x)) / 32767.0

# Room Acoustics

When a sound source is placed inside an enclosed space, the sound waves emanating from it will interact with the surrounding walls and will "bounce around" the room as a result of multiple specular and diffuse reflections. A listener within the space will thus hear a superposition of delayed and attenuated copies of the source signal and the resulting perceptual effect of this phenomenon is called _reverberation_. 

Virtually all of our listening experiences (either in real life or when listening to recorded music) are characterized by some reverberation, so much so that a completely "dry" signal (that is, a signal with no reverb such as a synthetic waveform heard using headphones) will sound quite innatural to our ears. 

In music production, especially in a studio setting where different instrumental parts are often recorded independently and later mixed together, it is common practice to record such tracks as "dry" as possible (eg. by placing microphones very close to the source) and only later add a global reveberation to the final mix. In this notebook we will look at the properties of a reverberant space and illustrate some of the methods that can be used to add realistic yet synthetic reverberation to an audio recoding.

## RT60

<img width="300" style="float: right; margin: 0 30px 0 30px;" src="img/RT60.jpg"> 

On a *macroscopic* level, the fundamental parameter of a reverberant space is the "length" of the reverb. We know from experience that large spaces (such as a cathedral) will sound very different than a small room or an open-air concert. The RT60 metric quantifies this intuition.

**Reverberation Time 60**: time until the sound pressure level measured by the listened drops by 60 dB after the sound source stops

| max RT60 |  room type  |  good for |
|:---------:|-------------|------|
| .3s | recording studio | no reverb |
|  1s |  lecture hall | speech |
| 1.5s |  concert hall | music |
| > 2s |  cathedral |   |

# The Room Impulse Response (RIR)

<img width="200" style="float: right; margin: 0 30px 20px 30px;" src="img/reflections.png"> 

On a more detailed level, A reverberant room acts like a filter whose impulse response is dependent on the geometry of the enclosure, on the materials used for the walls and the objects in the space, and on the positions of both the sound source and of the listener. 

To understand the properties of a RIR, recall that the sound waves produced by the source will reach the listener multiple times:
 * via a direct path, if source and listener are in a direct line of sight
 * after being reflected once by a wall (first-order or "early" reflections)
 * after multiple reflections by different walls (high-order or late" reflections)

The number of reflections grows exponentially with increasing order and therefore the RIR becomes denser and denser towards its "tail"; at the same time, because of absorption losses, the amplitude of a reflection is an exponentially decreasing function of its order and so the typical time-domain envelope of a RIR is the one shown below:

<img width="600" style="display: block; margin: 40px auto;" src="img/rir.png"> 

# Convolution Reverb

If the RIR is known, reverberation can be added to a dry audio signal via a simple convolution. RIRs can be obtained in various ways

 * via direct measurement
 * via room acoustics simulation methods
 * (implicitly) using synthetic reverb algorithms.

Commercial reverb plug-ins generally offer a rich library of real-world and artificial RIRs.

<img width="600" style="display: block; margin: 20px auto;" src="img/convrev.png"> 


# RIR measurement

<img width="200" style="float: right; margin: 0 30px 20px 30px;" src="img/icos.jpg"> 

Source and microphone are placed in the chosen position; the source generates a test signal $x[n]$ and the microphone captures the reverberant sound

$$ y[n] = (h\ast x)[n] $$

from which the RIR $h[n]$ can obtained via deconvolution. The typical source signals are

 * an impulse-like sound (starter gun, popping a balloon, etc.); if $x[n] \approx \delta[n]$ then $h[n] = y[n]$ and no deconvolution is necessary
 * a specific test signal with longer duration, designed to make deconvolution easy (eg. a chirp)

Many complications:
 * approximating a true omnidirectional source (eg. using the icosahedral speaker shown in the figure)
 * reducing the impact of noise
 * account for the mechanics of human hearing (eg. binaural RIRs, with one microphone per ear)

Here is an example of a measured RIR within a concert hall
<img width="600" style="display: block; margin: 0 auto;" src="img/concerthall.png"> 

## ESS test signals

A popular source signal for RIR measurements is the Exponentially Swept Sine, a chirp with instantaneous frequency increasing exponentially from $f_1$ Hz to $f_2$ Hz over $T$ seconds:

$$
    x[n] = \sin\left( 2\pi\frac{f_1 T}{R}(e^{\frac{R}{TF_s}n} - 1)\right), \qquad n = 0, 1, \ldots TF_s
$$

where $F_s$ is the sampling frequency and $R = \ln(f_2 / f_1)$ is the sweep rate.

The ESS signal makes deconvolution easy since $(x \ast x_{inv})[n] \approx K\delta[n]$ for 
$$
    x_{inv}[n] = x[-n] e^{-\frac{R}{TF_s}n}
$$

In [None]:
def ess(f1, f2, T, fs):
    R = np.log(f2 / f1)
    n = np.arange(0, int(T * fs))
    
    # ESS generation
    x = np.sin((2 * np.pi * f1 * T / R) * (np.exp(n / fs * R / T) - 1))
    
    # Inverse signal
    x_inv = x[::-1] * np.exp(-n / fs * R / T)

    return x, x_inv

In [None]:
x, x_inv = ess(10, 100, 3, 1000)

plt.subplot(311)
plt.plot(x);
plt.subplot(312)
plt.plot(x_inv);
plt.subplot(313)
plt.plot(sp.fftconvolve(x, x_inv, mode='same'));
plt.tight_layout();

In [None]:
Audio(ess(200, 1000, 3, 8000)[0], rate=8000)

# Room Acoustics Simulation Methods

In room acoustics, the goal is to determine the RIR from an abstract description of a room's geometry and physical properties. Several algorithmic approaches exist, with different strenghts and weaknesses and vastly different accuracy level and computational cost.

<img width="600" style="display: block; margin: 20px auto;" src="img/methods.jpg"> 


## Geometric Methods

Geometric methods assume that sound propagates from the source in the form of straight rays rather than as a wave. These sound rays are reflected by the surrounding walls and then reflected again and again and, in principle, all the trajectories that ultimately reach the listener can be determined exactly using only elementary geometrical considerations.

The accuracy of these methods decreases for low-frequency sounds, that is, when the wavelengths become commensurate with the room's dimensions and the propagation mechanism can no longer be properly modeled if wave phenomena are ignored. (In particular, sounds at lower frequencies easily propagate around an obstacle via edge diffraction whereas, using a ray-based model, they would appear completely blocked.)

Below we will briefly describe two of the most common RIR geometric estimation methods; for a much more in-depth analysis that includes lots of excellent interactive demos, [this website](https://interactiveacoustics.info/) is an excellent resouce.

### Image-Source methods

> An ideal specular reflection from a rigid surface can be represented by a mirror image source that is obtained by reflecting the sound source against the reflecting surface.

<img width="200" style="float: left; margin: 0 30px 20px 30px;" src="img/firstorder.png"> 
<img width="200" style="float: left; margin: 0 50px 20px 30px;" src="img/secondorder.png"> 


 * simple computations using law of reflection (billiard ball)
 * image sources are reflected too, leading to higher-order reflections
 * image sources do not depend on listener's position
 * ray paths are computed from listener to each image source: exact method (in theory)
 * each path of length $d$ contributes to the RIR via a scaling factor $\propto 1/d$

example for a 5th-order reflection:

<img width="600" style="display: block; margin: 20px auto;" src="img/highorder.jpg"> 

Issues
 * numer of sources grows exponentially with max order
 * for polygonal rooms, image sources must be validate wrt listener position
 * difficult to model non-ideal reflections (absorption, scattering)

Many efficient algorithmic implementations have been proposed that prune the image source tree before RIR computation

### Ray tracing

<img width="200" style="float: right; margin: 0 30px 20px 30px;" src="img/diffuse.png"> 

Similar to the image-source method but:
 * it uses a finite number of sound "rays" emanating in random directions from the source (stochastic method)
 * ray paths are tracked through multiple reflections
 * not all paths will "hit" the listener and listener must have nonzero "volume"
 * advantage: it can model diffuse reflections (surface scattering), important for RIR tail

## Wave-based Methods

<img width="300" style="float: right; margin: 30px 40px 20px 30px;" src="img/wave.png"> 

These methods focus on the wave nature of sound and its impact on reverberation:
 * diffusion and diffraction
 * material absorption
 * frequency-dependent room "modes" (eg. resonances or nulls)

The wave equation (with boundary conditions set by the shape of the room) is solved numerically using finite element methods.
 * computationally expensive
 * cost proportional to max frequency
 * most useful for low frequencies

## Hybrid methods

<img width="600" style="display: block; margin: 20px auto;" src="img/tradeoffs.jpg"> 

## Software libraries


 * [Pyroomacoustics](https://pyroomacoustics.readthedocs.io/en/pypi-release/index.html): developed at EPFL (in my lab), offers image source and some ray tracing. Demo notebook available
 * [Wayverb](https://reuk.github.io/wayverb/): very good introduction to room acoustics (most illustrations here were "borrowed" from the site) and very complete library offering all 3 methods

# Reverb simulation using DSP

In many audio applications it's not really necessary to use a realistic RIR (either measured or synthesized) and we are only interested in making the sound less dry by adding some reverb with a definable RT60 value. In these cases a realistic reverberation can be simulated at an extremely low computational cost by using just a few low-order IIR filters.

## Echo vs reverberation

Sound propagates in the air as a pressure wave and, if the wave encounters a hard surface, the discontinuity in the propagation medium causes a partial reflection. A listener hearing both the original sound and its reflection will experience:
 * a distinct echo if the two sounds are separated in time by at least 100ms
 * a reverb-like coloration otherwise

### Slapback echo

The so-called "slapback" echo, very popular in 1950s rockabilly music, uses a propagation delay of about 100ms, that is, a delay at the limit of echo perception that sounds a bit like reverb. It can be implemented via a simple FIR with transfer function
$$
    S(z) = (1-g) + gz^{-D}
$$
where $D \approx \lfloor F_s / 10 \rfloor$; the parameter $g$, between 0 and 1, determines the amount of slapback.

In [None]:
hb_sf, hb = load_wav('heartbreak')
Audio(hb, rate = hb_sf)

In [None]:
g, D = 0.3, int(hb_sf * 0.10)
h = np.r_[1-g, np.zeros(D), g]

Audio(sp.lfilter(h, [1], hb), rate=hb_sf)

### Tap delay echos (FIR)

For a sound source within an enclosed space of reasonable size, the early reflections in the reverberation can be modeled as multiple short-time echos with varying amplitudes and delays.

A finite number $M$ of overlapping echos can be generated via a so-called tap delay filter, namely, an FIR with transfer function

$$
    T(z) = c\sum_{k=0}^{M} g_k z^{-D_k}
$$
where $D_k$ is the arrival delay of the $k$-th echo and $g_k$ the corresponding attenuation. This echo doesn't sound very natural because it stops abruptly.

### Recursive echos (comb filters)

Suppose a sound source is placed at equal distance from two parallel reflecting surfaces, and consider a listener in the same spot as the source; the sound will bounce back and forth and each reflection will reach the listener with a delay that is a multiple of the round-trip delay from source to surface. Because of the attenuation caused by the reflections and by propagation loss in air, each echo will be fainter than the previous.

<img width="600" style="float: right; margin: 0 0 30px 40px;" src="img/comb.png"> 

We can model this phenomenon with a simple first-order IIR filter where the feedback delay is equal to the propagation time between reflections and the feedback gain is less than one:

$$
    y[n] = x[n] + \alpha y[n - D]
$$

The transfer function of the filter generating a recursive echo is 
$$
    C(z) = \frac{c}{1 - gz^{-D}}
$$
which is stable for $|g|<1$. The filter is often called a "comb" filter because its magnitude response exhibits a series of regularly-spaced peaks at frequencies multiples of $\omega_0 = 2\pi / D$:

In [None]:
def comb(delay: int, gain: float) -> [np.ndarray, np.ndarray]:
    return [1], np.r_[1, np.zeros(int(delay) - 1), -gain]

In [None]:
D, g = 16, 0.7
w, H = sp.freqz(*comb(D, g), 2048);
plt.plot(w, np.abs(H));

The impulse response of a comb filter is an exponentially decaying sequence where only one sample every $D$ is nonzero:
$$
    c[n] = \begin{cases} c\,g^k & n = kD, k \in \mathbb{N} \\ 0 & \mathrm{otherwise} \end{cases}
$$

In [None]:
plt.stem(sp.lfilter(*comb(D, g), np.r_[1, np.zeros(100)]));

In [None]:
sf, piano = load_wav('piano')
Audio(piano, rate=sf)

In [None]:
Audio(sp.lfilter(*comb(int(sf * 0.3), 0.7), piano), rate=sf)

## Allpass filters

A filter with transfer function 

$$
    A(z) = \frac{z^{-D} - \alpha}{1 - \alpha z^{-D}}
$$

is called _allpass_ since it magnitude response is constant and equal to one, $|A(e^{j\omega})| = 1$. By definition, an allpass filters doesn't change the spectral energy distribution of a signal; on the other hand, its nonlinear phase response will cause the input signal to "spread out" in the time domain, which can be used to simulate the diffuse reflections that contribute to the tail end of a RIR.

<img width="600" style="display: block; margin: 20px auto;" src="img/allpass.png"> 


In [None]:
def allpass(delay: int, alpha: float) -> [np.ndarray, np.ndarray]:
    return np.r_[-alpha, np.zeros(int(delay - 1)), 1], np.r_[1, np.zeros(int(delay - 1)), -alpha]

In [None]:
D, alpha = 17, 0.7
w, H = sp.freqz(*allpass(D, alpha), 2048);
plt.plot(w, np.abs(H));
plt.plot(w, np.angle(H));

The impulse response of a comb filter is also a spaced-out exponentially decaying sequence:
$$
    a[n] = \begin{cases}
        0 & n < 0\\
        -\alpha & n = 0 \\
        \alpha^k (1-\alpha^2) & n = kD \\
        0 & \text{otherwise}
        \end{cases}
$$

In [None]:
plt.stem(sp.lfilter(*allpass(D, g), np.r_[1, np.zeros(200)]));

Let's load an impulse-like audio excerpt and see how an allpass filter can spread out the peak in time without affecting the acoustic properties too much:

In [None]:
sf, snare = load_wav('snare')
snare = snare[:10000]

D, alpha = 210, 0.7
ap_snare = sp.lfilter(*allpass(D, alpha), snare);

plt.plot(snare);
plt.plot(ap_snare);

In [None]:
Audio(snare, rate=sf)

In [None]:
Audio(ap_snare, rate=sf)

## Synthetic reverberators

Comb and allpass filters are the building blocks used by the two well-known reverb simulators described below. Both methods use a bank of comb filters to simulate early reflections and one or more allpass filters to produce the dense tail of the impulse response due to diffusion. 

The delays of the comb filters are a function of the size of the room; to increase the density of the early reflections, multiple comb filters are used in parallel making sure the delay values are coprime so that the nonzero values of their impulse responses do not overlap in time. The delays of the allpass filters are chosen so that so that the overall impulse response of the reverberator exhibits the desired RT60 value 

### Adjusting the delays

Many implementations of the reverberation algorithms can be found in the literature, with delay and gain values been carefully selected to be coprime and to produce a good reverberation quality. The delay values of each implementation are specified in samples and they therefore assume a specific sampling rate for the input signals. When adapting one of these designs to a differnt rate, we need to scale the delay values appropriately while trying to preserve coprimality. 

In the examples below this is achieved by rounding the rate-adjusted delay values to the nearest prime number; the following function

In [None]:
def closest_primes(values, margin=1.5):
    values = np.array(values).astype(int)
    limit = int(np.max(values) * margin)

    # mark all prime numbers up to the limit using Eratosthenes sieve
    a = np.ones(limit).astype(bool)
    a[1] = a[::2] = False
    for i in range(3, int(np.sqrt(limit)) + 1):
        if a[i]:
            a[i*i::2*i] = False
    if not np.any(a[np.max(values):]):
        raise 'increase the top margin for prime search'

    vp = np.zeros_like(values)
    for ix, v in enumerate(values):
        # find neighboring primes (smaller and greater) and keep the closest
        vs = vg = v
        while not a[vs]:
            vs -= 1
        while not a[vg]:
            vg += 1
        vp[ix] = vs if np.abs(v - vs) < np.abs(v - vg) else vg   
        
    return vp

In [None]:
closest_primes([3239, 3073, 3941, 4321])

### Schroeder's reverberator

[Schroeder's algorithm](https://ieeexplore.ieee.org/document/1166351), published in 1961, uses a bank of four comb filters with coprime delays for the early reflections and a cascade of three allpass of increasing order for the diffuse reflections. The original audio (the "dry" signal) and the output of the reverberator (the "wet" signal) can be mixed in different proportions to choose the final amount of added reverb

<img width="600" style="display: block; margin: 0px auto;" src="img/schroeder.png"> 


In [None]:
def schroeder(x, sf, wet=0.2):
    # delays and gains for allpass and comb filters; these produce a reverb with RT60 of about 1 second
    #  and were proposed by John Chowning (CCRMA, Stanford)
    AP = np.array([[347, 113, 37], [0.7, 0.7, 0.7]])
    CF = np.array([[1687, 1601, 2053, 2251], [0.773, 0.802, 0.753, 0.733]])
    DESIGN_SF = 25000  # filter delays are relative to this sampling rate

    # recompute delays wrt current sampling rate
    AP[0] = closest_primes(AP[0] * sf / DESIGN_SF)
    CF[0] = closest_primes(CF[0] * sf / DESIGN_SF)

    # comb filters
    y = np.zeros(len(x))
    for n in range(len(CF[0])):
        y += sp.lfilter(*comb(CF[0][n], CF[1][n]), x)

    # allpass
    for n in range(len(AP[0])):
        y = sp.lfilter(*allpass(AP[0][n], AP[1][n]), y)
    
    return wet * y + (1 - wet) * x

In [None]:
plt.plot(schroeder(np.r_[1, np.zeros(25000)], 25000, wet=0.5));

In [None]:
# test audio file; add silence at the end to hear the reverberation's decay

audio_sf, audio = load_wav('guitar')
audio = np.r_[audio, np.zeros(audio_sf)]
Audio(audio, rate=audio_sf)

In [None]:
Audio(schroeder(audio, audio_sf), rate=audio_sf)

### Moorer's reverberator

Moorer's algorithm, published in 1979, creates a finite number of early reflections using a FIR tap delay; this "early" reverb is then processed by a structure very similar to Schroeder's reverberator and the result (the "late" reverberation) is delayed and appended after the "early" portion. The resulting impulse response has a clearer early/late structure and is less dense than in the previous case.

<img width="600" style="display: block; margin: 0px auto;" src="img/moorer.png"> 


In [None]:
def moorer(x, sf, wet=0.1):
    AP = np.array([[225], [0.7]])
    CF = np.array([[1801, 1478, 2011, 2123], [0.805, 0.827, 0.783, 0.764]])
    TD = np.array([[40, 70, 149], [.4, .3, .2]])
    DESIGN_SF = 44100  

    AP[0] = closest_primes(AP[0] * sf / DESIGN_SF)
    CF[0] = closest_primes(CF[0] * sf / DESIGN_SF)
    TD[0] = closest_primes(TD[0] * sf / DESIGN_SF)

    td_max = int(max(TD[0])) + 1
    h = np.zeros(td_max)
    for k in range(len(TD[0])):
        h[int(TD[0][k])] = TD[1][k]
    early = sp.lfilter(h, [1], x)
    
    y = np.zeros(len(x))
    for k in range(len(CF[0])):
        y += sp.lfilter(*comb(CF[0][k], CF[1][k]), x)
    
    # Allpass stage
    y = sp.lfilter(*allpass(AP[0][0], AP[1][0]), y)
    
    # Delay
    late = np.r_[np.zeros(td_max), y][:len(early)]
    
    return wet * (early + late) + (1 - wet) * x

In [None]:
plt.plot(moorer(np.r_[1, np.zeros(44100)], 44100, wet=0.3));

In [None]:
Audio(audio, rate=audio_sf)

In [None]:
Audio(moorer(audio, audio_sf, wet=0.5), rate=audio_sf)

In [None]:
Audio(moorer(hb, hb_sf, wet=.3), rate=hb_sf)