<div style="margin: 0 auto 30px; height: 60px; border: 2px solid gray; border-radius: 6px;">
  <div style="float: left;"><img src="img/epfl.png" /></div>
  <div style="float: right; margin: 20px 30px 0; font-size: 10pt; font-weight: bold;"><a href="https://moodle.epfl.ch/course/view.php?id=18253">COM202 - Signal Processing</a></div>
</div>
<div style="clear: both; font-size: 30pt; font-weight: bold; color: #483D8B;">
    Lab 7: Discrete-time Filters
</div>

In this notebook we will learn how to implement and use two discrete-time  filters called the Leaky Integrator and the Moving Average. In spite of their simplicity, these two lowpass filters are extremely useful and they often represent a good initial stand-in for more complex filters when prototyping a signal processing application.

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import IPython
import scipy.signal as sp
from scipy.io import wavfile

# interactivity library:
from ipywidgets import interact, interactive, fixed, interact_manual
import ipywidgets as widgets

In [None]:
plt.rcParams["figure.figsize"] = (14,4)

# Implementing discrete-time filters

Although filtering algorithms are available in SciPy, it's always instructive to try and code a digital filter from scratch in order to understand the finer details of a practical implementation. 

## Filters as Python functions

Using Python and NumPy, filters can be implemented as standalone [pure functions](https://en.wikipedia.org/wiki/Pure_function); the arguments are going to be:
 * an array containing the entire input signal,
 * the filter description,
 
and the return value is an array containing the entire output signal, as shown in this template:

In [None]:
def dt_filter(x: np.ndarray, filter_parameters) -> np.ndarray:
    y = np.zeros(len(x))
    for n in range(0, len(x)):
        y[n] = ...  # compute each output sample        
    return y

### Causality and zero initial conditions

Filters are considered causal by default, and therefore the computation of each output sample $y[n]$ involves only _past_ input and output samples.

The input array passed as an argument is interpreted as containing the values $x[0], x[1], \ldots, x[N-1]$. The input is also assumed to be a causal sequence, so that $x[n] = 0$ for $n <  0$; if the algorithm requires accessing past values of $x[n]$ for negative values of the index $x$, these values are assumed to be zero.

Similarly, we assume _zero initial conditions_ for recursive filters that use past _output_ values in the computation of the current output sample. Zero initial conditions imply that $y[n] = 0$ for $n < 0$.

### Termination

The input to the filtering function is an array of length $N$ and, by convention, the return value is also an array of length $N$. Note however that this does not mean that $y[n] = 0$ for $n \ge N$, it simply means that no more output samples can be computed unless more input samples are provided.

In fact, if for instance we were to assume that $x[n]=0$ for $n \ge N$, an IIR filter would produce an infinite-length output sequence $y[n]$. Sometimes, it is useful to append a suitable amount of zeros to the input array so that the output can naturally decay to a small amplitude once the actual data in the array has been processed.

## The Leaky Integrator

The following function implements a Leaky Integrator described by the constant-coefficient difference equation (CCDE) 

$$
    y[n] = \lambda y[n-1] + (1-\lambda)x[n], \qquad 0 < \lambda < 1.
$$

In [None]:
def leaky(x: np.ndarray, lam: float) -> np.ndarray:
    y = np.zeros(len(x))
    for n in range(0, len(x)):
        y[n] = lam * y[n-1] + (1 - lam) * x[n]        
    return y

### Ensuring zero initial conditions 

The Leaky Integrator is a recursive (IIR) filter and its output at time $n$ depends on the output at time $n-1$; assuming zero initial conditions, when $n=0$ the required "previous output value" is $y[-1] = 0$. 

In the implementation above, when `n > 0` the expression `y[n-1]` is indeed pointing to the previously-computed output value. In the first iteration, however, `n` is equal to zero and so the expression `y[n-1]` is equivalent to `y[-1]`. Contrary to many other programming languages Python allows negative indexing so that  `y[-1]` actually points to the _last_ element in the array `y`. Since the output array `y` is pre-allocated and filled with zeros, `y[-1]` is indeed equal to zero as required.

### Testing the code

The impulse response of the Leaky Integrator is $h[n] = (1-\lambda)\lambda^n u[n]$ and we can verify that the above implementation is correct by comparing the theoretical values to the output of the function when the input is a (truncated) delta sequence:

In [None]:
N = 100
lam = 0.95
plt.stem(leaky(np.r_[1, np.zeros(N-1)], lam), label=r"leaky(x, $\lambda$)");
plt.plot((1 - lam) * lam ** np.arange(0, N), 'C2', label=r"$h[n]$");
plt.legend();

## The Moving Average

The causal Moving Average filter of length $M$ is described by the CCDE

$$
    y[n] = \frac{1}{M}\sum_{k=0}^{M-1}x[n-k].
$$

The Moving Average is an FIR filter, that is, each output sample is computed using only past input samples; in a practical implementation, values for $x[n]$ when $n<0$ are equal to zero because of the causality assumption for $x[n]$.

### Exercise: implement the Moving Average filter

Complete the code below

In [None]:
def mavg(x: np.ndarray, M: int) -> np.ndarray:
    ...        
    return y

In [None]:
# SOLUTION

def mavg(x: np.ndarray, M: int) -> np.ndarray:
    N = len(x)
    y = np.zeros(N)
    # prepend M-1 zeros to x so that we can always "go back" M samples to compute the average
    x = np.r_[np.zeros(M-1), x]
    for n in range(0, N):
        # we can use the built-in averaging function in numpy over the previous M samples
        y[n] = np.mean(x[n:n+M])        
    return y

Let's test your implementation:

In [None]:
y = mavg((-1) ** np.arange(0, 40), 20)  # test signal, filtered
print('good job!' if np.sum(y[1::2]) == 0 and np.sum(y[:20:2]) == 0.5 else 'Sorry, try again!')

# Applications


## Denoising

In a denoising scenario we have a "clean" signal $x[n]$ that has been corrupted by an additive noise signal $\eta[n]$; we only have access to $\hat{x}[n] = x[n] + \eta[n]$ and we would like to recover $x[n]$.

In general, without further assumptions, this is not a solvable problem. However, it is generally the case that the signal and the noise have very different characteristics and, in this case, we can try to reduce the amount of noise via filtering. Typically, if we look in the time domain:
 * the clean signal is varying slowly and smoothly
 * the noise is low-amplitude with respect to the signal and it varies very fast from one sample to the next.
 
These two characteristics translate to the following properties in the frequency domain:
 * the clean signal contains most of its energy in the low frequencies around zero
 * the noise has a full-band spectrum, with almost equal energy at all frequencies.

### A signal generator

The following function can be used to generate an $N$-point smooth signal together with a noise-corrupted version at the specified signal to noise ratio; the spectrum of the smooth signal will contain most of its energy in the $[-B\pi, B\pi]$ range. You don't need to worry about how the function works, simply use it as a black box.

In [None]:
def sig_gen(N: int, SNR: float, B=0.04, x=None) -> [np.ndarray, np.ndarray]:
    if x is None:
        X = np.r_[0, np.random.uniform(-1, 1, 2 * int(N * B) + 1)]
        x = np.real(np.fft.ifft(X, 2*N))[:N] / np.sqrt(2 * B / 3 / N)
    a = np.sqrt((3.0 / 8.0) / np.power(10, SNR / 10)) 
    return x, x + np.random.uniform(-a, a, len(x))

In [None]:
# Only for the very curious students!!

def sig_gen(N: int, SNR: float, B=0.04, x=None) -> [np.ndarray, np.ndarray]:
    if x is None:
        # build a smooth signal by creating a DFT vector X of length 2 * int(N * B) + 2 and
        #  taking a 2N-point inverse DFT. The resulting signal will be bandlimited to 2pi * B.
        # X[0] = 0 to ensure zero mean; other values randomly distributed over [-1, 1]
        X = np.r_[0, np.random.uniform(-1, 1, 2 * int(N * B) + 1)]
        # at this point the signal's energy is the number of nonzero DFT coefficients times
        #  their variance, divided by the number of time samples: (2NB)(1/3)/(2N) = B / 3
        # We take the real part only, so energy is B / 6
        # To get a signal with approx unit peak, normalize by sqrt of twice the power
        #  (pretending the signal is sinusoidal so that peak = RMS * sqrt(2))
        x = np.real(np.fft.ifft(X, 2*N))[:N] / np.sqrt(2 * B / 3 / N)
    # amplitude of the noise from the desired SNR, knowing that now signal energy is 0.125
    a = np.sqrt((3.0 / 8.0) / np.power(10, SNR / 10)) 
    return x, x + np.random.uniform(-a, a, len(x))

Use the following interactive widget to play with the SNR and the B parameters and try to get a feel for their effect on the  signal generated by the function:

In [None]:
def display(SNR=15, B=0.02):
    x, x_hat = sig_gen(1000, SNR, B, display.prev[1] if B == display.prev[0] else None)
    display.prev = [B, x]
    plt.plot(x, 'C0', lw=2, label='clean');
    plt.plot(x_hat, 'C3', lw=1, label='noisy');
    plt.ylim(-1.2,1.2);
    plt.legend(loc="upper right");

display.prev = [0, None]
    
interact(display, SNR=(0.0, 50.0), B=(0.01, 0.09, 0.01));

### Exercise: checking the SNR

Given a noise-corrupted signal $\hat{x}[n] = x[n] + \eta[n]$, the signal-to-noise ratio is expressed in dB and is computed as 

$$
    \text{SNR}_{\hat{x}} = 10 \log_{10}\left(\frac{E_x}{E_\eta}\right)
$$

where $E_x$ is the energy of the clean signal and $E_\eta$ is the energy of the noise. 

Generate a noisy signal and verify numerically that the SNR of the sequence returned by `sig_gen()` is indeed close to the SNR passed as an argument to the function.

In [None]:
N, SNR = 1000, 30
x, x_hat = sig_gen(N, SNR)
E_x = ...
E_eta = ...
SNR_exp = ...

In [None]:
# SOLUTION

x, x_hat = sig_gen(N, SNR)
E_x = np.sum(x ** 2)
E_eta = np.sum((x_hat - x) ** 2)
SNR_exp = 10 * np.log10(E_x / E_eta)
print(SNR_exp)

### Denoising: time-domain experiments

The following interactive widget allows you to play with the SNR of the noisy signal and with the parameter $\lambda$ of a leaky integrator to see the denoising performance of the filter in the time domain. Try to find a value for $\lambda$ that provides a good compromise between removal of the noise and preservation of the original clean signal.

In [None]:
def display(SNR=12, lam=0.5):
    if SNR != display.state[0]:
        display.state = [SNR, *sig_gen(500, SNR, x=display.state[1])]
    x, x_hat = display.state[1], display.state[2] 
    plt.plot(x_hat, 'C3', lw=1, label='noisy');
    plt.plot(x, 'C0', lw=2, label='clean');
    plt.plot(leaky(x_hat, lam), 'C2', lw=2, label='denoised');
    plt.ylim(-1.2,1.2)
    plt.legend(loc="upper right");
display.state = [0, None]
    
interact(display, SNR=(0.0, 50.0), lam=(0.49,0.99,0.02));

### Exercise: denoising in frequency

Plot the magnitude spectra of the clean, noisy, and denoised signals, together with the magnitude response of the leaky integrator using the values for SNR and for $\lambda$ that you chose before using the widget. 

Remember that the magnitude response of the Leaky Integrator is 

$$
    |H(e^{j\omega})| = \frac{(1-\lambda)}{\sqrt{1 - 2\lambda \cos\omega + \lambda^2}}.
$$

To obtain the plot:
 * plot the filter's magnitude response over the $[-\pi, \pi]$ interval using the analytic expression above
 * compute $E_x$, the energy of the clean signal, then normalize the clean, noisy and denoised signals by $E_x$ before computing their DFTs
 * plot the magnitude of the DFTs so that they are aligned with the frequency response of the filter (exactly in the same way as in Question 7 in Homework set 5)

In [None]:
# your code here

In [None]:
# SOLUTION
PTS, SNR, lam = 500, 12, 0.8

w = np.linspace(-np.pi, np.pi, 2000)
H = (1 - lam) / np.sqrt(1 - 2 * lam * np.cos(w) + lam**2)

x, x_hat = sig_gen(PTS, SNR)
d = leaky(x_hat, lam)
E_x = np.sum(x ** 2)

for sig, label, color in [(x, 'clean', 'C0'), (x_hat, 'noisy', 'C3'), (d, 'denoised', 'C2')]:
    plt.figure()
    S = np.abs(np.fft.fftshift(np.fft.fft(sig / E_x)))
    plt.plot(np.linspace(-np.pi, np.pi, len(S)), S, c=color, label=label);  # linspace(-pi, pi) is correct if PTS is even.
                                                                            # For PTS odd the bounds would have to be shifted slightly.
    plt.plot(w, H, 'C1', lw=3, label='mag resp')
    plt.xlabel('$\\omega$')
    plt.legend();

### Exercise: denoising with the Moving Average filter

Modify the time-domain denoising widget and the code you wrote for the previous exercise so that they use a Moving Average filter instead of a Leaky Integrator. Which filter works best in your opinion?

For the time-domain widget, we simply need to change the function parameter to receive $M$, the length of the Moving Average filter, and change the filtering function:

In [None]:
# SOLUTION

def display(SNR=12, M=5):
    if SNR != display.state[0]:
        display.state = [SNR, *sig_gen(500, SNR, x=display.state[1])]
    x, x_hat = display.state[1], display.state[2] 
    plt.plot(x_hat, 'C3', lw=1, label='noisy');
    plt.plot(x, 'C0', lw=2, label='clean');
    plt.plot(mavg(x_hat, M), 'C2', lw=2, label='denoised');
    plt.ylim(-1.2,1.2)
    plt.legend(loc="upper right");
display.state = [0, None]
    
interact(display, SNR=(0.0, 50.0), M=(2, 100));

For the plot of the various spectra, the only change is the formula for the frequency response of the filter, which in this case is

$$
    |H(e^{j\omega})| = \frac{1}{M}\left|\frac{\sin(M \omega/2)}{\sin(\omega / 2)}\right|.
$$

In [None]:
PTS, SNR, M = 500, 12, 20

w = np.linspace(-np.pi, np.pi, 2000)
H = np.abs(np.sin(M * w / 2) / np.sin(w / 2) / M)

x, x_hat = sig_gen(PTS, SNR)
d = mavg(x_hat, M)
E_x = np.sum(x ** 2)

for sig, label, color in [(x, 'clean', 'C0'), (x_hat, 'noisy', 'C3'), (d, 'denoised', 'C2')]:
    plt.figure()
    S = np.abs(np.fft.fftshift(np.fft.fft(sig / E_x)))
    plt.plot(np.linspace(-np.pi, np.pi, len(S)), S, c=color, label=label);  # linspace(-pi, pi) is correct if PTS is even.
                                                                            # For PTS odd the bounds need to be shifted slightly.
    plt.plot(w, H, 'C1', lw=3, label='mag resp')
    plt.legend();

## Detrending

In many (if not all) signal processing applications, we prefer signals to be _balanced,_ that is, we want the average value of the signal to be zero. Indeed, all physical processing devices (and digital devices in particular) can only deal with a finite range of possible input values before things like distortion or breakdown start to happen, and this nominal input range is usually centered around zero. If a signal is not balanced, it will not be able to fully use the available input range of a processing device.

As an example, assume that a processing device can only accept input values in the interval $[-1, 1]$; a signal $x[n]$ such that $\max_n\{x[n]\} = 0.8$ and $\min_n\{x[n]\} = -0.8$ will be processed without problems; but the unbalanced signal $y[n] = x[n] + 0.5$ will exceed the device's input limits even though the signal's range, $\max_n\{y[n]\} - \min_n\{y[n]\}$, is the same as for $x[n]$.

In these cases, a Leaky Integrator or a Moving Average filter can be used to _detrend_ a signal, that is, remove its estimated mean value to obtain a balanced signal.

As you may have noticed while playing with the interactive widget before, if you push the value of $\lambda$ very close to one in a Leaky integrator, the output of the filter tends to converge to the mean.

### Exercise: average estimation with Leaky Integrator and Moving Average

The following cell creates a signal and offsets it by a constant amount. Use a Leaky Integrator and a Moving Average filter to estimate the value of the offset in order to balance back the signal. Plot the value of the estimated mean and find the values for $\lambda$ and $M$ that provide a comparable performance.

In [None]:
offset = 0.5
x, _ = sig_gen(2000, 100)
x_off = x + offset

In [None]:
avg_li = ...
avg_ma = ...


In [None]:
# SOLUTION 

avg_li = leaky(x_off, 0.995)
avg_ma = mavg(x_off, 300)

plt.axhline(offset, label='true offset')
plt.plot(avg_li, 'C2', label='LI estimate' )
plt.plot(avg_ma, 'C3', label='MA estimate')
plt.legend();

In [None]:
plt.plot(x, 'C0', lw=3, label='ideal')
plt.plot(x_off - avg_li, 'C2', label='leaky' )
plt.plot(x_off - avg_ma, 'C3', label='m-avg')
plt.legend();

## VU meters

<div style="float: right; margin: 10px;"><img src="img/vumeter.gif" width="180"></div>

An analog [VU-meter](https://en.wikipedia.org/wiki/VU_meter), as the one shown on the right, is a device used in audio recording equipment to visually monitor the _short-term power_ of a signal, namely the power of the signal computed over a short time window spanning a few milliseconds of past data. 

For a discrete-time signal, the short-term power at time $n$ can be computed as 
$$
    p_M[n] = \frac{1}{M}\sum_{k = 0}^{M-1}|x[n-k]|^2
$$
which is clearly the result of filtering the _squared_ input signal with an $M$-point Moving Average.

### Exercise: short-term power estimation

If you remember our [previous lab on DTMF signals](https://github.com/LCAV/COM202/tree/main/05-DTMF) one of the most important steps in the decoding process was the _segmentation_ of the input signal to isolate the different digit tones. This was accomplished by computing the local power of the signal and by comparing it to a threshold to separate the silent gaps. Let's load a DTMF signal and play it:

In [None]:
fs, dtmf = wavfile.read('data/dtmf.wav')
IPython.display.Audio(dtmf, rate=fs)

Complete the function below so that it returns an estimate of the local power of the input signal over a window spanning the given number of milliseconds. 

In [None]:
def vu_meter(x: np.ndarray, fs: int, span_ms: float) -> np.ndarray:
    ...

In [None]:
# SOLUTION

def vu_meter(x: np.ndarray, fs: int, span_ms: float) -> np.ndarray:
    # convert milliseconds to samples
    M = int(span_ms * fs / 1000)
    return mavg(x**2, M)

Let's see the results:

In [None]:
plt.plot(dtmf, label="DTMF");
plt.plot(vu_meter(dtmf, fs, 20), label="VU Meter output");
plt.legend();

# A last trick for the road

Let's finish with a fun and surprising trick. Let's load an audio file and play it; you shouldn't hear anything:

In [None]:
fs, s = wavfile.read('data/testing.wav')
IPython.display.Audio(s, rate=fs)

What if we filter the signal before playing? 

In [None]:
M = 36
IPython.display.Audio(mavg(s, M), rate=fs)

Still nothing. But check this out:

In [None]:
IPython.display.Audio(mavg(s ** 2, M), rate=fs)

Cool, isn't it? Can you figure out what happened and why squaring the signal made the audio magically appear? 

If you feel like investigating, start by looking at plots of the signal both in the time and in the frequency domain, and then try to understand how the original signal was generated. If after a while you're still clueless, you may want to use this [hint](https://en.wikipedia.org/wiki/Crystal_detector). 

## Solution

OK, this is going to be long so take it as a little "detective story" in Signal Processing and proceed only if you are interested!

### An initial time-domain exploration

As a first step in our investigation, let's look at the mystery signal in the time domain:

In [None]:
plt.plot(s);

This plot is not helping much even though, if we zoom in to look at the details we can see that it is oscillating very fast:

In [None]:
chunk = s[50000:50480] 
plt.plot(chunk);

Note that the sampling frequency of the signal is 96 kHz, which is very high, and so the 480 samples shown in the previous plot correspond to only 5 ms of audio data. Yet, over this short time span the signal appears to oscillate more than 200 times, which corresponds to a frequency of at least 40 kHz, totally outside of the human hearing range.

By the way, you don't need to count the oscillations by hand; a good estimate is provided by half the number of zero crossings of the signal, and to find the number of zero crossings we need to simply count the number of times two successive samples have opposite sign:

In [None]:
np.sum(chunk[:-1] * chunk[1:] < 0)

### Moving to the frequency domain

If we move to the frequency domain, we can confirm that the mystery signal is indeed bandpass; we can use the plotting tricks developed in lab 4 to label the frequency axis to see that the spectral content occupies a small region centered around 44 kHz and extending by approximately 4000 Hz to the left and to the right of the center frequency.

In [None]:
L = int(len(s)/2)
S = np.abs(np.fft.fft(s))[:L]
plt.semilogy(np.linspace(0, fs/2, L), S);
print('peak at ', np.argmax(S) / L / 2 * fs)

A plausible hypothesis at this point is that the signal is a modulated audio signal and we can check this quickly by demodulating and playing the result:

In [None]:
demod = s * np.cos(2 * np.pi * (44000.0 / 96000.0) * np.arange(0, len(s)))
IPython.display.Audio(demod, rate=fs)

Bingo! But there are two more issues: how does the squaring operation manage to replace demodulation and why do we hear an annoying whistle?

### Demodulation via squaring

The first part is relatively easy: if we assume that the original signal is of the form $s[n] = x[n]\cos(\omega_c n)$, where $\omega_c = 2\pi (44000 / 96000) = (11/12)\pi$ is the carrier frequency, then 

$$
   s^2[n] = x^2[n]\cos^2(\omega_c n) = \frac{1}{2}x^2[n] + \frac{1}{2}x^2[n]\cos(2\omega_c n)
$$

The squaring operation, therefore, performs a demodulation of sorts and returns a baseband signal ($x^2[n]$) plus a modulated copy at twice the original carrier frequency. There are some details to keep in mind though: in general, the square of a _balanced_ audio signal will be a severely distorted version of the original, since the negative portions of the waveform become positive. You can check this using the original signal used to create this example:

In [None]:
fs_o, x_orig = wavfile.read('data/speech.wav')
x_orig = x_orig / 32768.0  # rescale the range over [-1, 1]
IPython.display.Audio(x_orig, rate=fs_o)

In [None]:
IPython.display.Audio(x_orig * x_orig, rate=fs_o)

However, we can avoid this problem if we ensure that the signal is always positive:

In [None]:
offset = 1
x = x_orig + offset
IPython.display.Audio(x * x, rate=fs_o)

Now, when this positive signal is modulated, we will obtain

$$
    s[n] = x[n]\cos(\omega_c n) = \cos(\omega_c n) + x_{\text{orig}}[n]\cos(\omega_c n)
$$

and the first term in the expression for $s[n]$ explains the prominent spectral lines at $\pm\omega_c$ that were apparent in the spectrum of the modulated signal. But, with this trick, when we "demodulate" with a squaring operation, the baseband copy $x^2[n]$ won't sound distorted. 

### The irritating "whistle"

We still have to understand (and fix) the fact that we hear an audible whistle if we try to play $s^2[n]$ as-is, without filtering. It turns out that this problem is due the combined effects of the signal's amplitude offset and of aliasing. 

Remember that the carrier frequency here is $\omega_c = (11/12)\pi$ so that $2\omega_c > \pi$; because of the $2\pi$ periodicity of discrete-time sinusoids, $\cos(2\omega_c n) = \cos(2(11/12)\pi n) = \cos((\pi/6) n)$ and so the spectrum of the demodulated signal will contain a strong spectral line at $\omega_w = \pi/6$.

At our sampling rate of 96 kHz, this corresponds to a frequency $f_w = 8$ kHz, which is perfectly (and disturbingly) audible. 

In [None]:
S = np.abs(np.fft.fft(s * s))[:L]
plt.plot(np.linspace(0, fs/2, L), S);

To remove this component we need to filter out the cross-modulation terms around $\omega_w$; unfortunately, since the original signal extends up to 4 kHz, a simple Leaky Integrator would not work because its transition band is too wide. You can try it here and see that no value of $\lambda$ will be able to remove enough of the whistle while preserving the speech signal.

In [None]:
lam = 0.98
IPython.display.Audio(leaky(s * s, lam), rate=fs)

In [None]:
plt.plot(np.linspace(0, fs/2, L), S / np.max(S), label="signal spectrum");

w = np.linspace(0, np.pi, L)[1:]
plt.plot(
    fs * w / np.pi / 2, 
    (1 - lam) / np.sqrt(1 - 2 * lam * np.cos(w) + lam**2),
    label="leaky integrator response",
);
plt.legend();

Normally in this situation we should use a more advanced filter with a steep transition band, such as an elliptic filter with cutoff frequency above 4 kHz:

In [None]:
b, a = sp.ellip(6, .1, 60, 5000 / fs)
IPython.display.Audio(sp.lfilter(b, a, s * s), rate=fs)

In [None]:
plt.plot(np.linspace(0, fs/2, L), S / np.max(S), label="signal spectrum");

wb, Hb = sp.freqz(b, a, L);
plt.plot(fs * wb / np.pi / 2, np.abs(Hb), label="filter spectrum");
plt.legend();

In this case, however, we can be clever and use a simple Moving Average because:
 * the frequency of the whistle is a rational multiple of $2\pi$.
 * we know that the frequency response of a Moving Average of length $M$ will be exactly zero at all multiples of $2\pi/M$. 
 
The spectral line causing the whistle sound is at $\omega_w = \pi/6$, so any MA filter where $M$ is a multiple of 12 will kill it. We want to choose $M$ as large as possible but sufficiently small to preserve the speech signal; $M=36$ seems to be a good compromise:

In [None]:
plt.plot(np.linspace(0, fs/2, L), S / np.max(S), label="signal spectrum");

M = 36
w = np.linspace(0, np.pi, L)[1:]
plt.plot(
    fs * w / np.pi / 2, 
    np.abs(np.sin(M * w / 2) / np.sin(w / 2) / M),
    label="MA spectrum",
);
plt.legend();