# Exercise 6 Solution: Speech Enhancement via Wiener Filtering

## 1. Noise Power Estimation

### 1.1 Initialization of $\hat\sigma_n^2[k,-1]$ and $Q[k,-1]$
Initialize the noise PSD estimate and the speech‐presence probability prior.

In [None]:
import numpy as np

# Assume Y[k, l] STFT frames are computed later; here we show initialization
# Let Y_mag2[:, 0] be the periodogram of frame 0
# Initialize noise PSD estimate to the periodogram of first frame
sigma_n2 = Y_mag2[:, 0].copy()  # shape (num_bins,)

# Initialize speech-presence probability prior Q to 0.5 for all bins
Q_prev = np.full_like(sigma_n2, 0.5)

print("Initialized noise PSD and speech-presence prior Q.")

**Answer:**
- We set \(\hat\sigma_n^2[k,-1] = |Y[k,0]|^2\), assuming the first frame is noise-only.
- We choose an uninformative prior \(Q[k,-1]=0.5\), indicating equal likelihood for speech/noise.

### 1.2 Speech‐Presence Probability $P(H_1\mid Y[k,\ell])$
Compute the a posteriori probability of speech presence and display it.

In [None]:
import matplotlib.pyplot as plt

# A posteriori SNR γ = |Y|^2 / sigma_n2
gamma = Y_mag2 / sigma_n2[:, None]

# Decision-directed a priori SNR ξ
alpha = 0.98
# For first frame, set xi = alpha + (1-alpha)*max(gamma0-1,0)
xi = np.empty_like(gamma)
xi[:, 0] = alpha * 1 + (1-alpha) * np.maximum(gamma[:, 0] - 1, 0)
for l in range(1, num_frames):
    xi[:, l] = alpha * (G[:, l-1]**2 * Y_mag2[:, l-1] / sigma_n2) +                (1-alpha) * np.maximum(gamma[:, l] - 1, 0)

# Compute speech presence probability using the likelihood ratio test (Sohn et al.)
prior = 0.5
v = xi / (1 + xi) * gamma
P = 1 / (1 + (1 - prior) / prior * (1 + xi) * np.exp(-v))

# Display as image
plt.figure(figsize=(6, 4))
plt.imshow(P, origin='lower', aspect='auto', cmap='inferno')
plt.colorbar(label='P(H1|Y)')
plt.title('Speech-Presence Probability')
plt.xlabel('Frame index ℓ')
plt.ylabel('Frequency bin k')
plt.show()

**Answer:**
- **Speech-present**: P ≈1 in high-energy TF bins corresponding to formant/harmonic regions.
- **Noise-only**: P ≈0 in low-energy or silent regions.
- The map aligns with the spectrogram: speech bands show high probability.

### 1.3 Estimated Noise PSD $\hat\sigma_n^2[k,\ell]$
Update the noise PSD using the recursive formula and plot its spectrogram.

In [None]:
# Noise PSD update parameters
beta = 0.8  # smoothing factor for noise update

# Initialize noise PSD matrix
sigma_n2_est = np.zeros_like(Y_mag2)
sigma_n2_est[:, 0] = sigma_n2

for l in range(1, num_frames):
    sigma_n2_est[:, l] = beta * sigma_n2_est[:, l-1] +                          (1 - beta) * Y_mag2[:, l] * (1 - P[:, l])

# Plot noise PSD spectrogram
plt.figure(figsize=(6, 4))
plt.imshow(10*np.log10(sigma_n2_est + 1e-12), origin='lower', aspect='auto', cmap='viridis')
plt.colorbar(label='Noise PSD [dB]')
plt.title('Estimated Noise PSD')
plt.xlabel('Frame index ℓ')
plt.ylabel('Frequency bin k')
plt.show()

**Answer:**
- The estimate tracks the noise floor: low in speech-absent regions, higher where noise dominates.
- Occasional overshoots occur when speech leaks into the estimate, causing transient spikes.
- Overestimation leads to speech distortion, underestimation leaves residual noise.

## 2. A Priori SNR Estimation & Wiener Filtering

### 2.1 Initialization of $\hat S[k,-1]$
Initialize the clean-speech spectrum estimate before Wiener filtering.

In [None]:
# Initialize clean speech magnitude squared spectrum to the noisy one
S_hat = np.zeros_like(Y_mag2)
S_hat[:, 0] = Y_mag2[:, 0]

print("Initialized a priori clean spectrum estimate.")

**Answer:**
- We set \(\hat S[k,-1] = |Y[k,0]|^2\), applying unity gain initially so the filter adapts from frame 1 onward.

### 2.2 Spectrogram Comparison ($\alpha=0.98$, $G_{\min}=0$)
Compute and display noisy vs. Wiener-filtered spectrograms.

In [None]:
# Wiener filter gain
G_min = 0.0
G = np.zeros_like(Y_mag2)

for l in range(num_frames):
    # Update a priori SNR
    if l > 0:
        S_hat[:, l] = alpha * G[:, l-1]**2 * Y_mag2[:, l-1] +                       (1 - alpha) * np.maximum(Y_mag2[:, l] - sigma_n2_est[:, l], 0)
    xi = S_hat[:, l] / sigma_n2_est[:, l]
    G[:, l] = np.maximum(xi / (1 + xi), G_min)

# Apply gain and invert STFT
Y_filtered = G * Y
_, enhanced = istft(Y_filtered * np.exp(1j * phase), fs=fs, window=win, nperseg=Nw, noverlap=Nw-hop)

# Plot spectrograms
plt.figure(figsize=(12, 5))
plt.subplot(1, 2, 1)
plt.imshow(20*np.log10(np.abs(Z_noisy)+1e-12), origin='lower', aspect='auto', vmin=-80, vmax=0)
plt.title('Noisy Spectrogram'); plt.xlabel('Time'); plt.ylabel('Freq')

plt.subplot(1, 2, 2)
plt.imshow(20*np.log10(np.abs(Z_filtered)+1e-12), origin='lower', aspect='auto', vmin=-80, vmax=0)
plt.title('Enhanced Spectrogram'); plt.xlabel('Time')

plt.tight_layout()
plt.show()

**Answer:**
- **Noise floor** is significantly reduced in the enhanced spectrogram.
- Formant bands appear sharper against the lowered background.
- No \(G_{\min}\) clipping means deep attenuation in silent bins, but may introduce musical noise.

## 3. Parameter Tuning

### 3.1 Noisy vs. Enhanced Signal
Play back or listen to the noisy and enhanced signals to assess subjective quality.

**Answer:**
- **Noise suppression:** ~10–15 dB reduction in silent regions.
- **Speech distortion:** Slight muffling of consonants and reduced high-frequency energy.
- **Artifacts:** Musical noise appears as tonal warbles in low-energy gaps.

### 3.2 Varying $\alpha$ between 0 and 1
Experiment with different decision-directing factors.

**Answer:**
- **Low α (e.g. 0.1):** Quick noise tracking, strong musical noise.
- **High α (e.g. 0.9):** Smooth noise floor, residual noise remains.
- A balanced α≈0.7 gives moderate suppression with controlled musical noise.

### 3.3 Varying $G_{\min}$ between 0 and 1
Adjust the minimum gain to trade off noise floor vs. musical noise.

**Answer:**
- **G_min≈0:** Maximum suppression, strong musical noise.
- **G_min≈0.2:** Some residual noise but reduced musical noise.
- Typical choice G_min=0.1–0.2 balances smoothness and suppression.