# Exercise 5 Solution: Cepstral Techniques

## 1. Generating Voiced Excitation Source
Generate a 32 ms pulse sequence for a voiced source at 100 Hz.

In [None]:
import numpy as np
import matplotlib.pyplot as plt

# 1. Sampling rate and window length
fs = 16000  # Hz
duration_ms = 32  # ms
N = int(duration_ms/1000 * fs)  # number of samples (32 ms)

# 2. Fundamental frequency and period
f0 = 100  # Hz
period_samples = int(fs / f0)  # samples per pitch period

# 3. Generate voiced excitation u[n]
u = np.zeros(N)
u[::period_samples] = 1  # impulse at each period

# 4. Plot the pulse train
plt.figure(figsize=(8, 2))
plt.stem(np.arange(N)/fs*1000, u, basefmt=" ")
plt.xlabel('Time [ms]')
plt.title(f'Voiced Excitation: {f0} Hz Pulse Train (32 ms)')
plt.show()


**Answer:**
- Pulses occur every 1/f₀ = 10 ms (160 samples).
- In 32 ms, we observe ~4 pulses, modeling the glottal source for voiced speech.

## 2. Artificial Speech Generation and Spectra
Filter the excitation through the vocal tract coefficients and compare spectra.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.signal import lfilter, freqz

# 1. Load filter coefficients
a = np.loadtxt('filter-data.txt')  # LPC filter a coefficients

# 2. Filter the excitation to create speech signal s[n]
s = lfilter([1.0], a, u)

# 3. Apply Hann window
w = np.hanning(N)
uw = u * w
sw = s * w

# 4. FFT parameters
Nfft = N
freqs = np.fft.rfftfreq(Nfft, 1/fs)

# 5. Compute spectra
w_h, H = freqz([1.0], a, worN=Nfft, fs=fs)
Uw = np.fft.rfft(uw, n=Nfft)
Sw = np.fft.rfft(sw, n=Nfft)

# 6. Plot amplitude and log-amplitude spectra
eps = np.finfo(float).eps
plt.figure(figsize=(12, 6))

plt.subplot(2, 3, 1)
plt.plot(w_h, np.abs(H)); plt.title('|H(ω)|'); plt.xlabel('Hz')

plt.subplot(2, 3, 2)
plt.plot(freqs, np.abs(Uw)); plt.title('|U_w(ω)|'); plt.xlabel('Hz')

plt.subplot(2, 3, 3)
plt.plot(freqs, np.abs(Sw)); plt.title('|S_w(ω)|'); plt.xlabel('Hz')

plt.subplot(2, 3, 4)
plt.plot(w_h, 20*np.log10(np.abs(H)+eps)); plt.title('20log|H|'); plt.xlabel('Hz')

plt.subplot(2, 3, 5)
plt.plot(freqs, 20*np.log10(np.abs(Uw)+eps)); plt.title('20log|U_w|'); plt.xlabel('Hz')

plt.subplot(2, 3, 6)
plt.plot(freqs, 20*np.log10(np.abs(Sw)+eps)); plt.title('20log|S_w|'); plt.xlabel('Hz')

plt.tight_layout(); plt.show()


**Answers:**
- **|H(ω)|**: Formant peaks of the vocal tract filter.
- **|U_w(ω)|**: Harmonics at multiples of f₀.
- **|S_w(ω)|**: Harmonic structure shaped by filter envelope.
- **Log spectra** emphasize envelope and harmonic detail.

## 3. Real Cepstrum Computation
Compute the real cepstra of the filter, excitation, and speech signals.

In [None]:
import numpy as np
import matplotlib.pyplot as plt

eps = np.finfo(np.float64).eps

# Compute cepstra
ch = np.real(np.fft.ifft(np.log(np.abs(H)+eps)))
cu = np.real(np.fft.ifft(np.log(np.abs(Uw)+eps)))
cs = np.real(np.fft.ifft(np.log(np.abs(Sw)+eps)))

plt.figure(figsize=(10, 6))
plt.subplot(3,1,1); plt.stem(ch, basefmt=" "); plt.title('Cepstrum of Filter')
plt.subplot(3,1,2); plt.stem(cu, basefmt=" "); plt.title('Cepstrum of Excitation')
plt.subplot(3,1,3); plt.stem(cs, basefmt=" "); plt.title('Cepstrum of Speech')
plt.tight_layout(); plt.show()


**Answers:**
- Cepstrum is symmetric since log|X| is even.
- Periodic excitation yields cepstral peaks at quefrency = pitch period.

## 4. Fundamental Frequency Estimation from Cepstrum
Locate the quefrency peak corresponding to f₀.

In [None]:
# Peak search range: 0.5ms–20ms
min_q = int(fs*0.0005); max_q = int(fs*0.02)
cs_search = cs.copy()
cs_search[:min_q] = -np.inf; cs_search[max_q:] = -np.inf

peak = np.argmax(cs_search)
f0_est = fs/peak
print(f"Estimated f₀: {f0_est:.2f} Hz")  # should ≈100 Hz


**Answer:** Peak at ~160 samples → f₀≈100 Hz, matching original.
### a) Is there any symmetry within the real cepstrum of the signals? Why?

**Answer:**  
The real cepstrum is defined as  
\[
c[n] = \Re\{\mathrm{IDFT}\{\ln|X(e^{j\omega})|\}\}.
\]  
For any real time-domain signal \(x[n]\), its magnitude spectrum \(|X(e^{j\omega})|\) is an even function of frequency (\(|X(e^{j\omega})|=|X(e^{-j\omega})|\)). Taking the logarithm preserves that even symmetry, and the inverse DFT of a real, even spectrum yields a **real and even** sequence in quefrency. Hence

\[
c[n] = c[-n]\quad(\text{or, in discrete periodic form, }c[k] = c[N-k]),
\]

and you observe mirror symmetry about \(n=0\).

---

### b) Why is the real (and also the complex) cepstrum of any real-valued time-domain signal real-valued?

**Answer:**  
1. **Real cepstrum:**  
   - We compute \(\ln|X(e^{j\omega})|\), which is a real, even function of \(\omega\).  
   - Its inverse DFT therefore produces a purely **real** sequence \(c[n]\).

2. **Complex cepstrum:**  
   - We compute \(\ln X(e^{j\omega})\), including both magnitude and (unwrapped) phase.  
   - For a real \(x[n]\), the spectrum obeys Hermitian symmetry:  
     \[
       X(e^{j\omega}) = X^*(e^{-j\omega}).
     \]  
   - Consequently \(\ln X(e^{j\omega})\) satisfies  
     \[
       \ln X(e^{j\omega}) = \bigl[\ln X(e^{-j\omega})\bigr]^*,
     \]  
     i.e. it is Hermitian-symmetric.  
   - The inverse DFT of a Hermitian-symmetric function is real.  

In both cases, the inherent spectral symmetry of real signals forces the cepstrum to be real-valued.

---

### c) For voiced sounds, a cepstral peak at a distinct position can be observed. Why is that so? Explain how the position of the peak is related to the fundamental frequency.

**Answer:**  
- **Periodic excitation:** Voiced sounds are produced by a quasi-periodic impulse train at period \(T_0 = 1/f_0\).  
- **Harmonic spectrum:** A periodic pulse train in time yields a comb of harmonics in frequency at multiples of \(f_0\).  
- **Log spectrum ripple:** Taking \(\ln|S(e^{j\omega})|\) introduces a periodic ripple in frequency with spacing \(\Delta\omega = 2\pi f_0\).  
- **Cepstral impulse:** The inverse DFT of a sinusoidal ripple of frequency \(f_0\) in the log-magnitude domain produces a sharp impulse at quefrency  
  \[
    n_0 = \frac{f_s}{f_0}\quad\text{(samples)},
  \]  
  corresponding exactly to the pitch period \(T_0\).

Thus the cepstral peak appears at the quefrency index \(n_0\), and you can recover the fundamental frequency via  
\[
f_0 = \frac{f_s}{n_0}.
\]








Ask ChatGPT




## 5. Cepstral Liftering and Reconstruction
Separate filter and source using cepstral liftering.

In [None]:
import numpy as np
import matplotlib.pyplot as plt

def lifter(cep, L):
    N = len(cep)
    ch = np.zeros(N); ch[:L+1] = cep[:L+1]; ch[-L+1:] = cep[-L+1:]
    cu = cep - ch
    return ch, cu

L_ms = 1.5; L = int(L_ms/1000*fs)
ch, cu = lifter(cs, L)
H_est = np.exp(np.fft.fft(ch)); U_est = np.exp(np.fft.fft(cu))

plt.figure(figsize=(8,4))
plt.plot(freqs, 20*np.log10(np.abs(H)+eps), label='Orig H')
plt.plot(freqs, 20*np.log10(np.abs(H_est)+eps), '--', label='Est H'); plt.legend()
plt.title('Filter Envelope Estimation via Liftering'); plt.xlabel('Hz'); plt.ylabel('dB')
plt.show()


**Answer:** L = 1.5 ms gives best envelope match; liftering separates slow-varying filter from fast-varying excitation.

## 6. Real Speech Cepstrogram and Liftering
Apply to `speech1.wav`.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.io import wavfile
from scipy.signal import stft

fs_s, x_s = wavfile.read('speech1.wav'); x_s = x_s.astype(float)
Nw = int(0.032*fs_s); hop = int(0.016*fs_s)
f_s, t_s, Z_s = stft(x_s, fs_s, window='hann', nperseg=Nw, noverlap=Nw-hop)
log_spec = 20*np.log10(np.abs(Z_s)+eps)

plt.figure(figsize=(10,4))
plt.pcolormesh(t_s, f_s, log_spec, shading='gouraud')
plt.title('Real Speech Log Spectrogram'); plt.xlabel('Time [s]'); plt.ylabel('Frequency [Hz]'); plt.colorbar()
plt.tight_layout(); plt.show()

# Cepstrogram
num_fr = Z_s.shape[1]
ceps = np.zeros((num_fr, Nw))
for i in range(num_fr):
    mag = np.abs(Z_s[:,i])
    ceps[i,:] = np.real(np.fft.ifft(np.log(mag+eps), n=Nw))

plt.figure(figsize=(10,4))
plt.imshow(ceps.T, aspect='auto', origin='lower', extent=[0, t_s[-1], 0, Nw/fs_s])
plt.title('Cepstrogram'); plt.xlabel('Time [s]'); plt.ylabel('Quefrency [s]'); plt.colorbar()
plt.tight_layout(); plt.show()


**Answers:**
- Spectrogram shows formants and harmonics.
- Cepstrogram shows periodic ridges at quefrencies ~2–8 ms (pitch²).