### PNS ELEC4 EIEL821 Spectral Analysis 

# Chapter 3: Parametric estimation

This Jupyter notebook aims at studying parametric techniques for spectral analysis, with particular focus on **AR modeling**.


## Model mismatch


**[HAY96, p. 441]** The performance of parametric techniques strongly depends on the fitness of the model. If the assumed model is inappropriate for the process under analysis, the parametric approach will lead to inaccurate or misleading spectral estimates. To illustrate this phenomenon, let us first consider a stochastic process consisting of **two sinusoids in additive noise**:

$$X(n) = 10\sin(0.4\pi n + \phi_1) + 5\sin(0.6\pi n + \phi_2) + \nu(n)$$

where $\nu(n)$ is a zero-mean unit-variance white noise and random phases $\phi_1$ and $\phi_2$ are distributed uniformly in $[0, 2\pi]$ rad.


* Generate 50 realizations of $N = 64$ samples of process $X(n)$. Plot one realization.


* Justify why an AR(4) model would be appropriate for this process. Determine the AR model coefficients by solving the Yule-Walker equations based on biased ACS estimates. Derive the AR spectral estimate (in dB) of each signal realization. Superimpose the 50 spectral realizations in the same figure, with $\omega$ in the interval $[0, \pi]$ rad/sample.


* Compute the average spectral estimate and plot it (in dB) in a different figure. Compare the average with the theoretical PSD of the process considered in this exercise. Superimpose (in dB) the variance of the spectral realizations. Compute the average variance over the whole frequency range.


* Repeat using the periodogram as spectral estimator. Compare.


Now assume that the process under analysis is actually governed by the **MA(2) model**:

$$X(n) = \varepsilon(n) - \varepsilon(n-2)$$

with theoretical PSD given by

$$S_X(\omega) = \left|1 - \mathrm{e}^{-\jmath 2\omega}\right|^2 = (1 - \mathrm{e}^{-\jmath 2\omega})(1 - \mathrm{e}^{\jmath 2\omega}) = 2 - 2\cos(2\omega).$$


* Repeat the above exercise for 50 realizations of this MA(2) random process. Compare the theoretical PSD with the AR(4) and periodogram estimates. Conclude.


*Hint:* For the spectral representations, sample the frequency axis in $N_\mathrm{FFT} = 1024$ equally-spaced points in the interval $[0, 2\pi]$ rad/sample.


In [3]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.linalg import toeplitz

pi = np.pi
fft = np.fft.fft


# ======================================================================
def gen_twosin(w, R, N):
# Generates R random realizations of two sinusoids in random noise
#
# -- Output
# x[R, N] : R process realizations of N samples 
# 
# -- Input
# w[2] : frequency of sinusoids (rad/sample): [w1, w2]
# R    : number of realizations
# N    : number of samples

    # <TO BE COMPLETED>
    # .
    # .
    # .
    x = np.zeros((R, N))
    n = np.arange(N)
    A = [10, 5]
    for i in range(R):
        np.random.seed(i);
        x[i, :] = A[0]*np.sin(w[0]*n+2*pi*np.random.random())+A[1]*np.sin(w[1]*n+2*pi*np.random.random())+np.random.randn(1, N)
    return x

# ======================================================================
def gen_ma2(R, N):
# Generates R random realizations of MA(2) process
#
# -- Output
# x[R, N] : R process realizations of N samples 
# 
# -- Input
# R    : number of realizations
# N    : number of samples

    # <TO BE COMPLETED>
    # .
    # .
    # .
    
    return x

# ======================================================================
def plot_realization(x, info):
# Plots one signal realization
#
# x[R, N] : R realizations of N-sample random process
# info    : a string with information to show in plot title
# x
  
    [R, N] = x.shape
    
    # plot one signal realization
    xabsmax = np.max(abs(x))
    fig = plt.figure(figsize = (10, 4))
    fig.suptitle(info) #'A realization of random process X(t), N = ' + str(N) + ' samples')
    plt.stem(range(N), x[1,:])
    plt.xlabel(r'$t$')
    plt.ylabel(r'$x(t)$') 
    plt.axis([-1, N, -1.1*xabsmax, 1.1*xabsmax])
    plt.grid()
    
    return

# ======================================================================
def acs_estimates(x, max_lag):
# Computes biased autocorrelation sequence (ACS) estimates for input
# signal realization at given lags
#
# -- Output
# r[R, max_lag+1] : biased ACS estimats at lags 0 to max_lag for each of 
#                   the R input signal realizations
#
# -- Input
# x[R, N]      : input signal (R realizations; N samples per realization) 
# max_lag      : maximum time lag for which the ACS is to be estimated

    # <TO BE COMPLETED>
    # .
    # .
    # .
    for k in range (max_lag):
        c = np.zeros((R))
        for t in range (k, N):
            c+=x[:, t]*np.conjugate(x[:, t-k])
        r[:, k] = c
    r/=N
    return r

# ======================================================================
def ar(x, p, Nfft, verbose = True):
# Computes AR(p) spectral estimate of random process realizations
#
# -- Output
# Sx[R, Nfft] : R realizations of spectral estimates over Nfft frequency points in [0, 2*pi] rad/sample
# r[R, p+1]   : R realizations of ACS estimates, from lag 0 to p
# sg2[R]      : innovation variance (modeling error) for each realization
# 
# -- Input
# x[R, N] : R realizations of N-sample random process
# p       : AR model order
# Nfft    : number of FFT points
# verbose : if true, verbose operation
      
    # <TO BE COMPLETED>
    # .
    # .
    # .
    [R, N] = x.shape
    Sx = np.zeros((R, Nfft))
    sg2 = np.zeros((R, 1))
    w = np.arange(0, 2*pi, 2*pi/Nfft)
    r = acs_estimates(x, p)
    for i in range(R):
        Rx = toelplitz(r[i, :p])
        rp = r[i, 1:p+1]
        a = np.linalg(Rx, -rp)
        sg2[i].r[i, 0] + np.dot(np.conjugate(rp), a)
        expjw = np.exp(-1j*np.ma.outerproduct(np.arange(1, p+1), w))
        Sx[i, :] = sg2[i]/abs(1+np.dot(a, expjw))**2
    return Sx, r, sg2
    

# ======================================================================


N = 64       # total length of signal
R = 50       # number of realizations
w1 = 0.4*pi
w2 = 0.6*pi
max_lag = 50
p = 4        # AR model order
Nfft = 1024  # FFT length

x = gen_twosin([w1, w2], R, N)

# <TO BE COMPLETED>
# .
# .
# .



## Spectral line splitting

**[HAY96, p. 443]** An artifact that may be observed with the autocorrelation method is **spectral line splitting**. This artifact consists of the splitting of a single spectral peak into two (or more) well separate and distinct peaks. Typically, this phenomenon occurs when $X(n)$ is overmodeled, i.e., when the assumed value of $p$ is too large as compared with the actual value of $p$ of the underlying model.


To illustrate this phenomenon, consider the **AR(2) process** given by the difference equation:

$$
X(n) = -0.9X(n - 2) + \varepsilon(n)
$$

where $\varepsilon(n)$ is a zero-mean unit-variance white noise (innovation process).


* Generate 5 independent random realizations of $N = 64$ samples of this AR(2) process. 


* Repeat the experiment of the previous section assuming an AR(4) model, and then an AR(12) model. What can be observed?



In [None]:
# ======================================================================
def gen_ar2(R, N):
# Generates R random realizations of the AR(2) process
#
# -- Output
# x[R, N] : R process realizations of N samples 
# 
# -- Input
# R : number of realizations
# N : number of samples

    # <TO BE COMPLETED>
    # .
    # .
    # .
    
    return x

# ======================================================================

R = 5   # number of realizations
Sx_bounds = [-20, 50, -20, 50]

x = gen_ar2(R, N) # generate random realizations of AR(2) process

    # <TO BE COMPLETED>
    # .
    # .
    # .
    

## Model order selection

**Akaike's information criterion (AIC)**, the **minimum description length (MDL)** criterion and Akaike's **final prediction error (FPE)** criterion are given, respectively, by:

$$\mathrm{AIC}(p) = N\log \sigma^2_\varepsilon + 2p$$

$$\mathrm{MDL}(p) = N\log \sigma^2_\varepsilon + (\log N)p$$

$$\mathrm{FPE}(p) = \sigma^2_\varepsilon \frac{N + p + 1}{N - p - 1}.$$



* Consider the **AR(2) model** of the previous exercise. For the realizations of $N = 64$ samples, compute and plot the AIC, MDL and FPE criteria as a function of the assumed model order $p$ in the range $[1, 10]$. Estimate the modeling error as the average innovation variance over the available signal realizations. Compare the different model order selection methods. Do they confirm that $p = 2$ is the right order for this AR process?


* Repeat for the **two sinusoids in additive noise** studied at the beginning of this notebook. Do the order selection criteria confirm that AR(4) is a suitable model for this process?


* What about the selected order for the **MA(2) process**?


* Justify the shape of the curves for the different criteria in each case.


In [None]:
# ======================================================================
def model_order(x, pmax, info):
# Determines and plots model order selection criteria (AIC, MDL, FPE)
#
# -- Input
# x[R, N] : input signal realizations
# pmax    : maximum model order
# info    : string with additional information for plots

    # <TO BE COMPLETED>
    # .
    # .
    # .
    
    return

# ======================================================================


pmax = 10

x = gen_ar2(R, N)
info = 'AR(2) process'
model_order(x, pmax, info)

x = gen_twosin([w1, w2], R, N)
info = 'two sinusoids in additive noise'
model_order(x, pmax, info)

x = gen_ma2(R, N)
info = 'MA(2) process'
model_order(x, pmax, info)
