### PNS ELEC4 EIEL821 Spectral Analysis 

# Chapter 1: Random processes - solutions

This notebook reviews some basic concepts of stochastic processes, including stationarity and ergodicity.

## Random realizations

* Generate 100 realizations of the random process 

$$X(t) = sin(w_0t+\theta)$$

with $w_0 = 2\pi/100$, where each realization is composed of 200 samples, $t = 0, 1, \dots, 199$. Random variable $\theta$ is uniformly distributed in the interval $[0, 2\pi[$. The samples of the $i$th realization will be stored along the $i$th row of a matrix X with dimensions $100\times 200$.


* Plot two realizations.

In [None]:
#%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt

pi = np.pi

K = 100      # number of realizations
T = 200      # total length of signal
T0 = 100     # signal period in samples
Tplot = 200  # samples to plot
X = np.zeros((K, T))
             
# generate random realizations
for i in range(K): # same as "for i in np.arange(K):" in this case
    np.random.seed(i)
    X[i, :] = np.sin(2*pi*np.arange(T)/T0 + 2*pi*np.random.random())

Xabsmax = np.max(abs(X))

# plot two signal realizations
for i in range(2):
    plt.figure(figsize = (10, 4))
    plt.stem(X[i, range(Tplot)])
    plt.xlabel('$t$')
    plt.ylabel(r'$x(t, \theta_' + str(i) + ')$')
    plt.title(r'$X(t)$, realization $\theta_' + str(i) + '$')
    plt.axis([-1,Tplot, -1.1*Xabsmax, 1.1*Xabsmax])
    plt.grid()
    

## Mean and variance

From a finite number of realizations $\{x(t, \theta_k)\}_{k=1}^K$, the mean and variance can be approximated by the **sample averages** as follows:
$$\mu_X(t) = \int_{-\infty}^{+\infty}xf_X(x;t)dx = \int_{-\infty}^{+\infty} x(t, \theta)f_\Theta(\theta)d\theta \approx \frac{1}{K} \sum_{k=1}^K x(t, \theta_k)$$

$$\sigma^2_X(t) = \int_{-\infty}^{+\infty}[x - \mu_X(t)]^2 f_X(x;t)dx = \int_{-\infty}^{+\infty} [x(t, \theta) - \mu_X(t)]^2f_\Theta(\theta)d\theta \approx \frac{1}{K} \sum_{k=1}^K [x(t, \theta_k) - \mu_X(t)]^2 = \frac{1}{K} \sum_{k=1}^K x(t, \theta_k)^2 - \mu^2_X(t)$$

where $\{\theta_k\}_{k=1}^K$ are the realizations of random variable $\theta$ characterizing the random process.

* Compute and plot the mean $\mu_X(t)$ and variance $\sigma^2_X(t)$ of random process $X(t)$.


* What can we say about the variability of these statistics?

*Remark 1:* results may show some variance due to estimation from a finite sample size.

*Remark 2:* the sample averages used here are not to be confused with the time averages used when studying and exploiting ergodicity (see exercise below). 

In [None]:
Tval = np.arange(T)
mu = np.zeros(T)
sg2 = np.zeros(T)

mu = X.mean(axis = 0) # compute average over realizations (axis = 0)
X2 = X**2
sg2 = X2.mean(axis = 0) - mu**2

# plot mean
plt.figure(figsize = (10, 4))
plt.stem(Tval, mu)
plt.xlabel('$t$')
plt.ylabel('$\mu_X(t)$')
plt.title('Mean, $\mu_X(t)$')
plt.axis([0, T, -1.1*Xabsmax, 1.1*Xabsmax]);
plt.grid()
    
# plot variance
plt.figure(figsize = (10, 4))
plt.stem(Tval, sg2)
plt.xlabel('$t$')
plt.ylabel('$\sigma_X^2(t)$')
plt.title('Variance, $\sigma_X^2(t)$')
plt.axis([0, T, -1.1*Xabsmax, 1.1*Xabsmax]);
plt.grid()

## Autocorrelation sequence (ACS)

The ACS is defined as $$R_X(t_1, t_2) = \mathrm{E}\{X(t_1)X(t_2)\}.$$

When analyzing real data, the expectation can be approximated by the sample average:
$$\mathrm{E}\{X(t_1)X(t_2)\} = \iint_{-\infty}^{+\infty} x_1x_2f_X(x_1, x_2; t_1, t_2)dx_1dx_2 = \int_{-\infty}^{+\infty} x(t_1, \theta)x(t_2, \theta)f_\Theta(\theta)d\theta \approx \frac{1}{K} \sum_{k=1}^K x(t_1, \theta_k)x(t_2, \theta_k).$$

* Compute and plot the autocorrelation sequence for $t_1 = 0$ and $t_2 \in [0, 100[$. 


* Repeat for $t_1 = 10$ and then for $t_1 = 20$.

In [None]:
T1 = 21   # maximum number of lags in t1
T2 = 100  # number of lags in t2

T1val = np.array([0, 10, 20])
T2val = np.arange(T2)
Rx = np.zeros((T1, T2))

for t1 in T1val:
    for t2 in T2val: # same as "for t in range(T2):" in this case
        c = 0
        for k in range(K):
            c += X[k, t1]*X[k, t2]
        Rx[t1, t2] = c/K
        
Rxabsmax = abs(Rx).max()

# plot ACS
for t1 in T1val:
    plt.figure(figsize = (10, 4))
    plt.stem(T2val, Rx[t1, :])
    plt.xlabel('$t_2$')
    plt.ylabel('$R_X(' + str(t1) + ', t_2)$')
    plt.title('Autocorrelation sequence, $t_1 = $' + str(t1))
    plt.axis([0, T2, -1.1*Rxabsmax, 1.1*Rxabsmax]);
    plt.grid()

## Wide sense stationarity (WSS)
A random process $X(t)$ is said to be WSS if $\mu_X(t)$ is constant and $R_X(t_1, t_2)$ only depends on the time lag $\tau \overset{\mathrm{def}}{=} (t_2 - t_1)$.

* Show $R_X(t_1, t_2)$ for different values of $(t_1, t_2)$ with the same lag $\tau = (t_2 - t_1)$. Justify that process $X(t)$ is WSS.

*Remark:* results for different values of $(t_1, t_2)$ with the same time lag may show some variance due to the finite sample size used in ACS estimation. 

In [None]:
print('>>> Lag tau = 0')
print('R_X(0, 0) = ', round(Rx[0, 0], 4))
print('R_X(10, 10) = ', round(Rx[10, 10],4))
print('R_X(20, 20) = ', round(Rx[20, 20],4))

print('>>> Lag tau = 10')
print('R_X(0, 10) = ', round(Rx[0, 10],4))
print('R_X(10, 20) = ', round(Rx[10, 20], 4))
print('R_X(20, 30) = ', round(Rx[20, 30], 4))

print('>>> Lag tau = 20')
print('R_X(0, 20) = ', round(Rx[0, 20], 4))
print('R_X(10, 30) = ', round(Rx[10, 30], 4))
print('R_X(20, 40) = ', round(Rx[20, 40], 4))


* Compute and plot the autocorrelation sequence $R_X(\tau) = R_X(0, \tau)$ for $\tau\in [-100, 100]$.

*Hint:* the ACS presents Hermitian symmetry, i.e., $R_X(-\tau) = R_X^*(\tau)$, where $(\cdot)^*$ denotes complex conjugation.

In [None]:
Rxtau = Rx[0, :]
Rxtau = np.concatenate((Rxtau[range(T2-1, 0, -1)], Rxtau)) # exploit even symmetry to get negative lag values
tau = np.arange(-T2+1, T2)

plt.figure(figsize = (10, 4))
plt.stem(tau, Rxtau)
plt.xlabel(r'$\tau$')
plt.ylabel(r'$R_X(\tau)$')
plt.title(r'Autocorrelation sequence $R_X(\tau)$')
plt.axis([-T2, T2, -1.1*max(abs(Rxtau)), 1.1*max(abs(Rxtau))]);
plt.grid()

## Ergodicity

**Time averages** are defined as

$$m_X \overset{\mathrm{def}}{=}{}<x(t)>{}= \frac{1}{T}\sum_{t=0}^{T-1} x(t)$$

$$\Gamma_X(\tau) \overset{\mathrm{def}}{=} {} <x(t)x(t+\tau)> {} = \frac{1}{T}\sum_{t=0}^{T-1} x(t)x(t+\tau).$$


* For any realization of the random process, compute the above time averages. Compute $\Gamma_X(\tau)$ in the interval $\tau\in]-100, 100[$.


* Compare time averages $m_X$ and $\Gamma_X(\tau)$ with ensemble averages $\mu_X$ and $R_X(\tau)$. Is the process ergodic?

In [None]:
m = X.mean(axis = 1)          # compute average along time (axis = 1)
Gamma = np.zeros((K, 2*T2-1))

for k in range(K):
    for i in range(-T2+1, T2):
        for t in range(max([0, -i]), min([T, T-i])):
            Gamma[k, T2-1+i] += X[k, t]*X[k, t+i]

Gamma /= T

k = 0   # chosen realization for visualization
print('m_X = ', round(m[k], 4))
print('mu_X = ', round(mu.mean(), 4))

plt.figure(figsize = (10, 4))
plt.stem(tau, Rxtau)
plt.stem(tau, Gamma[k,:], 'r')
plt.xlabel(r'$\tau$')
plt.ylabel(r'$R_X(\tau), \Gamma_X(\tau)$')
plt.title('Ergodicity - ACS estimates from sample and time averages')
plt.legend(('$R_X$', '$\Gamma_X$'))
plt.axis([-T2, T2, -1.1*max(abs(Rxtau)), 1.1*max(abs(Rxtau))]);
plt.grid()


## Estimation accuracy: bias, variance and mean square error

Time averages $m_X$ and $\Gamma_X(\tau)$ can be considered as estimates of the ensemble averages $\mu_X$ and $R_X(\tau)$, respectively.

Each realization $x(t, \theta_k)$ of the random process $X(t)$ generates a different value of $m_X$ and $\Gamma_X(\tau)$. Hence, these quantities are random variables.

* Compute the bias, variance and mean square error (MSE) of $m_X$ and $\Gamma_X(\tau)$ as estimators of $\mu_X$ and $R_X(\tau)$, respectively. Plot the results for $\Gamma_X(\tau)$ as a function of $\tau$.


* Why does the estimation error degrade as $|\tau|$ increases?


In [None]:
mbias = m.mean() - mu.mean() # compute mean along realizations
mvar = m.var()               # compute variance along realizations
mmse = mbias**2 + mvar

print('bias{m_X} =', round(mbias, 4))
print('var{m_X} =', round(mvar, 4))
print('MSE{m_X} = ', round(mmse, 4))

Gammabias = Gamma.mean(axis = 0) - Rxtau
Gammavar = Gamma.var(axis = 0)
Gammamse = Gammabias**2 + Gammavar

plt.figure(figsize = (10, 4))
plt.stem(tau, Gammabias)
plt.stem(tau, Gammavar, 'r')
plt.stem(tau, Gammamse, 'g')
plt.xlabel(r'$\tau$')
plt.ylabel(r'$\Gamma_X(\tau)$ - bias, variance and MSE')
plt.title('Ergodicity - ACS estimate from time average')
plt.legend(('bias', 'variance', 'MSE'))
plt.axis([-T2, T2, -1.1*max(abs(Rxtau)), 1.1*max(abs(Rxtau))]);
plt.grid()


## Further work

* Repeat the above experiments with a Gaussian random process $X(t)$, where each realization $x(t, \theta)$ is composed of a sequence of independent and identically distributed normalized Gaussian random variables.


* Justify the shape of the ACS and anticipate the shape of the PSD. Why is this process called *'white'*?

In [None]:
# generate random realizations
for i in range(100):
    np.random.seed(i)
    X[i, :] = np.random.normal(size = T)

Xabsmax = np.max(abs(X))

# plot two signal realizations
for i in range(2):
    plt.figure(figsize = (10, 4))
    plt.stem(X[i, range(Tplot)])
    plt.xlabel('$t$')
    plt.ylabel(r'$x(t, \theta_' + str(i) + ')$')
    plt.title(r'$X(t)$, realization $\theta_' + str(i) + '$')
    plt.axis([-1,Tplot, -1.1*Xabsmax, 1.1*Xabsmax])
    plt.grid()


# etc
# .
# .
# .
