## Problem 2.1: Signal Processing as Linear Algebra

We need to express the convolution operation between:
- A discrete time signal $x_n$ of length $N$
- An FIR filter of length $L$ with coefficients $θ_l$
as a matrix product $y = X\theta$

### Solution

#### Filter Coefficient Vector $\theta$
$\theta$ is constructed as a column vector:

$$\theta = \begin{bmatrix} 
θ_0 \\
θ_1 \\
\vdots \\
θ_{L-1}
\end{bmatrix}$$

#### Input Signal Matrix $X$
$X$ is constructed as a matrix where each row represents the signal values involved in one convolution operation:

$$X = \begin{bmatrix}
x_0 & 0 & 0 & \cdots & 0 \\
x_1 & x_0 & 0 & \cdots & 0 \\
x_2 & x_1 & x_0 & \cdots & 0 \\
\vdots & \vdots & \vdots & \ddots & \vdots \\
x_{N-1} & x_{N-2} & x_{N-3} & \cdots & x_{N-L}
\end{bmatrix}$$

The dimensions of the matrices will be:
- $X$: $(N) \times (L)$ matrix
- $\theta$: $(L) \times (1)$ vector
- $y$: $(N) \times (1)$ vector

This construction ensures that each element of the output $y$ is computed as:

$$y_n = \sum_{l=0}^{L-1} x_{n-l}θ_l$$

Which is equivalent to the convolution operation.

## Problem 2.2: Biased vs Unbiased Parameter Estimation

### Mean Square Error Decomposition

Let's decompose the Mean Square Error (MSE) into bias and variance components.

Definition:
$$MSE = E[(ˆθ - θ_o)^2]$$

Add and subtract $E[ˆθ]$ inside the squared term:
$$MSE = E[(ˆθ - E[ˆθ] + E[ˆθ] - θ_o)^2]$$

Expanding:
$$MSE = E[(ˆθ - E[ˆθ])^2 + (E[ˆθ] - θ_o)^2 + 2(ˆθ - E[ˆθ])(E[ˆθ] - θ_o)]$$

The cross term's expectation is zero:
$$MSE = E[(ˆθ - E[ˆθ])^2] + (E[ˆθ] - θ_o)^2$$

This gives us the final decomposition:
$$MSE = \underbrace{Var(ˆθ)}_{\text{Variance}} + \underbrace{(Bias(ˆθ,θ_o))^2}_{\text{Squared Bias}}$$

### Analysis of Biased vs Unbiased Estimators

For minimizing MSE:

1. **Unbiased Estimators**:
   - Have $Bias(ˆθ,θ_o) = 0$
   - MSE equals variance: $MSE = Var(ˆθ)$
   - Often have higher variance to maintain unbiasedness

2. **Biased Estimators**:
   - Have non-zero bias but potentially lower variance
   - Can achieve lower MSE despite bias
   - Examples like Ridge Regression deliberately introduce bias to reduce variance

#### Conclusion
A biased estimator is often better suited for minimizing MSE because:
1. It can achieve a better trade-off between bias and variance
2. Small bias is often acceptable if it leads to substantial variance reduction
3. In practice, the total MSE is what matters for prediction accuracy

## Problem 2.3: All Norms are Convex Functions

## Setup and Definitions

1. **Norm Definition**: A norm $\|\cdot\|$ on a vector space $V$ satisfies:
   - Positive definiteness: $\|x\| \geq 0$ and $\|x\| = 0$ iff $x = 0$
   - Homogeneity: $\|αx\| = |α|\|x\|$ for all scalars $α$
   - Triangle inequality: $\|x + y\| \leq \|x\| + \|y\|$

2. **Convexity Definition**: A function $f$ is convex if for any two points $x_1, x_2$ and any $t \in [0,1]$:
   $$f(tx_1 + (1-t)x_2) \leq tf(x_1) + (1-t)f(x_2)$$

## Proof


- Take any two points $x_1, x_2$ in the vector space
- Take any $t \in [0,1]$

### Apply Triangle Inequality
Start with the left side of the convexity inequality:
$$\|tx_1 + (1-t)x_2\|$$

### Apply Homogeneity
$$\|tx_1 + (1-t)x_2\| = \|t(x_1) + (1-t)(x_2)\|$$

### Apply Triangle Inequality
$$\|t(x_1) + (1-t)(x_2)\| \leq \|t(x_1)\| + \|(1-t)(x_2)\|$$

### Apply Homogeneity Again
$$\|t(x_1)\| + \|(1-t)(x_2)\| = |t|\|x_1\| + |1-t|\|x_2\|$$

### SSimplify
Since $t \in [0,1]$:
- $|t| = t$
- $|1-t| = 1-t$

Therefore:
$$|t|\|x_1\| + |1-t|\|x_2\| = t\|x_1\| + (1-t)\|x_2\|$$

### Final Result
Putting it all together:
$$\|tx_1 + (1-t)x_2\| \leq t\|x_1\| + (1-t)\|x_2\|$$

This is exactly the definition of convexity for the norm function.

## Problem 2.4 Correlation functions

Given:
- AR(1) process: $x_n = ax_{n-1} + v_n$
- New signal: $y_n = x_n + b$
- Conditions:
  - $a \in ]-1,1[$ (AR coefficient)
  - $v_n$ is white noise with variance $σ^2_v$
  - $E[x_n] = 0$
  - $b \in \mathbb{R}$


For an AR(1) process with $E[x_n] = 0$:
1. Variance of $x_n$: $σ^2_x = \frac{σ^2_v}{1-a^2}$
2. Auto-correlation function of $x_n$: $r_x(k) = σ^2_x a^{|k|}$

## Computing $E[y_n]$

$$\begin{align*}
E[y_n] &= E[x_n + b] = 0 + b = b
\end{align*}$$

## Deriving Auto-correlation Function $r_y(k)$

The auto-correlation function for $y_n$ is:
$$r_y(k) = E[(y_n - E[y_n])(y_{n-k} - E[y_{n-k}])]$$

Substituting:
$$\begin{align*}
r_y(k) &= E[(x_n + b - b)(x_{n-k} + b - b)] = E[x_n x_{n-k}] = r_x(k) = σ^2_x a^{|k|} = \frac{σ^2_v}{1-a^2} a^{|k|}
\end{align*}$$

## Computing $r_y(-2)$ for Given Values

Given:
$a = 0.8$, $b = 0.5$, $σ^2_v = 1$

First, compute $σ^2_x$:
$$σ^2_x = \frac{σ^2_v}{1-a^2} = \frac{1}{1-0.8^2} = \frac{1}{1-0.64} = \frac{1}{0.36} = 2.778$$

Then, compute $r_y(-2)$:
$$\begin{align*}
r_y(-2) &= σ^2_x a^{|−2|} = 2.778 \times (0.8)^2 = 2.778 \times 0.64 = 1.778
\end{align*}$$

The auto-correlation function for signal $y_n$ is:
$$r_y(k) = \frac{σ^2_v}{1-a^2} a^{|k|}$$

For the given values:
$$r_y(-2) = 1.778$$