# Notes on Week 8 Lectures: Building Blocks

## Matrices

### M1: Introduction to Vectors and Matrices

To denote the $i^{\text{th}}$ row of matrix $A$ by

$$A_{i \bullet}$$

Similarly, we denote the $j^{\text{th}}$ column of matrix $A$ by

$$A_{\bullet j}$$

If $v$ is a $(p \times 1)$ column-vector and $x$ is a $(1 \times p)$ row-vector, then

$$v \cdot x = Y$$

is called the **outer product** and $Y$ is a $(p \times q)$ matrix.

We denote a column-vector whose entries are all 1 by $\iota$. That is

$$\iota = \begin{bmatrix} 1 \\ \vdots \\ 1 \end{bmatrix}$$

For this course, we sometimes refer to this as **the unit vector** (note this is in contrast to the typical meaning of unit vector: $\vert v \vert = 1$).

### M2: Special Matrix Operations

For this course, the transpose of $A$ may be denoted $A'$ or by $A^T$.

### M3: Vectors and Differentiation

The gradient of a function $f(\mathbf{b}) = f(b_1, \ldots, b_q)$ is the column-vector

$$ \frac{\partial f}{\partial \mathbf{b}} (\mathbf{b}) = \begin{bmatrix} \frac{\partial f}{\partial b_1} (\mathbf{b}) \\ \vdots \\ \frac{\partial f}{\partial b_q} (\mathbf{b}) \end{bmatrix} $$

The **Hessian** is given by

$$ \frac{\partial f}{\partial \mathbf{b} \partial \mathbf{b}^T} (\mathbf{b}) = \begin{bmatrix} \frac{\partial f}{\partial b_1 \partial b_1} (\mathbf{b}) & \ldots & \frac{\partial f}{\partial b_1 \partial b_q} (\mathbf{b}) \\ \vdots & \ddots & \vdots \\ \frac{\partial f}{\partial b_q \partial b_1} (\mathbf{b}) & \ldots & \frac{\partial f}{\partial b_q \partial b_q} (\mathbf{b}) \end{bmatrix} $$

When $f$ is smooth the Hessian is symmetric.

## Probability

### P1: Random Variables

When given $n$ random variables, there are $\frac{n(n-1)}{2}$ covariance combinations of different random variables since

$$ {n \choose 2} = \frac{n!}{2!(n-2)!} = \frac{n(n-1)}{2} $$

and, of course, $n$ variances.

We denote the covariance matrix by

$$
    \Sigma =
        \begin{bmatrix}
            \sigma_{11}^2 & \ldots & \sigma_{1n}^2 \\
            \vdots & \ddots & \vdots \\
            \sigma_{n1}^2 & \ldots & \sigma_{nn}^2
        \end{bmatrix}
$$

### P2: Probability Distributions

If $x$ and $y$ are independent, then

$$
\begin{align*}
    E[xy] &= E[x] E[y] \\
    E[g(x) h(y)] &= E[g(x)] E[h(y)]
\end{align*}
$$

If $x$ is normally distributed with mean $\mu$ and variance $\sigma^2$ we write

$$ x \sim N(\mu, \sigma^2) $$

If $x$ is an $n \times 1$ vector of normally distributed random variables with mean vector $\mu_{n \times 1}$ and variance matrix $\Sigma_{n \times n}$ then we write

$$ x \sim N(\mu, \Sigma) $$

where, in particular, $x_i \sim N(\mu_i, \sigma_i^2)$.

If $ y_{m \times 1} = A_{m \times n} x + b_{m \times 1}$, then

$$ y \sim N(A\mu+b, A \Sigma A^T) $$

If $y_i \sim N(\mu, \sigma^2)$ for every $i$ then we say the $y_i$ are normally and identically distributed and denote this by NID. That is

$$ y_i \sim \text{NID}(\mu, \sigma^2) $$

When the distribution is not necessarily normal we write IID and say the variables are independent and identically distributed.

If $z = \sum_{i=1}^n y_i^2$ where $y_i \sim \text{NID}(0,1)$ then $z \sim \chi^2(n)$. That is, $z$ is a chi-square distribution with $n$ degrees of freedom.

If $y \sim N(0,1)$ and $z \sim \chi^2(\nu)$ with $y,z$ independent, then

$$ \frac{y}{ \sqrt{ \frac{z}{\nu} } } \sim t(\nu) $$

where $t(\nu)$ is the Student's $t$ distribution with $\nu$ degrees of freedom.

Note that

$$ \lim_{\nu \rightarrow \infty} t(\nu) = N(0,1) $$

Let $z_1 \sim \chi^2(d_1)$ and $z_2 \sim \chi^2(d_2)$ be independent. Then

$$ \frac{z_1/d_1}{z_2/d_2} \sim F(d_1,d_2) $$

where $F(d_1,d_2)$ is an $F$-distribution with $(d_1,d_2)$ degrees of freedom.

## Parameter Estimation

### S1: Parameter Estimation

For the purposes of this course, we must always start with an assumption about what kind of distribution observations come from, and that all observations are IID.

For example, we might have a sample (observations) $\left\{ y_1, y_2, \ldots , y_n \right\}$ and we would assume that $y_i \sim NID(\mu, \sigma^2)$.

A **statistic** is a function of random variables

$$
    g(y) = g(y_1, y_2, \ldots , y_n)
$$

If $y_i$ are observations (which are random variables) $g$ would be a function of those observations. Thus a statistic is a random variable.

An **estimator** is a statistic related to a parameter (for example $\mu$).

In general we denote an estimator by $\hat{\theta}$ for parameter $\theta$.

If $E[\hat{\theta}] = \theta$ then we say $\hat{\theta}$ is unbiased. Otherwise we say it is biased and the bias is equal to $E[\hat{\theta}] - \theta$.

We say that $\hat{\theta}$ is an **efficient estimator** when $\text{var}[\hat{\theta}]$ is lowest over a set of estimators.

Given observations $\left\{ y_1, \ldots, y_n \right\}$ (NID), if we define

$$
    m = \frac{1}{n} \sum_{i=1}^n y_i
$$

Then $E[m] = \mu$ so that $m$ is an unbiased estimator of $\mu$.

Given the same observations, recall $\sigma^2 = \text{var}[y_i] = E[(y_i - \mu)^2]$. We want to find an estimator of $\sigma^2$. We do not know $\mu$ but we can estimate with $m$. Thus, define

$$
    z_i = y_i - m
$$

It follows that $z$ is a linear transformation of $y$ since

$$
    z = y - \iota m = y - \iota \cdot \frac{1}{n} \iota^T y = \left( I_n - \frac{1}{n} \iota \iota^T \right) y = My
$$

It is not hard to show that

- $M$ is symmetric: $M = M^T$
- $\text{tr}(M) = n-1$
- $M$ is idempotent: $M^2 = M$

which can be used to show that, by linearity

$$
    z \sim N(0, \sigma^2M)
$$

We want to find an unbiased estimator for $\sigma^2$ using $z$. The derivation in these notes will differ from the lecture, which uses vector and matrix calculations. What we want to do is see if $E[\Sigma_{i=1}^n z_i^2] = E[z^tz] = \sigma^2$ and if it isn't, perhaps we can find a way to transform $z^T z$ to get an unbiased estimator. Keep in mind, we also want that this estimator is not a function of $\mu$ since we don't know $\mu$. To that end, observe that

$$
    \sum_{i=1}^n (y_i - m)^2 = \sum_{i=1}^n y_i^2 - nm^2
$$

From training exercise P1.4 (a) we know that $\sigma_{m}^2 = \frac{\sigma^2}{n}$. Thus

$$
\begin{align*}
    E\left[ \sum_{i=1}^n z_i^2 \right] &= E\left[ \sum_{i=1}^n (y_i - m)^2 \right] \\
    &= E\left[ \sum_{i=1}^n y_i^2 - nm^2 \right] \\
    &= \sum_{i=1}^n E\left[ y_i^2 \right] - nE\left[ m^2 \right] \\
    &= \sum_{i=1}^n \sigma^2 + \mu^2 - n \left( \frac{\sigma^2}{n} + \mu^2 \right)\\
    &= n \left( \sigma^2 + \mu^2  \right) - n \left( \frac{\sigma^2}{n} + \mu^2 \right) \\
    &= n \sigma^2 + n \mu^2 - \sigma^2 - n \mu^2 \\
    &= (n-1) \sigma^2
\end{align*}
$$

This implies then that

$$
    E\left[ \frac{1}{n-1} z^T z \right] = \sigma^2
$$

so that $\frac{1}{n-1} z^T z$ is an unbiased estimator of $\sigma^2$. Thus we define

$$
    s^2 = \frac{1}{n-1} \sum_{i=1}^n (y_i - m)^2
$$

as our estimator for $\sigma^2$ and it is unbiased:

$$
    E[s^2] = \sigma^2
$$

It can be shown that

$$
    \frac{z^T z}{\sigma^2} \sim \chi^2(n-1)
$$

and $m$ and $s^2$ are independent.

We say that an estimator $\hat{\theta}$ is **consistent** when

$$
    \lim_{n \rightarrow \infty} E[\hat{\theta}] = \theta \hspace{1mm} \text{ and } \hspace{1mm} \lim_{n \rightarrow \infty} \text{var}[\hat{\theta}] = 0
$$

It can be shown that both $m$ and $s^2$ are consistent.