# Least Squares for Response Surface Work

Link: [http://charlesreid1.com/wiki/Empirical_Model-Building_and_Response_Surfaces#Chapter_3:_Least_Squares_for_Response_Surface_Work](http://charlesreid1.com/wiki/Empirical_Model-Building_and_Response_Surfaces#Chapter_3:_Least_Squares_for_Response_Surface_Work)

## Method of Least Squares

Least squares helps you to understand a model of the form:

$$
y = f(x,t) + \epsilon
$$

where:

$$
E(y) = \eta = f(x,t)
$$

is the mean level of the response $y$ which is affected by $k$ variables $(x_1, x_2, ..., x_k) = \mathbf{x}$

It also involves $p$ parameters $(t_1, t_2, ..., t_p) = \mathbf{t}$

$\epsilon$ is experimental error

To examine this model, experiments would run at n different sets of conditions, $x_1, x_2, ..., x_n$

would then observe corresponding values of response $y_1, y_2, ..., y_n$

Two important questions:

1. does postulated model accurately represent the data?

2. if model does accurately represent data, what are best estimates of parameters t?

start with second question first

-----

Given: function $f(x,t)$ for each experimental run

$n$ discrepancies:

$$
{y_1 - f(x_1,t)}, {y_2 - f(x_2,t)}, ..., {y_n - f(x_n,t)}
$$

Method of least squares selects best value of t that make the sum of squares smallest:

$$
S(t) = \sum_{u=1}^{n} \left[ y_n - f \left( x_u, t \right) \right]^2
$$

$S(t)$ is the sum of squares function

Minimizing choice of $t$ is denoted

$$
\hat{t}
$$

are least-squares estimates of $t$ good?

Their goodness depends on the nature of the distribution of their errors

Least-squares estimates are appropriate if you can assume that experimental errors:

$$
\epsilon_u = y_u - \eta_u
$$

are statistically independent and with constant variance, and are normally distributed

these are "standard assumptions"

## Linear models

This is a limiting case, where

$$
\eta = f(x,t) = t_1 z_1 + t_2 z_2 + ... + t_p z_p
$$

adding experimental error $\epsilon = y - \eta$:

$$
y = t_1 z_1 + t_2 z_2 + ... + t_p z_p + \epsilon
$$

model of this form is linear in the parameters

### Algorithm

Formulate a problem with $n$ observed responses, $p$ parameters...

This yields $n$ equations of the form:

$$
y_1 = t_1 z_{11} + t_2 z_{21} + ... \\
y_2 = t_1 z_{21} + t_2 z_{22} + ...
$$

etc...

This can be written in matrix form:

$$
\mathbf{y} = \mathbf{Z t} + \boldsymbol{\epsilon}
$$

and the dimensions of each matrix are:

* $y = n \times 1$
* $Z = n \times p$
* $t = p \times 1$
* $\epsilon = n \times 1$

the sum of squares function is given by:

$$
S(\mathbf{t}) = \sum_{u=1}^{n} \left( y_u - t_1 z_{1u} - t_2 z_{2u} - ... - t_p z_{pu} \right)^2
$$

or,

$$
S(t) = ( y - Zt )^{\prime} ( y - Zt )
$$

this can be rewritten as:

$$
\mathbf{ Z^{\prime} Z t = Z^{\prime} y }
$$

### Rank of Z

If there are relationships between the different input parameters $(z's)$, then the matrix $\mathbf{Z}$ can become singular

e.g. if there is a relationship $z_2 = c z_1$, then you can only estimate the linear combination $z_1 + c z_2$ 

reason: when $z_2 = c z_1$, changes in $z_1$ can't be distinguished from changes in $z_2$

$Z$ (an $n \times p$ matrix) is said to be full rank $p$ if there are no linear relationships of the form:

$$
a_1 z_1 + a_2 z_2 + ... + a_p z_p l= 0
$$

if there are $q > 0$ independent linear relationships, then $Z$ has rank $p - q$

## Analysis of Variance: 1 regressor

Assume simple model $y = \beta + \epsilon$

This states that $y$ is varying about an unknown mean $\beta$

Suppose we have 3 observations of $y$, $\mathbf{y} = (4, 1, 1)' $

Then the model can be written as $y = z_1 t + \epsilon$

and $z_1 = (1, 1, 1) '$

and $t = \beta$

so that

```
[ 4 ]   [ 1 ]     [ \epsilon_1 ]
[ 1 ] = [ 1 ] t + [ \epsilon_2 ]
[ 1 ]   [ 1 ]     [ \epsilon_3 ]
```

Supposing the linear model posited a value of one of the regressors t, e.g. $t_0 = 0.5$

Then you could check the null hypothesis, e.g. $H_0 : t = t_0 = 0.5$

If true, the mean observation vector given by $\eta_0 = z_1 t_0$

or,

```
[ 0.5 ]   [ 1 ]
[ 0.5 ] = [ 1 ] 0.5
[ 0.5 ]   [ 1 ]
```

and the appropriate "observation breakdown" (whatever that means?) is:

$$
y - \eta_0 = ( \hat{y} - \eta_0 ) + ( y - \hat{y} )
$$

Associated with this observation breakdown is an analysis of variance table:

{|
|Source
|Degrees of freedom (df)
|Sum of squares (square of length), SS
|Mean square, MS
|Expected value of mean square, E(MS)
|-
|Model
|1
|$\vert \hat{y} - \eta_0 \vert^2 = ( \hat{t} - t_0 )^2 \sum z_1^2$
|6.75
|$\sigma^2 + ( t - t_0 )^2 \sum z_1^2$

|-
|Residual
|2
|$\vert y - \hat{y} \vert^2 = \sum ( y - \hat{t} z_1 )^2$
|3.00
|$\sigma^2$

|-
|Total
|3
|$\vert y - \eta_0 \vert^2 = \sum ( y - \eta_0 )^2 = 12.75$
|
|
|}

Sum of squares: squared lengths of vectors

Degrees of freedom: number of dimensions in which vector can move (geometric interpretation)

The model $y = z_1 t + \epsilon$ says whatever the data is, the systematic part $\hat{y} - \eta_0 = ( \hat{t} - t_0) z_1$ of $y - \eta_0$ must lie in the direction of $z_1$, which gives $\hat{y} - \eta_0$ only one degree of freedom.

Whatever the data, the residual vector must be perpendicular to $z_1$ (why?), and so it can move in 2 directions and has 2 degrees of freedom

Now, looking at the null hypothesis: 

The component $\vert \hat{y} - \eta_0 \vert^2 = ( \hat{t} - t_0 )^2 \sum z^2$ is a measure of discrepancy between POSTULATED model $\eta_0 = z_1 t_0$ and ESTIMATED model $\hat{y} = z_1 \hat{t}$

Making "standard assumptions" (earlier), expected value of sum of squares, assuming model is true, is $( t - t_0 )^2 \sum z_1^2 + \sigma^2$

For the residual component it is $2 \sigma^2$ (or, in general, $\nu_2 \sigma^2$, where $\nu_2$ is number of degrees of freedom of residuals)

Thus a measure of discrepancy from the null hypothesis $t = t_0$ is $F = \frac{ \vert \hat{y} - \eta_0 \vert^2 / 1 }{ \vert y - \hat{y} \vert^2 / 2 }$

if the null hypothesis were true, then the top and bottom would both estimate the same $\sigma^2$

So if $F$ is different from 1, that indicates departure from null hypothesis

The MORE $F$ differs from 1, the more doubtful the null hypothesis becomes

## Least squares: 2 regressors

Previous model, $y = \beta + \epsilon$, said $y$ was represented with a mean $t$ plus an error.

Instead, suppose that there are systematic deviations from the mean, associated with an external variable (e.g. humidity in the lab).

Now equation is for straight line: $ y = \beta_0 + \beta_1 x + \epsilon$

or, $y = z_1 t_1 + z_2 t_2 + \epsilon$

So now the revised least-squares model is: $\eta = z_1 t_1 + z_2 t_2$

$\eta = E(y)$ - i.e. $\eta$ is in the plane defined by linear combinations of vectors $z_1, z_2$

because $z_1^{\prime} z_2 = \sum z_1 z_2 \neq 0$, these two vectors are NOT at right angles

The least-squares values $\hat{t_1}, \hat{t_2}$ produce a vector $\hat{\hat{y}} = z_1 \hat{t_1} + z_2 \hat{t_2}$

These least-squares values make the squared length $\sum ( y - \hat{\hat{y}} )^2 = \vert y - \hat{\hat{y}} \vert^2$ of the residual vector as small as possible

The normal equations express fact that residual vector must be perpendicular to both $z_1$ and $z_2$:

$$
z_1^{\prime} ( y - \hat{\hat{y}} ) = 0 \\
z_2^{\prime} ( y - \hat{\hat{y}} ) = 0
$$

also written as:

$$
\begin{align}
\sum z_1 ( y - \hat{t_1} z_1 - \hat{t_2} z_2 ) &=& 0 \\
\sum z_2 ( y - \hat{t_1} z_1 - \hat{t_2} z_2 ) &=& 0
\end{align}
$$

also written (in matrix form) as:

$$
\mathbf{Z^{\prime}} ( \mathbf{y - Z \hat{t} } ) = 0
$$



Now suppose the null hypothesis was investigated for $t_1 = t_{10} = 0.5$ and $t_2 = t_{20} = 1.0$

Then the mean observation vector $\eta_0$ is represented as $\eta_0 = t_{10} z_1 + t_{20} z_2$

$$
y - \eta_0 = \left( \hat{\hat{y}} - \eta_0 \right) + \left( y - \hat{\hat{y}} \right)
$$

and so

$$F_0 = \frac{ \vert \hat{\hat{y}} - \eta_0 \vert / 2 }{ \vert y - \hat{\hat{y}} \vert^2 / 1 } = 2.23
$$

## Orthogonalizing second regressor

In the above example, $z_1$ and $z_2$ are not orthogonal

One can find the vectors $z_1$ and $z_{2 \cdot 1}$ that are orthogonal

To do this, use least squares property that residual vector is orthogonal to space in which the predictor variables lie

Regard $z_2$ as "response" vector and $z_1$ as predictor variable

You then obtain $\hat{z_2} = 0.2 z_1$ (how?)

so the residual vector is $z_{2 \cdot 1} = z_2 - \hat{z_2} = z_2 - 0.2 z_1$

now the model can be rewritten as $\eta = \left( t_1 + 0.2 t_2 \right) z_1 + t_2 \left( z_2 - 0.2 z_1 \right) = t z_1 + t_2 z_{2 \cdot 1}$

This gives three least-squares equations:

1. $\hat{y} = 2 z_1$
2. $\hat{y} = 1.5 z_1 + 2.5 z_2$
3. $\hat{y} = 2.0 z_1 + 2.5 z_{2 \cdot 1}$

The analysis of variance becomes:

---

Source: Response function with $z_1$ only

DoF: 1

Sum of Squares (SS): $\vert \hat{y} - \eta_0 |vert^2 = \left( \hat{t} - t_0 \right)^2 \sum z_1^2 = 12.0$

Source: Extra due to $z_2$ (given $z_1$)

DoF: 1

SS: $\vert \hat{\hat{y}} - \hat{y} \vert^2 = \hat{t}_2^2 \sum z_{2 \cdot 1}^2 = 4.5$

Source: Residual

DoF: 1

SS: $\vert y - \hat{\hat{y}} \vert^2 = \sum \left( y - \hat{\hat{y}} \right)^2 = 1.5$

Source: Total

DoF: 3

SS: $\vert y - \eta_0 \vert^2 = \sum \left( y - \eta_0 \right)^2 = 18.0$


## Generalization to p regressors

With n observations and p parameters:

n relations implicit in response function can be written 

$$
\boldsymbol{\eta} = \mathbf{Z t}
$$

Assuming $Z$ is full rank, and letting $\hat{\mathbf{t}}$ be the vector of estimates given by normal equations

$$
\left( \mathbf{ y - \hat{y} } \right)^{\prime} \mathbf{Z} = \left( y - Z \hat{t} \right)^{\prime} Z = 0
$$

Sum of squares function is $S(t) = (y - \eta)^{\prime} (y - \eta) = (y - \hat{y})^{\prime} (y - \hat{y}) + ( \hat{y} - \eta )^{\prime} (\hat{y} - \eta)$

Because cross-product is zero from the normal equations

$$
S(t) = S(\hat{t}) + (\hat{t} - t)^{\prime} \mathbf{Z^{\prime} Z} ( \hat{t} - t )
$$

Furthermore, because $\mathbf{Z^{\prime} Z}$ is positive definite, $S(t)$ minimized when $t = \hat{t}$

So the solution to the normal equations producing the least squares estimate is the one where $t = \hat{t}$:

$$
\hat{t} = ( \mathbf{Z^{\prime} Z} )^{-1} \mathbf{Z^{\prime} y}
$$

----

Source: Response function

DoF: $p$

SS: $\vert \hat{y} - \eta \vert^2 = (\hat{t} - t)^{\prime} \mathbf{Z^{\prime} Z} ( \hat{t} - t )$

Source: Residual

DoF: $n - p$

SS: $\vert y - \hat{y} \vert^2 = \sum ( y - \hat{y} )^2 $

Source: Total

DoF: $n$

SS: $\vert y - \eta \vert^2 = \sum ( y - \eta )^2 $


## Bias in Least-Squares Estimators if Inadequate Model

Say data was being fit with a model $y = Z_1 t_1 + \epsilon$,

but the true model that should have been used is $y = Z_1 t_1 + Z_2 t_2 + \epsilon$

$t_1$ would be estimated by $\hat{t_1} = (\mathbf{ Z_1^{\prime} Z_1 } )^{-1} \mathbf{ Z_1^{\prime} y }$

but using true model, 

$$
\begin{array}{rcl}
E( \hat{t_1} ) &=& ( \mathbf{Z_1^{\prime} Z_1} )^{-1} \mathbf{Z_1^{\prime}} E(\mathbf{y}) \\
&=& ( \mathbf{ Z_1^{\prime} Z_1 } )^{-1} \mathbf{Z_1^{\prime}} (\mathbf{Z_1 t_1} + \mathbf{Z_2 t_2} ) \\
&=& \mathbf{t_1 + A t_2}
\end{array}
$$

The matrix A is the bias or alias matrix

$$
A = \left( \mathbf{ Z_1^{\prime} Z_1 } \right)^{-1} \mathbf{ Z_1^{\prime} Z_2 }
$$

Unless $A = 0$, $\hat{t_1}$ will represent $t_1$ AND $t_2$, not just $t_1$

$A = 0$ when $\mathbf{Z_1^{\prime} Z_2} = 0$, which happens if regressors in $\mathbf{Z_1}$ are orthogonal to regressors in $\mathbf{Z_2}$