#### <span style="color: yellow; ">Regression by linear combination of basis functions</span>

We got the data points $({\bf x}_{1}, {\bf y}_{1}), ..., ({\bf x}_{N}, {\bf y}_{N})$ where ${\bf x}_{n} \in \mathbb{Z}^{d_{in}}$ and ${\bf y}_{n} \in \mathbb{Z}^{d_{out}}$. The task of regression is to estimate a function $f$ taking ${\bf x}_{n}$ as an argument and ${\bf y}_{n}$ as a returned value.

In order to look for $f$, we define a set of basis functions ${\phi}_{0}, ..., {\phi}_{P-1}$ in the form of a linear combination:
$$
{\hat f}({\bf x}_{n}) = {\sum}_{i = 0}^{P-1}{\phi}_{i}({\bf x}_{n})w_{i}
$$
, where $w_{i}$ is a weight vector to be optimized.

#### <span style="color: yellow; ">Different Basis Functions</span>

We consider the following three basis functions:
* <span style="color: red; ">**Linear Basis Function**</span>
* <span style="color: red; ">**Polynomial Basis Function**</span>
* <span style="color: red; ">**Gaussian Basis Function**</span>

The simplest one is <span style="color: red; ">**Linear Basis Function**</span>:
$$
{\phi}_{i}({\bf x}_{n}) = \left\{\begin{array}{ll}
                                    1 & i = 0 \\
                                    {[{\bf x}_{n}]}_{i} & {\rm Otherwise}
                                 \end{array}
                           \right. \\
{\hat f}({\bf x}_{n}) = {\sum}^{d_{in}}_{i = 0}{\phi}_{i}({\bf x}_{n})w_{i} = w_{0}+{[{\bf x}_{n}]}_{1}w_{1}+...+{[{\bf x}_{n}]}_{d_{in}}w_{d_{in}}
$$
, where ${[{\bf x}_{n}]}_{i}$ denotes the ${i}$'th component of ${\bf x}_{n}$.

Another possible choice of basis functions is <span style="color: red; ">**Polynomial Basis Function**</span>. The multidimentional case is complicated, because $i$ has to be multi-index $i = (i_{1}, ..., i_{d_{in}})$ as follows:
$$
{\phi}_{i}({\bf x}_{n}) = {\prod}^{d_{in}}_{j} {({[{\bf x}_{n}]}_{j})}^{{i}_{j}} \\

{\hat f}({\bf x}_{n}) = {\sum}_{i=(i_{1}, ..., i_{d_{in}})}{\phi}_{i}({\bf x}_{n})w_{i} \nonumber \\
=w_{(0, ..., 0)} 
+ {[{\bf x}_{n}]}_{1}w_{(1, 0, ..., 0)}+...+{[{\bf x}_{n}]}_{d_{in}}w_{(0, ..., 0, 1)} 
+ {({[{\bf x}_{n}]}_{1})}^{2}w_{(2, 0, ..., 0)} + {[{\bf x}_{n}]}_{1}{[{\bf x}_{n}]}_{2}w_{(1, 1, ..., 0)} + {[{\bf x}_{n}]}_{1}{[{\bf x}_{n}]}_{3}w_{(1, 0, 1, ..., 0)} + ...
$$

Here we introduce $d_{p}$ as indicating the order of this function. Thus, 

$$
d_{p} = {\sum}^{d_{in}}_{j} {i}_{j}
$$

The last basis we consider is <span style="color: red; ">**Gaussian Basis Function**</span>:
$$
{\phi}_{i}({\bf x}_{n}) = \left\{\begin{array}{ll}
                                    1 & i = 0 \\
                                    e^{-{||{\bf x}_{n}-{\bf x}_{i}||}^{2}/(2{\sigma}^{2})} & {\rm Otherwise}
                                 \end{array}
                           \right.
$$
, where ${\sigma}$ is a pre-set variance parameter. ${\bf z}$ can be anywhere in $\mathbb{Z}^{8}$ so we must assume ${\bf z}={\bf {\bf x}_{i}}$. Hence,
$$
{\hat f}({\bf x}_{n}) = {\sum}^{d_{in}}_{i = 0}{\phi}_{i}({\bf x}_{n})w_{i} = w_{0}+e^{-{||{\bf x}_{n}-{\bf x}_{1}||}^{2}/(2{\sigma}^{2})}w_{1}+...+e^{-{||{\bf x}_{n}-{\bf x}_{d_{in}}||}^{2}/(2{\sigma}^{2})}w_{d_{in}}
$$
As before, ${\phi}_{0}$ is still a constant function ${\phi}_{0}({\bf x}_{n}) = 1$.

#### <span style="color: yellow; ">Solution</span>

To simplify the loss function, we now introduce the vectors:
$$
{\bf w} = {[w_{0}, w_{2}, ..., w_{P-1}]}^{T}
$$
and the matrices:
$$
{\bf Y} = {[{\bf y}_{1}, {\bf y}_{2}, ..., {\bf y}_{N}]}^{T}
\\
\\
{\bf \Phi} = 
    \left[\begin{array}{c}
        {\phi}_{0}({\bf x}_{1}) & {\phi}_{1}({\bf x}_{1}) & {\phi}_{2}({\bf x}_{1}) & \cdots & {\phi}_{P-1}({\bf x}_{1}) \\
        {\phi}_{0}({\bf x}_{2}) & {\phi}_{1}({\bf x}_{2}) & {\phi}_{2}({\bf x}_{2}) & \cdots & {\phi}_{P-1}({\bf x}_{2}) \\
        \vdots & \vdots & \vdots & \ddots & \vdots \\
        {\phi}_{0}({\bf x}_{N}) & {\phi}_{1}({\bf x}_{N}) & {\phi}_{2}({\bf x}_{N}) & \cdots & {\phi}_{P-1}({\bf x}_{N}) \\
    \end{array}\right]
$$
the latter matrix ${\bf \Phi}$ is named <span style="color: red; ">**Design Matrix**</span>.

We define <span style="color: red; ">**a loss function**</span> as follows:
$$
{\rm Loss}({\bf w}) = \frac{1}{N}({||{\bf Y} - {\bf \Phi w}||}^{2} + \lambda {||{\bf w}||}^{2})
$$
, where $\lambda {||{\bf w}||}^{2}$ is a L2-Regularization term.


This model computes the optimal(means minimizing the loss) weight vector ${\bf w}^{*}$ by the following formula:
$$
{\bf w}^{*} = {({\bf \Phi}^{T}{\bf \Phi} + \lambda {\bf I})}^{-1}{\bf \Phi}^{T}{\bf Y}
$$
, where ${\bf I}$ is an identity matrix.