# Extreme Learning Machine

* Huang et al, 2004
* Single Hidden Layer Feedforward Neural Networks

Let **input data** $X = [x_1, x_2, \cdots, x_N]^T$, $x\in \mathbb{R}^M$,

s.t. $N$ is a number of input data and $M$ is a number of feature data.

Let $L$ be a number of **hidden node** , 

$D$ be a number of **output data**  

and $\beta \in \mathbb{R}^{L\times D}, \beta = [\beta_1, \beta_2, \cdots, \beta_L]^T$ be **weight** between hidden node and output data.

In [1]:
import numpy as np
import pandas as pd

activate function
$$h(x) = g(x, w, c)$$

* Sigmoid Function
$$g(x, w, c) = \frac{1}{1 + e^{-(wx + c)}}$$

* Gaussian Function 
$$g(x, w, c) = e^{-c\|x-w\|}$$

* Hyperbolic Tangent Function
$$g(x, w, c) = \frac{1 - e^{-(wx+c)}}{1 + e^{-(wx+c)}}$$

In [2]:
def sigmoid(x, w, c):
    return 1 / (1 + np.exp(-(np.dot(x, w) + c)))

def gaussian(x, w, c):
    return np.exp(-c * np.linalg.norm(x - w, 'fro'))

def hyperbolic_tangent(x, w, c):
    return (1 - np.exp(-(np.dot(w, x) + c)))/(1 + np.exp(-(np.dot(x, w) + c)))

### Mdel 
\begin{align*} 
f(x) &=  \sum_{i=1}^L \beta_i h_i(x) \\ 
 &=  h_i(x)\beta
\end{align*}

Let $H$ be an activat function matrix.

$$H = \begin{bmatrix}
h(x_1) & \\
\vdots & \\
h(x_N) & 
\end{bmatrix} = 
\begin{bmatrix}
h_1(x_1) & \cdots & h_L(x_1) \\
\vdots & \vdots & \vdots \\
h_1(x_N) & \cdots & h_L(x_N)
\end{bmatrix}$$

$$H =  
\begin{bmatrix}
g(w_1\cdot x_1 + c_1) & \cdots & g(w_L\cdot x_1 + c_L) \\
\vdots & \vdots & \vdots \\
g(w_1\cdot x_N + c_1) & \cdots & g(w_L\cdot x_N + c_L)
\end{bmatrix}_{N\times L}$$

and let $Y \in \mathbb{R}^{N\times D}$  be an output matrix.

$$Y = \begin{bmatrix}
y_1 \\
\vdots \\
y_N
\end{bmatrix}
= 
\begin{bmatrix}
y_11 & \cdots & y_{1D} \\
\vdots & \vdots & \vdots \\
y_N1 & \cdots & y_{ND} 
\end{bmatrix}$$

In [4]:
x = np.random.rand(10, 5)

In [5]:
L = 2
M = x.shape[1]
w = np.random.rand(M, L)
c = np.random.rand(L)

In [6]:
x

array([[0.52880439, 0.62104455, 0.49020939, 0.18085367, 0.10973375],
       [0.77526985, 0.98912798, 0.48233457, 0.60777305, 0.67303469],
       [0.68950423, 0.89160714, 0.76218975, 0.63303612, 0.58240395],
       [0.71823948, 0.37377613, 0.08544905, 0.76592703, 0.54669678],
       [0.38025592, 0.63391827, 0.24396701, 0.45224742, 0.70760141],
       [0.51906768, 0.0904439 , 0.33081198, 0.40636845, 0.26857771],
       [0.55449637, 0.64057992, 0.14586423, 0.90333676, 0.2118925 ],
       [0.27423213, 0.81481853, 0.14643424, 0.24187729, 0.41078051],
       [0.17015122, 0.01211769, 0.67614511, 0.28027325, 0.27888769],
       [0.08776666, 0.15650351, 0.82047842, 0.35517723, 0.35338935]])

In [7]:
w

array([[0.19104774, 0.99525208],
       [0.74878526, 0.33605633],
       [0.96573321, 0.8351068 ],
       [0.79187485, 0.14651824],
       [0.48353287, 0.93783397]])

In [8]:
c

array([0.30861778, 0.09730444])

In [9]:
np.dot(x, w) + c

array([[1.54435851, 1.37109167],
       [2.46989688, 2.3243416 ],
       [2.62693708, 2.35862458],
       [1.67909907, 1.63403579],
       [1.79181242, 1.62240084],
       [1.24664277, 1.2319869 ],
       [1.85286245, 1.31732655],
       [1.50271215, 1.18703105],
       [1.35996682, 1.1379889 ],
       [1.68706747, 1.30589598]])

In [10]:
sigmoid(x, w, c)

array([[0.82409743, 0.79755647],
       [0.92200435, 0.91087304],
       [0.93257521, 0.91361732],
       [0.8427852 , 0.83672175],
       [0.85714934, 0.83512597],
       [0.77671817, 0.77416614],
       [0.86446284, 0.78873657],
       [0.81797864, 0.76620965],
       [0.79575431, 0.75731021],
       [0.84383811, 0.78682559]])

In [3]:
def H(x, activate, L):
    M = x.shape[1]
    w = np.random.normal(M, L)
    c = np.random.rand(L)
    return activate(x, w, c)

### Objective
$$\underset{\beta}{\mathrm{min}} \|H\beta - Y\|^2$$

So,

$$\beta = H^\dagger Y$$

where $H^\dagger$ is psudo inverse matrix of H.
$$H^\dagger = (H^TH)^{-1}H^T$$

#### Regularize Model

$$\underset{\beta}{\mathrm{min}} \frac{C}{2}\|H\beta - Y\|^2 + \frac{1}{2}\|\beta\|$$

where $C$ is a hyperparameter.

\begin{align*} 
\nabla_{\beta}\big(\frac{C}{2}\|H\beta - Y\|^2 + \frac{1}{2}\|\beta\|\big) & = 0\\\\
CH^T(H\beta - Y) + \beta & = 0\\\\
CH^TH\beta - CH^TY + \beta & = 0\\\\
(CH^TH + I)\beta & = CH^TY \\\\
\beta & = (H^TH + \frac{I}{C})^{-1}H^TY
\end{align*} 

In [26]:
C = 1
I = np.eye(L, L) 
H = sigmoid(x, w, c)
Y = np.random.rand(10, 1)

In [27]:
I

array([[1., 0.],
       [0., 1.]])

In [28]:
Y

array([[0.90609474],
       [0.212788  ],
       [0.01731089],
       [0.04339955],
       [0.67732475],
       [0.17360557],
       [0.81263833],
       [0.73581482],
       [0.48544909],
       [0.29274417]])

In [29]:
Beta = np.linalg.inv(H.T @ H + I/C) @ H.T @ Y

In [30]:
Beta

array([[0.26614512],
       [0.21074853]])

In [31]:
H @ Beta

array([[0.38741336],
       [0.43735211],
       [0.44074385],
       [0.40064105],
       [0.40412769],
       [0.36987413],
       [0.39629764],
       [0.37917858],
       [0.37138814],
       [0.39040573]])