<center><h1>Extreme Learning Machines (ELM)</h1></center>
<center><h2>Objective: Train a single hidden layered MLP using ELM without back-propagation.</h2></center>


<h2> Introduction </h2>

Extreme Learning Machines (ELMs) are single-hidden layer feedforward neural networks (SLFNs) capable to learn faster compared to gradient-based learning techniques. It’s like a classical one hidden layer neural network without a learning process. This kind of neural network does not perform iterative tuning, making it faster with better generalization performance than networks trained using backpropagation method.

ELMs are based on the __Universal Approximation Theorem__ which states that:

“A feed-forward network with a single hidden layer containing a finite number of neurons can approximate continuous functions on compact subsets of $R^n$, under mild assumptions on the activation function.”

This simply means that ELMs can solve classification and regression tasks with significant accuracy if it has sufficient hidden neurons and training data to learn for all hidden neurons.

To understand how ELM works, let me show to you an illustration and the steps in building the model.

<img src="ELM.png">

So, given the following:
- Training set $$ {(x_i,t_i)| x_i \in R^d, t_i \in R^m, i=1,2\ldots,N},$$
- Hidden node output function $H(w, b, x)$
- Number of hidden nodes L.


We can implement ELM in three simple steps:
- Randomly assign the parameters of the hidden nodes (__w__, b)
- Compute the hidden layer output matrix __H__
- Compute the output weights __β__.


To initialize our network, we need to identify the following:
- The size of the input layer, which is the number of input features
- Number of hidden neurons
- Input to hidden weights
- Hidden layer activation function

In [45]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder,MinMaxScaler

## ELM on MNIST dataset

In [46]:
train = pd.read_csv('mnist_train/mnist_train.csv')
test = pd.read_csv('mnist_test/mnist_test.csv')

In [48]:
train.head(10)

Unnamed: 0,label,1x1,1x2,1x3,1x4,1x5,1x6,1x7,1x8,1x9,...,28x19,28x20,28x21,28x22,28x23,28x24,28x25,28x26,28x27,28x28
0,5,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,4,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,9,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
5,2,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
6,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
7,3,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
8,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
9,4,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [49]:
X_train = train.values[:,1:]
y_train = train.values[:,0]
X_test = test.values[:,1:]
y_test = test.values[:,0]

In [50]:
y_train = np.matrix(y_train).T
y_test = np.matrix(y_test).T

In [51]:
onehotencoder = OneHotEncoder(categories='auto')
scaler = MinMaxScaler()
X_train = scaler.fit_transform(X_train)
y_train = onehotencoder.fit_transform(y_train).toarray()
X_test = scaler.fit_transform(X_test)
y_test = onehotencoder.fit_transform(y_test).toarray()

### ELM building

In [52]:
input_size = X_train.shape[1]

In [53]:
hidden_size = 1000

In [54]:
input_weights = np.random.normal(size=[input_size,hidden_size])
biases = np.random.normal(size=[hidden_size])

In [55]:
def relu(x):    ### Activation Function
    return np.maximum(x, 0)

In [56]:
def hidden_nodes(X):
    G = np.dot(X, input_weights)
    G = G + biases
    H = relu(G)
    return H

In [57]:
output_weights = np.dot(np.linalg.pinv(hidden_nodes(X_train)), y_train)

In [58]:
def predict(X):
    out = hidden_nodes(X)
    out = np.dot(out, output_weights)
    return out

In [59]:
prediction = predict(X_test)
correct = 0
total = X_test.shape[0]
for i in range(total):
    predicted = np.argmax(prediction[i])
    actual = np.argmax(y_test[i])
    correct += 1 if predicted == actual else 0
accuracy = correct/total
print('Accuracy for ', hidden_size, ' hidden nodes: ', accuracy)

Accuracy for  1000  hidden nodes:  0.9459


## ELM on Wine dataset

In [60]:
from sklearn.datasets import load_wine
data = load_wine()
X = data.data
y = data.target

In [61]:
print(X.shape)

(178, 13)


In [74]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

In [75]:
y_train = np.matrix(y_train).T
y_test = np.matrix(y_test).T

In [76]:
onehotencoder = OneHotEncoder(categories='auto')
scaler = MinMaxScaler()
X_train = scaler.fit_transform(X_train)
y_train = onehotencoder.fit_transform(y_train).toarray()
X_test = scaler.fit_transform(X_test)
y_test = onehotencoder.fit_transform(y_test).toarray()

In [77]:
input_size = X_train.shape[1]
hidden_size = 30
input_weights = np.random.normal(size=[input_size,hidden_size])
biases = np.random.normal(size=[hidden_size])

In [78]:
output_weights = np.dot(np.linalg.pinv(hidden_nodes(X_train)), y_train)

In [79]:
prediction = predict(X_test)
correct = 0
total = X_test.shape[0]
for i in range(total):
    predicted = np.argmax(prediction[i])
    actual = np.argmax(y_test[i])
    correct += 1 if predicted == actual else 0
accuracy = correct/total
print('Accuracy for ', hidden_size, ' hidden nodes: ', accuracy)

Accuracy for  30  hidden nodes:  0.9166666666666666
