# Radial Basis Networks

What are our learning objectives for this lesson?

* Learn how to set up a Radial Basis Network
* Understand how hyperparameters changes learning

Content used in this lesson is based upon information in the following sources:
* Marsland, Stephen. Machine Learning: An Algorithmic Perspective 2nd ed. (2015).

## Lab Tasks

1. Set up a radial basis network to solve the XOR problem
   * declare basic parameters of the network
   * initialize weight vector
   * write a function that computes the outputs of the radial basis layer
2. Train and test the network 
   * write a train function
   * run a test to make sure your network is capable of solving XOR
   * play around with the hyperparameters (learning rate $\eta$, $\sigma$ in the radial basis function, etc)
3. Create a k-means algorithm to initialize the positions of the RBF centers

### Set up a radial basis network

In this section we will set up a radial basis network with an input layer, a layer of radial basis nodes ,and an output layer. The structure if this network has the same amount of input and output nodes with the Perceptron model from last week, but we will implement an additional hidden layer inbetween the input and output layer. 

The exact network we will be implementing for the XOR problem is depicted below.

![radial-basis-for-XOR](https://raw.githubusercontent.com/FifthEpoch/Hosted_Images/main/XOR-RADIAL-BASIS-NET.png)


#### How to pick the centers of the receptive fields for the XOR problem

We will start with a simple strategy in positioning the RBF nodes, and work our way to using strategy that requires more steps like k-mean clustering later in this lab. 

For solving the XOR problem, we can represent the RBFs as the four possible cases: 

$$(0, 0), (0, 1), (1, 0), (1, 1)$$

so that the nodes are representative of typical inputs.

In [1]:
import math
import numpy as np

inputs = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
XOR_targets = np.array([[0], [1], [1], [0]])

# TODO: fill in what the center of each RBF nodes would be.
#       they should reflect all 4 cases of our inputs.
RBF_nodes_centers = inputs    #input data is centers

#### Setting up the weights vector

Recall that we have initialized a weight vector for a Perceptron model last week. The weight vector has a length of $n+1$ with $n$ representing the number of inputs. We are repeating the same process for the radial basis network for today's lab. Initialize weight vector $\vec{w}$ such that:

$$ \vec{w} = [w_0, w_1, ..., w_n] $$

with $w_0$ being the weight associated with the bias node.

In [2]:
# PARAM
n_in = np.shape(inputs)[1]   # refers to # of columns/features

# TODO: how many nodes is in the RBF layer, fill in the variable
n_rbf = 4


n_out = 1
eta = 0.2

# TODO: initialize the weight vector
# hint: weights.shape[1] should equals to 1
weights = np.random.rand(4,1) * 0.1 -0.05


print(f'initialized weights: \n{weights}')

initialized weights: 
[[-0.03934513]
 [-0.00886318]
 [ 0.01240771]
 [-0.04726754]]


#### Radial Basis

Here we will be using the notations introduced in the illustration of the network above. The radial basis function is represented by $h$ in the illustration, we make distinction between $h_0$, $h_1$, and so on because the center $\vec{c}$ of the RBF nodes are different, thus giving a slighly different equation. 

Below we will quickly review all the neccessary formulas we need in computing the output of the radial basis layer. The general equation of radial basis is:

$$ h(\vec{x}) = \text{exp}(\frac{-|\vec{x} - \vec{c}|^2}{2\sigma^2}) = y$$

where __$\sigma$ is set to $0.1$ arbitrarily__ (for now). Furthermore, $\vec{x} = [x_0 \text{ } x_1]^T$, $\vec{c} = [c_0 \text{ } c_1]^T$ denotes the coordinates of the center for a given RBF, $d$ is the distance given by $|\vec{x} - \vec{c}|$ and $M$ is the number of RBF in the network.

The distance between 2 vectors $\vec{x}$ and $\vec{c}$ $(\vec{x}, \vec{c} \in \mathbb{R}^n$) is given by:

$$ d(\vec{x}, \vec{c}) = |\vec{x} - \vec{c}| = \sqrt{(x_1-c_1)^2 + (x_2 - c_2)^2 + ... + (x_n - c_n)^2}$$ 

#### Computing the Output for the Radial Basis Layer

Each RBF node takes $x_0$ and $x_1$ and computes the distance between the vector $[x_0 \text{ } x_1]^T$ to the center of itself $[c_0 \text{ } c_1]^T$. 

We loop through all the RBF nodes and process each case in the input (e.g. $x_0 = 0$ and $x_1 = 1$). The output of our RBF layer should be a matrix with each row representing the RBF node the output is associated with, and each column representing radial basis output $y$ for a particular set of inputs.

In [3]:
# TODO: implement a radial basis function that processes the entire input
def get_radial_basis_outputs(_inputs, _centers, _sigma=0.1):
    # complete this function
    outputs = np.zeros((n_rbf, len(_inputs)))
    _sigma2 = _sigma ** 2
    
    for i in range (n_rbf):
        for j in range(len(_inputs)):
                          outputs[i,j] = np.exp(-(np.linalg.norm(_centers[i] - inputs[j])) **2 / 2 * (_sigma2))
                       
    #np.exp(-np.linalg.norm(_input - _centers)/2*_sigma**2)
    
    
    

    return outputs
get_radial_basis_outputs(inputs, RBF_nodes_centers)

array([[1.        , 0.99501248, 0.99501248, 0.99004983],
       [0.99501248, 1.        , 0.99004983, 0.99501248],
       [0.99501248, 0.99004983, 1.        , 0.99501248],
       [0.99004983, 0.99501248, 0.99501248, 1.        ]])

#### Other Functions We Need for this Network

After we are finished with the ```get_radial_basis_outputs()``` function, we will need to implement the same 3 functions we implemented for the Perceptron model during last week's lab. 

Since we have already went over how these functions work, their complete implementation is going to be included below. If you didn't get to finish them from last week's lab, you are encouraged to examine them before moving on.

If you the function you have implemented thus far have different dimensionality requirements for the parameters from the functions implemented below, please feel free to alter/rewrite the 3 functions below to fit your work.  

In [4]:
def calculate_activations(_inputs, _weights, _threshold=0.0):
    activations = np.dot(np.transpose(_inputs), _weights)
    return np.where(activations > _threshold, 1, 0)

def insert_bias(_inputs):
    return np.insert(_inputs, 0, 1.0, axis=0)

def update_weights(_inputs, _targets, _activations, _weights):
    _weights -= eta * np.dot(_inputs, _activations - _targets)
    return _weights

### Train and Test the Network

Lastly, we implement a train function to facililate training. This function should take information like our inputs, their targets, the initialized weights, and number of iterations for the training session. The output would be an updated weight vector. This updated weight vector contains all of the learning our network has done, we can consider it a form of "memory" for the network. Since we would like to test the network's learning, be sure to save the updated weight vector during training.

Let's review the flow of data in our radial basis network briefly:
1. radial basis layer takes $\vec{x}$ and output $\vec{y}$
2. $\vec{y}$ is sent to the output node
3. we insert the input associated to the bias node to $\vec{y}$
3. sum of square and activation takes place, leaving us with a final scalar output $z$

In [5]:
# TODO: using all of the functions we have above, complete the training loop
def train_radial_basis_network(_inputs, _targets, _weights, _iterations=5, _sigma=0.1):
    for i in range(_iterations):
        print(f'\niteration: {i + 1}')
        
        calculate_activations(_inputs, _weights, _threshold=0.0)
        insert_bias(_inputs)
        update_weights(_inputs, _targets, _activations, _weights)
        
    return _weights

weights = train_radial_basis_network(inputs, XOR_targets, weights)


iteration: 1


NameError: name '_activations' is not defined

In [6]:
test = np.array([[1,0],[1,1],[0,1],[0,0]])
test_target = np.array([[1],[0],[1],[0]])

# TODO: implement a test to check if the network solves XOR correctly.
def test_network(_test_inputs, _test_targets, _weights, _sigma=0.1):
    
    
    
    

    
    
    # check the accuracy of your trained model
    accuracy = 0.0
    for i in range(len(test_target)):
        if test_target[i] == activations[i]: accuracy += 1 
    print(f'accuracy: {(accuracy / len(test_target))*100.0}%')

test_network(test, test_target, weights)

NameError: name 'activations' is not defined

#### Hyperparameters

Now that we have a working network, we can try altering the parameters to see how they change the network's learning. 

For Radial Basis, each RBF node is responsible in learning within their receptive fields. What happens if all training data falls outside of their receptive fields? To illustrate the effect, we will use a set of new centers for our RBF nodes shown below. Instead of having the centers represent the typical inputs of our network, we use this set of slignly offset centers.

If the training didn't work with the number of iterations, learning rate $\eta$, and sigma $\sigma$ value we have, try adjusting them one by one and see if learning improves. 

In [4]:
RBF_nodes_centers = np.array([[0.2, 0.2], [0.2, 0.8], [0.2, 0.8], [0.8, 0.8]])


# TODO: re-initialize the weight vector here
weights = ???


# Train with the new radial basis layer:
# if the current parameters are not working well for this problem
# try changing _iterations, _sigma, and eta values below 

eta = 0.2
weights = train_radial_basis_network(inputs, XOR_targets, weights, _iterations=5, _sigma=0.1)

SyntaxError: invalid syntax (<ipython-input-4-331f83c5bbc1>, line 5)

__🤔 Why does nudging some parameters improve learning while nudging others don't?__

## Bonus Task

In most machine learning problems, there aren't clearcut "typical" inputs around which we define our receptive fields. In those cases, it is common to use k-mean cluster to identify centroids that work for the ranges of a dataset.

### $k$-means algorithm

1. Set the number $k$ to specify the number of cluster to assign
2. Randomly initialize $k$ centroids
3. Repeat:
    * assign each data point to its closest centroid
    * compute the new centroid of each cluster
    * repeat until the centroid positions do not change
    
Given a set $P$ containing $n$ 2D-points $p_0$, $p_1$, ..., $p_{n-1}$, where the coordinate of $p_i$ is ($x_i$, $y_i$), the coordinate of centeroid of $X$ denoted by $c$ is:

$$ c(P) = (\frac{x_0 + x_1 + ... + x_{n-1}}{n}, \frac{y_0 + y_1 + ... + y_{n-1}}{n})$$

In [None]:
# specifying a k
k = 4
# a set of 2D points that are set in the area bounded by (0, 0) and (1, 1)
data = np.random.rand(20, 2)
# randomly initializing k centroids
centroids = np.random.rand(k)

# TODO: implement k-mean clustering


