## Radial Basis Function Networks

Used for Regression or Function approximation. RBF nets can learn to approximate trends using many Gaussian/bell curves

RBF net is similar to a 2-layer network:

**Hidden Layer**: Gaussian RBF used as an "activation" function
**Output Layer**: Performs weighted sum

### Gaussian/Normal Distribution (Bell Curve)

$N(x; \mu, \sigma^2) = \frac{1}{\sqrt{2\pi\sigma^2}} e^{-\frac{(x-\mu)^2}{2\sigma^2}}$, where $\mu$ is **mean** and $\sigma$ is **standard deviation**

Above function is known as the **probability density function (pdf)**, but we're only concerned with the bell-curve properties of Gaussians, not the probability distribution.

#### Gaussian Centers

Need **K-means Clustering** to place the Gaussian centers. That way the Gaussian will span the clusters of data. Multiple Gaussians summed up can approximate any function.

#### Standard Deviations

Two Options to set standard deviations:

1. Set as standard deviation of points assigned to a particular cluster
2. Set a single standard deviation as $\sigma_j = \sigma \forall j$, where $\sigma = \frac{d_{max}}{\sqrt{2k}}$ and $d_{max}$ is the maximum distance between any two cluster centers and $k$ is the number of clusters

### Backpropagation

RBF Network produces a weighted sum output given an input x:

$F(x) = \sum_{j=1}^k w_j\phi_j(x, c_j) + b$, where $w_j$ is the weight, $b$ is the bias, $k$ is the number of clusters, and $\phi()$ is the Gaussian RBF:

$\phi_j(x, c_j) = exp(\frac{-||x-c_j||^2}{2\sigma^2_j})$

#### Quadratic Cost Function

$C = \sum_{i=1}^N (y^{(i)} - F(x^{(i)}))^2$

Compute partial derivative of the cost function with respect to $w_j$

$\frac{\partial C}{\partial w_j} = \frac{\partial C}{\partial F} \frac{\partial F}{\partial w_j}
= \frac{\partial}{\partial F} [\sum_{i=1}^N (y^{(i)} - F(x^{(i)}))^2] * \frac{\partial}{\partial w_j} [\sum_{j=0}^K w_j \phi_j (x, c_j) + b] = -(y^{(i)} - F(x^{(i)})) * \phi_j(x, c_j)$

$w_j \leftarrow w_j + \eta(y^{(i)} - F(x^{(i)})) \phi_j(x, c_j)$


Similarly compute partial derivative of cost function with respect to $b$

$\frac{\partial C}{\partial b} = \frac{\partial C}{\partial F} \frac{\partial F}{\partial b}$

$= \frac{\partial}{\partial F} [\sum_{i=1}^N (y^{(i)} - F(x^{(i)}))^2] * \frac{\partial}{\partial b} [\sum_{j=0}^K w_j \phi_j (x, c_j) + b] = -(y^{(i)} - F(x^{(i)})) * 1$

$b \leftarrow b + \eta(y^{(i)} - F(x^{(i)}))$

In [2]:
# Imports
import numpy as np

In [3]:
# Gaussian RBF
def rbf(x, c, s):
    return np.exp(-1 / (2 * s**2) * (x-c)**2)

In [None]:
# K-means clustering to determine the cluster centers
# X {ndarray} -- A Mx1 array of inputs
# k {int} -- Number of clusters
# Returns -- kx1 array of final cluster centers
def kmeans(X, k):
    # randomly select initial clusters from input data
    clusters = np.random.choice(np.squeeze(X), size=k)
    prevClusers = clusters.copy()
    stds = np.zeros(k)
    converged = False
    
    while not converged:
        # Compute distances for each cluster center