# RBF Networks for Clustering


What if we don't have labels, and we just want to uncover structure in the data?

Well of course we can do that using k-means clustering, but we can also do it using a biologically-inspired method called Hebbian learning that in some ways resembles RBF networks.

## Acknowledgments

This whole lecture is courtesy of Oliver Layton.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns

# Do Hebbian Learning 

We will do this with a no-hidden-layer network.
* The input layer has one node for each variable (feature) in the input data
* The output layer has one node, which outputs its activation based on the input data and the edge weights
* There is no bias node
* We initialize the edge weights at random
* We update the edge weights $w_i$ using Hebb's rule: $w_j(T) = w_j(T-1) + \eta x_{ij} z_i$, where $z_i$ is the activation output for data point $\vec{x_i}$ and $\eta$ is a small update factor (learning rate)
  * $x_{ij} z_i$ updates the weights based on the correlation between $x_{ij}$ and $z_i$; if they are both positive / negative, then $w_j$ increases from time $T-1$ to time $T$; else it decreases
  * Hebb's rule applied naively leads to weights that increase without bound
  * We use Oja's rule to correct: $w_j(T) = w_j(T-1) + \eta x_{ij} z_i - \eta x_{ij} {z_i}^2 = w_j(T-1) + \eta z_i(x_{ij} - {z_i})$

The goal in Hebbian learning is to learn weights based on training samples that represent key patterns in the training data.

In [None]:
def fit(data, eta=0.01, n_epochs=150):
    '''Do Hebbian learning on the data samples (using Oja's Rule) and learning rate of `eta`.

    Parameters:
    -----------
    data: ndarray. shape=(N, M)
        Data samples
    eta: float.
        Learning rate for weight update
    n_epochs: int.
        Number of epochs to train (i.e. number of passes/presentations of data the the network)

    Returns:
    -----------
    wts. ndarray. shape=(M,).
        The learned network weights
    '''
    # Number of data pts and features
    N, M = data.shape

    # Initialize weights randomly, centered at 0
    wts = 0.1*np.random.rand(M) - 0.1/2
    prev_wts = wts + 1

    # Do `n_epochs` passes through the data
    for j in range(n_epochs):
        print('weights at epoch', str(j), ':\n', wts)
        if np.array_equal(wts, prev_wts):
            return wts
        prev_wts = wts
        # Train by processing each sample
        for i in range(N):
            # Sample x_i
            xi = data[i]

            # Network output
            zi = xi @ wts

            # Update wts via Hebbian Learning
            wts = wts + eta*zi*(xi - zi*wts)
    return wts

# Predict

To predict, we would just multiply new data points by the learned weights to get the activation of this network for those new data points.

In [None]:
def predict(data, weights):
    return data @ weights

Let's do Hebbian learning on a random data set.

In [None]:
def run():
    # Set random seed for reproduceability
    np.random.seed(0)

    # Define data as multivariate Gaussian blob
    mu = [0, 0]
    sigma = np.array([[3, 1.5],
                      [1.5, 3]])
    data = np.random.multivariate_normal(mu, sigma, size=100)
    plt.plot(data[:, 0], data[:, 1], 'o')

    # Normalize globally to range [-0.5, 0.5]
    data = (data - np.min(data)) / (np.max(data) - np.min(data))
    data = data - 0.5

    # Train the Hebbian neural network, get the final weights
    wts = fit(data, n_epochs=150)
    print('Learned wts\n', wts)

    # Draw learned wts vector
    vectorScale = 3
    ax = plt.gca()
    ax.annotate('', vectorScale*wts, [0, 0],
                arrowprops=dict(arrowstyle='->', linewidth=2, shrinkA=0, shrinkB=0))
    plt.show()
    return data, wts
    
data, weights = run()

# Comparison with PCA

Let's compare this weight vector with the first principal component for this data.

In [None]:
def PCA(data):
    covariance_matrix = (data.T @ data) / (len(data) - 1)
    (evals, evectors) = np.linalg.eig(covariance_matrix)
    evals_order = np.argsort(evals)[::-1]
    evals_sorted = evals[evals_order]
    evectors_sorted = evectors[:, evals_order]
    return evals_sorted, evectors_sorted

# Why do we do this?
centered_data = data - np.mean(data, axis=0)

evals, evecs = PCA(centered_data)
print(evecs[0])
print(weights)


# Let's use Hebbian learning on a real dataset!

## Load the Data

In [None]:
def type_converter(x):
    values = ['setosa', 'versicolor', 'virginica']
    return float(values.index(x))

def inverse_type_converter(x):
    values = ['setosa', 'versicolor', 'virginica']
    return values[x]


columns = ["sepal_length", "sepal_width", "petal_length", "petal_width", "class"]
iris = np.array(np.genfromtxt('data/iris.csv', delimiter=',', converters={4: type_converter}, skip_header=2, dtype=float, encoding='utf-8'))
print(iris.shape, iris.dtype)

## Look at the Data

In [None]:
def get_summary_statistics(data):
    "Get the max, min, mean, var for each variable in the data."
    return pd.DataFrame(np.array([data.max(axis=0), data.min(axis=0), data.mean(axis=0), data.var(axis=0)]))

print(get_summary_statistics(iris))

df = pd.DataFrame(iris, columns=columns)
sns.pairplot(df, y_vars = ["class"], kind = "scatter")

## Split the Data 

Not for clustering!

## Clean the Data

Nothing to see here for the iris data

## Consider Dimensionality Reduction

Nothing to see here for the iris data

## Consider Transforming/Normalizing the Data

In [None]:
def homogenizeData(data):
    return np.append(data, np.array([np.ones(data.shape[0], dtype=float)]).T, axis=1)
   
def zScore(data, translateTransform=None, scaleTransform=None):
    "z score."
    homogenizedData = np.append(data, np.array([np.ones(data.shape[0], dtype=float)]).T, axis=1)
    if translateTransform is None:
        translateTransform = np.eye(homogenizedData.shape[1])
        for i in range(homogenizedData.shape[1]):
            translateTransform[i, homogenizedData.shape[1]-1] = -homogenizedData[:, i].mean()
    if scaleTransform is None:
        diagonal = [1 / homogenizedData[:, i].std() if homogenizedData[:, i].std() != 0 else 1 for i in range(homogenizedData.shape[1])]
        scaleTransform = np.eye(homogenizedData.shape[1], dtype=float) * diagonal
    data = (scaleTransform@translateTransform@homogenizedData.T).T
    return translateTransform, scaleTransform, data[:, :data.shape[1]-1]

translateTransform, scaleTransform, data_transformed = zScore(iris)
print("training data", "\n", data_transformed.shape, "\n", get_summary_statistics(data_transformed))

## Fit the Model

In [None]:
weights = fit(data_transformed)
print(weights)

## Comparison with PCA

In [None]:
evals, evecs = PCA(data_transformed)
print(evecs[0])
print(weights)