# The One Goal for Today

To implement and train a from input to hidden of a RBF network using python.

# RBF Networks with the car logo data

Review: training a RBF consists of:
* Finding prototypes
* Selecting the activation function for the hidden nodes
* Selecting the activation function for the output nodes
* Setting (or fitting) the weights for the edges and biases

## Load the Data

We will keep using the car logo dataset!

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import scipy

In [None]:
data = np.array(np.genfromtxt('data/logos.csv', delimiter=',', dtype=int))  

# Look at the Data

In [None]:
def getSummaryStatistics(data):
    "Get the max, min, mean, var for each variable in the data."
    return pd.DataFrame(np.array([data.max(axis=0), data.min(axis=0), data.mean(axis=0), data.var(axis=0)]))

def getShapeType(data):
    "Get the shape and type of the data."
    return (data.shape, data.dtype)

print(getSummaryStatistics(data))
getShapeType(data)

## What kind of analysis are we going to do?

Regression, clustering, classification?

If supervised, which is our dependent variable?

If we have a dependent variable, how many possible values does it have? What will this number correspond to in the RBF network?

In [None]:
# Why are we doing this?
np.random.shuffle(data)

# Why are we doing this?
train_data, dev_data, test_data = np.split(data, [int(.8 * len(data)), int(.9 * len(data))])
print(getSummaryStatistics(train_data))
print(getSummaryStatistics(dev_data))
print(getSummaryStatistics(test_data))

In [None]:
y_train = train_data[:, -1]
x_train = train_data[:, 0:-1]
y_dev = dev_data[:, -1]
x_dev = dev_data[:, 0:-1]
y_test = test_data[:, -1]
x_test = test_data[:, 0:-1]

## Does the data need to be cleaned?

Are there missing or erroneous values? 

Do we need to fix the types of some of the variables?

## Does it need to be normalized?

Is the range of one or more values clearly out of line with the rest?

## Consider transformation

Would PCA help? Yes, probably! We will reuse the code from day 30.

In [None]:
def prep_pca(data):
    # center data
    centered_data = data - np.mean(data, axis=0)
    # covariance matrix
    covariance_matrix = (centered_data.T @ centered_data) / (data.shape[0] - 1)
    # singular value decomposition
    evals, evectors = scipy.linalg.eigh(covariance_matrix)
    # sort eigenvals, eigenvecs
    order = np.argsort(evals)[::-1]
    eigenvals_sorted = evals[order]
    eigenvecs_sorted = evectors[:, order]
    return centered_data, covariance_matrix, eigenvals_sorted, eigenvecs_sorted

def plot_covariance_matrix(covariance_matrix):
    fig = plt.figure(figsize=(12,12))
    sns.heatmap(pd.DataFrame(covariance_matrix), annot=False, cmap='PuOr')
    plt.show()

def plot_eigenvectors(eigenvecs_sorted):
    fig = plt.figure(figsize=(14,3))
    sns.heatmap(pd.DataFrame(eigenvecs_sorted[:, 0:21].T), 
                annot=False, cmap='coolwarm',
               vmin=-0.5,vmax=0.5)
    plt.ylabel("Ranked Eigenvalue")
    plt.xlabel("Eigenvector Components")
    plt.show()

def get_proportional_variances(eigenvals_sorted):
    sum = np.sum(eigenvals_sorted)
    proportional_variances = np.array([eigenvalue / sum for eigenvalue in eigenvals_sorted])
    cumulative_sum = np.cumsum(proportional_variances)
    return proportional_variances, cumulative_sum

def scree_graph(proportional_variances):
    plt.figure(figsize=(6, 4))
    plt.bar(range(len(proportional_variances)), proportional_variances, alpha=0.5, align='center',
            label='Proportional variance')
    plt.ylabel('Proportional variance ratio')
    plt.xlabel('Ranked Principal Components')
    plt.title("Scree Graph")
    plt.legend(loc='best')
    plt.tight_layout()
    plt.show()

def elbow_plot(cumulative_sum):
    fig = plt.figure(figsize=(6,4))
    ax1 = fig.add_subplot(111)
    ax1.plot(cumulative_sum)
    ax1.set_ylim([0,1.0])
    ax1.set_xlabel('Number of Principal Components')
    ax1.set_ylabel('Cumulative explained variance')
    ax1.set_title('Elbow Plot')
    plt.show()

def fit_pca(centered_data, eigenvecs_sorted, number_to_keep):
    v = eigenvecs_sorted[:, :number_to_keep]
    projected_data = centered_data@v
    return projected_data

As we know from day 30, we will keep 200 dimensions.

In [None]:
centered_train, covariance_matrix, eigenvals_sorted, eigenvecs_sorted = prep_pca(x_train)
projected_train = fit_pca(centered_train, eigenvecs_sorted, 200)
centered_dev = x_dev - np.mean(x_train, axis=0)
projected_dev = fit_pca(centered_dev, eigenvecs_sorted, 200)
centered_test = x_test - np.mean(x_train, axis=0)
projected_test = fit_pca(centered_test, eigenvecs_sorted, 200)

# Find Prototypes

To do this, we use kmeans. I am going to use the scikit-learn implementation; you should use your own for the project.

Why would we not just have the number of prototypes be equal to the number of classes?

In [None]:
from sklearn.cluster import KMeans

inertia_by_k = []

for k in range(2, 20):
    print(k)
    km = KMeans(n_clusters=k, random_state=0, n_init = 10).fit(projected_train)
    inertia_by_k.append([k, km.inertia_])

inertia_by_k = np.array(inertia_by_k)
print(inertia_by_k)
fig = plt.figure(figsize=(10,4))
ax1 = fig.add_subplot(111)
ax1.plot(inertia_by_k[:, 0], inertia_by_k[:, 1])
ax1.set_xlabel('k')
ax1.set_ylabel('Inertia')
ax1.set_title('Elbow Plot')
plt.show()


So, what value will we choose for k? What will this number correspond to in the RBF network?

In [None]:
k = 14

km = KMeans(n_clusters=k, random_state=0, n_init=10).fit(projected_train)

print(km.cluster_centers_)

# Define the Activation Function for the Hidden Nodes

Recall that a typical activation function for the hidden nodes is the Gaussian, so something like $exp \left( - \frac{||\vec{d}-\vec{\mu_j}||^2}{2\delta_j^2 + \epsilon} \right)$, where $\vec{d}$ is the data point, $\vec{\mu_j}$ is the prototype, $\delta_j$ is the hidden unit's standard deviation, $\epsilon$ is a small constant and $||.||^2$ is the squared Euclidean distance.

Let's take a good look at this activation function. 
* *What is in the numerator? Why look, it's the distance! Why would we not just use the distance itself as the activation function?* 
* *What is the function of $\delta_j$?*
* *Why do we have $\epsilon$?*

In the cell below. you should define a function that will return the activations for the hidden nodes.

# What Will We Do When We Get a New Data Point?

At this point, we have defined:
* The input layer (ish)
* The hidden layer

For a new data point, we will:
1. encode it using the same method we used for train, including any summary statistics **on the training data**
2. "send it" to each of the hidden layer nodes (so the weights from the input layer to the hidden layer are all 1) - this is really just a matrix multiply!
3. each hidden layer node will calculate its activation for this data point - more matrix multiply!

Next class session we will define the output layer, and explain how it relates to another analysis method we already know well, linear regression. We will then show how we can *also RBF networks for classification AND regression*!

To facilitate that, let's save our processed data.

In [None]:
import numpy as np
import pickle as pkl

In [None]:
with open("hidden_node_weights_train.pkl", "wb") as f:
    pkl.dump(train_calcs, f)
with open("labels_train.pkl", "wb") as f:
    pkl.dump(train_y, f)