# CompSpec Krotov Hopfield MNIST Benchmark

## Intro
* **Date**: 11/27/2020
* **What**: This experiment is going to benchmark CompSpec against the unsupervised network in [Krotov's and Hopfield's paper](https://www.pnas.org/content/116/16/7723).  So basically, I'm going to train a CompSpec layer, and then build a tf layer on top of my network to do the final classification.  Then, of course, I need to train a full DNN of the same architecture and see how my network measures up.  
* **Why**: CompSpec is crazy fast, and crazy efficient at training.  This we know and love.  But we need to see how good it actually is at the classification task from Hopfield and Krotov's paper.  I'm thinking CompSpec actually trains even faster than backprop, which is absolutely fucking wild.  But we'll see.
* **Hopes**: I hope that training the CompSpec layer on a single epoch of data matches the performance of backprop after several epochs.  I'm fairly certain CompSpec is more efficient than backprop, so if that's the case, I've effectively found an architecture that's better than the standard training algorithm for machine learning.
* **Limitations**: If KH's paper is to be believed, then the non-charged CompSpec might hit an upper limit in performance.  Apparently that's what happened for them when their network only learned image prototypes.  But don't you worry, I'll also try the benchmark out with charged CompSpec and see if that helps things a lot.  I might even throw in a *novel* (ooh hoo hoo!) version of charged CompSpec that I thought of last night.  Here we go!

## Code

In [1]:
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt

from time import time
from tensorflow.keras.datasets import mnist
from tqdm import tqdm

L = 28 * 28   #Size of mnist in pixels
S = 60000     #Size of training set

(train_X, train_y), (test_X, test_y) = mnist.load_data()
train_X = train_X / 255.0
test_X = test_X / 255.0

flat_x = np.reshape(train_X, [-1, L])
flat_test = np.reshape(test_X, [-1, L])

In [2]:
def draw_weights(synapses, Kx, Ky):
    yy=0
    HM=np.zeros((28*Ky,28*Kx))
    for y in range(Ky):
        for x in range(Kx):
            HM[y*28:(y+1)*28,x*28:(x+1)*28]=synapses[yy,:].reshape(28,28)
            yy += 1
    plt.clf()
    nc=np.amax(np.absolute(HM))
    im=plt.imshow(HM,cmap='bwr',vmin=-nc,vmax=nc)
    fig.colorbar(im,ticks=[0, np.amax(HM)])
    plt.axis('off')
    fig.canvas.draw()  

In [3]:
"""
flat_x: training data
S: Size of training set
L: Size of input
Kx: Num cols of neurons
Ky: Num rows of neurons
Nep: Num epochs
T_s: Number of training inputs
xi: Base learning constant
phi: Specialization ema constant
B: Batch size

Returns: (synapse_weights, neuron specialization values)
"""
def comp_spec_mb(flat_x, S, L, Kx, Ky, Nep, T_s, xi, phi, B):
    start = time()
    N = Kx * Ky
    
    w = np.abs(np.random.normal(0, 1, (N, L))) # synapses of each neuron
    w = w / np.array([np.linalg.norm(w, axis=1)]).T #NORMALIZE THE WEIGHTS TO PREVENT EXXXPLOSIONS
    s = np.zeros(N).reshape(-1, 1) # Specialization for each neuron+

    for ep in range(Nep):
        # Uncomment the following line if you'd like to shuffle the data between epochs
        inputs = flat_x[np.random.permutation(S), :].reshape(S, L)

        for i in tqdm(range(T_s // B)):
            v = inputs[i * B: (i + 1) * B, :]

            w_mul_v = w @ v.T 
            o = w_mul_v / (np.linalg.norm(w, axis=1).reshape(-1, 1) * np.linalg.norm(v, axis=1))

            c = ((1 - s) ** 2) / (1 - o)

            wins = np.argmax(c, axis=0)     

            win_mask = np.zeros((N, B))
            win_mask[wins, np.arange(B)] = 1
            win_mask = (win_mask / np.maximum(np.sum(win_mask, axis=1), 1).reshape(-1, 1))

            win_avg = (np.sum(w_mul_v * win_mask, axis=1)).reshape(-1, 1)

            v_update = win_mask @ v

            del_syn = (v_update - (win_avg * w)) * (((1 - s) ** 2) + 0.1) * xi

            w += del_syn

            s[wins] *= (1 - phi)
            s += phi * np.sum(o * win_mask, axis=1).reshape(-1, 1)

    print("Max val: ", np.amax(s), "Min value: ", np.amin(s), "Mean val: ", np.mean(s), "Std: ", np.std(s))
    print("Elapsed time: ", time() - start, " seconds")
    
    return (w, s)

In [4]:
"""
flat_x: training data
S: Size of training set
L: Size of input
Kx: Num cols of neurons
Ky: Num rows of neurons
Nep: Num epochs
T_s: Number of training inputs
xi: Base learning constant
phi: Specialization ema constant
k: Rank of the repelled neuron
delta: local repulsion constant (should be less than 1)
B: Batch size

Returns: (synapse_weights, neuron specialization values)
"""
def charged_comp_spec_mb(flat_x, S, L, Kx, Ky, Nep, T_s, xi, phi, delta, k, B):
    start = time()
    N = Kx * Ky
    
    w = np.abs(np.random.normal(0, 1, (N, L))) # synapses of each neuron
    w = w / np.array([np.linalg.norm(w, axis=1)]).T #NORMALIZE THE WEIGHTS TO PREVENT EXXXPLOSIONS
    s = np.zeros(N).reshape(-1, 1) # Specialization for each neuron+

    for ep in range(Nep):
        # Uncomment the following line if you'd like to shuffle the data between epochs
        inputs = flat_x[np.random.permutation(S), :]

        for i in tqdm(range(T_s // B)):
            v = inputs[i * B: (i + 1) * B, :]

            w_mul_v = w @ v.T 
            o = w_mul_v / (np.linalg.norm(w, axis=1).reshape(-1, 1) * np.linalg.norm(v, axis=1))

            c = ((1 - s) ** 2) / (1 - o)
            
            c_sort = np.argsort(c, axis=0)

            wins = c_sort[N - 1]

            win_mask = np.zeros((N, B))
            win_mask[wins, np.arange(B)] = 1
            win_mask = (win_mask / np.maximum(np.sum(win_mask, axis=1), 1).reshape(-1, 1))

            win_avg = (np.sum(w_mul_v * win_mask, axis=1)).reshape(-1, 1)

            v_update = win_mask @ v

            del_syn = (v_update - (win_avg * w)) * (((1 - s) ** 2) + 0.1) * xi
            
            repelled = c_sort[N - 1 - k]
            
            repel_mask = np.zeros((N, B))
            repel_mask[repelled, np.arange(B)] = 1
            repel_mask = (repel_mask / np.maximum(np.sum(repel_mask, axis=1), 1).reshape(-1, 1))
            repel_mask *= -1 * delta
            
            repel_avg = (np.sum(w_mul_v * repel_mask, axis=1)).reshape(-1, 1)
            
            v_repel_update = repel_mask @ v
            
            del_repel_syn = (v_repel_update - (repel_avg * w)) * ((s ** 2)) * xi
            
            w += del_syn + del_repel_syn
            
            if np.amax(np.abs(w)) > 10:
                w /= np.array([np.linalg.norm(w, axis=1)]).T
            
            s[wins] *= (1 - phi)
            s += phi * np.sum(o * win_mask, axis=1).reshape(-1, 1)
            
    print("Max val: ", np.amax(s), "Min value: ", np.amin(s), "Mean val: ", np.mean(s), "Std: ", np.std(s))
    print("Elapsed time: ", time() - start, " seconds")
    
    return (w, s)

In [75]:
"""
Generate weights for the final layer based on a quick wta of 
the final synapses.
"""
def gen_final_inits(w, min_val):
    w = w.T
    
    v = flat_x[:T_s]
    train_lbls = train_y[:T_s]
    
    v = v / np.array([np.linalg.norm(v, axis=1)]).T
    w = w / np.array([np.linalg.norm(w, axis=1)]).T
    
    wins = np.argmax(w @ (flat_x[:T_s, :]).T, axis=0)
    
    n_wins = np.zeros((w.shape[0], 10))
    
    for (n_i, lbl) in zip(wins, train_lbls):
        n_wins[n_i][lbl] += 1
        
    n_cls = np.argmax(n_wins, axis=1)
    
    c_o = np.ones((w.shape[0], 10)) * min_val
    c_o[np.arange(w.shape[0]), n_cls] = 1
    
    return c_o
    

In [77]:
"""
Network with CompSpec Layer.  Also the final weights are initialized 
using the wta weight generator.
"""
class CompSpecGoodInit(tf.keras.Model):
    # The comp_spec_weights have to be transposed from how their originally trained
    def __init__(self, comp_spec_weights, final_min_val, **kwargs):
        super().__init__(**kwargs)
        self.cs_w = tf.constant(comp_spec_weights, dtype='float32')
        
        self.w = tf.Variable(gen_final_inits(comp_spec_weights, final_min_val), dtype='float32')
        
    def __call__(self, x, **kwargs):
        l1 = tf.matmul(x, self.cs_w)
        return tf.matmul(l1, self.w)

In [92]:
"""
Network with CompSpec Layer.  The last layer is generated with noise
"""
class CompSpecModel(tf.keras.Model):
    # The comp_spec_weights have to be transposed from how their originally trained
    def __init__(self, comp_spec_weights, **kwargs):
        super().__init__(**kwargs)
        self.cs_w = tf.constant(comp_spec_weights, dtype='float32')
        
        self.w = tf.Variable(tf.random.normal([comp_spec_weights.shape[1], 10]), name='w')
        
    def __call__(self, x, **kwargs):
        l1 = tf.matmul(x, self.cs_w)
        return tf.matmul(l1, self.w)

## Analysis Dialog

You want me to say it? I'll say it!  Fuck backprop! Fuck it! Fuck stupid TensorFlow!  Fuck all the fucking companies using these algorithms.  Fuck Gradient Descent!  Fuck it all!  God fucking damn it!

Pardon the excessive profanity.  But I really do hate these algorithms.  That being said, CompSpec does do some interesting 
stuff inside TensorFlow.  

I'm not about to retrain all my networks, and I might delete them because they look gross.  ...never mind I won't do that.  I'll just show you the results, talk about why everything is the worst, and them probably go back to my RQI lab and cook up some new algos.  

Ok, I'll just copy the code for this first one.  This is trained with 100 CompSpec Neurons, and random last layer initialization.  Here's the code.

In [None]:
%matplotlib inline
fig=plt.figure(figsize=(12,12))

mu = 0
sig = 1
Kx = 10
Ky = 10
Nep = 1
T_s = 60000
xi = 0.1
phi = 2 / 11

B = 100 #Batch size

(w, _) = comp_spec_mb(flat_x, S, L, Kx, Ky, Nep, T_s, xi, phi, B)
print("\n\n")
draw_weights(w, Kx, Ky)

my_model = CompSpecModel(w.T)

my_model.compile(
    optimizer='adam',
    loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics=['accuracy']
)

my_model.fit(flat_x, train_y, epochs=20, verbose=2)

I only included training with 20 epochs, but I trained this a bunch, so it probably trained on around 100 epochs.  Let's see how it does! Yay!

In [93]:
my_model.evaluate(flat_test, test_y, verbose=2)

313/313 - 1s - loss: 0.3701 - accuracy: 0.8916


[0.3700554072856903, 0.8916000127792358]

Wow!  An accuracy of 89%?? That's way better than than the wta classification, right?!

You are correct, but that's about where the joy ends.  However, there's one more thing to be joyful about.  With basically every test I've run so far, the test accuracy has been either as good or better than the training accuracy.  So yeah, CompSpec basically is immune to over-fitting (with mnist at least).  That's kinda incredibly dope. 

Ok, let's get to the part where I hate backprop.  

I got ambitious after the success of the 100 CompSpec layer, and I went immediately to 2000 neurons.  Here's the code.  I also probably trained this one on around 50-100 epochs, which literally takes forever.  An epoch takes about 15 seconds for this network so rip me.

In [None]:
%matplotlib inline
fig=plt.figure(figsize=(12,12))

mu = 0
sig = 1
Kx = 40
Ky = 50
Nep = 1
T_s = 60000
xi = 0.1
phi = 2 / 11

B = 100 #Batch size

(w, _) = comp_spec_mb(flat_x, S, L, Kx, Ky, Nep, T_s, xi, phi, B)
print("\n\n")
draw_weights(w, Kx, Ky)

two_thousand = CompSpecModel(w.T)

two_thousand.compile(
    optimizer='adam',
    loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics=['accuracy']
)

two_thousand.fit(flat_x, train_y, epochs=20, verbose=2)

And how does it perform, you ask?  Look for yourself.

In [94]:
two_thousand.evaluate(flat_test, test_y, verbose=2)

313/313 - 2s - loss: 1.0741 - accuracy: 0.8689


[1.0740573406219482, 0.8689000010490417]

Literally worse than the 100 neuron network!  Are you kidding me?? Also the training accuracy just creeps upward.  It basically plateaued around 87% accuracy.  

That makes me angry! And the backprop takes forever!  So I thought this was probably happening because there were simply too many parameters.  So I bumped it down to 400.  Here's the code.  Trained this one for either 100 epochs, or more than 100 epochs.

Oh, and I also thought I might try to make this boi's job easier, so I also initialized the final synapses using wta classification.  So basically I gave a neuron a strong weight to it's wta class, and a weak weight to everything else (so that it could still train on those synapses.  We don't want zero gradients everywhere, do we).  I put the new keras model up top, but here's the code for it.

In [None]:
%matplotlib inline
fig=plt.figure(figsize=(12,12))

mu = 0
sig = 1
Kx = 40
Ky = 50
Nep = 1
T_s = 60000
xi = 0.1
phi = 2 / 11

B = 100 #Batch size

(w, _) = comp_spec_mb(flat_x, S, L, Kx, Ky, Nep, T_s, xi, phi, B)
print("\n\n")
draw_weights(w, Kx, Ky)

fh_good = CompSpecGoodInit(w.T)

fh_good.compile(
    optimizer='adam',
    loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics=['accuracy']
)

fh_good.fit(flat_x, train_y, epochs=100, verbose=2)

Let's see how it does! (*grimaces*) 

In [95]:
fh_good.evaluate(flat_test, test_y, verbose=2)

313/313 - 1s - loss: 0.4050 - accuracy: 0.8832


[0.4050067663192749, 0.8831999897956848]

Better than 2000 gigantor, but still not as good as fucking 100!  Are you kidding me?? 

I don't know for sure what's happening inside this network, but I'm going to safely blame it on simple accumulation instead of angle between prototypes.  With this setup, you can't really discriminate between things with different structure.  You need to have a way for neurons to express whether something's in line with a particular type of structure.  All of this accumulation business really muddles the details.  

With that said, I'm going to go to the conclusion. 

## Conclusions

Man I hate the things I don't make.  Also there's no way in hell I'm going to go dancing through TensorFlow's source code and try to figure out what's going on, and how I can make it faster.  No.  I'm not going to deal with that absolute shit-heap unless I have to.  It's unreal that literally the world is using backprop, given its unthinkably shitty nature.  The biggest "win" that I guess can be taken away from this experiment is that CompSpec doesn't overfit.  That's pretty awesome, but I think it's somewhat to be expected.  I'm not trying to minimize some stupid loss function, I'm trying to find structure.  Minimizing a loss function blindly is both where you overfit, and where you build a stupid piece of social media that literally polarizes the world and causes kids to kill themselves.  Fuck minimizing loss functions.  

"What about the benchmarks?" you ask.  Fuck the benchmarks.  I got some thoughts spinning around the ol' noggerino that I want to try out.  I want to build an architecture that renders stupid backprop obsolete.  That absolutely blows it out of the water. Where computations are run in a local, unthinkably parallelized fashion.  My research is basically bankrolled right now, and I don't want to ruin that by bringing other people into the picture.  I want to run lean in mean.  Come August, if I've failed, I can show the rest of the world what I've done so that I can get into grad school and get some grants and continue my research.  

But until then, it's a fucking field day, baby.  I don't want to compare my god-algorithm to Krotov and Hopfield's shitty one.  It's better.  Well, I shouldn't necessarily say that.  It wasn't getting as good of a classification accuracy using tensorflow, but something doesn't add up there.  They must've trained the final layer using something other than my setup, because with their full-on digit prototyping network, they were still getting an accuracy of 98%? I think?  So yeah, something's not adding up.

If I need to try to grovel to higher-ups in order to get the resources I need to continue research, so be it.  But I'm not there yet, and I got a lot more time to try to bool.  

Oh, I also forgot to mention that I didn't try charged CompSpec.  Why?  Cause fuck that, and fuck backprop.

## Next Steps

Flee from back propagation!  That's the most important step.  

With regards to benchmarking this algo against KH's paper, I'm going to put that off until it's absolutely necessary.  I don't want to be bogged down with competing with other people.  That's always seemed pretty pointless.  I'd much rather just stay lean and mean, working towards what makes sense to me.  God I'm so glad I'm not in school.  What an unbelievably stupid waste of time.  

Now then, it isn't as related to this experiment, but I've been realizing that I need a neuron architecture that allows neurons to react to a wide variety of structures.  Right now, CompSpec learns a couple prototypes.  Great.  But it would be fantastic if each neuron were able to hold information about a variety of structures.  The "or gate" functionality that I've been journaling about.  That's what I really need.  And that's what I'm doing next.  

God!  Fuck backprop!