# Variable Xi Invariance

## Intro
* **Date**: 1/6/2021
* **What**: Basically, I'm doing ema invariance with reconstruction-based training, but I'm using a smaller learning constant for the neurons that aren't "on".  
* **Why**: Well, because invariance hasn't worked, and I think this just might.  Basically if I do reconstruction learning with a uniform learning constant, the invariant neurons learn to ignore irrelevant neurons.  The bad thing?  Some sparse features are entirely left out, and that's just no good.  And if I do reconstruction learning only on the neurons that are "on," the invariant layer learns *incredibly slowly*. Ain't nobody got time for learning that slow.  So basically, I'm trying to take the best things from full-reconstruction learning, and do it in a way where sparse features aren't disregarded.
* **Hopes**: I really, really hope that all the sparse features will be represented by invariant neurons, and that the sparse features are well-grouped.
* **Limitations**: Well, invariance hasn't really worked out that well so far.  I'm not sure what'll break, but I'm sure it'll find a way.

## Code

In [1]:
import numpy as np
import cupy as cp
import matplotlib.pyplot as plt
import matplotlib.animation as animation

from tensorflow.keras.datasets import mnist
from tqdm import tqdm

(x_tr, _), _ = mnist.load_data()

x_tr = x_tr / 255.0

r_l = 1000 # Length of the ribbon
m_sl = 28 # Side length of each images

r_s = 0 # Starting index of the ribbon

ribbon = np.zeros((m_sl, r_l * m_sl))

for x in range(r_l):
    ribbon[:, m_sl * x : m_sl * (x + 1)] = x_tr[x + r_s]
    
t_sl = 30 # Tapestry side length
m_sl = 28 # Side length of each images

tapestry = np.zeros((t_sl * m_sl, t_sl * m_sl))

x_i = 3000

for x in range(t_sl):
    for y in range(t_sl):

        tapestry[y * m_sl : (y + 1) * m_sl, x * m_sl : (x + 1) * m_sl] = x_tr[x_i]
        x_i += 1
        
tapestry[(t_sl - 1) * m_sl:, :] = tapestry[: m_sl, :]
tapestry[:, (t_sl - 1) * m_sl:] = tapestry[:, : m_sl]

def draw_weights(w, Kx, Ky, s_len, fig):
    tapestry = np.zeros((s_len * Ky, s_len * Kx))
    
    w_i = 0
    for y in range(Ky):
        for x in range(Kx):
            tapestry[y * s_len: (y + 1) * s_len, x * s_len: (x + 1) * s_len] = w[w_i].reshape(s_len, s_len)
            w_i += 1
            
    plt.clf()        
    max_val = np.max(tapestry)
    im = plt.imshow(tapestry, cmap="Greys", vmax=max_val)
    fig.colorbar(im, ticks=[0, max_val])
    plt.axis("off")
    fig.canvas.draw()

In [4]:
x_o = 420
y_o = 420

sl = 20

x = x_o
y = y_o

v_x = 0
v_y = 0

v_max = 3

a_x = np.random.uniform(-1, 1)
a_y = np.random.uniform(-1, 1)

img_count = 100_000
imgs = []

del_t = 1

for i in range(img_count):
    if i % 20 == 0:
        a_x = np.random.uniform(-1, 1)
        a_y = np.random.uniform(-1, 1)
        
    x += v_x * del_t
    y += v_y * del_t
    v_x = np.clip(v_x + (a_x * del_t), -v_max, v_max)
    v_y = np.clip(v_y + (a_y * del_t), -v_max, v_max)
    
    x_f = int(x) % ((t_sl - 1) * m_sl)
    y_f = int(y) % ((t_sl - 1) * m_sl)
    
    imgs.append(tapestry[y_f: y_f + sl, x_f : x_f + sl])
    
img_array = np.array(imgs)
ts_tapestry = img_array.reshape(-1, sl ** 2)
gp_tapestry = cp.asarray(ts_tapestry)

In [72]:
%matplotlib notebook
fig = plt.figure(figsize=(3, 3))

ims = []
for i in range(500):
    im = plt.imshow(imgs[i], cmap="gray_r", animated=True)
    ims.append([im])

ani = animation.ArtistAnimation(fig, ims, interval=100, blit=True,
                                repeat_delay=500)

plt.xticks([])
plt.yticks([])
plt.show()

<IPython.core.display.Javascript object>

In [18]:
img_count = 100_000
imgs = []

del_t = 1

for i in range((r_l - 1) * m_sl):    
    imgs.append(ribbon[:, i : i + m_sl])
    
img_array = np.array(imgs)
ts_ribbon = img_array.reshape(-1, m_sl ** 2)
gp_ribbon = cp.asarray(ts_ribbon)

In [12]:
%matplotlib notebook
fig = plt.figure(figsize=(3, 3))

ims = []
for i in range(500):
    im = plt.imshow(imgs[i], cmap="gray_r", animated=True)
    ims.append([im])

ani = animation.ArtistAnimation(fig, ims, interval=100, blit=True,
                                repeat_delay=500)

plt.xticks([])
plt.yticks([])
plt.show()

<IPython.core.display.Javascript object>

## Analysis Dialog

Aight fam, who's going to carry the mother-fucking boats?

In [19]:
%matplotlib notebook
fig = plt.figure(figsize=(10, 10))

Nep = 20
T_s = 10000
prec = 1e-10

# Sparse Layer
Kx = 30
Ky = 30
sN = Kx * Ky
m_len = sl ** 2

n_w = 6 #Number of winners

sw = cp.random.uniform(0, 0.2, (sN, m_len))
xi = 0.03

# Invariant Layer
Ix = 10
Iy = 10
iN = Ix * Iy

iw = cp.random.uniform(0, 0.1, (iN, sN))
alpha = 0.5
leta = 0.01
seta = 0.001

for ep in range(Nep):
    inputs = gp_tapestry
    
    io = cp.zeros((iN, 1))
    
    for i in tqdm(range(T_s)):

        # Handle sparse layer
        v = inputs[i].reshape(-1, 1)
#         print(sw.shape, v.shape)
        p = sw @ v
        winners = cp.argsort(p, axis=0)[-n_w:]
        mask = cp.zeros((sN, 1))
        mask[winners] = 1
        so_uw = mask * p
        r = sw.T @ so_uw
        mod_r = cp.maximum(r, prec)
        e = v - r

        sw += sw * so_uw * (e / mod_r).T * xi

        so = so_uw / cp.sum(sw, axis=1).reshape(-1, 1)

        # Handle invariant layer
        io_pert = iw @ so
        
        io += (io_pert - io) * alpha

        # Train for reconstruction ability
        r = iw.T @ io
        mod_r = cp.maximum(r, prec)
        e = so - r
        zeta = np.where(so > 0, leta, seta)

        iw += iw * io * ((e * zeta) / mod_r).T

    if (ep // 2) % 2:
        draw_weights(iw.get(), Ix, Iy, Kx, fig)
    else:
        draw_weights(sw.get(), Kx, Ky, sl, fig)

<IPython.core.display.Javascript object>

100%|██████████| 10000/10000 [00:10<00:00, 939.75it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1069.79it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1069.83it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1076.36it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1070.86it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1072.97it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1066.96it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1076.97it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1066.76it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1069.04it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1067.90it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1061.73it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1065.38it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1065.40it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1064.45it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1064.16it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1062.65it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1064.2

Did it work?  Difficult to tell.

In [21]:
%matplotlib notebook
fig = plt.figure(figsize=(10, 5))

sw_np = sw.get()
iw_np = iw.get()

ims = []

io = np.zeros((iN, 1))


for i in tqdm(range(500)):
    # Handle sparse layer
    v = ts_tapestry[i].reshape(-1, 1)
    p = sw_np @ v
    winners = np.argsort(p, axis=0)[-n_w:]
    mask = np.zeros((sN, 1))
    mask[winners] = 1
    so_uw = mask * p
    r = sw_np.T @ so_uw
    mod_r = np.maximum(r, prec)
    e = v - r

    so = so_uw / np.sum(sw_np, axis=1).reshape(-1, 1)

    # Handle invariant layer
    io_pert = iw_np @ so

    io += (io_pert - io) * alpha
    
    mini_tap = np.zeros((20, 60))
    
    glee = 10
    
    mini_tap[:, :20] = v.reshape(20, 20)
    mini_tap[:glee, 40 - glee:40] = io.reshape(glee, glee)
    mini_tap[:, -20:] = r.reshape(20, 20)
    
    im = plt.imshow(mini_tap, cmap="gray_r", animated=True)
    ims.append([im])

ani = animation.ArtistAnimation(fig, ims, interval=200, blit=True,
                                repeat_delay=500)

plt.xticks([])
plt.yticks([])

plt.show()

<IPython.core.display.Javascript object>

100%|██████████| 500/500 [00:01<00:00, 476.01it/s]


Ok, let's check out neuron representation.

In [23]:
fig = plt.figure(figsize=(10, 10))

draw_weights((iw @ sw).get(), Ix, Iy, 20, fig)

<IPython.core.display.Javascript object>

In [28]:
fig = plt.figure(figsize=(10,10))
draw_weights((sw * iw[15].reshape(-1, 1)).get(), Kx, Ky, 20, fig)

<IPython.core.display.Javascript object>

In [29]:
fig = plt.figure(figsize=(10,10))

draw_weights(sw.get(), Kx, Ky, sl, fig)

<IPython.core.display.Javascript object>

In [31]:
plt.figure()
plt.xticks([])
plt.yticks([])
plt.imshow(cp.sum(iw, axis=0).reshape(Kx, Ky).get(), cmap="gray")

<IPython.core.display.Javascript object>

<matplotlib.image.AxesImage at 0x7f18d0c46cd0>

That's not great.  I think I'm going to use fewer winners, and see what happens.

In [34]:
%matplotlib notebook
fig = plt.figure(figsize=(10, 10))

Nep = 20
T_s = 10000
prec = 1e-10

# Sparse Layer
Kx = 30
Ky = 30
sN = Kx * Ky
m_len = sl ** 2

n_w = 3 #Number of winners

sw = cp.random.uniform(0, 0.2, (sN, m_len))
xi = 0.03

# Invariant Layer
Ix = 10
Iy = 10
iN = Ix * Iy

iw = cp.random.uniform(0, 0.1, (iN, sN))
alpha = 0.5
leta = 0.05
seta = 0.005

for ep in range(Nep):
    inputs = gp_tapestry
    
    io = cp.zeros((iN, 1))
    
    for i in tqdm(range(T_s)):

        # Handle sparse layer
        v = inputs[i].reshape(-1, 1)
#         print(sw.shape, v.shape)
        p = sw @ v
        winners = cp.argsort(p, axis=0)[-n_w:]
        mask = cp.zeros((sN, 1))
        mask[winners] = 1
        so_uw = mask * p
        r = sw.T @ so_uw
        mod_r = cp.maximum(r, prec)
        e = v - r

        sw += sw * so_uw * (e / mod_r).T * xi

        so = so_uw / cp.sum(sw, axis=1).reshape(-1, 1)

        # Handle invariant layer
        io_pert = iw @ so
        
        io += (io_pert - io) * alpha

        # Train for reconstruction ability
        r = iw.T @ io
        mod_r = cp.maximum(r, prec)
        e = so - r
        zeta = np.where(so > 0, leta, seta)

        iw += iw * io * ((e * zeta) / mod_r).T

    if (ep // 4) % 2 == 0:
        draw_weights(iw.get(), Ix, Iy, Kx, fig)
    else:
        draw_weights(sw.get(), Kx, Ky, sl, fig)

<IPython.core.display.Javascript object>

100%|██████████| 10000/10000 [00:09<00:00, 1053.01it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1054.78it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1060.12it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1062.77it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1059.77it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1058.15it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1062.42it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1055.47it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1046.28it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1063.11it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1049.96it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1052.55it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1055.65it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1054.79it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1015.63it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1013.89it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1012.94it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1022.

I upped `leta` and `seta`, and I'm pretty happy about the invariant feature prototypes.  Time to see what they're all about.

In [35]:
%matplotlib notebook
fig = plt.figure(figsize=(10, 5))

sw_np = sw.get()
iw_np = iw.get()

ims = []

io = np.zeros((iN, 1))


for i in tqdm(range(500)):
    # Handle sparse layer
    v = ts_tapestry[i].reshape(-1, 1)
    p = sw_np @ v
    winners = np.argsort(p, axis=0)[-n_w:]
    mask = np.zeros((sN, 1))
    mask[winners] = 1
    so_uw = mask * p
    r = sw_np.T @ so_uw
    mod_r = np.maximum(r, prec)
    e = v - r

    so = so_uw / np.sum(sw_np, axis=1).reshape(-1, 1)

    # Handle invariant layer
    io_pert = iw_np @ so

    io += (io_pert - io) * alpha
    
    mini_tap = np.zeros((20, 60))
    
    glee = 10
    
    mini_tap[:, :20] = v.reshape(20, 20)
    mini_tap[:glee, 40 - glee:40] = io.reshape(glee, glee)
    mini_tap[:, -20:] = r.reshape(20, 20)
    
    im = plt.imshow(mini_tap, cmap="gray_r", animated=True)
    ims.append([im])

ani = animation.ArtistAnimation(fig, ims, interval=200, blit=True,
                                repeat_delay=500)

plt.xticks([])
plt.yticks([])

plt.show()

<IPython.core.display.Javascript object>

100%|██████████| 500/500 [00:01<00:00, 482.63it/s]


In [36]:
fig = plt.figure(figsize=(10, 10))

draw_weights((iw @ sw).get(), Ix, Iy, 20, fig)

<IPython.core.display.Javascript object>

In [50]:
fig = plt.figure(figsize=(10,10))
draw_weights((sw * iw[88].reshape(-1, 1)).get(), Kx, Ky, 20, fig)

<IPython.core.display.Javascript object>

In [51]:
fig = plt.figure(figsize=(10,10))

draw_weights(sw.get(), Kx, Ky, sl, fig)

<IPython.core.display.Javascript object>

In [52]:
plt.figure()
plt.xticks([])
plt.yticks([])
plt.imshow(cp.sum(iw, axis=0).reshape(Kx, Ky).get(), cmap="gray")

<IPython.core.display.Javascript object>

<matplotlib.image.AxesImage at 0x7f18d120cc90>

Hmm, I don't know how I feel about that.  

In [56]:
np.argmin(cp.sum(iw, axis=0).reshape(Kx, Ky).get())

229

In [57]:
plt.figure()
plt.imshow(sw.get()[229].reshape(20, 20), cmap='gray_r')

<IPython.core.display.Javascript object>

<matplotlib.image.AxesImage at 0x7f18d101b490>

I'm guessing certain prototypes aren't represented as strongly in the invariant prototypes.  Let's see which neuron is represented most strongly.

In [58]:
np.argmax(cp.sum(iw, axis=0).reshape(Kx, Ky).get())

840

In [59]:
plt.figure()
plt.imshow(sw.get()[840].reshape(20, 20), cmap='gray_r')

<IPython.core.display.Javascript object>

<matplotlib.image.AxesImage at 0x7f18d0d45610>

Yeah, I guess I can see how that bad boi would be represented strongly.  

I'm actually pretty happy with this.  Just to see if I can mess with this in a cool way, I'm going to raise the number of winners to 4, and I'm going to lower `seta`.  I think lowering `seta` will make it harder for the network to ignore neurons.  You know, while I'm at it, I think I'll increase `leta`.  Aight, let's see what happens.

In [60]:
%matplotlib notebook
fig = plt.figure(figsize=(10, 10))

Nep = 20
T_s = 10000
prec = 1e-10

# Sparse Layer
Kx = 30
Ky = 30
sN = Kx * Ky
m_len = sl ** 2

n_w = 4 #Number of winners

sw = cp.random.uniform(0, 0.2, (sN, m_len))
xi = 0.03

# Invariant Layer
Ix = 10
Iy = 10
iN = Ix * Iy

iw = cp.random.uniform(0, 0.1, (iN, sN))
alpha = 0.5
leta = 0.1
seta = 0.001

for ep in range(Nep):
    inputs = gp_tapestry
    
    io = cp.zeros((iN, 1))
    
    for i in tqdm(range(T_s)):

        # Handle sparse layer
        v = inputs[i].reshape(-1, 1)
#         print(sw.shape, v.shape)
        p = sw @ v
        winners = cp.argsort(p, axis=0)[-n_w:]
        mask = cp.zeros((sN, 1))
        mask[winners] = 1
        so_uw = mask * p
        r = sw.T @ so_uw
        mod_r = cp.maximum(r, prec)
        e = v - r

        sw += sw * so_uw * (e / mod_r).T * xi

        so = so_uw / cp.sum(sw, axis=1).reshape(-1, 1)

        # Handle invariant layer
        io_pert = iw @ so
        
        io += (io_pert - io) * alpha

        # Train for reconstruction ability
        r = iw.T @ io
        mod_r = cp.maximum(r, prec)
        e = so - r
        zeta = np.where(so > 0, leta, seta)

        iw += iw * io * ((e * zeta) / mod_r).T

    if (ep // 4) % 2 == 0:
        draw_weights(iw.get(), Ix, Iy, Kx, fig)
    else:
        draw_weights(sw.get(), Kx, Ky, sl, fig)

<IPython.core.display.Javascript object>

100%|██████████| 10000/10000 [00:09<00:00, 1065.13it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1064.07it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1059.11it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1065.08it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1062.38it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1055.22it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1059.26it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1058.99it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1061.54it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1041.48it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1023.61it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1019.23it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1016.90it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1015.02it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1016.75it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1018.92it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1029.67it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1056.

In [61]:
%matplotlib notebook
fig = plt.figure(figsize=(10, 5))

sw_np = sw.get()
iw_np = iw.get()

ims = []

io = np.zeros((iN, 1))


for i in tqdm(range(500)):
    # Handle sparse layer
    v = ts_tapestry[i].reshape(-1, 1)
    p = sw_np @ v
    winners = np.argsort(p, axis=0)[-n_w:]
    mask = np.zeros((sN, 1))
    mask[winners] = 1
    so_uw = mask * p
    r = sw_np.T @ so_uw
    mod_r = np.maximum(r, prec)
    e = v - r

    so = so_uw / np.sum(sw_np, axis=1).reshape(-1, 1)

    # Handle invariant layer
    io_pert = iw_np @ so

    io += (io_pert - io) * alpha
    
    mini_tap = np.zeros((20, 60))
    
    glee = 10
    
    mini_tap[:, :20] = v.reshape(20, 20)
    mini_tap[:glee, 40 - glee:40] = io.reshape(glee, glee)
    mini_tap[:, -20:] = r.reshape(20, 20)
    
    im = plt.imshow(mini_tap, cmap="gray_r", animated=True)
    ims.append([im])

ani = animation.ArtistAnimation(fig, ims, interval=200, blit=True,
                                repeat_delay=500)

plt.xticks([])
plt.yticks([])

plt.show()

<IPython.core.display.Javascript object>

100%|██████████| 500/500 [00:01<00:00, 485.10it/s]


In [65]:
fig = plt.figure(figsize=(10, 10))

w_w = iw @ sw

www = w_w / cp.linalg.norm(w_w, axis=1).reshape(-1, 1)

draw_weights(www.get(), Ix, Iy, 20, fig)

<IPython.core.display.Javascript object>

In [106]:
fig = plt.figure(figsize=(10,10))
draw_weights((sw * iw[69].reshape(-1, 1)).get(), Kx, Ky, 20, fig)

<IPython.core.display.Javascript object>

In [77]:
fig = plt.figure(figsize=(10,10))

draw_weights(sw.get(), Kx, Ky, sl, fig)

<IPython.core.display.Javascript object>

In [78]:
plt.figure()
plt.xticks([])
plt.yticks([])
plt.imshow(cp.sum(iw, axis=0).reshape(Kx, Ky).get(), cmap="gray")

<IPython.core.display.Javascript object>

<matplotlib.image.AxesImage at 0x7f18cf520050>

In [79]:
np.argmin(cp.sum(iw, axis=0).reshape(Kx, Ky).get())

873

In [80]:
plt.figure()
plt.imshow(sw.get()[873].reshape(20, 20), cmap='gray_r')

<IPython.core.display.Javascript object>

<matplotlib.image.AxesImage at 0x7f18d13f3f10>

In [81]:
np.argmax(cp.sum(iw, axis=0).reshape(Kx, Ky).get())

483

In [82]:
plt.figure()
plt.imshow(sw.get()[483].reshape(20, 20), cmap='gray_r')

<IPython.core.display.Javascript object>

<matplotlib.image.AxesImage at 0x7f18cf4be990>

Ok, I'm going to try something new.  I'm going to normalize the invariant layer every so often, and see what that does.  I'm also going to decrease the ema alpha, so that things are a bit more invariant.

Oh, and also I'm going to decrease the sparse layer's learning constant because I'm already training on a kazillion epochs, so it has time to make its prototypes darn good.

In [109]:
%matplotlib notebook
fig = plt.figure(figsize=(10, 10))

Nep = 20
T_s = 10000
prec = 1e-10

# Sparse Layer
Kx = 30
Ky = 30
sN = Kx * Ky
m_len = sl ** 2

n_w = 4 #Number of winners

sw = cp.random.uniform(0, 0.2, (sN, m_len))
xi = 0.008

# Invariant Layer
Ix = 10
Iy = 10
iN = Ix * Iy

iw = cp.random.uniform(0, 0.1, (iN, sN))
alpha = 2 / 6
leta = 0.1
seta = 0.001

for ep in range(Nep):
    inputs = gp_tapestry
    
    io = cp.zeros((iN, 1))
    
    for i in tqdm(range(T_s)):

        # Handle sparse layer
        v = inputs[i].reshape(-1, 1)
#         print(sw.shape, v.shape)
        p = sw @ v
        winners = cp.argsort(p, axis=0)[-n_w:]
        mask = cp.zeros((sN, 1))
        mask[winners] = 1
        so_uw = mask * p
        r = sw.T @ so_uw
        mod_r = cp.maximum(r, prec)
        e = v - r

        sw += sw * so_uw * (e / mod_r).T * xi

        so = so_uw / cp.sum(sw, axis=1).reshape(-1, 1)

        # Handle invariant layer
        io_pert = iw @ so
        
        io += (io_pert - io) * alpha

        # Train for reconstruction ability
        r = iw.T @ io
        mod_r = cp.maximum(r, prec)
        e = so - r
        zeta = np.where(so > 0, leta, seta)

        iw += iw * io * ((e * zeta) / mod_r).T
        
        if i % 2000 == 0:
            iw = 9 * (iw / cp.sum(iw, axis=1).reshape(-1, 1))

    if (ep // 4) % 2 == 0:
        draw_weights(iw.get(), Ix, Iy, Kx, fig)
    else:
        draw_weights(sw.get(), Kx, Ky, sl, fig)

<IPython.core.display.Javascript object>

100%|██████████| 10000/10000 [00:09<00:00, 1060.51it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1057.86it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1058.48it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1057.72it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1059.40it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1056.25it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1058.24it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1063.64it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1061.10it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1063.57it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1061.41it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1059.63it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1058.23it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1061.73it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1061.33it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1064.41it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1064.34it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1058.

Hmmmmmmm.  I think that might be the juice.  Let's do the analysis cell run.

In [110]:
%matplotlib notebook
fig = plt.figure(figsize=(10, 5))

sw_np = sw.get()
iw_np = iw.get()

ims = []

io = np.zeros((iN, 1))


for i in tqdm(range(500)):
    # Handle sparse layer
    v = ts_tapestry[i].reshape(-1, 1)
    p = sw_np @ v
    winners = np.argsort(p, axis=0)[-n_w:]
    mask = np.zeros((sN, 1))
    mask[winners] = 1
    so_uw = mask * p
    r = sw_np.T @ so_uw
    mod_r = np.maximum(r, prec)
    e = v - r

    so = so_uw / np.sum(sw_np, axis=1).reshape(-1, 1)

    # Handle invariant layer
    io_pert = iw_np @ so

    io += (io_pert - io) * alpha
    
    mini_tap = np.zeros((20, 60))
    
    glee = 10
    
    mini_tap[:, :20] = v.reshape(20, 20)
    mini_tap[:glee, 40 - glee:40] = io.reshape(glee, glee)
    mini_tap[:, -20:] = r.reshape(20, 20)
    
    im = plt.imshow(mini_tap, cmap="gray_r", animated=True)
    ims.append([im])

ani = animation.ArtistAnimation(fig, ims, interval=200, blit=True,
                                repeat_delay=500)

plt.xticks([])
plt.yticks([])

plt.show()

<IPython.core.display.Javascript object>

100%|██████████| 500/500 [00:01<00:00, 481.87it/s]


In [111]:
fig = plt.figure(figsize=(10, 10))

w_w = iw @ sw

www = w_w / cp.linalg.norm(w_w, axis=1).reshape(-1, 1)

draw_weights(www.get(), Ix, Iy, 20, fig)

<IPython.core.display.Javascript object>

In [129]:
fig = plt.figure(figsize=(10,10))
draw_weights((sw * iw[30
                     ].reshape(-1, 1)).get(), Kx, Ky, 20, fig)

<IPython.core.display.Javascript object>

In [124]:
fig = plt.figure(figsize=(10,10))

draw_weights(sw.get(), Kx, Ky, sl, fig)

<IPython.core.display.Javascript object>

In [125]:
plt.figure()
plt.xticks([])
plt.yticks([])
plt.imshow(cp.sum(iw, axis=0).reshape(Kx, Ky).get(), cmap="gray")

<IPython.core.display.Javascript object>

<matplotlib.image.AxesImage at 0x7f18d0ab2ed0>

In [126]:
plt.figure()
plt.imshow(sw.get()[np.argmin(cp.sum(iw, axis=0).reshape(Kx, Ky).get())].reshape(20, 20), cmap='gray_r')

<IPython.core.display.Javascript object>

<matplotlib.image.AxesImage at 0x7f18d08e0950>

In [127]:
plt.figure()
plt.imshow(sw.get()[np.argmax(cp.sum(iw, axis=0).reshape(Kx, Ky).get())].reshape(20, 20), cmap='gray_r')

<IPython.core.display.Javascript object>

<matplotlib.image.AxesImage at 0x7f18d13591d0>

Ok, dinner's ready, but this is actually pretty compelling.

It's now 1/7/2021, and I've got mine eye on the prize.  I'm going to quickly train up this network again, and then I'm going to plot affinity groups for each invariant neuron.  I'm also going to slightly decrease `leta` and `seta` so the learning is more stable.

In [6]:
%matplotlib notebook
fig = plt.figure(figsize=(10, 10))

Nep = 20
T_s = 10000
prec = 1e-10

# Sparse Layer
Kx = 30
Ky = 30
sN = Kx * Ky
m_len = sl ** 2

n_w = 4 #Number of winners

sw = cp.random.uniform(0, 0.2, (sN, m_len))
xi = 0.008

# Invariant Layer
Ix = 10
Iy = 10
iN = Ix * Iy

iw = cp.random.uniform(0, 0.1, (iN, sN))
alpha = 2 / 6
leta = 0.05
seta = 0.0005

for ep in range(Nep):
    inputs = gp_tapestry
    
    io = cp.zeros((iN, 1))
    
    for i in tqdm(range(T_s)):

        # Handle sparse layer
        v = inputs[i].reshape(-1, 1)
#         print(sw.shape, v.shape)
        p = sw @ v
        winners = cp.argsort(p, axis=0)[-n_w:]
        mask = cp.zeros((sN, 1))
        mask[winners] = 1
        so_uw = mask * p
        r = sw.T @ so_uw
        mod_r = cp.maximum(r, prec)
        e = v - r

        sw += sw * so_uw * (e / mod_r).T * xi

        so = so_uw / cp.sum(sw, axis=1).reshape(-1, 1)

        # Handle invariant layer
        io_pert = iw @ so
        
        io += (io_pert - io) * alpha

        # Train for reconstruction ability
        r = iw.T @ io
        mod_r = cp.maximum(r, prec)
        e = so - r
        zeta = np.where(so > 0, leta, seta)

        iw += iw * io * ((e * zeta) / mod_r).T
        
        if i % 2000 == 0:
            iw = 9 * (iw / cp.sum(iw, axis=1).reshape(-1, 1))

    if (ep // 4) % 2 == 0:
        draw_weights(iw.get(), Ix, Iy, Kx, fig)
    else:
        draw_weights(sw.get(), Kx, Ky, sl, fig)

<IPython.core.display.Javascript object>

100%|██████████| 10000/10000 [00:09<00:00, 1005.67it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1076.07it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1071.08it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1076.43it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1078.01it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1076.86it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1074.75it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1078.17it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1071.78it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1069.29it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1067.63it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1063.52it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1069.67it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1072.01it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1067.56it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1073.02it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1068.60it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1066.

In [7]:
%matplotlib notebook
fig = plt.figure(figsize=(10, 5))

sw_np = sw.get()
iw_np = iw.get()

ims = []

io = np.zeros((iN, 1))


for i in tqdm(range(500)):
    # Handle sparse layer
    v = ts_tapestry[i].reshape(-1, 1)
    p = sw_np @ v
    winners = np.argsort(p, axis=0)[-n_w:]
    mask = np.zeros((sN, 1))
    mask[winners] = 1
    so_uw = mask * p
    r = sw_np.T @ so_uw
    mod_r = np.maximum(r, prec)
    e = v - r

    so = so_uw / np.sum(sw_np, axis=1).reshape(-1, 1)

    # Handle invariant layer
    io_pert = iw_np @ so

    io += (io_pert - io) * alpha
    
    mini_tap = np.zeros((20, 60))
    
    glee = 10
    
    mini_tap[:, :20] = v.reshape(20, 20)
    mini_tap[:glee, 40 - glee:40] = io.reshape(glee, glee)
    mini_tap[:, -20:] = r.reshape(20, 20)
    
    im = plt.imshow(mini_tap, cmap="gray_r", animated=True)
    ims.append([im])

ani = animation.ArtistAnimation(fig, ims, interval=200, blit=True,
                                repeat_delay=500)

plt.xticks([])
plt.yticks([])

plt.show()

<IPython.core.display.Javascript object>

100%|██████████| 500/500 [00:01<00:00, 460.77it/s]


In [8]:
fig = plt.figure(figsize=(10, 10))

w_w = iw @ sw

www = w_w / cp.linalg.norm(w_w, axis=1).reshape(-1, 1)

draw_weights(www.get(), Ix, Iy, 20, fig)

<IPython.core.display.Javascript object>

In [19]:
fig = plt.figure(figsize=(10,10))
draw_weights((sw * iw[84].reshape(-1, 1)).get(), Kx, Ky, 20, fig)

<IPython.core.display.Javascript object>

In [20]:
fig = plt.figure(figsize=(10,10))

draw_weights(sw.get(), Kx, Ky, sl, fig)

<IPython.core.display.Javascript object>

In [21]:
plt.figure()
plt.xticks([])
plt.yticks([])
plt.imshow(cp.sum(iw, axis=0).reshape(Kx, Ky).get(), cmap="gray")

<IPython.core.display.Javascript object>

<matplotlib.image.AxesImage at 0x7fca1bd2dd90>

In [22]:
plt.figure()
plt.imshow(sw.get()[np.argmin(cp.sum(iw, axis=0).reshape(Kx, Ky).get())].reshape(20, 20), cmap='gray_r')

<IPython.core.display.Javascript object>

<matplotlib.image.AxesImage at 0x7fca1bb18550>

In [23]:
plt.figure()
plt.imshow(sw.get()[np.argmax(cp.sum(iw, axis=0).reshape(Kx, Ky).get())].reshape(20, 20), cmap='gray_r')

<IPython.core.display.Javascript object>

<matplotlib.image.AxesImage at 0x7fca1ba2b450>

Ok, now I'm going to try to plot the affinity groups for each invariant neuron.

In [58]:
fig = plt.figure(figsize=(10, 80))

draw_weights(sw[cp.argsort(iw, axis=1)][:, -9:].reshape(900, 400).get(), 9, 100, 20, fig)

<IPython.core.display.Javascript object>

Shoot dang!!!  That's incredibly cool!  It's pretty clear that some of the invariant neurons are activate by more than one type of feature, but this is actually sweet!  Especially for the straight lines, some of these groupings are quite good. 

Ok, I really want to try mwta on cifar10, but first I'm going to train this with 49 invariant neurons and see what happens.

In [59]:
%matplotlib notebook
fig = plt.figure(figsize=(10, 10))

Nep = 20
T_s = 10000
prec = 1e-10

# Sparse Layer
Kx = 30
Ky = 30
sN = Kx * Ky
m_len = sl ** 2

n_w = 4 #Number of winners

sw = cp.random.uniform(0, 0.2, (sN, m_len))
xi = 0.008

# Invariant Layer
Ix = 7
Iy = 7
iN = Ix * Iy

iw = cp.random.uniform(0, 0.1, (iN, sN))
alpha = 2 / 6
leta = 0.05
seta = 0.0005

for ep in range(Nep):
    inputs = gp_tapestry
    
    io = cp.zeros((iN, 1))
    
    for i in tqdm(range(T_s)):

        # Handle sparse layer
        v = inputs[i].reshape(-1, 1)
        p = sw @ v
        winners = cp.argsort(p, axis=0)[-n_w:]
        mask = cp.zeros((sN, 1))
        mask[winners] = 1
        so_uw = mask * p
        r = sw.T @ so_uw
        mod_r = cp.maximum(r, prec)
        e = v - r

        sw += sw * so_uw * (e / mod_r).T * xi

        so = so_uw / cp.sum(sw, axis=1).reshape(-1, 1)

        # Handle invariant layer
        io_pert = iw @ so
        
        io += (io_pert - io) * alpha

        # Train for reconstruction ability
        r = iw.T @ io
        mod_r = cp.maximum(r, prec)
        e = so - r
        zeta = np.where(so > 0, leta, seta)

        iw += iw * io * ((e * zeta) / mod_r).T
        
        if i % 2000 == 0:
            iw = 18 * (iw / cp.sum(iw, axis=1).reshape(-1, 1))

    if (ep // 4) % 2 == 0:
        draw_weights(iw.get(), Ix, Iy, Kx, fig)
    else:
        draw_weights(sw.get(), Kx, Ky, sl, fig)

<IPython.core.display.Javascript object>

100%|██████████| 10000/10000 [00:09<00:00, 1073.71it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1083.19it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1067.77it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1083.44it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1081.31it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1081.86it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1082.17it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1079.19it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1083.25it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1082.17it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1077.36it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1080.33it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1076.87it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1072.87it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1076.24it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1070.89it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1072.22it/s]
100%|██████████| 10000/10000 [00:09<00:00, 1076.

In [73]:
%matplotlib notebook
fig = plt.figure(figsize=(10, 5))

sw_np = sw.get()
iw_np = iw.get()

ims = []

io = np.zeros((iN, 1))


for i in tqdm(range(500)):
    # Handle sparse layer
    v = ts_tapestry[i].reshape(-1, 1)
    p = sw_np @ v
    winners = np.argsort(p, axis=0)[-n_w:]
    mask = np.zeros((sN, 1))
    mask[winners] = 1
    so_uw = mask * p
    r = sw_np.T @ so_uw
    mod_r = np.maximum(r, prec)
    e = v - r

    so = so_uw / np.sum(sw_np, axis=1).reshape(-1, 1)

    # Handle invariant layer
    io_pert = iw_np @ so

    io += (io_pert - io) * alpha
    
    mini_tap = np.zeros((20, 60))
    
    glee = 7
    
    mini_tap[:, :20] = v.reshape(20, 20)
    mini_tap[:glee, 40 - glee:40] = io.reshape(glee, glee)
    mini_tap[:, -20:] = r.reshape(20, 20)
    
    im = plt.imshow(mini_tap, cmap="gray_r", animated=True)
    ims.append([im])

ani = animation.ArtistAnimation(fig, ims, interval=200, blit=True,
                                repeat_delay=500)

plt.xticks([])
plt.yticks([])

plt.show()

<IPython.core.display.Javascript object>

100%|██████████| 500/500 [00:01<00:00, 470.33it/s]


In [74]:
%matplotlib notebook
fig = plt.figure(figsize=(10, 5))

sw_np = sw.get()
iw_np = iw.get()

ims = []

io = np.zeros((iN, 1))


for i in tqdm(range(500)):
    # Handle sparse layer
    v = ts_tapestry[i].reshape(-1, 1)
    p = sw_np @ v
    winners = np.argsort(p, axis=0)[-n_w:]
    mask = np.zeros((sN, 1))
    mask[winners] = 1
    so_uw = mask * p
    r = sw_np.T @ so_uw
    mod_r = np.maximum(r, prec)
    e = v - r

    so = so_uw / np.sum(sw_np, axis=1).reshape(-1, 1)

    # Handle invariant layer
    io_pert = iw_np @ so

    io += (io_pert - io) * alpha
    
    mini_tap = np.zeros((20, 60))
    
    glee = 7
    
    mini_tap[:, :20] = v.reshape(20, 20)
    mini_tap[:glee, 40 - glee:40] = io_pert.reshape(glee, glee)
    mini_tap[:, -20:] = r.reshape(20, 20)
    
    im = plt.imshow(mini_tap, cmap="gray_r", animated=True)
    ims.append([im])

ani = animation.ArtistAnimation(fig, ims, interval=200, blit=True,
                                repeat_delay=500)

plt.xticks([])
plt.yticks([])

plt.show()

<IPython.core.display.Javascript object>

100%|██████████| 500/500 [00:01<00:00, 441.72it/s]


In [62]:
fig = plt.figure(figsize=(10,10))

draw_weights(sw.get(), Kx, Ky, sl, fig)

<IPython.core.display.Javascript object>

In [63]:
plt.figure()
plt.xticks([])
plt.yticks([])
plt.imshow(cp.sum(iw, axis=0).reshape(Kx, Ky).get(), cmap="gray")

<IPython.core.display.Javascript object>

<matplotlib.image.AxesImage at 0x7fc9fd521350>

In [64]:
plt.figure()
print(np.min(cp.sum(iw, axis=0).reshape(Kx, Ky).get()))
plt.imshow(sw.get()[np.argmin(cp.sum(iw, axis=0).reshape(Kx, Ky).get())].reshape(20, 20), cmap='gray_r')

<IPython.core.display.Javascript object>

0.016968385774604


<matplotlib.image.AxesImage at 0x7fc9fd5fe210>

In [65]:
plt.figure()
print(np.max(cp.sum(iw, axis=0).reshape(Kx, Ky).get()))
plt.imshow(sw.get()[np.argmax(cp.sum(iw, axis=0).reshape(Kx, Ky).get())].reshape(20, 20), cmap='gray_r')

<IPython.core.display.Javascript object>

2.0206043019416984


<matplotlib.image.AxesImage at 0x7fc9fc878090>

Ok, now I'm going to try to plot the affinity groups for each invariant neuron.

In [71]:
fig = plt.figure(figsize=(10, 20))

draw_weights(sw[cp.argsort(iw, axis=1)][:, -18:].reshape(882, 400).get(), 18, 49, 20, fig)

<IPython.core.display.Javascript object>

Yeah this is just incredibly dope.  And bless frikin numpy for making it so incredibly easy to make these affinity groups.  

## Conclusions

This is *incredibly* cool.  I'm wondering what would happen if I built a classifier just out of these two layers.  I'll probably try that not next experiment, but the experiment after.  I'm itching to try mwta on cifar10.

Ok, so in terms of actual conclusions, my network basically is actually learning invariant features.  And that is deeply *deeply* ***deeply*** dope.

Variable xi invariance worked incredibly well.  Even having a super small value of `seta` was still great, because the network is learning over time.  Also the incremental normalizations on the weights was also incredibly important.  However, maybe instead of ensuring the prototypes add up to a certain number, maybe I should initialize them with a normal distribution, and then over time ensure that the maximum synapse weight is 1 for every neuron.  Hmm.  Maybe I'll try that in the future.

But fam, we have invariance, and we have it in a big way.  I think it's time to maybe try this with real-world video data.  Maybe I'll take a video of me walking around high valley court.

Shoot dang, this is awesome!!!


## Next steps

First, I'm just going to do vanilla MWTA on cifar10 because I really want to see the features it learns, and then I'm going to maybe train a classifier on mnist digits using two sparse layers sandwiching an invariant layer.  

But yeah, I think stacking sparse and invariant layers is a really, really good idea.