explanation

dizcza · Nov 27, 2020 · 5f4bc4a · 5f4bc4a
1 parent f1f9c4f
commit 5f4bc4a
Show file tree

Hide file tree

Showing 11 changed files with 187 additions and 26 deletions.
diff --git a/doc/explanation.rst b/doc/explanation.rst
@@ -0,0 +1,104 @@
+Explanation
+===========
+
+.. py:currentmodule:: nn
+
+Infographic results are here: http://85.217.171.57:8097. Pick
+*"2020.11.26 AreaSequential assemblies"* experiment from the drop-down list.
+
+A recurrent area cell
+*********************
+
+.. image:: images/area.png
+    :width: 300
+
+The building block of computation with assemblies [1]_ is a recurrent neural
+network cell, called *area*, the forward pass of which is described in
+:ref:`usage`. The output of such an area is a binary sparse vector, formed by
+winner-take-all competition. For example, if an input vector multiplied by a
+weight yields vector :code:`z = [-3.2, 4.6, 0, 0.7, 1.9]`, then
+:code:`kwta(z, k=2) = [0, 1, 0, 0, 1]`.
+
+K-winners-take-all
+------------------
+
+One of the properties of kWTA is that the inverse of kWTA is also kWTA, even
+in case of a random projection (the multiplication matrix is random, not
+learned). On the plot below, several images from MNIST dataset are shown
+on the left, their random projection & cut binary vector
+:math:`\bm{y} = \text{kWTA}(\bm{Wx}, k_y)`, reshaped as a matrix, in the
+middle, and the restored :math:`\tilde{\bm{x}} = \text{kWTA}(\bm{W^T y}, k_x)`
+is shown on the right. :math:`\text{dim}(\bm{y}) \gg \text{dim}(\bm{x})`
+condition must hold in order to restore the input signal.
+
+.. image:: images/kwta_inverse.png
+    :width: 300
+
+This example shows that a random projection & cut operation (kWTA followed by
+a multiplication by a random matrix) preserves enough information to
+reconstruct the input signal.
+
+
+How does the association work?
+******************************
+
+How to associate information from two and more different modalities? For example,
+how to associate a picture of an elephant with the sound an elephant makes?
+
+Willshaw's model
+----------------
+
+Let's define the task in mathematical terms: let `x` and `y` denote the image
+and the sound representation vectors of a signal respectively. Then the
+simplest way to associate `x` and `y` is to resort to Hebbian-like learning
+rule. Assuming both `x` and `y` are binary sparse vectors, we can construct
+the weight matrix as an outer product of `x` and `y`. This technique is
+described in [2]_ and implemented in :class:`AreaRNNWillshaw`.
+
+The idea behind Willshaw's paper is based on the outer product property:
+
+.. math::
+    (\bm{x} \otimes \bm{y}) \bm{y} = \bm{x} * (\bm{y}\bm{y^T}) \propto \bm{x}
+
+which naturally suggests the following update rule:
+
+.. math::
+    \begin{cases}
+    \bm{W} = \bm{W} + \bm{x} \otimes \bm{y} \\
+    \bm{W} = \Theta(\bm{W})
+    \end{cases}
+
+where :math:`\Theta(x) = 1 ~~ \text{if} ~~ x > 0`; otherwise, it's zero.
+
+Papadimitriou's model
+---------------------
+
+Willshaw's update mechanism has a limitation: the initial matrix :math:`\bm{W}`
+must be initialized with zeros, which poses biological plausibility problems.
+To alleviate this, we can use a third layer `C` to indirectly associate the
+parental layers `A` and `B`, as shown below.
+
+.. image:: images/area_sequence.png
+
+Area `A` encodes images, and area `B` - sound. The output of `A` and `B` is
+projected onto area `C`, which forms a combined image-sound representation.
+After several such projections (forward passes), the assemblies `A-C` and `B-C`
+become more and more overlapping - significantly more than by chance. This
+process is called `association` and described in [1]_. Following this example,
+when areas `A` and `B` become associated, a sound an elephant makes will
+reconstruct a memory of elephant pictures (and vice versa), stored in `B`,
+assuming, of course, the presence of backward connections from area `C` to the
+incoming areas, which is not covered in this tutorial.
+
+Input areas `A` and `B` can, of course, represent signals of the same modality
+that come from different cortical areas or layers.
+
+References
+----------
+
+.. [1] Papadimitriou, C. H., Vempala, S. S., Mitropolsky, D., Collins, M., &
+   Maass, W. (2020). Brain computation by assemblies of neurons. Proceedings of
+   the National Academy of Sciences.
+
+.. [2] Willshaw, D. J., Buneman, O. P., & Longuet-Higgins, H. C. (1969).
+       Non-holographic associative memory. Nature, 222(5197), 960-962.
diff --git a/doc/images/area.png b/doc/images/area.png
diff --git a/doc/images/area_sequence.png b/doc/images/area_sequence.png
diff --git a/doc/images/kwta_inverse.png b/doc/images/kwta_inverse.png
diff --git a/doc/index.rst b/doc/index.rst
@@ -1,4 +1,19 @@
 Computation with Assemblies
 ===========================
 
-.. automodule:: nn
+PyTorch implementation of `project` and `associate` operations [1]_.
+
+
+.. toctree::
+    :maxdepth: 1
+
+    explanation
+    usage
+
+
+References
+----------
+
+.. [1] Papadimitriou, C. H., Vempala, S. S., Mitropolsky, D., Collins, M., &
+   Maass, W. (2020). Brain computation by assemblies of neurons. Proceedings of
+   the National Academy of Sciences.
diff --git a/doc/usage.rst b/doc/usage.rst
@@ -0,0 +1,36 @@
+.. _usage:
+
+Usage
+=====
+
+.. automodule:: nn
+
+
+Example
+-------
+
+Associate script:
+
+.. code-block:: python
+
+    from nn.areas import *
+    from nn.samplers import sample_k_active
+    from nn.simulate import Simulator
+
+    N_NEURONS, K_ACTIVE = 1000, 50
+
+    n_stim_a, n_stim_b = N_NEURONS, N_NEURONS // 2
+    na, nb, nc = N_NEURONS * 2, int(N_NEURONS * 1.5), N_NEURONS
+    area_A = AreaRNNHebb(N_NEURONS, out_features=na)
+    area_B = AreaRNNHebb(N_NEURONS // 2, out_features=nb)
+    area_C = AreaRNNHebb(na, nb, out_features=nc)
+    area_AB = AreaStack(area_A, area_B)
+    brain = AreaSequential(area_AB, area_C)
+    stim_a = sample_k_active(n=n_stim_a, k=K_ACTIVE)
+    stim_b = sample_k_active(n=n_stim_b, k=K_ACTIVE)
+    stim_ab = (stim_a, stim_b)
+    simulator = Simulator(model=brain, epoch_size=10)
+    simulator.simulate(x_samples=[stim_ab])
+
+
+More examples are in `nn/simulate.py <https://github.com/dizcza/assemblies/blob/master/nn/simulate.py>`_
diff --git a/nn/__init__.py b/nn/__init__.py
@@ -1,21 +1,18 @@
 r"""
 
-PyTorch implementation of `project` and `associate` operations [1]_.
-
-
 The forward pass is defined in two steps:
 
 .. math::
     \begin{cases}
-    \bold{y} = \sum_l W^{input}_l \bold{x}_l +
-               \alpha W^{recurrent}\bold{y}^{latent}
+    \bm{y} = \sum_l W^{input}_l \bm{x}_l +
+               \alpha W^{recurrent}\bm{y}^{latent}
     \\
-    \bold{y} = \text{kWTA}(\bold{y}, k)
+    \bm{y} = \text{kWTA}(\bm{y}, k)
     \end{cases}
     :label: forward
 
-where :math:`\bold{x}_l` are input stimuli from an incoming layer :math:`l`
-and :math:`\bold{y}^{latent}` are latent activations from a previous step;
+where :math:`\bm{x}_l` are input stimuli from an incoming layer :math:`l`
+and :math:`\bm{y}^{latent}` are latent activations from a previous step;
 :math:`k` is the number of active output neurons (winners).
 
 Assembly areas
@@ -54,13 +51,6 @@
     expected_random_overlap
     pairwise_similarity
 
-
-References
-----------
-.. [1] Papadimitriou, C. H., Vempala, S. S., Mitropolsky, D., Collins, M., &
-   Maass, W. (2020). Brain computation by assemblies of neurons. Proceedings of
-   the National Academy of Sciences.
-
 """
 
 from .monitor import Monitor, expected_random_overlap, pairwise_similarity

diff --git a/nn/areas.py b/nn/areas.py
@@ -340,6 +340,10 @@ class AreaRNNWillshaw(AreaRNN):
         W_{ij} = 1
         :label: update-will
 
+    This update rule is the simplest possible update rule that requires
+    neither the learning rate nor the weight normalization, compared to
+    :class:`AreaRNNHebb`.
+
     Parameters
     ----------
     *in_features

diff --git a/nn/graph.py b/nn/graph.py
@@ -12,7 +12,8 @@ class GraphArea:
     def __init__(self, name=None):
         self.graph = graphviz.Digraph(name=name, format='svg',
                                       graph_attr=dict(rankdir='LR',
-                                                      style='invisible'),
+                                                      style='invisible',
+                                                      nodesep='0.5'),
                                       node_attr=dict(shape='box'))
 
     def draw_model(self, model: AreaInterface, sample):
@@ -37,30 +38,40 @@ def draw_model(self, model: AreaInterface, sample):
         clusters = defaultdict(list)
         NamedLayer = namedtuple("NamedLayer", ("name", "layer"))
         for name, layer in find_named_layers(model, layer_class=AreaRNN):
-            name = f"{layer.__class__.__name__} '{name}'".rstrip(" ''")
+            if name == '':
+                # a single area layer
+                name = layer.__class__.__name__
+            else:
+                name = f"{layer.__class__.__name__} '{name}'"
             nl = NamedLayer(name=name, layer=layer)
             clusters[ordered_idx[layer]].append(nl)
         for idx, named_layers in clusters.items():
             with self.graph.subgraph(name=f"cluster_{idx}") as c:
                 for nl in named_layers:
                     c.node(nl.name)
+                    self.graph.edge(nl.name, nl.name,
+                                    tailport='e', headport='s',
+                                    constraint='false')
         with self.graph.subgraph(name="cluster_input") as c:
             for nl in clusters[min(clusters.keys())]:
                 for i, in_feature in enumerate(nl.layer.in_features):
                     stimuli = f"input{i}_{nl.name}"
                     c.node(stimuli, shape='point')
-                    self.graph.edge(stimuli, nl.name, label=str(in_feature))
+                    self.graph.edge(stimuli, nl.name, label=str(in_feature),
+                                    headport='w')
         with self.graph.subgraph(name="cluster_output") as c:
             for nl in clusters[max(clusters.keys())]:
                 c.node(f"output_{nl.name}", shape='point')
                 self.graph.edge(nl.name, f"output_{nl.name}",
-                                label=str(nl.layer.out_features))
+                                label=str(nl.layer.out_features),
+                                tailport='e')
         keys = tuple(clusters.keys())
         for source_id, sink_id in zip(keys[:-1], keys[1:]):
             for tail in clusters[source_id]:
                 for head in clusters[sink_id]:
                     self.graph.edge(tail_name=tail.name,
                                     head_name=head.name,
-                                    label=str(tail.layer.out_features))
+                                    label=str(tail.layer.out_features),
+                                    tailport='e', headport='w')
         svg = self.graph.pipe(format='svg').decode('utf-8')
         return svg
diff --git a/nn/monitor.py b/nn/monitor.py
@@ -151,7 +151,8 @@ def __init__(self, model):
             handle = layer.register_forward_hook(self._forward_hook)
             self.handles.append(handle)
 
-        env_name = f"{time.strftime('%Y.%m.%d')} {model.__class__.__name__}"
+        env_name = f"{time.strftime('%Y.%m.%d')} " \
+                   f"{model.__class__.__name__} assemblies"
         self.viz = VisdomBuffered(legend_labels=self.module_name.values(),
                                   env=env_name)
         self.log_expected_random_overlap()
@@ -310,12 +311,12 @@ def update_assembly_similarity(self, input_similarity=None, log=False):
         Plot the :math:`L_{0/1}` similarity of the projected (learned)
         assemblies.
 
-        The similarity of two binary vectors :math:`\bold{x}` and
-        :math:`\bold{y}` of size `n` that have `k` active neurons is computed
+        The similarity of two binary vectors :math:`\bm{x}` and
+        :math:`\bm{y}` of size `n` that have `k` active neurons is computed
         as their dot product, divided by `k`:
 
         .. math::
-            \frac{\bold{x} \cdot \bold{y}}{k}
+            \frac{\bm{x} \cdot \bm{y}}{k}
 
         Parameters
         ----------

diff --git a/nn/simulate.py b/nn/simulate.py
@@ -100,7 +100,7 @@ def associate_benchmark(self, x_samples):
         self.model.train(mode_saved)
 
 
-def associate_example(n_samples=10, area_type=AreaRNNHebb):
+def associate_example(n_samples=1, area_type=AreaRNNHebb):
     n_stim_a, n_stim_b = N_NEURONS, N_NEURONS // 2
     na, nb, nc = N_NEURONS * 2, int(N_NEURONS * 1.5), N_NEURONS
     area_type = partial(area_type, p_synapse=0.05, update='multiplicative',