<a href="https://colab.research.google.com/github/JacobFV/AGI/blob/master/PAI_0.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# PAI-0

PAI-0 extracts and integrates principles from a forgotten array of AI and neuroscientific research notably including (in chronological order):
- Buzsaki's [*Rhythms of the Brain*](https://psycnet.apa.org/record/2007-01020-000)
- The Paradigm of Allostatic Orchestration ([Lee 2019](https://www.frontiersin.org/articles/10.3389/fnhum.2019.00129/full)).
- LGMA architecture ([Qi 2020](https://arxiv.org/abs/2011.11400))
- Hinton's [NSF presentation](https://www.hpcwire.com/off-the-wire/nsf-distinguished-lecture-with-geoffrey-hinton-how-to-represent-part-whole-hierarchies-in-a-neural-net-to-be-held-feb-11/) on GLOM
- Begg's introduction to the critical neuron hypothesis ([Beggs 2015](https://youtu.be/bE9IKMAr-wg))
- The SORN design ([Lazar et al. 2009](https://www.frontiersin.org/articles/10.3389/neuro.10.023.2009/full))
- Buzsaki's [*The Brain from Inside Out*](https://buzsakilab.com/wp/2019/02/06/the-brain-from-inside-out-by-buzsaki-g/)

In [None]:
#@title imports
%tensorflow_version 2.x

import math
import tqdm
import random

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

!pip install -q tsalib
import tsalib
import networkx
!pip install -q jplotlib
import jplotlib as jpl
!pip install -q livelossplot
from livelossplot import PlotLossesKeras

import tensorflow as tf
keras = tf.keras
tfkl = keras.layers

import tensorflow_probability as tfp
tfd = tfp.distributions
tfpl = tfp.layers
tfb = tfp.bijectors

In [None]:
#@title utils

def Bernoulli(p, shape=None):
    return tf.cast(tf.random.uniform(shape) < p, tf.float32)

def clear_diag(A):
    return A * (1 - tf.eye(tf.shape(A)[-2], tf.shape(A)[-1]))

def TODO(reason=""):
    raise NotImplementedError("TODO: " + reason)

## Local Dynamics

## Global Dynamics

I extract a few **hypothetical** principles from the reward system's natural implementation to apply in SSORN:

- **Reward is not a scalar or directly controllable signal.** The reward *system* is not one neuron. The brain develops multiple non-uniformly distributed dopaminergic pathways. Affective psychology commonly divides affective state into arousal (awareness), valency (positive/negative), and motivational intensity (how disposed to act). These physchological variables are not directly reachable over significant timescales; they emerge from the complex and chaotic interactions associated with neurotransmitters, brain-scale patterns, the body's internal environment, and external stimuli and actions. **Implication**: Use multiple layers and connections to propagate reward. Also divide some of the reward system into positive and negative valency processing subsystems respectively. Include a joint reward subsystem that integrates with both these subsystems and the overall network.

- **Reward may be intrinsic, conditioned, or even imagined** Not just external behavior -- but even the reward system itself -- shared similar responsive and adaptive mechanisms as other parts of the brain do. **Implication**: The reward system layers and connections should share similar activity and learning algorithms as most other layers and connections. The reward system should also recieve its input from the rest of the brain.

- **Reward biases neural activity nonhomeogeneously** Action pathways  

- **The global reward objective generally aligns with local neuronal objectives** The free energy principles suggests that the brain's objective is to accurately model its external environment. The neuronal energy homeostasis principle suggests that locally excessive or lacking neuronal energy stressors drives adaptations toward network-scale behavior. **Implication**: Supposing neurons represent binary random variables, their average activation should not be close to 0 (representing a negative energy stressor) nor should it be too close to 1 (representing positive a energy stressor). The reward system's activity will increase when mean neuronal activation deviates from both 0 and 1 by minimizing the objective $D_{KL}(p(x)||\mathcal{B}(x; p=0.2))$

- **The reward system is robust to tampering** Returning to the neuronal energy homeostasis principle, an excessive activitiy poses an energy stressor which neurons adapt to by decreasing the number of presynaptic inputs, increasing their resistivity to signal propagation, and raising the bar for action potentiation. This makes reward an oscilatory rather than constant experience and often tunes human behavior to a safe optima of drinking/eating frequency. **Implication** While the network's average activity deviation from 0 and 1 regularizes the reward system as a whole, internal reward system neurons will also be individually regularized by their own average activity. This should make eliciting reward a critically balanced problem.

- **Anticapated reward is chaotic and discovers complex and diverse behaviors** The brain's anticipation of affective state is often incorrect for any significant distance in time. Following this seemingly chaotic signal helps 'shake' natural agents outside local optima and at times display behavior that contradicts externally administered reward. **Implication** The reward system should not be a strong contributor to presynaptic signal accumulation. It primary influence should be on neuronal activity thresholds and noise which may compare to shifting the temperaterature parameter of an Ising model to be subcritical. 

- **The reward system operates over microscopic and macroscopic scales with fast and slow durations**. Principles and tools from relativity apply well to analyzing and modeling the brain's spatiotemporal activity. **Implications**:
    - **Fast and small**: Decrease the temperature coefficent and threshold required for excitatory and inhibitory signal propagation. Make inhibitory signal threasholds slightly lower than for excitatory such that the total number of signals propagating through the SSORN slightly decreases. Inhibitory thresholds should be even lower when a consequence of negative as opposed to positive reward subsystem activity.
    - **Fast and large**: By lowering neuronal activation threshold, the reward is expected to decrease the signal to noise ratio when fast behavioral response is needed. It should also lower the probability of exploratory dynamics in response to intense positive and especially negative situations.
    - **Slow and small**: Learning rates across the network should be porpotional to the rolling mean of reward system activity. 
    - **Slow and large**: The 'reward system' (used here to describe the layers and connections explicitly labeled as such) makes little distinction between positively or negatively valent network states which themselves may or may not be quarrelated to desirable or undesirable behaviors. Actually learning desirable behavior is the unsupervised, free-energy minimizing (e.g.: diverse, predictive, predictable, empowering, etc.. In biological systems, physiologically optimal) objective which should implicitly emerge from the integrated dynamics of PAI-0. The agent should eventually use its associated phenomonal experience of reward to form explicit associations with significantly faster semantic learning.

The question is still: how do you bias network activity towards 'desirable' states?
 - $\min f(x) = -x(x-1)$ or similar makes the network *able* to learn
 - plasticity should increase approaching either extreme
 - new synapses that were used should be strengthened (weakened) by positive (negative) reward; synapses that were not used should not be modified. Whatever rule to express this should also account for IE/EI/EE/II against +/- reward.


In [None]:
class Updatable:
    
    def __init__(self, name):
        self.name = name

    def update(self, state): pass # returns updated state

    @property
    def initial_state(self): pass # gets a (optionally nested) tensor

In [None]:
class Layer(Updatable):

    def __init__(self,
        height,
        width,
        depth,
        initial_threashold,
        initial_bucket_val,
        noise_fn,
        activation_penalty,
        target_firing_rate,
        target_firing_rate_lr,
        name,
    ):
        pass
    
    def update(self, state): pass # returns updated state

    @property
    def initial_state(self): pass # gets a (optionally nested) tensor

In [None]:
class Connection(Updatable):

    def __init__(
        self, 
        src_layer,
        dst_layer,
        src_sparsity,
        dst_sparsity,
        threshold_coef,
        bucket_coef,
        name
    ):
        pass

    def update(self, state):
        state = self.fast_action(state)
        state = self.slow_action(state)
        return state

    def _fast_action(state):
        # same for (E/I)-(E/I)Layers

    def _slow_action(state):
        raise NotImplemntedError("subclasses should implement this")

    @property
    def initial_state(self): pass # gets a (optionally nested) tensor

In [None]:
class LateralConnection(Connection):

    def __init__(
        self,
        connectivity_falloff, # a-value in f(d) = d^a for semi-global connectivity
        threshold_coef,
        bucket_coef,
    )

    def _slow_action(state):
        raise NotImplemntedError("subclasses should implement this")

    @property
    def initial_state(self): pass # gets a (optionally nested) tensor

In [None]:
class EEConnection(Connection): pass
class EIConnection(Connection): pass
class IEConnection(Connection): pass
class IIConnection(Connection): pass

In [None]:
class MotorNerve(Updatable): 

    def __init__(self, src, connection_sparsity, dst_key): 
        TODO()

In [None]:
class SensoryNerve(Updatale): 

    def __init__(self, )

In [None]:
class PAI0:

    def __init__(self, connections=None, updatables=None):
        # please specify at least one param

        # get unique updatables
        self.updatables = updatables if updatables is not None else []

        # add layers and connections
        _layers_and_connections = list()
        for src_layer, connection, dst_layer in connections:
            _layers_and_connections += [src_layer, connection dst_layer]
        #  avoids double counting
        self.updatables += list(set(_layers_and_connections) - set(self.updatables))


    @tf.function
    def update(self, prev_state):

        return state

    @property
    def initial_state(self): pass # gets a (optionally nested) tensor

    """@property
    def get_layers():
        return [updatable for updatable 
                in self.updatables 
                if isinstance(updatable, Layer)]
    @property
    def get_connections():
        return [updatable for updatable 
                in self.updatables 
                if isinstance(updatable, Layer)]"""