#### Variables

\begin{array}{llll}

\begin{array}{llll}
\text{STM} & \text{MTM} & \text{LTM} & \text{F}_0 \rightarrow \text{F}_2 \text{ signal} & \text{F}_2 \rightarrow \text{F}_1 \text{ signal} \\
\hline
I_i - \text{F}_0 \text{ (input)} & \Delta_{ij} - \text{Phasic} & T_{ij} - \text{F}_0 \rightarrow \text{F}_2 & S_j - \text{Phasic} & \sigma_i - \text{Total} \\
x_i - \text{F}_1 \text{ (matching)} & \delta_{ij} - \text{Tonic} & T_{ji} - \text{F}_2 \rightarrow \text{F}_1 & \Theta_j - \text{Tonic} & \\
y_j - \text{F}_2 \text{ (coding)} & & & T_j - \text{Total} & \\
\end{array}
\end{array}

* Phasic regards to rapid, temporary responses that have a quick onset and quick decay, they are related to immediate changes, or short-term activity
* Tonic regards to sustained, longer-lasting responses that have a slower onset and slower decay, they are related to maintained activity or baseline states.
* These terms come from neuroscience where phasic activity refers to brief burst of neural firing, and tonic activity refers to sustained, regular firing patterns. 
* It's important to note that the layered architecture differs from ART models where inputs must first pass through F₁ to reach F₂. In the dART network, the coding field F₂ receives input directly from F₀, retaining the bottom-up/top-down matching process at F₁ only to determine whether an active code meets the vigilance matching criterion.

#### Parameters

\begin{align*}
&\text{Number of input components}, \quad i = 1 \ldots M \\
&\text{Number of coding nodes}, \quad j = 1 \ldots N \\
&\text{Signal rule}, \quad \alpha \in (0,1) \text{ (choice-by-difference) or } \alpha > 0 \text{ (Weber law)} \\
&\text{CAM rule}, \quad p \text{ (power law) and } Q \text{ (Q-max)}, \text{ with } p \rightarrow \infty \text{ or } Q = 1 \text{ for choice} \\
&\text{Learning rate}, \quad \beta \in [0,1] \\
&\text{Vigilance}, \quad \rho \in [0,1] \\
&\text{A set of small, positive, random numbers, for initial } T_{ij} \text{ values}, \quad \eta_{ij} = 0^+\\
\end{align*}

#### Signal Rule

* The total signal $T_J$ from the dART input field $F_0$ to the $j^{th}$ $F_{2}$ node is a function of the form: $T_j = T_j(y_j) = g_j(S_j(y_j), \Theta_j(y_j))$
    - This can either be derived to a choice-by-difference variant: $T_j = \|[x \wedge (1\vec{} - \tau^{bu}_j) - \Delta_j]^+\|_1 + (1 - \alpha)\|[\tau^{bu}_j - \delta_j]^+\|_1, \quad 0 < \alpha < 1$
    - This can also be derived to the Webers law variant $T_j = \frac{\|[x \wedge (1\vec{} - \tau^{bu}_j) - \Delta_j]^+\|_1}{\alpha + d - \|[\tau^{bu}_j - \delta_j]^+\|_1}, \quad \alpha > 0$

* In terms of showing the rest I will just write it out as  $T_j = g_j(S_j, \Theta_j)$ and I will show $S_j$ and $\Theta_j$

#### Content-Addressable-Memory (CAM) Rule

* Activity $y \equiv (y_{1}... y_{i}... y_{N})$ at a competitive coding field $F_{2}$ is stored as a content-addressable memory (CAM)
    - Content Addressable Memory (CAM) is a special type of computer memory system where data is accessed based on its content rather than its physical location or address

$$y^{(F2)}_j = \begin{cases} 
\frac{(T_j)^p}{\sum_{\lambda \in \Lambda} (T_\lambda)^p} & \text{if } j \in \Lambda \\
0 & \text{otherwise} 
\end{cases} \\
\text{ such that } \|y^{(F2)}\|_1 = 1 \text{ and } p > 0$$

* In the future I will write this as $y_j = f_j(T_1,\ldots,T_N)$
* Λ is the set of F₂ nodes j where $T_j$ exceeds a threshold
* If node j is in the set $\Lambda$ 
    - Calculate $y_j$ using the normalized power rule
    - Content Addressable Memory (CAM) is a special type of computer memory system where data is accessed based on its content rather than its physical location or address


#### First Iteration n=1

$$\begin{align*}
&\text{MTM depletion } \Delta_{ij} = \delta_{ij} = 0 \\
&\text{F}_0 \rightarrow \text{F}_2 \text{ threshold } T_{ij} = \eta_{ij} \\
&\text{F}_2 \rightarrow \text{F}_1 \text{ threshold } T_{ji} = 0 \\
&\text{Input } I_i = I_{i}^{(1)}
\end{align*}$$

#### Reset: new STM steady-state at $F_{2}$ and $F_{1}$

$$\begin{align*}
&\textbf{F}_0 \rightarrow \textbf{F}_2 \text{ signal:} \\
&\text{Phasic: } S_j = \sum_{i=1}^M [I_i \wedge (1 - T_{ij}) - \Delta_{ij}]^+ \\
&\text{Tonic: } \Theta_j = \sum_{i=1}^M [T_{ij} - \delta_{ij}]^+ \\
&\text{Total: } T_j = g_j(S_j, \Theta_j) \\
&\textbf{F}_2 \text{ activation: } y_j = f_j(T_1,\ldots,T_N) \\
&\textbf{F}_2 \rightarrow \textbf{F}_1 \text{ signal: } \sigma_i = \sum_{j=1}^N [y_j - T_{ji}]^+ \\
&\textbf{F}_1 \text{ activation: } x_i = I_i \wedge \sigma_i
\end{align*}$$

* The y_j and T_j calculation may seem circular but they go into why this is circular but I don't completely understand what it accomplishes.
    - Page 1477


#### MTM Depletion: $F_2$ sites refractory on the time scale of search
$$\begin{align*}
&\text{Phasic: } \Delta^{old}_{ij} = \Delta_{ij} \\
&\quad\quad\quad\, \Delta_{ij} = \Delta^{old}_{ij} \vee (I_i \wedge [y_j - T_{ij}])^+ \\
&\text{Tonic: } \delta^{old}_{ij} = \delta_{ij} \\
&\quad\quad\quad\, \delta_{ij} = \delta^{old}_{ij} \vee (y_j \wedge T_{ij})
\end{align*}$$

* The MTM depletion formulas are defined to help with the following:
    - Phasic Component: Tracks mistatches between input and category prediction and will accumlate when there are mismatches.
    - Tonic Component: Tracks recent category activations which helps prevent overruse of the same categories and accumlates when there is overuse. 
    - These accumlate more and more as time goes on and are reset back to 0 once a search is over.

#### Reset or Resonance(Learning):  

$$\begin{align*}
&\text{If } \sum_{i=1}^M x_i < \rho \sum_{i=1}^M I_i, \text{ go to (6) Reset} \\
&\text{If } \sum_{i=1}^M x_i \geq \rho \sum_{i=1}^M I_i, \text{ go to (9) Resonance}
\end{align*}$$

* This is how the vigilance is definied because it is distributed, it is the same logic as other vigilance checks just written differently because of normalization and input size differences. 

#### Learning

$$\begin{align*}
&\text{Old values: } T^{old}_{ij} = T_{ij}, \, T^{old}_{ji} = T_{ji}, \, \sigma^{old}_i = \sigma_i \\
\\
&\text{Increase } F_0 \rightarrow F_2 \text{ threshold:} \\
&T_{ij} = T^{old}_{ij} + \beta[y_j - T^{old}_{ij} - I_i]^+ \\
\\
&\text{Increase } F_2 \rightarrow F_1 \text{ threshold:} \\
&T_{ji} = T^{old}_{ji} + \beta[\sigma^{old}_i - I_i]^+ \frac{\sigma^{old}_i}{[y_j - T^{old}_{ji}]^+} \\
\\
&\text{Decrease } F_2 \rightarrow F_1 \text{ signal: } \\
&\sigma_i = \sigma^{old}_i - \beta[\sigma^{old}_i - I_i]^+ \\
\\
&\text{MTM recovery: } \Delta_{ij} = \delta_{ij} = 0
\end{align*}$$

* Increasing the $T_{ij}$ Thresehold serves a couple purposes, but it helps prevent nodes to responding to inputs that are not similar, and makes them become more specific to the learned patterns as learning continues.
* Increasing $T_{ji}$ helps refine the matching process, and ability to reconstuct learned patterns. 

#### Next pattern or n += 1

$$\begin{align*}
&\text{New input: } I_i = I^{(n)}_i \\
&\text{New F}_1 \text{ activation: } x_i = I_i \wedge \sigma_i
\end{align*}$$

#### Overall Takeaways

* Architecture
    - $F_0$ -> $F_2$ <-> $F_1$
    - Direct $F_0$ -> $F_2$ connection differs from traditional ART

* Components
    - $F_0$ receives input patterns $I_i$ and directly connects to $F_2$
    - $F_2$ forms distributed codes $y_j$ and can have multiple nodes be active as a part of the $\Lambda$ set
    - $F_1$ compares input with predictions and computes the match scores
    - Uses choice-by-difference activation function for $T_j$

* Memory
    - STM holds immediate activations in $x_i$ and $y_j$ and the state of the current system
    - MTM holds the depletion parameters $\Delta$ and $\delta$ which:
        - Prevent perseveration
        - Create refractory periods
        - Encourage exploration of different categories
    - LTM holds thresholds $T_{ij}$ and $T_{ji}$ for stable learning

* Processing Cycle
    - Input presentation
    - Category activation
    - Match check using vigilance parameter $\rho$
    - Reset or Resonance decision
    - Learning if resonance achieved

* The learning process is as defined above but is simplified but we can see that the overall steps gone through are exactly the same as a simple Fuzzy ART system..

* There is a ton more math that goes into a system of this sort as it has to interact with multiple systems and there needs to be a balance between all these systems with communication.

* There are a couple things that I still don't completely comprehend their purpose but it's related to communication between systems and making sure that there are not times in which the wrong information will be sent if two systems are not completely synced up on a time scale. 
    - Even without completely comprehending them, I can understand that the above process works as intended as long as there are no time scale mismatches.