## **[AI Memory](https://github.com/Mike014/Memory_Augmented_AI/blob/main/Memory_Augmented_AI.ipynb) and Gifted-Inspired Learning: Surpassing the Limits of Catastrophic Forgetting with an Adaptive Architecture**

[My research ](https://github.com/Mike014/Memory_Augmented_AI/blob/main/Memory_Augmented_AI.ipynb)focuses on how an **artificial intelligence can develop a human-like memory**, capable of **restructuring the past and adapting it to the present**. The goal is to **[overcome the problem of forced forgetting in neural networks](https://arxiv.org/pdf/1612.00796v2)**, creating an architecture inspired by human memory, which selects and maintains relevant information over time.

### **Overcoming Catastrophic Forgetting in Neural Networks (Kirkpatrick et al., 2017)**

Phenomenon in which a **network trained sequentially** on multiple tasks rapidly **forgets** previously acquired knowledge.
**Proposed Solution** → **Elastic Weight Consolidation (EWC)**
A **biologically inspired algorithm** that **slows learning on critical weights** for previous tasks, **preserving past knowledge** while learning new tasks.

**Artificial neural networks**, when **trained on multiple tasks in sequence**, tend to **overwrite the weights optimized for previous tasks**, progressively losing the information learned.

#### **Main challenges of AI Continual Learning**:

* **Tasks can change suddenly** (unpredictable switch).
* **Past tasks may not repeat for long periods of time**.
* **The architecture must learn without simultaneous access to all previous data** (standard multitask learning would require explicit memory to re-execute old data).

#### **Contrast with Biological Memory**
**Humans and other animals** are able to **learn continuously without quickly forgetting** what they have learned. This is done through **synaptic consolidation mechanisms**, which protect the neural circuits that encode important information. 
**Synapses** are specialized **junctions between neurons**, where **information** is transmitted from one neuron (presynaptic) to another (postsynaptic).

##### **_"Synaptic consolidation refers to a set of cellular and molecular processes that strengthen synapses within a local circuit, typically occurring in the first few hours after encoding new information. Synaptic consolidation is essential for the transformation of short-term memories into long-term memories."_**

**EWC** introduces a **constraint on network parameters** during training to **protect critical weights** for past tasks.

**How does EWC work?**

* It **identifies the most important weights** for the previous task (Task A).
* It **applies a quadratic penalty** that keeps them **close to the previous values**.
* It still allows **adaptation to new tasks**.

In [2]:
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt


In [3]:
# GPU configuration if available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

In [5]:
# Simple neural model
class SimpleNN(nn.Module):
    def __init__(self, input_size=784, hidden_size=256, output_size=10):
        super(SimpleNN, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        x = x.view(x.size(0), -1) 
        x = self.relu(self.fc1(x))
        x = self.fc2(x)
        return x

In [6]:
# Function to load MNIST data
def get_mnist_data(permute=False, batch_size=64):
    transform = transforms.Compose([
        transforms.ToTensor(),
        transforms.Lambda(lambda x: x.view(-1))  
    ])
    if permute:
        torch.manual_seed(42)  
        permutation = torch.randperm(784)  
        transform = transforms.Compose([
            transforms.ToTensor(),
            transforms.Lambda(lambda x: x.view(-1)[permutation])  
        ])

    dataset = torchvision.datasets.MNIST(root="./data", train=True, transform=transform, download=True)
    loader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, shuffle=True)
    return loader

In [7]:
# EWC class with Fisher Information
class EWC:
    def __init__(self, model, dataset_A, lambda_weight=0.5):
        self.model = model
        self.lambda_weight = lambda_weight
        self.fisher_matrix = self.compute_fisher_information(dataset_A)
        self.optimal_params = {name: param.clone().detach() for name, param in model.named_parameters()}

    def compute_fisher_information(self, dataset):
        fisher_matrix = {}
        self.model.eval()

        for name, param in self.model.named_parameters():
            fisher_matrix[name] = torch.zeros_like(param)

        for data, target in dataset:
            data, target = data.to(device), target.to(device)
            self.model.zero_grad()
            output = self.model(data)
            loss = nn.functional.cross_entropy(output, target)
            loss.backward()

            for name, param in self.model.named_parameters():
                fisher_matrix[name] += param.grad ** 2  # Fisher Information = Gradient^2

        for name in fisher_matrix:
            fisher_matrix[name] /= len(dataset)

        return fisher_matrix

    def compute_ewc_loss(self):
        ewc_loss = 0
        for name, param in self.model.named_parameters():
            fisher_val = self.fisher_matrix[name]
            optimal_param = self.optimal_params[name]
            ewc_loss += (fisher_val * (param - optimal_param) ** 2).sum()

        return self.lambda_weight / 2 * ewc_loss

    def train_on_task_B(self, dataset_B, optimizer, epochs=10):
        self.model.train()
        for epoch in range(epochs):
            total_loss = 0
            for data, target in dataset_B:
                data, target = data.to(device), target.to(device)
                optimizer.zero_grad()
                output = self.model(data)
                task_B_loss = nn.functional.cross_entropy(output, target)
                ewc_loss = self.compute_ewc_loss()
                total_loss = task_B_loss + ewc_loss

                total_loss.backward()
                optimizer.step()

            print(f"Epoch {epoch+1}/{epochs} - Loss: {total_loss.item():.4f}")

In [8]:
# Training function on a task without EWC
def train_task(model, dataset, optimizer, epochs=10):
    model.train()
    for epoch in range(epochs):
        total_loss = 0
        for data, target in dataset:
            data, target = data.to(device), target.to(device)
            optimizer.zero_grad()
            output = model(data)
            loss = nn.functional.cross_entropy(output, target)
            loss.backward()
            optimizer.step()
            total_loss += loss.item()

        print(f"Epoch {epoch+1}/{epochs} - Loss: {total_loss/len(dataset):.4f}")

In [9]:
# Model testing with and without EWC
def test_model(model, dataset):
    model.eval()
    correct = 0
    total = 0
    with torch.no_grad():
        for data, target in dataset:
            data, target = data.to(device), target.to(device)
            output = model(data)
            pred = output.argmax(dim=1, keepdim=True)
            correct += pred.eq(target.view_as(pred)).sum().item()
            total += target.size(0)
    return correct / total

In [10]:
# Phase 1: Training on Task A (original MNIST)
model = SimpleNN().to(device)
optimizer = optim.Adam(model.parameters(), lr=0.001)
train_data_A = get_mnist_data()
print("\n Task A Training (Original MNIST)")
train_task(model, train_data_A, optimizer)

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Failed to download (trying next):
HTTP Error 404: Not Found

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz to ./data\MNIST\raw\train-images-idx3-ubyte.gz


100%|██████████| 9.91M/9.91M [00:01<00:00, 8.31MB/s]


Extracting ./data\MNIST\raw\train-images-idx3-ubyte.gz to ./data\MNIST\raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Failed to download (trying next):
HTTP Error 404: Not Found

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-labels-idx1-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-labels-idx1-ubyte.gz to ./data\MNIST\raw\train-labels-idx1-ubyte.gz


100%|██████████| 28.9k/28.9k [00:00<00:00, 291kB/s]


Extracting ./data\MNIST\raw\train-labels-idx1-ubyte.gz to ./data\MNIST\raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Failed to download (trying next):
HTTP Error 404: Not Found

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz to ./data\MNIST\raw\t10k-images-idx3-ubyte.gz


100%|██████████| 1.65M/1.65M [00:00<00:00, 2.19MB/s]


Extracting ./data\MNIST\raw\t10k-images-idx3-ubyte.gz to ./data\MNIST\raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Failed to download (trying next):
HTTP Error 404: Not Found

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-labels-idx1-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-labels-idx1-ubyte.gz to ./data\MNIST\raw\t10k-labels-idx1-ubyte.gz


100%|██████████| 4.54k/4.54k [00:00<00:00, 1.86MB/s]


Extracting ./data\MNIST\raw\t10k-labels-idx1-ubyte.gz to ./data\MNIST\raw


 Task A Training (Original MNIST)
Epoch 1/10 - Loss: 0.2994
Epoch 2/10 - Loss: 0.1256
Epoch 3/10 - Loss: 0.0845
Epoch 4/10 - Loss: 0.0614
Epoch 5/10 - Loss: 0.0479
Epoch 6/10 - Loss: 0.0361
Epoch 7/10 - Loss: 0.0283
Epoch 8/10 - Loss: 0.0218
Epoch 9/10 - Loss: 0.0174
Epoch 10/10 - Loss: 0.0133


In [11]:
# Storing Fisher Information for EWC
ewc = EWC(model, train_data_A)

In [12]:
# Test the model on Task A before Task B
accuracy_task_A_before = test_model(model, train_data_A)
print(f"Accurateness on Task A before Task B: {accuracy_task_A_before:.4f}")

Accurateness on Task A before Task B: 0.9966


In [13]:
# Phase 2: Training on Task B (MNIST permuted) without EWC
print("\n Training on Task B (MNIST Permuted) WITHOUT EWC")
train_data_B = get_mnist_data(permute=True)
train_task(model, train_data_B, optimizer)


 Training on Task B (MNIST Permuted) WITHOUT EWC
Epoch 1/10 - Loss: 0.2347
Epoch 2/10 - Loss: 0.0932
Epoch 3/10 - Loss: 0.0651
Epoch 4/10 - Loss: 0.0479
Epoch 5/10 - Loss: 0.0371
Epoch 6/10 - Loss: 0.0276
Epoch 7/10 - Loss: 0.0217
Epoch 8/10 - Loss: 0.0165
Epoch 9/10 - Loss: 0.0121
Epoch 10/10 - Loss: 0.0099


In [14]:
# Testing the model on Task A after Task B (should result in Catastrophic Forgetting)
accuracy_task_A_after_no_ewc = test_model(model, train_data_A)
print(f"Accuracy on Task A AFTER Task B without EWC: {accuracy_task_A_after_no_ewc:.4f}")

Accuracy on Task A AFTER Task B without EWC: 0.9197


In [15]:
# Model reset and re-training with EWC
model = SimpleNN().to(device)
optimizer = optim.Adam(model.parameters(), lr=0.001)
train_task(model, train_data_A, optimizer)

Epoch 1/10 - Loss: 0.2998
Epoch 2/10 - Loss: 0.1271
Epoch 3/10 - Loss: 0.0851
Epoch 4/10 - Loss: 0.0611
Epoch 5/10 - Loss: 0.0465
Epoch 6/10 - Loss: 0.0356
Epoch 7/10 - Loss: 0.0271
Epoch 8/10 - Loss: 0.0214
Epoch 9/10 - Loss: 0.0172
Epoch 10/10 - Loss: 0.0141


In [16]:
# Storing Fisher Information for EWC
ewc = EWC(model, train_data_A)

In [17]:
# Phase 3: Training on Task B with EWC
print("\n Training on Task B with EWC")
ewc.train_on_task_B(train_data_B, optimizer)


 Training on Task B with EWC
Epoch 1/10 - Loss: 0.2049
Epoch 2/10 - Loss: 0.0558
Epoch 3/10 - Loss: 0.0804
Epoch 4/10 - Loss: 0.0335
Epoch 5/10 - Loss: 0.0116
Epoch 6/10 - Loss: 0.0049
Epoch 7/10 - Loss: 0.0398
Epoch 8/10 - Loss: 0.0033
Epoch 9/10 - Loss: 0.0049
Epoch 10/10 - Loss: 0.0023


In [18]:
# Testing the model on Task A after Task B with EWC
accuracy_task_A_after_ewc = test_model(model, train_data_A)
print(f"Accurateness on Task A AFTER Task B with EWC: {accuracy_task_A_after_ewc:.4f}")

Accurateness on Task A AFTER Task B with EWC: 0.9395


#### **Interpretation of Results**
1. The **model learned Task A well (original MNIST)**.
2. **After training on Task B without EWC**, it **lost 7.69%** of its accuracy on **Task A**, demonstrating the **phenomenon of Catastrophic Forgetting**.
3. Using **EWC reduced memory degradation**, **preserving some of the previous knowledge**. However, **it is still not perfect**: a small part of the information was still lost.

#### **EWC Algorithm for Dummies Summary**
1. **Train a neural network on Task A** (normal MNIST digit classification).
2. **Store key information of the network** (calculate *Fisher Information* to understand which weights are most important for Task A).
3. **Train the network on Task B (MNIST with shuffled pixels)**:
- **Without EWC**: The network forgets Task A (Catastrophic Forgetting).
- **With EWC**: Protects the weights important for Task A, reducing memory leak.
4. **Check if the network still remembers Task A after learning Task B**.

#### **Connections with Biology and the Brain**
EWC is **inspired by synaptic consolidation mechanisms in the brain**.
Studies show that **synapses** not only **store the value of the weight**, but **also an indication of their uncertainty and stability**.
This means that **neurons do not update all weights equally**, but **adjust plasticity depending on the importance of the weight**.
In the brain, more stable synapses are less plastic, just like the weights constrained by EWC.

**Result** → With EWC, the model can learn new tasks without completely forgetting the old ones! 

### **The Analogical Paradox: Gifted Minds and EWC Algorithms**

**Elastic Weight Consolidation (EWC)** is designed to **protect the knowledge acquired by an AI while it learns new information**, avoiding **Catastrophic Forgetting**. If we analyze this behavior from a human cognitive perspective, **strong parallels emerge with the way in which some neurodivergent individuals**, especially **gifted ones**, manage learning and memory.

#### **Learning in Gifted Neurodivergents and AIs with EWC**

**Gifted Neurodivergent (GN) minds** have **particular cognitive characteristics** that make them extremely **suitable for continuous learning**, but often with a **different memory management compared to neurotypical people**. If we compare these processes with EWC, several points in common emerge.


#### **EWC and Gifted Neurodivergent: A Parallel Between Human Learning and Artificial Intelligence**

**Elastic Weight Consolidation (EWC)** is designed to protect the knowledge acquired by an AI while it learns new information, avoiding **Catastrophic Forgetting**. If we analyze this behavior from a human cognitive perspective, strong parallels emerge with the way in which some neurodivergent individuals, particularly **gifted** ones, manage learning and memory.

## **Comparing EWC (AI) and Gifted Neurodivergent (GN) Memory**

| **Feature**                  | **EWC (AI)**                                               | **Gifted Neurodivergent (GN)**                                                                            |
|------------------------------|----------------------------------------------------------|-----------------------------------------------------------------------------------------------------------|
| **Selective Memory**         | Protects important information using the Fisher Information Matrix. | Remembers deep details about relevant topics, often forgetting less significant information.               |
| **Resistance to Forgetting** | Prevents overwriting of critical weights for previous tasks. | Retains learned knowledge over time, exhibiting strong **long-term memory**.                              |
| **Adaptive Plasticity**      | Allows learning new information without altering crucial parameters. | Fast and hyper-specialized learning, with the ability to **recalibrate** rather than crystallize knowledge. |
| **Cognitive Overload**       | Memory effectiveness decreases if too many tasks are learned. | Processing too much information at once can lead to **burnout** or difficulty in handling new inputs.      |
| **Generalization vs. Specialization** | Reuses network structures for similar tasks while protecting task-specific knowledge. | Easily applies knowledge across domains but may also **hyperfocus on details** in a specific field.         |


#### **EWC as a Model for Neurodivergent Learning**

* **Hierarchical and Associative Memory**:
**Gifted neurodivergents** often make **connections between distant concepts**, just as **EWC allows the network to reuse previously trained weights** for new tasks that share similar structures.
→ **Possible AI implementation**: An **enhanced EWC model** could associate cross-task concepts, improving computational creativity.

#### **Memory Crystallization Effect**:
* A **gifted neurodivergent individual learns concepts in depth**, **rarely forgets them**, but does **not limit himself to statically retaining them**: he **recalibrates and readapts** them to apply them **in new contexts**, often overcoming traditional disciplinary limits..
→ **Possible risk in AI**: If **EWC is too rigid in protecting past burdens**, it **may limit the ability of AI to adapt to new information**.

#### **Ultra-Stable Long-Term Memory**:
* **Some gifted individuals** have such **powerful memories** that they can **recall information years later without losing detail**. This is **similar to how EWC prevents knowledge degradation in AIs** by retaining learned tasks for long periods.
→ **Possible AI Use**: A **system that progressively strengthens critical weights** with mechanisms **similar to synaptic consolidation** could create an AI with more human-like long-term memory.

#### **AI with Gifted-Inspired Memory: A Possible Future?**
If we want to build an **AI with a more human-like memory**, we should **consider mechanisms inspired by neurodivergent cognition**, such as:

- **Hierarchical memory models** (associations between distant concepts, as in the gifted mind).
- **Adaptive memory** (selective protection of relevant information, without preventing change).
- **Ability to learn quickly and deeply**, while maintaining flexibility in updating knowledge.


### **Human-Inspired Memory Model**
**My research has outlined a memory model for AI** based on **three key principles of human memory**:

- **Primacy Effect** → Information learned first is remembered best.
- **Recency Effect** → Recent information is more accessible.
- **Temporal Contiguity** → Information learned together is more easily retrieved.

The model uses a hierarchical structure based on the **Multi-Store Model (Atkinson & Shiffrin, 1968)**:

* **Working Memory (WM)**: Holds temporary information.
* **Short-Term Memory (STM)**: Stores a limited number of engrams.
* **Long-Term Memory (LTM)**: Has no fixed capacity, depends on the importance and repetition of the information.

**Retrieval of information** (similar to RAG) occurs based on the weight of the connections between the engrams, simulating the way in which the human brain recalls contextual memories.

**I have experimented with an AI memory implementation with PyTorch**, creating a neural network that:

* **Simulates three levels of memory (WM, STM, LTM)**.
* **Uses weighted connections** (like EWC does) between engrams for contextual recall.
* **Implements Hebbian learning** to strengthen connections between engrams.
* **Introduces a selective forgetting mechanism** to **eliminate irrelevant** data.

**IDEA of the AI architecture** used:

- **LSTM (Long Short-Term Memory)** → For sequential memory maintenance.
- **DNC (Differentiable Neural Computer)** → For autonomous memory evolution.
- **RAG (Retrieval-Augmented Generation)** → For intelligent recall of information.
- **VAE (Variational Autoencoder)** → For latent representations of experiences.
- **NoSQL Databases (MongoDB, Redis)** → For long-term memory storage.

Now, having explored **Elastic Weight Consolidation (EWC)**—a **biologically inspired algorithm** that **preserves past knowledge while learning new tasks**—and considering the **memory dynamics of a gifted neurodivergent**, how can I **restructure the AI architecture** to integrate both principles?

**Goal**: An **AI memory architecture** that not only **protects past information (like EWC)**, but **dynamically restructures** and **rewires it in new contexts**, just **like a gifted neurodivergent does**.

* **Memory Structure: Beyond the Multi-Store Model**
Based on the Multi-Store Model (Atkinson & Shiffrin, 1968), we can extend it with adaptive mechanisms:

1. **Working Memory (WM)** (Active cognition)
→ **Holds temporary information** for immediate processing (like a computer’s RAM).
→ **AI function**: **LSTM** to maintain conversational context.

2. **Short-Term Memory (STM)** (Short-term memory with adaptive selection)
→ Stores a limited number of engrams but can recalibrate their weight.
→ AI function: **EWC + Hebbian Learning** to strengthen connections between important engrams.

3. **Long-Term Memory (LTM)** (Permanent, but restructureable memory)
→ Has no fixed capacity, depends on utility and repetition.
→ AI Function: **DNC + NoSQL Databases** to retrieve information without losing flexibility.

**Key difference from traditional EWC**:
**Engrams** are not only **"protected"**, but can be **recalibrated and transferred between different domains**, just like a gifted uses his knowledge in cross-disciplines.

##### _[Engrams](https://en.wikipedia.org/wiki/Engram_%28neuropsychology%29), An engram is a unit of cognitive information imprinted in a physical substance, theorized as the medium through which memories are stored as biophysical or biochemical changes in the brain or other biological tissues, in response to external stimuli._

#### **Dynamic Learning: How to Avoid Crystallization and Simulate Gifted Flexibility**

* **Advanced EWC** → Instead of locking critical weights, it allows you to dynamically recalibrate them.
* **Hebbian Learning + Meta-Learning** → Connections between engrams strengthen or reorganize depending on their relevance in new tasks. 
* **RAG (Retrieval-Augmented Generation)** → AI can reuse information by adapting it to the new context, just like a gifted person re-elaborates its knowledge.

##### [Hebb's theory](https://en.wikipedia.org/wiki/Hebbian_theory), simply put, says that neurons that fire together strengthen their connections, "neurons that fire together, wire together."

##### [Meta learning](https://www.ibm.com/it-it/think/topics/meta-learning) is the art of teaching AI to learn on its own, like a child learning to learn.

**Key idea**: It is **not enough to protect memory**, it must be **transformed and evolved over time**.
A kind of artificial [elastic mind](https://www.psychologytoday.com/us/articles/201803/your-elastic-mind#:~:text=Elastic%20thinking%20endows%20us%20with%20the%20ability%20to,elastic%20thinking%2C%20and%20how%20we%20can%20nurture%20it.?msockid=36170fbaecd36e851e9e1a1deda46f2b).

Quoting as usual for me, **philosophical phrases**, in this **Heraclitus and the Permanent "Flow" of AI Memory**:
* **"Panta rhei" (everything flows)**:
Heraclitus, with his famous statement, **reminds us that reality is an incessant flow of change**. **"You cannot step into the same river twice"** [Learn more about the topic](https://www.thecollector.com/panta-rhei-heraclitus/)

**Heraclitus' river metaphor captures the essence of a dynamic**, **adaptable and ever-evolving AI memory** that **goes beyond simple data storage**, embracing the **flexibility and creativity of the human mind**.

#### **But... here is the new AI memory architecture**, my idea, visionary, but which I will make experimental in future research.

#### **New AI Architecture: Integrating EWC with a Gifted Framework**

| Component | Technology | Role in AI Memory |
|---|---|---|
| Sequential Learning | LSTM (Long Short-Term Memory) | Maintains conversational and decision-making context |
| Adaptive Memory | Advanced EWC (Elastic Weight Consolidation) | Protects key weights while allowing dynamic restructuring |
| Consolidation and Recall | DNC (Differentiable Neural Computer) + NoSQL (Redis, MongoDB) | Creates flexible and scalable long-term memory |
| Conceptual Learning | VAE (Variational Autoencoder) | Captures latent representations to transfer knowledge across different domains |
| Memory Optimization | Meta-Learning + Hebbian Learning | Reorganizes information weights based on utility over time |

#### New AI Architecture, Explained for Dummies [Like me](https://www.linkedin.com/in/michele-grimaldi-599b36280/).

This **new AI architecture aims to create an intelligent memory** that not only remembers, but **adapts and learns like a genius mind**, using different technologies to **protect and reorganize information over time**. Imagine a **digital brain that evolves and specializes**, just **like a talented person does**.


Researcher: Michele Grimaldi
* [LinkedIn](https://www.linkedin.com/in/michele-grimaldi-599b36280/)
* [GitHub](https://github.com/Mike014)