<div style="background-color: #007BFF; height: 4px; width: 100%;"></div>

# **Effortful Retrieval Experiments**

We will conduct a series of experiments to evaluate memory retrieval performance in CNNs under different conditions of retrieval difficulty. To vary retrieval difficulty, each experiment manipulates a key difficulty variable, interstimulus interval (ISI), criterion level, noise level, and occlusion. To evaluate model performance, each experiment computes classification accuracy, retrieval strength (precision, recall, F1-score), and forgetting rate (change in accuracy over time for previously learned items). We will be using a standard pre-trained `ResNet-18` CNN and fine-tune using the CIFAR-10 image dataset (publicly available via `torchvision.datasets`). 

## **Table of Contents**

1. Notebook Setup
2. Dataset Preparation
3. Experiment Setup
    a. Defining the CNN model
    b. 


<div style="background-color: #007BFF; height: 4px; width: 100%;"></div>

## **1. Notebook Setup**

In [5]:
# Library imports
import torch
import torchvision
import torch.nn as nn
import torch.optim as optim
import torchvision.transforms as transforms

import numpy as np
import random
import matplotlib.pyplot as plt
import time

import torchvision.models as models
from sklearn.metrics import precision_score, recall_score, f1_score

In [2]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

<div style="background-color: #007BFF; height: 4px; width: 100%;"></div>

## **2. Dataset Preparation**

In [None]:
# Data augmentation for different difficulty manipulations
transform_train = transforms.Compose([
    transforms.RandomHorizontalFlip(),
    transforms.RandomCrop(32, padding=4),
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))
])

transform_test = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))
])

In [None]:
# Load CIFAR-10 dataset
train_dataset = torchvision.datasets.CIFAR10(root='./data', train=True, transform=transform_train, download=True)
test_dataset = torchvision.datasets.CIFAR10(root='./data', train=False, transform=transform_test, download=True)

train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=64, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=64, shuffle=False)

<div style="background-color: #007BFF; height: 4px; width: 100%;"></div>

## **3. Experiment Setup**

#### **Defining the CNN Model**

In [7]:
def get_pretrained_cnn():
    # Pretrained ResNet-18
    model = models.resnet18(pretrained=True)
    model.fc = nn.Linear(model.fc.in_features, 10)
    model = model.to(device)
    
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(), lr=0.001)
    
    return model, criterion, optimizer

_____

#### **Abstracted training function for all experiments**

Can vary the following:
- Retrieval condition (massed or spaced)
- Dataset noise level
- Occlusion percetage

Returns accuracy and loss across epochs.

In [None]:
def train_model(model, train_loader, criterion, optimizer, epochs=10, retrieval_condition="massed", noise_level=0, occlusion_percent=0):
    model.train()
    history = {"accuracy": [], "loss": []}
    
    for epoch in range(epochs):
        correct, total, running_loss = 0, 0, 0.0
        
        for images, labels in train_loader:
            images, labels = images.to(device), labels.to(device)
            
            # spaced retrieval
            if retrieval_condition == "spaced":
                if random.random() < 0.5:
                    continue

            # dataset noise level
            if noise_level > 0:
                noise = torch.randn_like(images) * noise_level
                images = images + noise
                images = torch.clamp(images, 0, 1)

            # occlusion percentage
            if occlusion_percent > 0:
                mask_size = int(32 * occlusion_percent)
                x_start = random.randint(0, 32 - mask_size)
                y_start = random.randint(0, 32 - mask_size)
                images[:, :, x_start:x_start+mask_size, y_start:y_start+mask_size] = 0

            optimizer.zero_grad()
            outputs = model(images)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()

            running_loss += loss.item()
            _, predicted = outputs.max(1)
            total += labels.size(0)
            correct += predicted.eq(labels).sum().item()

        acc = 100. * correct / total
        history["accuracy"].append(acc)
        history["loss"].append(running_loss / len(train_loader))
        print(f"Epoch {epoch+1}: Loss: {running_loss:.4f}, Accuracy: {acc:.2f}%")

    return history

_____

#### **Evaluating model performance functions**

In [None]:
def evaluate_model(model, test_loader):
    model.eval()
    correct, total = 0, 0
    all_preds, all_labels = [], []

    with torch.no_grad():
        for images, labels in test_loader:
            images, labels = images.to(device), labels.to(device)
            outputs = model(images)
            _, predicted = torch.max(outputs, 1)

            correct += (predicted == labels).sum().item()
            total += labels.size(0)

            all_preds.extend(predicted.cpu().numpy())
            all_labels.extend(labels.cpu().numpy())

    accuracy = 100. * correct / total
    precision = precision_score(all_labels, all_preds, average="macro")
    recall = recall_score(all_labels, all_preds, average="macro")
    f1 = f1_score(all_labels, all_preds, average="macro")

    print(f"Test Accuracy: {accuracy:.2f}%")
    print(f"Precision: {precision:.4f}, Recall: {recall:.4f}, F1-score: {f1:.4f}")

    return accuracy, precision, recall, f1

In [None]:
def measure_forgetting_rate(model, test_loader, delay=10):
    initial_acc, _, _, _ = evaluate_model(model, test_loader)
    print(f"Initial Accuracy: {initial_acc:.2f}%")

    # simulate "forgetting" by adding a delay before retesting
    print(f"Waiting {delay} minutes before retesting...")
    time.sleep(delay * 60)

    final_acc, _, _, _ = evaluate_model(model, test_loader)
    forgetting_rate = (initial_acc - final_acc) / initial_acc * 100
    print(f"Forgetting Rate: {forgetting_rate:.2f}%")

    return forgetting_rate

print("\nMeasuring Forgetting Rate for Spaced Learning...")
forgetting_spaced = measure_forgetting_rate(model, test_loader, delay=10)

<div style="background-color: #007BFF; height: 4px; width: 100%;"></div>

## **Experimental Results**

_______

### **Experiment 1: Spaced vs. Massed Learning (ISI)**

**Objective**: Test whether spaced learning (inserting delays between learning and retrieval) enhances memory retention in CNNs compared to massed learning (immediate retrieval).

**Training Protocol**

1. Massed Learning: Train CNN on a batch of images and immediately test on the same batch.
2. Spaced Learning: Train CNN on multiple image sets (thus introduce delay) before testing each set in order.

**Benchmark(s)**

1. Compare against standard CNN training accuracy.
2. Measure memory degradation over multiple test intervals.

**Interpretation**

- Spaced retrieval outperforms massed retrieval over time → supports effortful retrieval hypothesis.
- There is no significant difference → spaced learning may not benefit CNNs as it does in human learning.



In [None]:
# Train under massed retrieval condition (easy)
print("Training with Massed Learning...")
history_massed = train_model(model, train_loader, criterion, optimizer, epochs=10, retrieval_condition="massed")

# Evaluate
print("\nEvaluating Massed Learning Model...")
evaluate_model(model, test_loader)

# Train under spaced retrieval condition (harder)
print("\nTraining with Spaced Learning...")
history_spaced = train_model(model, train_loader, criterion, optimizer, epochs=10, retrieval_condition="spaced")

# Evaluate
print("\nEvaluating Spaced Learning Model...")
evaluate_model(model, test_loader)


_______

### **Experiment 2: Criterion Level (Number of Retrievals Before Dropping)**

**Objective**: Examine whether requiring multiple successful retrievals before dropping an item from training strengthens CNN memory retention.

**Training Protocol**
1. Image must be classified correctly 1 time before dropping.
2. Image must be classified correctly 3 times before dropping.
3. Image must be classified correctly 5 times before dropping.

**Benchmark(s)**: Compare against standard single-pass training

**Interpretation**
- Higher retrieval criteria improve retention → CNNs exhibit similar memory effects as humans.
- No significant improvement → retrieval-based learning might not transfer well to CNNs.


In [6]:
# TODO

_______

### **Experiment 3: Noise Levels in Images**

**Objective**: Investigate how introducing noise into images affects retrieval difficulty and model retention over time.

**Training Protocol**
1. Train CNN on clean images
2. Test under different noise conditions:
    a. Low noise (Gaussian noise, σ=0.1)
    b. Medium noise (σ=0.3)
    c. High noise (σ=0.5)

*Note*: To test under different noise conditions, we will modify the image dataset using `torchvision.transforms`. σ parameters are subject to change.

**Benchmark(s)**
1. Compare against standard CNN performance on clean images.
2. Track degradation trends as noise increases.

**Interpretation**
- Retrieval performance drops significantly with noise →  retrieval difficulty negatively affects CNN retention.
- Model adapts well to noise → CNNs may be robust to effortful retrieval.

In [None]:
# TODO

_______

### **Experiment 4: Occluding Parts of Images**

**Objective**: Test whether CNNs can learn to retrieve images even when parts of the input are missing, mimicking retrieval with incomplete cues in human memory.

**Training Protocol**
1. Train on full images
2. Test under occlusion conditions:
    a. 25% occlusion (randomly block part of the image)
    b. 50% occlusion
    c. 75% occlusion

**Benchmark(s)**: Compare against standard CNN performance on full images.

**Interpretation**
- CNNs struggle with occlusion → retrieval difficulty negatively affects performance.
- CNNs maintain accuracy → effortful retrieval mechanisms might apply.

In [None]:
# TODO

<div style="background-color: #007BFF; height: 4px; width: 100%;"></div>