# Neural Plasticity Demo: Dynamic Pruning & Regrowth (v0.0.63 2025-04-20 20:30:00)

This notebook demonstrates Sentinel AI's neural plasticity system, which allows transformer models to dynamically prune and regrow attention heads during training based on utility metrics. [ID: 2a9d6687]

### Changes in v0.0.63:
- Implemented fully modular architecture via NeuralPlasticityExperiment class
- Simplified workflow with high-level experiment API
- Added one-shot experiment functionality via run_full_experiment()
- Enhanced Apple Silicon compatibility with improved tensor handling
- Added cross-platform visualization with device-aware tensor conversion
- Added workarounds for PyTorch/BLAS crashes on M1/M2/M3 chips
- Improved environment detection for Colab/local execution

## What is Neural Plasticity?

Neural plasticity is the ability of neural networks to adapt their structure over time through pruning (removing unused connections) and regrowth (restoring useful connections). This mimics how biological brains form efficient neural pathways.

In this demo, we:
1. Track the entropy and gradient patterns of each attention head
2. Dynamically prune high-entropy, low-gradient heads (unfocused, less useful)
3. Selectively revive low-entropy, higher-gradient heads (potentially useful)
4. Visualize the "brain dynamics" over time

This allows models to form more efficient neural structures during training.

## Environment Compatibility

This notebook automatically detects your execution environment and applies the appropriate optimizations:

- **Colab:** Uses GPU acceleration when available for maximum performance
- **Apple Silicon:** Applies safeguards against BLAS/libtorch crashes that commonly occur on M1/M2/M3 Macs
- **Standard Hardware:** Operates normally with GPU acceleration when available

No manual configuration is required - just run the cells and the notebook will optimize for your environment.

In [None]:
# Check and install system dependencies if needed
!apt-get update -qq > /dev/null
!apt-get install -qq libopenblas-dev > /dev/null  # For better performance

In [None]:
# Install required packages
!pip install -q torch transformers datasets matplotlib seaborn

# Clone the Sentinel AI repository
!git clone -b feature/implement-adaptive-plasticity https://github.com/CambrianTech/sentinel-ai.git
%cd sentinel-ai

# Add repository to path
import sys
sys.path.append('.')

# Configure the Experiment

Let's set up our configuration for the neural plasticity experiment using the new modular API.

In [None]:
# Import the needed modules
%matplotlib inline
import os
import torch
import numpy as np
import matplotlib.pyplot as plt
from datetime import datetime

# Import the Neural Plasticity Experiment class
from utils.neural_plasticity.experiment import NeuralPlasticityExperiment

# Import neural plasticity utilities
from utils.neural_plasticity import PruningStrategy, PruningMode

# Configuration for the experiment
MODEL_NAME = "distilgpt2"  # Small GPT-2 model for faster demonstration
DATASET = "wikitext"
DATASET_CONFIG = "wikitext-2-raw-v1"
MAX_LENGTH = 128
BATCH_SIZE = 4
LEARNING_RATE = 5e-5
PRUNING_LEVEL = 0.1      # Target to prune approximately 10% of heads in each step
PRUNING_STRATEGY = PruningStrategy.COMBINED  # Use both entropy and gradient information

# Set to True to enable continuous training for long periods
ENABLE_LONG_TRAINING = False  # Set to False for demo purposes to avoid memory/runtime issues

# If ENABLE_LONG_TRAINING is False, use these reduced settings
if not ENABLE_LONG_TRAINING:
    NUM_WARMUP_EPOCHS = 1        # Limit warmup epochs
    NUM_PRUNING_CYCLES = 3       # Run 3 pruning cycles
    TRAINING_STEPS_PER_CYCLE = 100  # Limit steps per cycle for demo
else:
    NUM_WARMUP_EPOCHS = 1        # Run a full epoch of warmup
    NUM_PRUNING_CYCLES = 5       # Run 5 pruning cycles
    TRAINING_STEPS_PER_CYCLE = 500  # More training steps per cycle

# Create output directory with timestamp
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
OUTPUT_DIR = f"neural_plasticity_output/run_{timestamp}"
os.makedirs(OUTPUT_DIR, exist_ok=True)

# Define unique ID for cache busting
unique_id = "2a9d6687"
print(f"Running neural plasticity experiment with modular API [ID: {unique_id}]")

# Initialize the NeuralPlasticityExperiment

The `NeuralPlasticityExperiment` class handles all the setup, data loading, and configuration automatically.

In [ ]:
# Initialize the experiment
experiment = NeuralPlasticityExperiment(
    model_name=MODEL_NAME,
    dataset=DATASET,
    dataset_config=DATASET_CONFIG,
    output_dir=OUTPUT_DIR,
    batch_size=BATCH_SIZE,
    max_length=MAX_LENGTH,
    pruning_level=PRUNING_LEVEL,
    pruning_strategy=PRUNING_STRATEGY,
    learning_rate=LEARNING_RATE,
    verbose=True,  # Print detailed information
    save_results=True  # Save results to disk
)

# Let's check the environment information reported by NeuralPlasticity API
from utils.neural_plasticity import NeuralPlasticity

env_info = NeuralPlasticity.get_environment_info()
print(f"\nEnvironment information:")
for key, value in env_info.items():
    print(f"  {key}: {value}")

# Set Up the Experiment

First, we need to set up the experiment by loading the model, tokenizer, and datasets.

In [None]:
# Set up the experiment (load model, tokenizer, and datasets)
experiment.setup()

# Verify we have access to the model and dataloaders
print(f"\nModel: {experiment.model_name}")
print(f"Device: {experiment.device}")

if experiment.train_dataloader and experiment.validation_dataloader:
    print(f"Training examples: {len(experiment.train_dataloader.dataset)}")
    print(f"Validation examples: {len(experiment.validation_dataloader.dataset)}")

# Run Model Warm-up

Before measuring baseline performance and applying neural plasticity, we'll run a brief warm-up phase to get initial attention patterns and stabilize metrics.

In [None]:
# Run warmup training until loss stabilizes
warmup_results = experiment.run_warmup(
    max_epochs=NUM_WARMUP_EPOCHS,  # Maximum number of warmup epochs
    patience=15,  # Number of steps with no decrease to consider stabilized
    min_steps=50,  # Minimum number of warm-up steps
    max_steps=150  # Maximum number of warm-up steps per epoch
)

# Plot the warmup losses
plt.figure(figsize=(10, 5))
plt.plot(warmup_results["losses"], label="Loss")
if len(warmup_results["smoothed_losses"]) == len(warmup_results["losses"]):
    plt.plot(warmup_results["smoothed_losses"], label="Smoothed Loss", linestyle="--")
plt.title("Warmup Training Loss")
plt.xlabel("Step")
plt.ylabel("Loss")
plt.legend()
plt.grid(alpha=0.3)
plt.show()

# Display baseline evaluation metrics
print(f"\nBaseline evaluation after warm-up:")
print(f"  Loss: {experiment.baseline_loss:.4f}")
print(f"  Perplexity: {experiment.baseline_perplexity:.2f}")

# Analyze Attention Patterns

Now let's analyze the attention patterns in the model to calculate entropy and gradients.

In [None]:
# Analyze attention patterns
attention_analysis = experiment.analyze_attention()

# Get the model structure
num_layers, num_heads = attention_analysis["model_structure"]
print(f"Model has {num_layers} layers with {num_heads} heads each")
print(f"Total number of attention heads: {num_layers * num_heads}")

# The entropy and gradient values are stored in the experiment object
entropy_values = experiment.entropy_values
grad_norm_values = experiment.grad_norm_values

# Visualize the entropy and gradient heat maps side by side using subplots
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))

# Plot entropy values
im1 = ax1.imshow(entropy_values.detach().cpu().numpy(), cmap="viridis", aspect="auto")
fig.colorbar(im1, ax=ax1, label="Entropy")
ax1.set_title("Attention Head Entropy (Higher = Less Focused)")
ax1.set_xlabel("Head Index")
ax1.set_ylabel("Layer Index")

# Plot gradient values
im2 = ax2.imshow(grad_norm_values.detach().cpu().numpy(), cmap="plasma", aspect="auto")
fig.colorbar(im2, ax=ax2, label="Gradient Norm")
ax2.set_title("Attention Head Gradient Norms (Higher = More Learning)")
ax2.set_xlabel("Head Index")
ax2.set_ylabel("Layer Index")

plt.tight_layout()
plt.show()

# Run Pruning Cycles

Now we'll run multiple pruning cycles. Each cycle consists of:
1. Analyzing attention head importance
2. Pruning the least important heads
3. Fine-tuning the model to recover performance
4. Evaluating the resulting model

In [None]:
# Run multiple pruning cycles
for cycle in range(NUM_PRUNING_CYCLES):
    print(f"\n=== Pruning Cycle {cycle+1}/{NUM_PRUNING_CYCLES} ===")
    
    # Run a single pruning cycle
    pruning_results = experiment.run_pruning_cycle(
        training_steps=TRAINING_STEPS_PER_CYCLE
    )
    
    # Print cycle metrics
    print(f"Cycle {cycle+1} results:")
    print(f"  Pruned heads: {len(pruning_results['pruned_heads'])}")
    print(f"  Loss before pruning: {pruning_results['baseline_metrics']['loss']:.4f}")
    print(f"  Loss after pruning: {pruning_results['pruned_metrics']['loss']:.4f}")
    print(f"  Loss after training: {pruning_results['final_metrics']['loss']:.4f}")
    print(f"  Perplexity before: {pruning_results['baseline_metrics']['perplexity']:.2f}")
    print(f"  Perplexity after: {pruning_results['final_metrics']['perplexity']:.2f}")
    
    # Create a visualization dashboard after each cycle
    if cycle > 0:
        dashboard_path = os.path.join(OUTPUT_DIR, f"dashboard_cycle{cycle+1}.png")
        dashboard_fig = experiment.visualize_metrics_dashboard(save_path=dashboard_path)
        plt.figure(dashboard_fig.number)
        plt.show()

# Evaluate the Final Model

Let's evaluate our final model after all pruning cycles.

In [None]:
# Run final evaluation
eval_metrics = experiment.evaluate()

print(f"=== Final Evaluation ===")
print(f"Baseline Perplexity: {experiment.baseline_perplexity:.2f}")
print(f"Final Perplexity: {experiment.final_perplexity:.2f}")
print(f"Improvement: {eval_metrics['improvement_percent']:.2f}%")
print(f"\nPruned {len(experiment.pruned_heads)} of {num_layers * num_heads} heads")
print(f"Model Sparsity: {len(experiment.pruned_heads) / (num_layers * num_heads) * 100:.1f}%")

# Create and show a final metrics dashboard
dashboard_path = os.path.join(OUTPUT_DIR, "final_metrics_dashboard.png")
dashboard_fig = experiment.visualize_metrics_dashboard(save_path=dashboard_path)
plt.figure(dashboard_fig.number)
plt.show()

# Generate Text with the Pruned Model

Let's generate some text with our pruned model to see the results.

In [None]:
# Generate text with various prompts
prompts = {
    "story": "Once upon a time",
    "ai": "The future of artificial intelligence",
    "space": "In a distant galaxy",
    "science": "Scientists recently discovered"
}

generated_texts = experiment.generate_examples(prompts=prompts, max_length=100)

# Print the generated text for each prompt
for prompt_name, text in generated_texts.items():
    print(f"\n=== {prompt_name.upper()} ===")
    print(f"Prompt: {prompts[prompt_name]}")
    print(f"Generated:\n{text}")

# Save the Pruned Model

Let's save our pruned model for future use.

In [None]:
# Save the pruned model
save_paths = experiment.save_model()

if save_paths:
    print(f"Model saved to {save_paths['model_dir']}")
    print(f"\nThe following files were saved:")
    for key, path in save_paths.items():
        print(f"  {key}: {path}")

# Simplified One-Shot Experiment

The `NeuralPlasticityExperiment` class also provides a convenient `run_full_experiment()` method to run everything in one go. This is useful for quick experiments or when you don't need fine-grained control over each step.

In [None]:
# Create a new experiment with a different output directory
one_shot_output_dir = os.path.join(OUTPUT_DIR, "one_shot")
one_shot_experiment = NeuralPlasticityExperiment(
    model_name=MODEL_NAME,
    dataset=DATASET,
    dataset_config=DATASET_CONFIG,
    output_dir=one_shot_output_dir,
    batch_size=BATCH_SIZE,
    max_length=MAX_LENGTH,
    pruning_level=PRUNING_LEVEL,
    pruning_strategy=PRUNING_STRATEGY,
    learning_rate=LEARNING_RATE,
    verbose=True,
    save_results=True
)

# Run the full experiment
print("\n=== Running One-Shot Experiment ===\n")
results = one_shot_experiment.run_full_experiment(
    warmup_epochs=1,
    pruning_cycles=2,  # Using fewer cycles for demo purposes
    training_steps=50  # Fewer steps per cycle for brevity
)

# Print the results
print(f"\n=== One-Shot Experiment Results ===")
print(f"Baseline Perplexity: {results['baseline_metrics']['perplexity']:.2f}")
print(f"Final Perplexity: {results['final_metrics']['perplexity']:.2f}")
print(f"Improvement: {results['improvement_percent']:.2f}%")
print(f"Pruned Heads: {len(results['pruned_heads'])}")
print(f"Execution Time: {results['execution_time'] / 60:.1f} minutes")

# We can also easily generate text with the one-shot experiment model
one_shot_texts = one_shot_experiment.generate_examples(
    prompts={"story": "Once upon a time"},
    max_length=50
)

print(f"\nGenerated text from one-shot experiment:\n{one_shot_texts['story']}")

# Conclusion

In this notebook, we demonstrated Sentinel AI's neural plasticity system, which enables transformer models to dynamically prune and revive attention heads during training based on their utility.

Key findings:
1. The plasticity system successfully pruned high-entropy, low-gradient heads
2. Some heads were revived when they showed potential for useful learning
3. The final model achieved comparable quality with fewer active heads
4. The brain dynamics visualization shows how attention heads evolve over time

## Benefits of the Modular Architecture

The modular architecture in v0.0.63 provides several advantages:

1. **Experiment Class API**: The `NeuralPlasticityExperiment` class provides a clean, high-level interface for running experiments
2. **Cross-Platform Compatibility**: The same code works reliably across standard CPUs, GPUs, and Apple Silicon
3. **Simplified Workflow**: The step-by-step methods make the process clear and easy to understand
4. **One-Shot Execution**: The `run_full_experiment()` method enables quick experimentation with minimal code
5. **Robust Tensor Handling**: Automatically detects the execution environment and applies appropriate optimizations
6. **Improved Numerical Stability**: Enhanced entropy calculations prevent NaN/Inf values
7. **Performance Optimizations**: Environment-specific optimizations for maximum efficiency

This approach mimics biological neural plasticity, where brains form efficient neural pathways by pruning unused connections and strengthening useful ones.