<a href="https://colab.research.google.com/github/CogNetSys/MetaStrata/blob/main/Meta_Adaptive_Neural_Engine_06.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# %% [markdown]
"""
# MANE (Meta-Adaptive Neural Engine)
### A Self-Evolving, Model-Agnostic Platform for AGI
MANE is designed to continuously evolve its architecture and learning strategy through adaptive clustering, evolutionary optimization, and maximum dissimilarity learning. It leverages self-organizing clustering mechanisms, hyperbolic-inspired distance computations, and a reinforcement learning-based hyperparameter feedback loop. This system is built to be model-agnostic, enabling it to modify its own structure dynamically and to eventually decide the optimal learning framework for its tasks.
"""

# %% [markdown]
"""
## Cell 1: Install Dependencies
Install the necessary packages. We'll need standard deep learning libraries, as well as libraries for reinforcement learning, clustering, and (optionally) quantum computing simulations.
"""

# %%
!pip install torch torchvision scikit-learn tensorboard faiss-cpu scikit-optimize

# %% [markdown]
"""
## Cell 2: Imports & Logging Configuration
Import libraries, set up logging, configure TensorBoard, and import modules for quantum simulation.
This final version includes comprehensive console logging and progress indicators.
"""
# %%
#######################
# Imports & Logging   #
#######################
import os
import time
import logging
import json
import datetime
import numpy as np
import matplotlib.pyplot as plt
from collections import deque

import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torch.utils.tensorboard import SummaryWriter

from sklearn.datasets import make_classification
from sklearn.decomposition import PCA
from sklearn.model_selection import train_test_split  # Make sure this is imported!

# For Bayesian optimization and progress indication
from sklearn.base import BaseEstimator
from skopt import BayesSearchCV
from skopt.space import Real
from tqdm import tqdm

logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s")
writer = SummaryWriter('runs/Combined_Experiment')


Collecting faiss-cpu
  Downloading faiss_cpu-1.10.0-cp311-cp311-manylinux_2_28_x86_64.whl.metadata (4.4 kB)
Collecting scikit-optimize
  Downloading scikit_optimize-0.10.2-py2.py3-none-any.whl.metadata (9.7 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch)
  Downloading nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-cu12

In [None]:
# %% [markdown]
"""
# Cell 3: Data Preprocessing & Augmentation (Phase 2.5 - Dataset Diversification)
This cell loads one of several datasets. You can choose from:
  - Standard datasets: "Iris", "Wine", "Titanic", "Digits", "Fashion-MNIST"
  - Multimodal datasets: "MELD", "IEMOCAP", "LUMA", "M3AV"
For multimodal datasets, we assume a CSV file is available with (at least) a 'text' and 'label' column.
For standard datasets, we use scikit‑learn or torchvision.
"""
# %%
import torch
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import numpy as np
import os
import torch.nn as nn

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Set your desired dataset name here:
# Options for standard datasets: "Iris", "Wine", "Titanic", "Digits", "Fashion-MNIST"
# Options for multimodal datasets: "MELD", "IEMOCAP", "LUMA", "M3AV"
dataset_name = "Iris"  # <-- change this value to test different datasets

multimodal_datasets = {"MELD", "IEMOCAP", "LUMA", "M3AV"}
enable_multimodal = dataset_name in multimodal_datasets

if dataset_name == "MELD":
    try:
        import pandas as pd
        from sklearn.feature_extraction.text import TfidfVectorizer
        df = pd.read_csv('meld_data.csv')
        vectorizer = TfidfVectorizer(max_features=10)
        text_features = vectorizer.fit_transform(df['text']).toarray()
        np.random.seed(42)
        audio_features = np.random.normal(0, 1, size=(df.shape[0], 3))
        video_features = np.random.normal(0, 1, size=(df.shape[0], 5))
        X = np.concatenate([text_features, audio_features, video_features], axis=1)
        y = df['label'].values
        print("Loaded MELD dataset.")
    except Exception as e:
        print("Could not load MELD dataset; falling back to Iris.")
        dataset_name = "Iris"
        enable_multimodal = False

elif dataset_name == "IEMOCAP":
    try:
        import pandas as pd
        from sklearn.feature_extraction.text import TfidfVectorizer
        df = pd.read_csv('iemocap_data.csv')
        vectorizer = TfidfVectorizer(max_features=10)
        text_features = vectorizer.fit_transform(df['text']).toarray()
        np.random.seed(42)
        audio_features = np.random.normal(0, 1, size=(df.shape[0], 3))
        video_features = np.random.normal(0, 1, size=(df.shape[0], 5))
        X = np.concatenate([text_features, audio_features, video_features], axis=1)
        y = df['label'].values
        print("Loaded IEMOCAP dataset.")
    except Exception as e:
        print("Could not load IEMOCAP dataset; falling back to Iris.")
        dataset_name = "Iris"
        enable_multimodal = False

elif dataset_name == "LUMA":
    try:
        import pandas as pd
        from sklearn.feature_extraction.text import TfidfVectorizer
        df = pd.read_csv('luma_data.csv')
        vectorizer = TfidfVectorizer(max_features=10)
        text_features = vectorizer.fit_transform(df['text']).toarray()
        np.random.seed(42)
        image_features = np.random.normal(0, 1, size=(df.shape[0], 4))
        audio_features = np.random.normal(0, 1, size=(df.shape[0], 3))
        X = np.concatenate([text_features, image_features, audio_features], axis=1)
        y = df['label'].values
        print("Loaded LUMA dataset.")
    except Exception as e:
        print("Could not load LUMA dataset; falling back to Iris.")
        dataset_name = "Iris"
        enable_multimodal = False

elif dataset_name == "M3AV":
    try:
        import pandas as pd
        from sklearn.feature_extraction.text import TfidfVectorizer
        df = pd.read_csv('m3av_data.csv')
        vectorizer = TfidfVectorizer(max_features=10)
        text_features = vectorizer.fit_transform(df['text']).toarray()
        np.random.seed(42)
        speech_features = np.random.normal(0, 1, size=(df.shape[0], 3))
        visual_features = np.random.normal(0, 1, size=(df.shape[0], 5))
        X = np.concatenate([text_features, speech_features, visual_features], axis=1)
        y = df['label'].values
        print("Loaded M3AV dataset.")
    except Exception as e:
        print("Could not load M3AV dataset; falling back to Iris.")
        dataset_name = "Iris"
        enable_multimodal = False

elif dataset_name == "Iris":
    from sklearn import datasets
    iris = datasets.load_iris()
    X = iris.data
    y = iris.target
    scaler = StandardScaler()
    X = scaler.fit_transform(X)
    np.random.seed(42)
    extra_features = np.random.normal(0, 1, size=(X.shape[0], 3))
    X = np.concatenate([X, extra_features], axis=1)
    print("Using Iris dataset with simulated extra features.")

elif dataset_name == "Wine":
    from sklearn import datasets
    wine = datasets.load_wine()
    X = wine.data
    y = wine.target
    scaler = StandardScaler()
    X = scaler.fit_transform(X)
    np.random.seed(42)
    extra_features = np.random.normal(0, 1, size=(X.shape[0], 2))
    X = np.concatenate([X, extra_features], axis=1)
    print("Using Wine dataset with simulated extra features.")

elif dataset_name == "Titanic":
    try:
        import seaborn as sns
        df = sns.load_dataset("titanic").dropna()
        X = df[['pclass', 'age', 'sibsp', 'parch', 'fare']].values
        y = df["survived"].values
        scaler = StandardScaler()
        X = scaler.fit_transform(X)
        np.random.seed(42)
        extra_features = np.random.normal(0, 1, size=(X.shape[0], 2))
        X = np.concatenate([X, extra_features], axis=1)
        print("Using Titanic dataset with simulated extra features.")
    except Exception as e:
        print("Could not load Titanic dataset; falling back to Iris.")
        dataset_name = "Iris"
        enable_multimodal = False

elif dataset_name == "Digits":
    from sklearn import datasets
    digits = datasets.load_digits()
    X = digits.data
    y = digits.target
    scaler = StandardScaler()
    X = scaler.fit_transform(X)
    np.random.seed(42)
    extra_features = np.random.normal(0, 1, size=(X.shape[0], 2))
    X = np.concatenate([X, extra_features], axis=1)
    print("Using Digits dataset with simulated extra features.")

elif dataset_name == "Fashion-MNIST":
    from torchvision import datasets, transforms
    transform = transforms.Compose([
        transforms.ToTensor(),
        transforms.Lambda(lambda x: x.view(-1))
    ])
    fashion = datasets.FashionMNIST(root='./data', train=True, download=True, transform=transform)
    X = np.array([np.array(img) for img, _ in fashion])
    y = np.array([label for _, label in fashion])
    X = X / 255.0
    X = X.reshape(X.shape[0], -1)
    np.random.seed(42)
    extra_features = np.random.normal(0, 1, size=(X.shape[0], 2))
    X = np.concatenate([X, extra_features], axis=1)
    print("Using Fashion-MNIST dataset with simulated extra features.")

else:
    raise ValueError(f"Dataset '{dataset_name}' not recognized. Please choose from Iris, Wine, Titanic, Digits, Fashion-MNIST, MELD, IEMOCAP, LUMA, or M3AV.")

X = np.array(X)
y = np.array(y)
X_tensor = torch.tensor(X, dtype=torch.float32)
y_tensor = torch.tensor(y, dtype=torch.long)
X_train, X_val, y_train, y_val = train_test_split(X_tensor, y_tensor, test_size=0.2, random_state=42, stratify=y_tensor)
X_train = X_train.to(device)
X_val = X_val.to(device)
y_train = y_train.to(device)
y_val = y_val.to(device)

input_size = X_train.shape[1]
print(f"Dataset: {dataset_name} | Multi‐Modal Enabled: {enable_multimodal}")
print(f"Training data shape: {X_train.shape}, Validation data shape: {X_val.shape}")
print(f"Unique classes: {torch.unique(y_tensor)}")

# Compute class weights for bias mitigation
class_counts = torch.bincount(y_train)
class_weights = 1.0 / (class_counts.float() + 1e-8)
class_weights = class_weights / class_weights.sum()
class_weights = class_weights.to(device)

def mixup(x, y, alpha=0.2):
    lam = np.random.beta(alpha, alpha)
    index = torch.randperm(x.size(0)).to(x.device)
    mixed_x = lam * x + (1 - lam) * x[index, :]
    num_classes = int(torch.max(y)) + 1
    y_onehot = torch.nn.functional.one_hot(y, num_classes=num_classes).float()
    mixed_y = lam * y_onehot + (1 - lam) * y_onehot[index, :]
    return mixed_x, mixed_y

# %% [markdown]
"""
# Cell 4: Utility Functions (Phase 2.5 - Next Iteration)
Defines functions for various metrics, uncertainty estimation, fairness, robust loss, zero-shot evaluation,
sensor noise simulation, prediction distribution, and a dummy bias correction.
"""
# %%
def hyperbolic_distance(x, y):
    x = torch.tensor(x, dtype=torch.float32, device=device)
    y = torch.tensor(y, dtype=torch.float32, device=device)
    norm_x = torch.norm(x)
    norm_y = torch.norm(y)
    return torch.acosh(1 + 2 * torch.sum((x - y)**2) / ((1 - norm_x**2) * (1 - norm_y**2)))

def compute_centroid_entropy(centroids):
    with torch.no_grad():
        D = torch.cdist(centroids, centroids, p=2)
        D.fill_diagonal_(float('inf'))
        p = 1 / (D + 1e-8)
        p = p / p.sum(dim=1, keepdim=True)
        p = torch.clamp(p, min=1e-8)
        entropy_per_centroid = -torch.sum(p * torch.log(p), dim=1)
        return torch.mean(entropy_per_centroid)

def enforce_centroid_separation(centroids, min_sep=1.0):
    with torch.no_grad():
        num_centroids = centroids.size(0)
        for i in range(num_centroids):
            for j in range(i+1, num_centroids):
                diff = centroids[i] - centroids[j]
                dist = torch.norm(diff)
                if dist < min_sep:
                    correction = (min_sep - dist) / 2
                    direction = diff / (dist + 1e-8)
                    centroids[i].add_(correction * direction)
                    centroids[j].add_(-correction * direction)

def monte_carlo_dropout(model, x, epoch, num_samples=10):
    model.train()
    preds = []
    for _ in range(num_samples):
        with torch.amp.autocast('cuda', enabled=True):
            outputs = model(x, epoch)
        preds.append(torch.softmax(outputs, dim=1).unsqueeze(0))
    preds = torch.cat(preds, dim=0)
    mean_preds = preds.mean(dim=0)
    variance = preds.var(dim=0).mean().item()
    return mean_preds, variance

def monte_carlo_dropout_per_sample(model, x, epoch, num_samples=10):
    model.train()
    preds = []
    for _ in range(num_samples):
        with torch.amp.autocast('cuda', enabled=True):
            outputs = model(x, epoch)
        preds.append(torch.softmax(outputs, dim=1).unsqueeze(0))
    preds = torch.cat(preds, dim=0)
    sample_variances = preds.var(dim=0).mean(dim=1)
    return sample_variances

def fairness_metric(pred_probs, y_true):
    preds = torch.argmax(pred_probs, dim=1)
    unique, counts = torch.unique(y_true, return_counts=True)
    ideal = torch.ones_like(counts, dtype=torch.float32) / len(unique)
    actual = counts.float() / counts.sum()
    bias = torch.norm(actual - ideal)
    return bias.item()

def log_energy_usage():
    usage = torch.cuda.memory_allocated(device) / 1e6
    return usage

def compute_modal_consistency(x):
    return 1.0  # For fallback datasets, we return 1.0.

def meta_cognition_metric(pred_probs):
    entropy = -torch.sum(pred_probs * torch.log(pred_probs + 1e-8), dim=1)
    return entropy.mean().item()

def compute_saliency_map(model, x, target_class=None):
    model.eval()
    x.requires_grad_()
    outputs = model(x, 0)
    if target_class is None:
        target_class = outputs.argmax(dim=1)
    loss = nn.CrossEntropyLoss()(outputs, target_class)
    loss.backward()
    saliency = x.grad.abs().mean().item()
    x.requires_grad_(False)
    return saliency

weighted_loss_fn = nn.CrossEntropyLoss(weight=class_weights)

def robust_loss(x, outputs, targets, model, lambda_reg=0.01):
    ce_loss = weighted_loss_fn(outputs, targets)
    x_adv = x.clone().detach().requires_grad_(True)
    outputs_adv = model(x_adv, 0)
    loss_for_grad = weighted_loss_fn(outputs_adv, targets)
    loss_for_grad.backward(retain_graph=True)
    grad_norm = x_adv.grad.norm(2)
    penalty = lambda_reg * grad_norm
    return ce_loss + penalty

def zero_shot_evaluation(model, num_samples=20):
    synthetic_data = generate_zero_shot_data(num_samples, input_size)
    model.eval()
    with torch.no_grad():
        outputs = model(synthetic_data, 0)
    predictions = torch.argmax(outputs, dim=1)
    unique, counts = torch.unique(predictions, return_counts=True)
    return dict(zip(unique.cpu().numpy(), counts.cpu().numpy()))

def detect_training_failure(loss_history, patience=5, min_improvement=0.01):
    if len(loss_history) < patience:
        return False
    recent = loss_history[-patience:]
    if max(recent) - min(recent) < min_improvement:
        return True
    return False

def update_experience_replay(buffer, new_batch, max_size=100):
    buffer.extend(new_batch)
    if len(buffer) > max_size:
        buffer = buffer[-max_size:]
    return buffer

def apply_adaptive_pruning(model, threshold=0.05):
    with torch.no_grad():
        for module in model.main_head:
            if isinstance(module, nn.Linear):
                module.weight.data = module.weight.data.masked_fill(module.weight.data.abs() < threshold, 0.0)

def simulate_sensor_noise(x, noise_std=0.1):
    noise = noise_std * torch.randn_like(x)
    return x + noise

def apply_bias_correction(pred_probs, bias_threshold=0.1):
    return pred_probs

def explain_with_shap(model, x):
    import numpy as np
    return np.random.uniform(0.0, 1.0)

def compute_prediction_distribution(model, x, epoch=0):
    model.eval()
    with torch.no_grad():
        outputs = model(x, epoch)
        preds = torch.argmax(outputs, dim=1).cpu().numpy()
    import numpy as np
    unique, counts = np.unique(preds, return_counts=True)
    return dict(zip(unique, counts))

# %% [markdown]
"""
# Cell 5: RL Hyperparameter Agent (Phase 2.5 - Next Iteration)
Updates hyperparameters based on various metrics.
"""
# %%
class RLHyperparameterAgent:
    def __init__(self, init_alpha=0.1, init_lr=0.01, energy_threshold=25.0):
        self.alpha = init_alpha
        self.lr = init_lr
        self.max_centroid_sep = 20
        self.energy_threshold = energy_threshold
    def update(self, centroid_separation, fairness=None, modal_consistency=None, meta_cognition=None, energy_usage=None):
        if fairness is not None and fairness > 0.1:
            self.alpha = min(0.5, self.alpha * 1.06)
            self.lr = min(0.05, self.lr * 1.06)
        elif modal_consistency is not None and modal_consistency < 0.5:
            self.alpha = min(0.5, self.alpha * 1.04)
            self.lr = min(0.05, self.lr * 1.04)
        elif meta_cognition is not None and meta_cognition < 0.8:
            self.alpha = min(0.5, self.alpha * 1.05)
            self.lr = min(0.05, self.lr * 1.05)
        elif energy_usage is not None and energy_usage > self.energy_threshold:
            self.lr = max(1e-4, self.lr * 0.95)
        elif centroid_separation < 1.0:
            self.alpha = min(0.5, self.alpha * 1.03)
            self.lr = min(0.05, self.lr * 1.01)
        else:
            self.alpha = max(0.01, self.alpha * 0.99)
            self.lr = max(1e-4, self.lr * 0.99)
        centroid_separation = min(self.max_centroid_sep, centroid_separation)
        return self.alpha, self.lr, centroid_separation

# %% [markdown]
"""
# Cell 6: Evolvable Model Architectures (NAS Module) (Phase 2.5 - Next Iteration)
Builds an evolvable MLP candidate with support for few-shot mode, adaptive pruning, and meta-cognition.
Also adds a method to mutate the architecture.
"""
# %%
import torch.nn as nn
import torch.nn.functional as F
import random

def build_evolvable_model(candidate, input_size, num_classes, total_epochs=20):
    arch = candidate["architecture"]
    few_shot = candidate.get("few_shot", False)
    adaptive_pruning = candidate.get("adaptive_pruning", False)
    meta_cognition_enabled = candidate.get("meta_cognition", False)
    if arch == "MLP":
        class EvolvableMLP(nn.Module):
            def __init__(self, candidate, input_size, num_classes, total_epochs):
                super(EvolvableMLP, self).__init__()
                self.candidate = candidate  # store candidate for cloning
                self.adaptive_pruning = adaptive_pruning
                self.meta_cognition_enabled = meta_cognition_enabled
                layers = []
                in_features = input_size
                if few_shot:
                    meta_hidden = max(16, candidate["hidden_size"] // 2)
                    layers.append(nn.Linear(in_features, meta_hidden))
                    layers.append(nn.ReLU())
                    in_features = meta_hidden
                num_layers = candidate["num_layers"]
                hidden_size = candidate["hidden_size"]
                dropout_rate = candidate["dropout"]
                activations = [nn.ReLU(), nn.SiLU(), nn.LeakyReLU()]
                for i in range(num_layers):
                    layers.append(nn.Linear(in_features, hidden_size))
                    layers.append(random.choice(activations))
                    layers.append(nn.Dropout(dropout_rate))
                    in_features = hidden_size
                self.main_head = nn.Sequential(*layers)
                self.classifier = nn.Linear(in_features, num_classes)
                if self.meta_cognition_enabled:
                    self.reflection_head = nn.Linear(in_features, 1)
                self.total_epochs = total_epochs

            def forward(self, x, epoch=None):
                features = self.main_head(x)
                out = self.classifier(features)
                if self.meta_cognition_enabled:
                    self.reflection_output = self.reflection_head(features)
                self.latent_features = features
                return out

            def mutate_architecture(self):
                for module in self.main_head:
                    if isinstance(module, nn.Dropout):
                        new_rate = module.p + random.uniform(-0.05, 0.05)
                        module.p = max(0.0, min(new_rate, 0.5))
                        break

        return EvolvableMLP(candidate, input_size, num_classes, total_epochs)
    else:
        raise ValueError(f"Unknown architecture in candidate: {arch}")

# %% [markdown]
"""
# Cell 7: Custom Estimator for Hyperparameter Optimization & Automated Retraining (Phase 2.5 - Next Iteration)
This estimator integrates multiple self-improvement features.
Note: Instead of using copy.deepcopy, we now use a custom cloning function to re-create a fresh model from the stored candidate.
"""
# %%
from sklearn.base import BaseEstimator
import torch.optim as optim
import math
from tqdm import tqdm


def adaptive_alpha(epoch, base_alpha=0.01, increase_factor=1.05, max_alpha=0.1):
    return min(base_alpha * (increase_factor ** epoch), max_alpha)

def neural_network_competition(X, y, population, num_epochs_comp=3, batch_size=16):
    best_model = None
    best_loss = float('inf')
    for model in population:
        model.train()
        optimizer = torch.optim.Adam(model.parameters(), lr=0.005)
        total_loss = 0.0
        for comp_epoch in range(num_epochs_comp):
            permutation = torch.randperm(X.size(0))
            for i in range(0, X.size(0), batch_size):
                indices = permutation[i:i+batch_size]
                batch = X[indices]
                batch_labels = y[indices]
                optimizer.zero_grad()
                with torch.amp.autocast('cuda', enabled=True):
                    outputs = model(batch, comp_epoch)
                    loss = nn.CrossEntropyLoss(weight=class_weights)(outputs, batch_labels)
                loss.backward()
                optimizer.step()
                total_loss += loss.item()
        if total_loss < best_loss:
            best_loss = total_loss
            best_model = model.state_dict()  # store state dict of best model
    return best_model

class MANEEstimator(BaseEstimator):
    def __init__(self, input_size, hidden_size, latent_size, num_classes,
                 entropy_weight=0.1, min_sep=1.0, base_alpha=0.01, total_epochs=35,
                 confidence_threshold=0.05, use_curriculum=True, self_directed=False):
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.latent_size = latent_size
        self.num_classes = num_classes
        self.entropy_weight = entropy_weight
        self.min_sep = min_sep
        self.base_alpha = base_alpha
        self.total_epochs = total_epochs
        self.confidence_threshold = confidence_threshold
        self.use_curriculum = use_curriculum
        self.self_directed = self_directed
        # Build model with candidate dictionary
        self.candidate = {
            "architecture": "MLP",
            "num_layers": 3,
            "hidden_size": hidden_size,
            "dropout": 0.35,
            "few_shot": False,
            "adaptive_pruning": False,
            "meta_cognition": True
        }
        self.model = build_evolvable_model(self.candidate, input_size, num_classes, total_epochs=self.total_epochs)
        self.model.to(device)
        self.optimizer = None
        self.criterion = nn.CrossEntropyLoss(weight=class_weights)
        self.best_model = None
        self.best_val_loss = float('inf')
        self.no_improve_epochs = 0
        self.experience_replay_buffer = []
        self.loss_history = []

    def fit(self, X, y, learning_rate=0.005, num_epochs=35, batch_size=64, weight_decay=5e-3,
            patience=5, auto_grok_min_epochs=10, auto_grok_threshold=0.04):
        self.optimizer = optim.Adam(self.model.parameters(), lr=learning_rate, weight_decay=weight_decay)
        self.no_improve_epochs = 0
        scheduler = torch.optim.lr_scheduler.CosineAnnealingWarmRestarts(self.optimizer, T_0=2, T_mult=1, eta_min=5e-6)

        # Cloning function to safely duplicate model without deepcopy()
        def clone_fn(model):
            new_model = build_evolvable_model(model.candidate, self.input_size, self.num_classes, self.total_epochs)
            new_model.load_state_dict(model.state_dict())
            new_model.to(torch.device("cuda" if torch.cuda.is_available() else "cpu"))
            return new_model

        # Instead of deepcopy, safely clone model using clone_fn()
        population = [clone_fn(self.model) for _ in range(3)]

        # Reset parameters if necessary
        for clone in population:
            try:
                clone.apply(lambda m: m.reset_parameters() if hasattr(m, 'reset_parameters') else None)
            except Exception:
                pass

        # Run neural network competition
        best_population_state = neural_network_competition(X, y, population, num_epochs_comp=3, batch_size=batch_size)
        if best_population_state is not None:
            self.model.load_state_dict(best_population_state)

        self.loss_history = []
        exp_buffer = []  # Local experience replay buffer
        for epoch in tqdm(range(num_epochs), desc="Retraining Epochs"):
            current_alpha = adaptive_alpha(epoch, base_alpha=0.01)
            if hasattr(self.model, 'base_alpha'):
                self.model.base_alpha = current_alpha

            if self.use_curriculum:
                curriculum_X, curriculum_y = generate_curriculum(X, y, self.model, num_samples=int(0.1 * X.size(0)))
                X_epoch = torch.cat([X, curriculum_X])
                y_epoch = torch.cat([y, curriculum_y])
            else:
                X_epoch, y_epoch = X, y

            if epoch % 3 == 0:
                self.model.mutate_architecture()
                adv_examples = generate_adversarial_examples(X_epoch, y_epoch, self.model, epsilon=0.05)
                X_epoch = torch.cat([X_epoch, adv_examples])
                y_epoch = torch.cat([y_epoch, y_epoch])

            self.model.train()
            permutation = torch.randperm(X_epoch.size(0))
            epoch_loss = 0.0
            for i in range(0, X_epoch.size(0), batch_size):
                indices = permutation[i:i+batch_size]
                batch = X_epoch[indices]
                batch_labels = y_epoch[indices]
                self.optimizer.zero_grad()
                with torch.amp.autocast('cuda', enabled=True):
                    outputs = self.model(batch, epoch)
                    loss = robust_loss(batch, outputs, batch_labels, self.model, lambda_reg=0.01)
                loss.backward()
                torch.nn.utils.clip_grad_norm_(self.model.parameters(), max_norm=1.0)
                self.optimizer.step()
                epoch_loss += loss.item()
                exp_buffer = update_experience_replay(exp_buffer, [(batch, batch_labels)])

            scheduler.step()
            self.model.eval()
            with torch.no_grad():
                outputs = self.model(X, epoch)
                val_loss = self.criterion(outputs, y).mean()
            self.loss_history.append(val_loss.item())

            print(f"Epoch {epoch+1}: Train Loss={epoch_loss:.4f}, Val Loss={val_loss.item():.4f}")

            if self.best_val_loss - val_loss.item() > 0.01:
                self.no_improve_epochs = 0
                self.best_val_loss = val_loss.item()
                self.best_model = self.model.state_dict()
            else:
                self.no_improve_epochs += 1
                if self.no_improve_epochs >= patience:
                    print("Early stopping triggered.")
                    break

        if self.best_model is not None:
            self.model.load_state_dict(self.best_model)

        stats_report = {"retraining_runs": []}
        stats_report["retraining_runs"].append({
            "run": epoch+1,
            "best_val_loss": self.best_val_loss,
            "total_epochs": num_epochs
        })

        print(f"Retraining run complete.")
        return self.model, stats_report

# Instantiate and run training
mane_model, retraining_stats = automated_retraining(X_train, y_train, X_val, y_val)

# Simple wrapper for AutoML evaluation
from sklearn.base import BaseEstimator
class DummyEstimator(BaseEstimator):
    def __init__(self, model):
        self.model = model
    def score(self, X, y):
        self.model.eval()
        with torch.no_grad():
            outputs = self.model(X, 0)
            preds = torch.argmax(outputs, dim=1)
            return (preds == y).float().mean().item()

estimator_wrapper = DummyEstimator(mane_model)
automl_accuracy = estimator_wrapper.score(X_val, y_val)
print(f"AutoML Selected Model Accuracy: {automl_accuracy*100:.2f}%")

mane_writer.add_scalar("Final_Val_Accuracy", automl_accuracy)
mane_writer.close()

from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, log_loss
X_baseline = X_train.cpu().numpy()
y_baseline = y_train.cpu().numpy()
X_val_baseline = X_val.cpu().numpy()
y_val_baseline = y_val.cpu().numpy()
baseline_model = LogisticRegression(max_iter=400)
baseline_model.fit(X_baseline, y_baseline)
baseline_preds = baseline_model.predict(X_val_baseline)
baseline_probs = baseline_model.predict_proba(X_val_baseline)
baseline_acc = accuracy_score(y_val_baseline, baseline_preds)
baseline_logloss = log_loss(y_val_baseline, baseline_probs)
print(f"Baseline Accuracy: {baseline_acc*100:.2f}%")
print(f"Baseline Log Loss: {baseline_logloss:.4f}")
baseline_writer.add_scalar("Final_Val_Accuracy", baseline_acc)
baseline_writer.add_scalar("Final_Log_Loss", baseline_logloss)
baseline_writer.close()

# %% [markdown]
"""
# Cell 10: Evaluate and Visualize the Learned Latent Space (Phase 2.5 - Next Iteration)
Visualizes the latent space using PCA and annotates energy usage, modal consistency, meta‑cognition,
and saliency. It also simulates edge deployment by reporting the model’s parameter count and inference time,
and plots a histogram of the prediction distribution (for bias analysis).
"""
# %%
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt
import gc
import time

def simulate_edge_deployment(model):
    model_cpu = model.to("cpu")
    param_count = sum(p.numel() for p in model_cpu.parameters())
    start = time.time()
    with torch.no_grad():
        dummy_input = torch.randn(1, input_size)
        _ = model_cpu(dummy_input, 0)
    inference_time = time.time() - start
    return param_count, inference_time

def reset_vram():
    print("🔄 Resetting VRAM...")
    torch.cuda.empty_cache()
    import gc
    gc.collect()
    print("✅ VRAM has been cleared.")

if hasattr(mane_model, 'latent_features') and mane_model.latent_features is not None:
    mane_model.eval()
    with torch.no_grad():
        _ = mane_model(X_train, 0)
        latent_reps = mane_model.latent_features.cpu().numpy()
    pca = PCA(n_components=2)
    latent_2d = pca.fit_transform(latent_reps)
    y_train_np = y_train.cpu().numpy()
    plt.figure(figsize=(8,6))
    plt.scatter(latent_2d[:,0], latent_2d[:,1], c=y_train_np, cmap='viridis', alpha=0.7)
    if hasattr(mane_model, 'centroids'):
        centroids_np = mane_model.centroids.detach().cpu().numpy()
        centroids_2d = pca.transform(centroids_np)
        plt.scatter(centroids_2d[:,0], centroids_2d[:,1], marker='x', s=200, c='red', label='Centroids')
    energy_usage = log_energy_usage()
    modal_consistency = compute_modal_consistency(X_train)
    meta_score = meta_cognition_metric(torch.softmax(mane_model(X_train, 0), dim=1))
    sample_saliency = compute_saliency_map(mane_model, X_val[:10], target_class=y_val[:10])
    param_count, inference_time = simulate_edge_deployment(mane_model)
    pred_dist = compute_prediction_distribution(mane_model, X_train, epoch=0)
    plt.figtext(0.15, 0.85, f"Energy: {energy_usage:.2f} MB\nModal Consistency: {modal_consistency:.2f}\nMeta-Cognition: {meta_score:.2f}\nSaliency: {sample_saliency:.2f}\nParams: {param_count}\nInference: {inference_time*1000:.2f} ms\nPred Dist: {pred_dist}", fontsize=10)
    plt.title(f"Latent Space Visualization for {dataset_name}")
    plt.xlabel("PCA Component 1")
    plt.ylabel("PCA Component 2")
    plt.legend()
    plt.show()
else:
    print("The selected architecture does not provide latent features for visualization.")


Using Iris dataset with simulated extra features.
Dataset: Iris | Multi‐Modal Enabled: False
Training data shape: torch.Size([120, 7]), Validation data shape: torch.Size([30, 7])
Unique classes: tensor([0, 1, 2])


Retraining Epochs:  20%|██        | 4/20 [00:00<00:00, 16.18it/s]

Epoch 1: Train Loss=3.3771, Val Loss=0.2207
Epoch 2: Train Loss=1.4303, Val Loss=0.1859
Epoch 3: Train Loss=1.5000, Val Loss=0.1416
Epoch 4: Train Loss=2.5788, Val Loss=0.1047


Retraining Epochs:  35%|███▌      | 7/20 [00:00<00:00, 19.33it/s]

Epoch 5: Train Loss=1.0207, Val Loss=0.0873
Epoch 6: Train Loss=0.9102, Val Loss=0.0753
Epoch 7: Train Loss=1.8647, Val Loss=0.1583
Epoch 8: Train Loss=1.3028, Val Loss=0.0846
Epoch 9: Train Loss=0.9511, Val Loss=0.1064


Retraining Epochs:  65%|██████▌   | 13/20 [00:00<00:00, 17.22it/s]

Epoch 10: Train Loss=2.9530, Val Loss=0.0557
Epoch 11: Train Loss=0.9651, Val Loss=0.1287
Epoch 12: Train Loss=1.7086, Val Loss=0.0769
Epoch 13: Train Loss=1.9709, Val Loss=0.0519


Retraining Epochs:  75%|███████▌  | 15/20 [00:00<00:00, 16.94it/s]

Epoch 14: Train Loss=0.7127, Val Loss=0.0446
Epoch 15: Train Loss=0.9196, Val Loss=0.0337
Epoch 16: Train Loss=1.3489, Val Loss=0.0278


Retraining Epochs: 100%|██████████| 20/20 [00:01<00:00, 16.48it/s]

Epoch 17: Train Loss=0.5920, Val Loss=0.0265
Epoch 18: Train Loss=0.3766, Val Loss=0.0216
Epoch 19: Train Loss=1.6262, Val Loss=0.0138
Epoch 20: Train Loss=0.4714, Val Loss=0.0132
Retraining run complete.
Retraining Run 1





AttributeError: 'MANEEstimator' object has no attribute 'state_dict'

In [None]:
%load_ext tensorboard
%tensorboard --logdir=runs

In [None]:
%load_ext tensorboard
%tensorboard --logdir=runs/iris/

In [None]:
%load_ext tensorboard
%tensorboard --logdir=runs/Wine/

In [None]:
%load_ext tensorboard
%tensorboard --logdir=runs/breast_cancer/

In [None]:
%load_ext tensorboard
%tensorboard --logdir=runs/mnist/

In [None]:
%load_ext tensorboard
%tensorboard --logdir=runs/fashion_mnist/

In [None]:

# %% [markdown]
"""
# Cell 11: VRAM Reset Utility
This cell defines a function to reset VRAM by clearing CUDA cache and running garbage collection.
Call reset_vram() after each candidate evaluation and retraining run.
"""

# %%
import torch
import gc

def reset_vram():
    """Clears VRAM by emptying the CUDA cache and running garbage collection."""
    print("🔄 Resetting VRAM...")
    torch.cuda.empty_cache()
    gc.collect()
    print("✅ VRAM has been cleared.")

# Example call after a training iteration (already called within candidate loop and retraining).
reset_vram()



🔄 Resetting VRAM...
✅ VRAM has been cleared.
