# Solving Supervised Learning Tasks with Neuroevolution in EvoX 

EvoX provides solutions for supervised learning tasks based on neuroevolution, with key modules including [`SupervisedLearningProblem`](#evox.problems.neuroevolution.SupervisedLearningProblem) and [`ParamsAndVector`](#evox.utils.parameters_and_vector.ParamsAndVector). Taking the MNIST classification task as an example, this section illustrates the neuroevolution process for supervised learning by adopting the modules of EvoX.

## Basic Setup

Basic component imports and device configuration serve as the essential starting steps for the neuroevolution process.

Here, to ensure the reproducibility of results, a random seed can be optionally set.

In [1]:
import torch
import torch.nn as nn

from evox.utils import ParamsAndVector
from evox.core import Algorithm, Mutable, Parameter, jit_class
from evox.problems.neuroevolution.supervised_learning import SupervisedLearningProblem
from evox.algorithms import PSO
from evox.workflows import EvalMonitor, StdWorkflow


# Set device
device = "cuda:0" if torch.cuda.is_available() else "cpu"

# Set random seed
seed = 0
torch.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
torch.backends.cudnn.benchmark = False
torch.backends.cudnn.deterministic = True

In this step, a sample convolutional neural network (CNN) model is directly defined upon the PyTorch framework and then loaded onto the device.

In [2]:
class SampleCNN(nn.Module):
    def __init__(self):
        super(SampleCNN, self).__init__()
        self.features = nn.Sequential(
            nn.Conv2d(1, 3, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),
            nn.Conv2d(3, 3, kernel_size=3),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),
            nn.Conv2d(3, 3, kernel_size=3),
            nn.ReLU(),
            nn.Conv2d(3, 3, kernel_size=3),
            nn.ReLU(),
        )
        self.classifier = nn.Sequential(nn.Flatten(), nn.Linear(12, 10))

    def forward(self, x):
        x = self.features(x)
        x = self.classifier(x)
        return x

model = SampleCNN().to(device)

total_params = sum(p.numel() for p in model.parameters())
print(f"Total number of model parameters: {total_params}")

Total number of model parameters: 412


Setting dataset implies the selection of the task. The data loader now needs to be initialized based on PyTorch's built-in support.
Here, the package `torchvision` must be installed in advance depending on your PyTorch version, if it is not already available.

In case the MNIST dataset is not already present in the `data_root` directory, the `download=True` flag is set to ensure that the dataset will be automatically downloaded. Therefore, the setup may take some time during the first run.

In [3]:
import os
import torchvision


data_root = "./data" # Choose a path to save dataset
os.makedirs(data_root, exist_ok=True)
train_dataset = torchvision.datasets.MNIST(
    root=data_root,
    train=True,
    download=True,
    transform=torchvision.transforms.ToTensor(),
)
test_dataset = torchvision.datasets.MNIST(
    root=data_root,
    train=False,
    download=True,
    transform=torchvision.transforms.ToTensor(),
)


BATCH_SIZE = 100
train_loader = torch.utils.data.DataLoader(
    train_dataset,
    batch_size=BATCH_SIZE,
    shuffle=True,
    collate_fn=None,
)
test_loader = torch.utils.data.DataLoader(
    test_dataset,
    batch_size=BATCH_SIZE,
    shuffle=False,
    collate_fn=None,
)

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Failed to download (trying next):
HTTP Error 403: Forbidden

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz to ./data/MNIST/raw/train-images-idx3-ubyte.gz


100%|██████████| 9.91M/9.91M [00:02<00:00, 3.37MB/s]


Extracting ./data/MNIST/raw/train-images-idx3-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Failed to download (trying next):
HTTP Error 403: Forbidden

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-labels-idx1-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-labels-idx1-ubyte.gz to ./data/MNIST/raw/train-labels-idx1-ubyte.gz


100%|██████████| 28.9k/28.9k [00:00<00:00, 127kB/s]


Extracting ./data/MNIST/raw/train-labels-idx1-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Failed to download (trying next):
HTTP Error 403: Forbidden

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz to ./data/MNIST/raw/t10k-images-idx3-ubyte.gz


100%|██████████| 1.65M/1.65M [00:06<00:00, 243kB/s] 


Extracting ./data/MNIST/raw/t10k-images-idx3-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Failed to download (trying next):
HTTP Error 403: Forbidden

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-labels-idx1-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-labels-idx1-ubyte.gz to ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz


100%|██████████| 4.54k/4.54k [00:00<00:00, 4.62MB/s]

Extracting ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz to ./data/MNIST/raw






To accelerate subsequent processes, all MNIST data are pre-loaded for faster execution. Below, three datasets are pre-loaded for different stages &ndash; gradient descent training, neuroevolution fine-tuning, and model testing.

It should be noted that this is an optional operation that trades space for time. Its adoption depends on your GPU capacity, and it will always take some time to prepare.

In [4]:
# Used for gradient descent training process
pre_gd_train_loader = tuple([
    (inputs.to(device), labels.to(device)) for inputs, labels in train_loader
])

# Used for neuroevolution fine-tuning process
pre_ne_train_loader = tuple([
    (
        inputs.to(device),
        labels.type(torch.float).unsqueeze(1).repeat(1, 10).to(device),
    )
    for inputs, labels in train_loader
])

# Used for model testing process
pre_test_loader = tuple([
    (inputs.to(device), labels.to(device)) for inputs, labels in test_loader
])

Here, a `model_test` function is pre-defined to simplify the evaluation of the model's prediction accuracy on the test dataset during subsequent stages.

In [5]:
def model_test(model: nn.Module, data_loader: torch.utils.data.DataLoader, device: torch.device) -> float:
    model.eval()
    with torch.no_grad():
        total = 0
        correct = 0
        for inputs, labels in data_loader:
            inputs: torch.Tensor = inputs.to(device=device, non_blocking=True)
            labels: torch.Tensor = labels.to(device=device, non_blocking=True)

            logits = model(inputs)
            _, predicted = torch.max(logits.data, dim=1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()
        acc = 100 * correct / total
    return acc

## Gradient Descent Training (Optional)

The gradient descent based model training is performed first. In this example, this training is adopted to initialize the model, preparing it for subsequent neuroevolution processes. 

The model training process in PyTorch is compatible with neuroevolution in EvoX, making it convenient to reuse the same model implementation for further steps.

In [6]:
def model_train(
    model: nn.Module,
    data_loader: torch.utils.data.DataLoader,
    criterion: nn.Module,
    optimizer: torch.optim.Optimizer,
    max_epoch: int,
    device: torch.device,
    print_frequent: int = -1,
) -> nn.Module:
    model.train()
    for epoch in range(max_epoch):
        running_loss = 0.0
        for step, (inputs, labels) in enumerate(data_loader, start=1):
            inputs: torch.Tensor = inputs.to(device=device, non_blocking=True)
            labels: torch.Tensor = labels.to(device=device, non_blocking=True)

            optimizer.zero_grad()
            logits = model(inputs)
            loss = criterion(logits, labels)
            loss.backward()
            optimizer.step()

            running_loss += loss.item()
            if print_frequent > 0 and step % print_frequent == 0:
                print(f"[Epoch {epoch:2d}, step {step:4d}] " f"running loss: {running_loss:.4f} ")
                running_loss = 0.0
    return model

In [7]:
model_train(
    model,
    data_loader=pre_gd_train_loader,
    criterion=nn.CrossEntropyLoss(),
    optimizer=torch.optim.Adam(model.parameters(), lr=1e-2),
    max_epoch=3,
    device=device,
    print_frequent=500,
)

gd_acc = model_test(model, pre_test_loader, device)
print(f"Accuracy after gradient descent training: {gd_acc:.4f} %.")

[Epoch  0, step  500] running loss: 395.5136 
[Epoch  1, step  500] running loss: 230.1449 
[Epoch  2, step  500] running loss: 208.1124 
Accuracy after gradient descent training: 89.4100 %.


## Neuroevolution Fine-Tuning

Based on the pre-trained model from the previous gradient descent process, neuroevolution is progressively applied to fine-tune the model.

First, the [`ParamsAndVector`](#evox.utils.parameters_and_vector.ParamsAndVector) component is used to flatten the weights of the pre-trained model into a vector, which serves as the initial center individual for the subsequent neuroevolution process.

In [8]:
adapter = ParamsAndVector(dummy_model=model)
model_params = dict(model.named_parameters())
pop_center = adapter.to_vector(model_params)
lower_bound = pop_center - 0.01
upper_bound = pop_center + 0.01

> In case of algorithms specifically designed for neuroevolution, which can directly accept a dictionary of batched parameters as input, the usage of [`ParamsAndVector`](#evox.utils.parameters_and_vector.ParamsAndVector) can be unnecessary.

Additionally, a sample criterion is defined. Here, both the loss and accuracy of the individual model are selected and weighted to serve as the fitness function in the neuroevolution process. This step is customizable to suit the optimization direction.

In [9]:
class AccuracyCriterion(nn.Module):
    def __init__(self, data_loader):
        super().__init__()
        data_loader = data_loader

    def forward(self, logits, labels):
        _, predicted = torch.max(logits, dim=1)
        correct = (predicted == labels[:, 0]).sum()
        fitness = -correct
        return fitness

acc_criterion = AccuracyCriterion(pre_ne_train_loader)
loss_criterion = nn.MSELoss()


class WeightedCriterion(nn.Module):
    def __init__(self, loss_weight, loss_criterion, acc_weight, acc_criterion):
        super().__init__()
        self.loss_weight = loss_weight
        self.loss_criterion = loss_criterion
        self.acc_weight = acc_weight
        self.acc_criterion = acc_criterion

    def forward(self, logits, labels):
        weighted_loss = self.loss_weight * loss_criterion(logits, labels)
        weighted_acc = self.acc_weight * acc_criterion(logits, labels)
        return weighted_loss + weighted_acc


weighted_criterion = WeightedCriterion(
    loss_weight=0.5,
    loss_criterion=loss_criterion,
    acc_weight=0.5,
    acc_criterion=acc_criterion,
)

At the same time, similar to the gradient descent training and model testing processes, the neuroevolution fine-tuning process is also encapsulated into a function for convenient use in subsequent stages.

In [10]:
import time


def neuroevolution_process(
    workflow: StdWorkflow, 
    adapter: ParamsAndVector, 
    model: nn.Module, 
    test_loader: torch.utils.data.DataLoader, 
    device: torch.device, 
    best_acc: float, 
    max_generation: int = 2,
) -> None:
    for index in range(max_generation):
        print(f"In generation {index}:")
        t = time.time()
        workflow.step()
        print(f"\tTime elapsed: {time.time() - t: .4f}(s).")

        monitor = workflow.get_submodule("monitor")
        print(f"\tTop fitness: {monitor.topk_fitness}")
        best_params = adapter.to_params(monitor.topk_solutions[0])
        model.load_state_dict(best_params)
        acc = model_test(model, test_loader, device)
        if acc > best_acc:
            best_acc = acc
        print(f"\tBest accuracy: {best_acc:.4f} %.")

### Population-Based Neuroevolution Test

In this example, the population-based algorithm for neuroevolution is tested first, using Particle Swarm Optimization ([PSO](#evox.algorithms.pso_variants.PSO)) as a representation. The configuration for neuroevolution is similar to that of other optimization tasks &ndash; we need to define the problem, algorithm, monitor, and workflow, along with their respective `setup()` functions to complete the initialization.

A key point to note here is that the population size (`POP_SIZE` in this case) needs to be initialized in **both the problem and the algorithm** to avoid potential errors.

In [11]:
POP_SIZE = 100
vmapped_problem = SupervisedLearningProblem(
    model=model,
    data_loader=pre_ne_train_loader,
    criterion=weighted_criterion,
    pop_size=POP_SIZE,
    device=device,
)
vmapped_problem.setup()

pop_algorithm = PSO(
    pop_size=POP_SIZE,
    lb=lower_bound,
    ub=upper_bound,
    device=device,
)
pop_algorithm.setup()

pop_monitor = EvalMonitor(
    topk=3,
    device=device,
)
pop_monitor.setup()

pop_workflow = StdWorkflow()
pop_workflow.setup(
    algorithm=pop_algorithm,
    problem=vmapped_problem,
    solution_transform=adapter,
    monitor=pop_monitor,
    device=device,
)

In [12]:
print(
    "Upon gradient descent, "
    "the population-based neuroevolution process start. "
)
neuroevolution_process(
    workflow=pop_workflow,
    adapter=adapter,
    model=model,
    test_loader=pre_test_loader,
    device=device,
    best_acc=gd_acc,
    max_generation=3,
)

Upon gradient descent, the population-based neuroevolution process start. 
In generation 0:
	Time elapsed:  0.9102(s).
	Top fitness: tensor([4.1156, 5.0516, 5.2372], device='cuda:0')
	Best accuracy: 89.8800 %.
In generation 1:
	Time elapsed:  1.3163(s).
	Top fitness: tensor([3.6362, 3.6640, 3.6765], device='cuda:0')
	Best accuracy: 89.8800 %.
In generation 2:
	Time elapsed:  1.0904(s).
	Top fitness: tensor([3.6362, 3.6640, 3.6765], device='cuda:0')
	Best accuracy: 89.8800 %.


### Single-Individual Neuroveolution Test

Next, the single-individual algorithm based neuroevolution is tested. Similar to the population-based case, we need to define the problem, algorithm, monitor, and workflow, and call their respective `setup()` functions during initialization. In this case, a random search strategy is selected as the algorithm.

A key point to note here is that [`SupervisedLearningProblem`](#evox.problems.neuroevolution.SupervisedLearningProblem) should be set with `pop_size=None`, and [`EvalMonitor`](#evox.workflows.EvalMonitor) should have `topk=1`, as only a single individual is being searched. A careful hyper-parameter setup helps avoid unnecessary issues.

In [13]:
single_problem = SupervisedLearningProblem(
    model=model,
    data_loader=pre_ne_train_loader,
    criterion=weighted_criterion,
    pop_size=None,
    device=device,
)
single_problem.setup()

@jit_class
class RandAlgorithm(Algorithm):
    def __init__(self, lb, ub):
        super().__init__()
        assert lb.ndim == 1 and ub.ndim == 1, (
            f"Lower and upper bounds shall have ndim of 1, " f"got {lb.ndim} and {ub.ndim}. "
        )
        assert lb.shape == ub.shape, f"Lower and upper bounds shall have same shape, " f"got {lb.ndim} and {ub.ndim}. "
        self.hp = Parameter([1.0, 2.0])
        self.lb = lb
        self.ub = ub
        self.dim = lb.shape[0]
        self.pop = Mutable(torch.empty(1, lb.shape[0], dtype=lb.dtype, device=lb.device))
        self.fit = Mutable(torch.empty(1, dtype=lb.dtype, device=lb.device))

    def step(self):
        pop = torch.rand(
            self.dim,
            dtype=self.lb.dtype,
            device=self.lb.device,
        )
        pop = pop * (self.ub - self.lb)[None, :] + self.lb[None, :]
        pop = pop * self.hp[0]
        self.pop.copy_(pop)
        self.fit.copy_(self.evaluate(pop))
single_algorithm = RandAlgorithm(lb=lower_bound, ub=upper_bound)

single_monitor = EvalMonitor(
    topk=1,
    device=device,
)
single_monitor.setup()

single_workflow = StdWorkflow()
single_workflow.setup(
    algorithm=single_algorithm,
    problem=single_problem,
    solution_transform=adapter,
    monitor=single_monitor,
    device=device,
)

In [14]:
print(
    "Upon gradient descent, "
    "the single-individual neuroevolution process start. "
)
neuroevolution_process(
    workflow=single_workflow,
    adapter=adapter,
    model=model,
    test_loader=pre_test_loader,
    device=device,
    best_acc=gd_acc,
    max_generation=12,
)

Upon gradient descent, the single-individual neuroevolution process start. 
In generation 0:
	Time elapsed:  0.5737(s).
	Top fitness: tensor([5.8806], device='cuda:0')
	Best accuracy: 89.4600 %.
In generation 1:
	Time elapsed:  0.5409(s).
	Top fitness: tensor([5.8806], device='cuda:0')
	Best accuracy: 89.4600 %.
In generation 2:
	Time elapsed:  0.4984(s).
	Top fitness: tensor([5.8806], device='cuda:0')
	Best accuracy: 89.4600 %.
In generation 3:
	Time elapsed:  0.3844(s).
	Top fitness: tensor([5.8806], device='cuda:0')
	Best accuracy: 89.4600 %.
In generation 4:
	Time elapsed:  0.3857(s).
	Top fitness: tensor([5.8806], device='cuda:0')
	Best accuracy: 89.4600 %.
In generation 5:
	Time elapsed:  0.3844(s).
	Top fitness: tensor([5.8806], device='cuda:0')
	Best accuracy: 89.4600 %.
In generation 6:
	Time elapsed:  0.3862(s).
	Top fitness: tensor([5.8806], device='cuda:0')
	Best accuracy: 89.4600 %.
In generation 7:
	Time elapsed:  0.3851(s).
	Top fitness: tensor([5.8806], device='cuda:0')