# Efficient HPO with Evox

 In this chapter, we will discuss how to use Evox for hyperparameter optimization (HPO).
 
 HPO is an essential step in many machine learning tasks, yet it often goes underappreciated. This is mainly due to its heavy computational demands, which can sometimes require days of processing, as well as the difficulties involved in deployment.
 
 In Evox, we can easily deploy HPO using the `HPOProblemWrapper`, and achieve efficient computation by leveraging the `vmap` functionality and GPU.

## Transforming Workflow into Problem

```{image} /_static/HPO_structure.svg
:alt: HPO structure
:width: 700px
:align: center
```

The key to deploy HPO with Evox is to transform the `workflow` into a `problem` with the `HPOProblemWrapper`. After that, we can treat the `workflow` as a common `problem` which has no difference with other problems. The input of the 'HPO problem' is the hyperparameters and the output is the metric.

To enable `HPOProblemWrapper` to recognize the hyperparameters, we need to wrap the hyperparameters with `Parameter`. After this simple operation, the hyperparameters will be automatically recognized.
```python
class ExampleAlgorithm():
    def __init__(self,): 
        self.omega = Parameter([1.0, 2.0]) # wrap the hyperparameters with `Parameter`
        self.beta = Parameter(0.1)
        pass

    def Step(self, key) -> State:
        pass
```

The `HPOProblemWrapper` takes 4 arguments:
1. iterations: The number of iterations to be executed in the optimization process.
2. num_instances: The number of instances to be executed in parallel in the optimization process.
3. workflow: The workflow to be used in the optimization process. Must be wrapped by `core.jit_class`.
4. copy_init_state: Whether to copy the initial state of the workflow for each evaluation. Defaults to `True`. If your workflow contains operations that IN-PLACE modify the tensor(s) in initial state, this should be set to `True`. Otherwise, you can set it to `False` to save memory.

## Making Algorithms Parallelizable

In order to make the 'inner algorithm' parallelizable, we may need to make some modifications to the algorithm. We have to make sure that the algorithm satisfies the following conditions:
1. The algorithm should have no function with in-place operations on the attributes of the algotirhm itself.
```python
class ExampleAlgorithm():
    def __init__(self,): 
        self.pop = torch.rand(10,10) #attribute of the algotirhm itself
        pass

    def StepInPlace(self, key) -> State: # function with in-place operations
        self.pop.copy_(pop)
        pass

    def StepNotInPlace(self, state, args) -> State: # function without in-place operations
        self.pop = pop
        pass
```

2. The code logic does not rely on conditional control structures.
```python
class ExampleAlgorithm():
    def __init__(self,): 
        self.pop = rand(10,10) #attribute of the algotirhm itself
        pass

    def Plus(self, y):
        self.pop += y
        pass

    def Minus(self, y):
        self.pop -= y
        pass      

    def StepWithConditionalControl(self, y) -> State: # function with conditional control
        x = rand()
        if x>0.5:
            self.Plus(y)
        else:
            self.Minus(y)
        pass

    def StepWithoutConditionalControl(self, y) -> State: # function without conditional control
        x = rand()
        cond = x > 0.5
        _if_else_ = TracingCond(self.Plus, self.Minus)
        _if_else_.cond(cond,y)
        self.pop = pop
        pass
```

In Evox, we can easily make the algorithm parallelizable by the `@trace_impl` decorator. 

The parameter of this decorator is a non-parallelizable function, and the decorated function is a rewrite of the original function. After rewriting, it must support vmap. 

Under this mechanism, we can retain the original function for use outside HPO tasks while enabling efficient computation within HPO tasks. Moreover, this modification is highly convenient.


## Utilizing the HPOMonitor

We should use `HPOMonitor` in the HPO task to monitor the metric of the algorithm. The `HPOMonitor` only add one method `tell_fitness` comparing with the common monitor. This is designed to make the evaluation metrics for a set of hyperparameters more flexible, as metrics in HPO tasks are often multi-dimensional and complex. 

Users only need to create a subclass of HPOMonitor and override the tell_fitness method to define their own evaluation metrics.

We also provide a simple HPOFitnessMonitor, which supports calculating 'IGD' and 'HV' metrics for both single-objective and multi-objective problems and always uses their minimum value as the evaluation metric for the hyperparameters.

## A simple example

Here we would show you a simple example of how to use HPO with Evox. We will use the PSO algorithm to search for the best parameters of a simple algorithm to solve the sphere problem.

First, we need to import the necessary modules.

In [23]:
import torch

from evox.algorithms.pso_variants.pso import PSO
from evox.core import Algorithm, Mutable, Parameter, Problem, jit_class, trace_impl
from evox.problems.hpo_wrapper import HPOFitnessMonitor, HPOProblemWrapper
from evox.utils import TracingCond
from evox.workflows import EvalMonitor, StdWorkflow

Next, we define an simple problem, which is the sphere problem. Note that this has no difference from the `problem`.

In [24]:
@jit_class
class Sphere(Problem):
    def __init__(self):
        super().__init__()

    def evaluate(self, x: torch.Tensor):
        return (x * x).sum(-1)

Then we can define the algorithm. The oringinal `step` function is non-parallelizable. We rewrite it as `trace_step` to be parallelizable. We modify the in-place opeartions and conditional control.

In [25]:
@jit_class
class ExampleAlgorithm(Algorithm):
    def __init__(self, pop_size: int, lb: torch.Tensor, ub: torch.Tensor):
        super().__init__()
        assert lb.ndim == 1 and ub.ndim == 1, f"Lower and upper bounds shall have ndim of 1, got {lb.ndim} and {ub.ndim}"
        assert lb.shape == ub.shape, f"Lower and upper bounds shall have same shape, got {lb.ndim} and {ub.ndim}"
        self.pop_size = pop_size
        self.hp = Parameter([1.0, 2.0, 3.0, 4.0]) # the hyperparameters to be optimized
        self.lb = lb
        self.ub = ub
        self.dim = lb.shape[0]
        self.pop = Mutable(torch.empty(self.pop_size, lb.shape[0], dtype=lb.dtype, device=lb.device))
        self.fit = Mutable(torch.empty(self.pop_size, dtype=lb.dtype, device=lb.device))

    def strategy_1(self,pop): # one update strategy
        pop = pop * (self.hp[0]+self.hp[1])

    def strategy_2(self,pop): #  the other update strategy
        pop = pop * (self.hp[2]+self.hp[3])

    def step(self):
        pop = torch.rand(self.pop_size, self.dim, dtype=self.lb.dtype, device=self.lb.device) # simply random sampling
        pop = pop * (self.ub - self.lb)[None, :] + self.lb[None, :]
        control_number = torch.rand()
        if control_number < 0.5: # conditional control
            pop = self.strategy_1(pop)
        else:
            pop = self.strategy_2(pop)
        self.pop.copy_(pop) # in-place update
        self.fit.copy_(self.evaluate(pop))

    @trace_impl(step) #rewrite the step function to support vmap
    def trace_step(self):
        pop = torch.rand(self.pop_size, self.dim, dtype=self.lb.dtype, device=self.lb.device)
        pop = pop * (self.ub - self.lb)[None, :] + self.lb[None, :]
        pop = pop * self.hp[0]
        control_number = torch.rand()
        cond = control_number < 0.5
        _if_else_ = TracingCond(self.strategy_1, self.strategy_2)
        _if_else_.cond(cond, pop)
        self.pop = pop
        self.fit = self.evaluate(pop)

Next, we can use the `workflow` to wrap the problem, algorithm and monitor. Then we use the `HPOProblemWrapper` to transform the workflow to a HPO problem.

In [26]:
torch.set_default_device("cuda" if torch.cuda.is_available() else "cpu")
inner_algo = ExampleAlgorithm(10, -10 * torch.ones(8), 10 * torch.ones(8))
inner_prob = Sphere()
inner_monitor = HPOFitnessMonitor()
inner_monitor.setup()
inner_workflow = StdWorkflow()
inner_workflow.setup(inner_algo, inner_prob, monitor=inner_monitor)
# Transform the inner workflow to a HPO problem
hpo_prob = HPOProblemWrapper(iterations=9, num_instances=7, workflow=inner_workflow, copy_init_state=True)

We can test whether the `HPOProblemWrapper` recognizes the hyperparameters we define. Since we make no modification to the hyperparameters for the 7 instances, the hyperparameters should be the same for all instances.

In [27]:
params = hpo_prob.get_init_params()
print('init params:\n',params)

init params:
 {'self.algorithm.hp': Parameter containing:
tensor([[1., 2., 3., 4.],
        [1., 2., 3., 4.],
        [1., 2., 3., 4.],
        [1., 2., 3., 4.],
        [1., 2., 3., 4.],
        [1., 2., 3., 4.],
        [1., 2., 3., 4.]], device='cuda:0')}


We can also specify a set of hyperparameter values ourselves. Note that the number of hyperparameter sets must be consistent with the number of instances in the `HPOProblemWrapper`. Note that the custom hyperparameters must be passed in the form of a dictionary and wrapped using the Parameter.

In [28]:
params = hpo_prob.get_init_params()
# since we have 7 instances, we need to pass 7 sets of hyperparameters
params["self.algorithm.hp"] = torch.nn.Parameter(torch.rand(7, 4), requires_grad=False)
result = hpo_prob.evaluate(params)
print('params:\n',params,'\n')
print('result:\n',result)

params:
 {'self.algorithm.hp': Parameter containing:
tensor([[0.6151, 0.4238, 0.3006, 0.0424],
        [0.4175, 0.8058, 0.8549, 0.5446],
        [0.2170, 0.1184, 0.5396, 0.5979],
        [0.5893, 0.6488, 0.3352, 0.4294],
        [0.0289, 0.7860, 0.1178, 0.5868],
        [0.8950, 0.0965, 0.4080, 0.5824],
        [0.9049, 0.5265, 0.1892, 0.5064]], device='cuda:0')} 

result:
 tensor([32.5368, 13.1591,  2.6436, 24.7213,  0.0959, 38.7169, 77.4396],
       device='cuda:0')


Now, we use the PSO algorithm to optimize the hyperparameters of ExampleAlgorithm. Note that the population size of the PSO should match the number of instances; otherwise, unexpected errors may occur. Here, we need to transform the solution in the outer workflow, as the `HPOProblemWrapper` must accept a dictionary as input.

In [29]:
class solution_transform(torch.nn.Module):
    def forward(self, x: torch.Tensor):
        return {"self.algorithm.hp": x}


outer_algo = PSO(7, -3 * torch.ones(4), 3 * torch.ones(4))
monitor = EvalMonitor(full_sol_history=False)
outer_workflow = StdWorkflow()
outer_workflow.setup(outer_algo, hpo_prob, monitor=monitor, solution_transform=solution_transform())
outer_workflow.init_step()
for _ in range(20):
    outer_workflow.step()
monitor = outer_workflow.get_submodule("monitor")
print('params:\n', monitor.topk_solutions, '\n')
print('result:\n', monitor.topk_fitness)

params:
 tensor([[-3.1316e-04, -4.2392e-01, -2.0663e+00,  2.8299e+00]], device='cuda:0') 

result:
 tensor([9.0117e-06], device='cuda:0')


We find a really good hyperparameter setting for the problem within 1 sec.