# Evaluation and SecureEvaluator

> This tutorial demonstrates how to evaluate a function using user-specified evaluator. The evaluation process is protected in a SecureEvaluator to prevent "very bad code" (e.g., with an endless loop, raise unexpected exceptions, consume too much memory, remain an unkilled subprocess, ...)

## Evaluation class
The Evaluator class (an abstract class) is an user interface. The user should define a child class of `Evaluator` (Extend the Evaluator class). 

### Initialization of the Evaluation class.
By passing the respective argument to the Evaluator, the user can specify if to use numba acceleration, use protected division, timeout second for code execution. Details about all arguments can be found in base_package/evaluate section of this doc.

### Implementation of the evaluate_program function
The user should override the `evaluate_program` function in the Evaluator class (where the `evaluate_program` function remains unimplemented). The evaluate_program function evaluate the algorithm and gives a score of it. If the user think the algorithm is infeasible/invalid/illegal, the user should return `None`. Otherwise, a int/float value or a value that is comparable (which may implements `>` operator between the them) is desired.

The first argument of the function is a `program_str`, which is a `str` type of algorithm to be evaluated. If you set the `use_numba_accelerate` or similar settings to `True` in the initialization, you will obtain a `str` typed function that has been modified. This `str` is provided to let you:

- Compile and execute the code with your own requirement.
- Taking the length or other features of the code in consideration.
- Other usage such as calculate the "novelty" of the code, or retrieve if the code has been evaluated before.

The second argument of the function is a `callable_func`, which is a executable object. You can simply call (invoke) it by passing arguments to `callable_func`. Such as `callable_function(arg0, arg1)`.

## SecureEvaluator class
This class is going to perform secure evaluation based on the user-specified `Evaluator` instance. This tutorial will show few examples about the features of this class.

## Tutorials
Below are examples on how to use these classes.

In [1]:
from __future__ import annotations

from typing import Any
from llm4ad.base import Evaluation, SecureEvaluator

The user should implement 'llm4ad.base.Evaluation' class and override the 'evaluate_program' function.

In [2]:
class MyEvaluator(Evaluation):
    def __init__(self):
        super().__init__(
            use_numba_accelerate=True,  # try to set to 'False' and execute 
            use_protected_div=True,  # avoid divided by 0
            timeout_seconds=5,
            template_program=''
        )
    
    # the user should override this function.
    def evaluate_program(self, program_str: str, callable_func: callable, **kwargs) -> Any | None:
        # we consider a "dummy evaluation" for the function:
        # we call (invoke) the function and get its return value as the score of this function
        score = callable_func()
        return score

We create an evaluator instance and encapsulate the instance to a SecureEvaluator, so that we can perform a secure evaluation. We also set the evaluator to debug mode to visualize the function to be evalauted.

In [3]:
evaluator = SecureEvaluator(evaluator=MyEvaluator(), debug_mode=True)

Here we prepare a simple demo of evaluated algorithm (in str).

In [4]:
program = """
import random

def f():
    return random.random() / random.random()
"""

Invoke `evaluate_program` function to evaluate the program. Please note that since the user set the argument `use_numba_accelerate=True` in the `MyEvaluator`, the evaluated program should be wrapped with a `@numba.jit()` wrapper.

In [5]:
# Note that following code should be put in if __name__ ==  '__main__'
if __name__ == '__main__':
    score = evaluator.evaluate_program(program)
    print(score)

DEBUG: evaluated program:
import numba
import random

@numba.jit(nopython=True)
def f():
    return _protected_div(random.random(), random.random())

@numba.jit(nopython=True)
def _protected_div(x, y, delta=1e-05):
    return x / (y + delta)

0.755131510901752


Assuming that we have obtained a program within a `while True` loop, let's see if the secure evaluator can terminate the evaluation after the `timeout_seconds` specified by the user in `MyEvaluator` class.

In [6]:
program = """
import random

def f():
    while True:
        pass
"""

Evaluate the program. We can observe from the debug information that the evaluation of the program exceeds 5 seconds, thus is terminated.

In [7]:
# Note that following code should be put in if __name__ ==  '__main__'
if __name__ == '__main__':
    score = evaluator.evaluate_program(program)
    print(score)

DEBUG: evaluated program:
import numba
import random

@numba.jit(nopython=True)
def f():
    while True:
        pass

@numba.jit(nopython=True)
def _protected_div(x, y, delta=1e-05):
    return x / (y + delta)

DEBUG: the evaluation time exceeds 5s.
None
