# Function Naming Convention Options


## Introduction

AutoRA is a framework for model discovery, which can propose and run experiments and analyse the resulting data,
fully autonomously. This process runs cyclically and we call the complete process the "cycle" and the individual
steps "tasks".

Our original object-oriented approach for defining the cycle turned out to be too complicated for people to understand.
We've been building a simpler functional interface for it for defining the cycles.

But we have a problem – the naming convention for the functions is difficult to agree on. The AER group (which is
developing AutoRA and related tools) has asked us to look over the current options and give some feedback.

## The functional interface
A **state** is a description of all the data and metadata known about a particular phenomenon:

- the domain of the variables,
- experimental conditions to be investigated,
- the experimental data, the newest model, and
- any other data the cycle might need.

We define a state as follows:

In [None]:
from autora.state.bundled import StandardState
from autora.variable import VariableCollection, Variable

s_0 = StandardState(
    variables=VariableCollection(
        independent_variables=[Variable("x", value_range=(-10, 10))],
        dependent_variables=[Variable("y")]
    )
)

`s_0` doesn't have anything other than the metadata we gave it:

In [None]:
s_0

StandardState(variables=VariableCollection(independent_variables=[Variable(name='x', value_range=(-10, 10), allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], dependent_variables=[Variable(name='y', value_range=None, allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], covariates=[]), conditions=None, experiment_data=None, models=[])

The functional interface sees the tasks as functions $f$ on state $S$ which return a new state.
A single task looks like:
$$ f(S_{i}) \rightarrow S_{i+1} ,$$

and a pipeline of such operations looks like:
$$S_n = f_n^\prime(...f_2^\prime(f_1^\prime(S_0))) .$$

One task we define is the experimentalist, which proposes new experimental conditions.
One experimentalist is the `random_pool` which takes variables and returns a series of conditions.
We define it just like that:

In [None]:
from typing import Optional
import numpy as np
import pandas as pd

def random_pool(
    variables: VariableCollection,
    num_samples: int = 5,
    random_state: Optional[int] = None,
) -> pd.DataFrame:
    rng = np.random.default_rng(random_state)

    raw_conditions = {}
    for iv in variables.independent_variables:
        raw_conditions[iv.name] = rng.uniform(*iv.value_range, size=num_samples)

    return pd.DataFrame(raw_conditions)

And running it on the variables results in a series of conditions sampled uniformly between -10 and +10:

In [None]:
random_pool(s_0.variables)

We still need to do some work so that it can run directly on $S$.

$S$ is defined such that it can be added to: $$S_{i+1} = S_i + \Delta S_{i+1}$$

The way we package the random_pool function is to make its output into a `Delta`:

In [None]:
from autora.state.delta import Delta


def random_pool_delta(
    variables: VariableCollection,
    num_samples: int = 5,
    random_state: Optional[int] = None,
):
    """
    Create a sequence of conditions randomly sampled from independent variables.

    Args:
        variables: the description of all the variables in the AER experiment.
        num_samples: the number of conditions to produce
        random_state: the seed value for the random number generator
        replace: if True, allow repeated values

    Returns: a Result / Delta object with the conditions as a pd.DataFrame in the `conditions` field

    """
    conditions = random_pool(
        variables=variables,
        num_samples=num_samples,
        random_state=random_state,
    )
    return Delta(conditions=conditions)

which can be run on the same inputs but produces a differently packaged output:

In [None]:
random_pool_delta(s_0.variables)

{'conditions':           x
0 -6.168705
1  1.143822
2  4.432569
3  9.079492
4 -8.734145}

Finally, we define a wrapper which combines this with $S$, which uses a utility function offered by AutoRA.

In [None]:
from autora.state.delta import State, wrap_to_use_state


def random_pool_state(
    s: State,
    num_samples: int = 5,
    random_state: Optional[int] = None,
    **kwargs,
) -> State:

    return wrap_to_use_state(random_pool_delta)(
        s, num_samples=num_samples, random_state=random_state, **kwargs
    )

Now we can run the function directly on $S$, returning a new state with our conditions included:

In [None]:
random_pool_state(s_0)

StandardState(variables=VariableCollection(independent_variables=[Variable(name='x', value_range=(-10, 10), allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], dependent_variables=[Variable(name='y', value_range=None, allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], covariates=[]), conditions=          x
0  0.434308
1 -6.668831
2 -4.216564
3 -1.528779
4 -3.591671, experiment_data=None, models=[])

The question is: what naming convention should these functions have, given that usually the `_state` version will be
used and that every contribution will need to follow the same convention? There might be multiple poolers and
samplers offered by an AutoRA module.

## Option 1: simple function names with conventional suffixes (or prefixes)

In [None]:
from autora.experimentalist.random_ import random_pool_state
random_pool_state(s_0)

StandardState(variables=VariableCollection(independent_variables=[Variable(name='x', value_range=(-10, 10), allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], dependent_variables=[Variable(name='y', value_range=None, allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], covariates=[]), conditions=          x
0  6.183674
1  2.837212
2 -3.392042
3  1.720430
4 -9.221208, experiment_data=None, models=[])

... or ...

In [None]:
from autora.experimentalist.random_ import random_pool_s
random_pool_s(s_0)

StandardState(variables=VariableCollection(independent_variables=[Variable(name='x', value_range=(-10, 10), allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], dependent_variables=[Variable(name='y', value_range=None, allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], covariates=[]), conditions=          x
0 -2.414414
1  0.188652
2 -2.501508
3 -0.528629
4 -5.542678, experiment_data=None, models=[])

## Option 2: one state function per module

This option is inspired by the scikit-learn `Regressor().fit(X, y)` syntax, but note that `pooler` in this case is a
module rather than a traditional object, and it shouldn't have any internal state which affects the fitting.

In [None]:
import autora.experimentalist.random_.pool as pooler
pooler.on_state(s_0)

StandardState(variables=VariableCollection(independent_variables=[Variable(name='x', value_range=(-10, 10), allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], dependent_variables=[Variable(name='y', value_range=None, allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], covariates=[]), conditions=          x
0  0.445504
1  0.106361
2  5.592574
3  6.849602
4  7.662081, experiment_data=None, models=[])

## Option 3: `run` functions

In [None]:
pooler.run(s_0)

StandardState(variables=VariableCollection(independent_variables=[Variable(name='x', value_range=(-10, 10), allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], dependent_variables=[Variable(name='y', value_range=None, allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], covariates=[]), conditions=          x
0  2.373031
1  9.041770
2 -8.355166
3  0.447124
4  7.788639, experiment_data=None, models=[])

In [None]:
pooler.run_on_state(s_0)

StandardState(variables=VariableCollection(independent_variables=[Variable(name='x', value_range=(-10, 10), allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], dependent_variables=[Variable(name='y', value_range=None, allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], covariates=[]), conditions=          x
0 -6.426886
1  6.937435
2  6.747884
3  3.964029
4 -4.822142, experiment_data=None, models=[])

## Option 4...n: your suggestion?