# Function Naming Convention Options


In [None]:
!pip install "git+https://github.com/autoresearch/autora-core.git@feat/function-naming-options"

Collecting git+https://github.com/autoresearch/autora-core.git@feat/function-naming-options
  Cloning https://github.com/autoresearch/autora-core.git (to revision feat/function-naming-options) to /private/var/folders/n5/6b48sz2j3yldl4mnglvsr6mh0000gq/T/pip-req-build-sl7lbdwl
  Running command git clone --filter=blob:none --quiet https://github.com/autoresearch/autora-core.git /private/var/folders/n5/6b48sz2j3yldl4mnglvsr6mh0000gq/T/pip-req-build-sl7lbdwl
  Running command git checkout -b feat/function-naming-options --track origin/feat/function-naming-options
  Switched to a new branch 'feat/function-naming-options'
  branch 'feat/function-naming-options' set up to track 'origin/feat/function-naming-options'.
  Resolved https://github.com/autoresearch/autora-core.git to commit 6035eeac4982b69d96ec1180ca6ab5831b33966b
  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Installing backend dependencies ... [?25ldone


## Introduction

AutoRA is a framework for model discovery, which can propose and run experiments and analyse the resulting data,
fully autonomously. This process runs cyclically and we call the complete process the "cycle" and the individual
steps "tasks".

Our original object-oriented approach for defining the cycle turned out to be too complicated for people to understand.
We've been building a simpler functional interface for it for defining the cycles.

But we have a problem – the naming convention for the functions is difficult to agree on. The AER group (which is
developing AutoRA and related tools) has asked us to look over the current options and give some feedback.

## The functional interface
A **state** is a description of all the data and metadata known about a particular phenomenon:

- the domain of the variables,
- experimental conditions to be investigated,
- the experimental data, the newest model, and
- any other data the cycle might need.

We define a state as follows:

In [None]:
from autora.state.bundled import StandardState
from autora.variable import VariableCollection, Variable

s_0 = StandardState(
    variables=VariableCollection(
        independent_variables=[Variable("x", value_range=(-10, 10))],
        dependent_variables=[Variable("y")]
    )
)

`s_0` doesn't have anything other than the metadata we gave it:

In [None]:
s_0

StandardState(variables=VariableCollection(independent_variables=[Variable(name='x', value_range=(-10, 10), allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], dependent_variables=[Variable(name='y', value_range=None, allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], covariates=[]), conditions=None, experiment_data=None, models=[])

The functional interface sees the tasks as functions $f$ on state $S$ which return a new state.
A single task looks like:
$$ f(S_{i}) \rightarrow S_{i+1} ,$$

and a pipeline of such operations looks like:
$$S_n = f_n^\prime(...f_2^\prime(f_1^\prime(S_0))) .$$

One task we define is the experimentalist, which proposes new experimental conditions.
One experimentalist is the `random_pool` which takes variables and returns a series of conditions.
We define it just like that:

In [None]:
from typing import Optional
import numpy as np
import pandas as pd

def random_pool(
    variables: VariableCollection,
    num_samples: int = 5,
    random_state: Optional[int] = None,
) -> pd.DataFrame:
    rng = np.random.default_rng(random_state)

    raw_conditions = {}
    for iv in variables.independent_variables:
        raw_conditions[iv.name] = rng.uniform(*iv.value_range, size=num_samples)

    return pd.DataFrame(raw_conditions)

And running it on the variables results in a series of conditions sampled uniformly between -10 and +10:

In [None]:
random_pool(s_0.variables)

Unnamed: 0,x
0,-2.291109
1,-6.603135
2,-8.877414
3,3.10873
4,8.169106


We still need to do some work so that it can run directly on $S$.

$S$ is defined such that it can be added to: $$S_{i+1} = S_i + \Delta S_{i+1}$$

The way we package the random_pool function is to make its output into a `Delta`:

In [None]:
from autora.state.delta import Delta


def random_pool_delta(
    variables: VariableCollection,
    num_samples: int = 5,
    random_state: Optional[int] = None,
):
    """
    Create a sequence of conditions randomly sampled from independent variables.

    Args:
        variables: the description of all the variables in the AER experiment.
        num_samples: the number of conditions to produce
        random_state: the seed value for the random number generator
        replace: if True, allow repeated values

    Returns: a Result / Delta object with the conditions as a pd.DataFrame in the `conditions` field

    """
    conditions = random_pool(
        variables=variables,
        num_samples=num_samples,
        random_state=random_state,
    )
    return Delta(conditions=conditions)

which can be run on the same inputs but produces a differently packaged output:

In [None]:
random_pool_delta(s_0.variables)

{'conditions':           x
0 -1.235222
1 -6.908781
2 -2.617692
3 -4.960670
4  0.743513}

Finally, we define a wrapper which combines this with $S$, which uses a utility function offered by AutoRA.

In [None]:
from autora.state.delta import State, wrap_to_use_state


def random_pool_state(
    s: State,
    num_samples: int = 5,
    random_state: Optional[int] = None,
    **kwargs,
) -> State:

    return wrap_to_use_state(random_pool_delta)(
        s, num_samples=num_samples, random_state=random_state, **kwargs
    )

Now we can run the function directly on $S$, returning a new state with our conditions included:

In [None]:
random_pool_state(s_0)

StandardState(variables=VariableCollection(independent_variables=[Variable(name='x', value_range=(-10, 10), allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], dependent_variables=[Variable(name='y', value_range=None, allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], covariates=[]), conditions=          x
0  5.510190
1 -9.734381
2 -9.247260
3 -3.880819
4 -7.846659, experiment_data=None, models=[])

The question is: what naming convention should these functions have, given that usually the `_state` version will be
used and that every contribution will need to follow the same convention? There might be multiple poolers and
samplers offered by an AutoRA module.

## The problem and the options in the simplest case

### Option 1: simple function names with conventional suffixes (or prefixes)

In [None]:
from autora.experimentalist.random_ import random_pool_state
random_pool_state(s_0)

StandardState(variables=VariableCollection(independent_variables=[Variable(name='x', value_range=(-10, 10), allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], dependent_variables=[Variable(name='y', value_range=None, allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], covariates=[]), conditions=          x
0  1.802195
1 -6.681581
2 -5.298816
3 -8.727805
4  9.767906, experiment_data=None, models=[])

... or ...

In [None]:
from autora.experimentalist.random_ import random_pool_s
random_pool_s(s_0)

StandardState(variables=VariableCollection(independent_variables=[Variable(name='x', value_range=(-10, 10), allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], dependent_variables=[Variable(name='y', value_range=None, allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], covariates=[]), conditions=          x
0 -0.084047
1  6.874185
2 -6.176624
3 -5.670282
4 -6.865156, experiment_data=None, models=[])

### Option 2: one state function per module

This option is inspired by the scikit-learn `Regressor().fit(X, y)` syntax, but note that `pooler` in this case is a
module rather than a traditional object, and it shouldn't have any internal state which affects the fitting.

In [None]:
import autora.experimentalist.random_.pool as pool
pool.on_state(s_0)

StandardState(variables=VariableCollection(independent_variables=[Variable(name='x', value_range=(-10, 10), allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], dependent_variables=[Variable(name='y', value_range=None, allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], covariates=[]), conditions=          x
0  1.929206
1  5.592331
2  6.558775
3 -8.900612
4  6.128046, experiment_data=None, models=[])

### Option 3: `run` functions

In [None]:
pool.run(s_0)

StandardState(variables=VariableCollection(independent_variables=[Variable(name='x', value_range=(-10, 10), allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], dependent_variables=[Variable(name='y', value_range=None, allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], covariates=[]), conditions=          x
0 -0.114128
1  2.292411
2 -5.499759
3  7.032079
4 -8.296732, experiment_data=None, models=[])

In [None]:
pool.run_on_state(s_0)

StandardState(variables=VariableCollection(independent_variables=[Variable(name='x', value_range=(-10, 10), allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], dependent_variables=[Variable(name='y', value_range=None, allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], covariates=[]), conditions=          x
0 -5.583620
1 -1.866155
2 -5.859761
3  0.061423
4 -1.798987, experiment_data=None, models=[])

## More examples: using a grid and sample

We can also construct a processing pipeline using multiple functions. In the following example, we have a state which
 has a grid of allowable variable values:

In [None]:
s_0 = StandardState(
    variables=VariableCollection(independent_variables=[
        Variable(name="x", allowed_values=np.linspace(-10, 10, 101)),
        Variable(name="y", allowed_values=[3, 4]),
        Variable(name="z", allowed_values=np.linspace(20, 30, 11))]
    )
)

In this case, we generate the full list of possible conditions using the `grid` functions:

In [None]:
from autora.experimentalist.grid_ import grid_pool_state
grid_pool_state(s_0)


StandardState(variables=VariableCollection(independent_variables=[Variable(name='x', value_range=None, allowed_values=array([-10. ,  -9.8,  -9.6,  -9.4,  -9.2,  -9. ,  -8.8,  -8.6,  -8.4,
        -8.2,  -8. ,  -7.8,  -7.6,  -7.4,  -7.2,  -7. ,  -6.8,  -6.6,
        -6.4,  -6.2,  -6. ,  -5.8,  -5.6,  -5.4,  -5.2,  -5. ,  -4.8,
        -4.6,  -4.4,  -4.2,  -4. ,  -3.8,  -3.6,  -3.4,  -3.2,  -3. ,
        -2.8,  -2.6,  -2.4,  -2.2,  -2. ,  -1.8,  -1.6,  -1.4,  -1.2,
        -1. ,  -0.8,  -0.6,  -0.4,  -0.2,   0. ,   0.2,   0.4,   0.6,
         0.8,   1. ,   1.2,   1.4,   1.6,   1.8,   2. ,   2.2,   2.4,
         2.6,   2.8,   3. ,   3.2,   3.4,   3.6,   3.8,   4. ,   4.2,
         4.4,   4.6,   4.8,   5. ,   5.2,   5.4,   5.6,   5.8,   6. ,
         6.2,   6.4,   6.6,   6.8,   7. ,   7.2,   7.4,   7.6,   7.8,
         8. ,   8.2,   8.4,   8.6,   8.8,   9. ,   9.2,   9.4,   9.6,
         9.8,  10. ]), units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False

We have the same options as before – shorter suffixes:

In [None]:
from autora.experimentalist.grid_ import grid_pool_s
grid_pool_s(s_0)

StandardState(variables=VariableCollection(independent_variables=[Variable(name='x', value_range=None, allowed_values=array([-10. ,  -9.8,  -9.6,  -9.4,  -9.2,  -9. ,  -8.8,  -8.6,  -8.4,
        -8.2,  -8. ,  -7.8,  -7.6,  -7.4,  -7.2,  -7. ,  -6.8,  -6.6,
        -6.4,  -6.2,  -6. ,  -5.8,  -5.6,  -5.4,  -5.2,  -5. ,  -4.8,
        -4.6,  -4.4,  -4.2,  -4. ,  -3.8,  -3.6,  -3.4,  -3.2,  -3. ,
        -2.8,  -2.6,  -2.4,  -2.2,  -2. ,  -1.8,  -1.6,  -1.4,  -1.2,
        -1. ,  -0.8,  -0.6,  -0.4,  -0.2,   0. ,   0.2,   0.4,   0.6,
         0.8,   1. ,   1.2,   1.4,   1.6,   1.8,   2. ,   2.2,   2.4,
         2.6,   2.8,   3. ,   3.2,   3.4,   3.6,   3.8,   4. ,   4.2,
         4.4,   4.6,   4.8,   5. ,   5.2,   5.4,   5.6,   5.8,   6. ,
         6.2,   6.4,   6.6,   6.8,   7. ,   7.2,   7.4,   7.6,   7.8,
         8. ,   8.2,   8.4,   8.6,   8.8,   9. ,   9.2,   9.4,   9.6,
         9.8,  10. ]), units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False

In [None]:
from autora.experimentalist.grid_ import grid_pool_wf
grid_pool_wf(s_0)


StandardState(variables=VariableCollection(independent_variables=[Variable(name='x', value_range=None, allowed_values=array([-10. ,  -9.8,  -9.6,  -9.4,  -9.2,  -9. ,  -8.8,  -8.6,  -8.4,
        -8.2,  -8. ,  -7.8,  -7.6,  -7.4,  -7.2,  -7. ,  -6.8,  -6.6,
        -6.4,  -6.2,  -6. ,  -5.8,  -5.6,  -5.4,  -5.2,  -5. ,  -4.8,
        -4.6,  -4.4,  -4.2,  -4. ,  -3.8,  -3.6,  -3.4,  -3.2,  -3. ,
        -2.8,  -2.6,  -2.4,  -2.2,  -2. ,  -1.8,  -1.6,  -1.4,  -1.2,
        -1. ,  -0.8,  -0.6,  -0.4,  -0.2,   0. ,   0.2,   0.4,   0.6,
         0.8,   1. ,   1.2,   1.4,   1.6,   1.8,   2. ,   2.2,   2.4,
         2.6,   2.8,   3. ,   3.2,   3.4,   3.6,   3.8,   4. ,   4.2,
         4.4,   4.6,   4.8,   5. ,   5.2,   5.4,   5.6,   5.8,   6. ,
         6.2,   6.4,   6.6,   6.8,   7. ,   7.2,   7.4,   7.6,   7.8,
         8. ,   8.2,   8.4,   8.6,   8.8,   9. ,   9.2,   9.4,   9.6,
         9.8,  10. ]), units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False

In [None]:
import autora.experimentalist.grid_ as grid
grid.on_state(s_0)

StandardState(variables=VariableCollection(independent_variables=[Variable(name='x', value_range=None, allowed_values=array([-10. ,  -9.8,  -9.6,  -9.4,  -9.2,  -9. ,  -8.8,  -8.6,  -8.4,
        -8.2,  -8. ,  -7.8,  -7.6,  -7.4,  -7.2,  -7. ,  -6.8,  -6.6,
        -6.4,  -6.2,  -6. ,  -5.8,  -5.6,  -5.4,  -5.2,  -5. ,  -4.8,
        -4.6,  -4.4,  -4.2,  -4. ,  -3.8,  -3.6,  -3.4,  -3.2,  -3. ,
        -2.8,  -2.6,  -2.4,  -2.2,  -2. ,  -1.8,  -1.6,  -1.4,  -1.2,
        -1. ,  -0.8,  -0.6,  -0.4,  -0.2,   0. ,   0.2,   0.4,   0.6,
         0.8,   1. ,   1.2,   1.4,   1.6,   1.8,   2. ,   2.2,   2.4,
         2.6,   2.8,   3. ,   3.2,   3.4,   3.6,   3.8,   4. ,   4.2,
         4.4,   4.6,   4.8,   5. ,   5.2,   5.4,   5.6,   5.8,   6. ,
         6.2,   6.4,   6.6,   6.8,   7. ,   7.2,   7.4,   7.6,   7.8,
         8. ,   8.2,   8.4,   8.6,   8.8,   9. ,   9.2,   9.4,   9.6,
         9.8,  10. ]), units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False

In [None]:
grid.run(s_0)

StandardState(variables=VariableCollection(independent_variables=[Variable(name='x', value_range=None, allowed_values=array([-10. ,  -9.8,  -9.6,  -9.4,  -9.2,  -9. ,  -8.8,  -8.6,  -8.4,
        -8.2,  -8. ,  -7.8,  -7.6,  -7.4,  -7.2,  -7. ,  -6.8,  -6.6,
        -6.4,  -6.2,  -6. ,  -5.8,  -5.6,  -5.4,  -5.2,  -5. ,  -4.8,
        -4.6,  -4.4,  -4.2,  -4. ,  -3.8,  -3.6,  -3.4,  -3.2,  -3. ,
        -2.8,  -2.6,  -2.4,  -2.2,  -2. ,  -1.8,  -1.6,  -1.4,  -1.2,
        -1. ,  -0.8,  -0.6,  -0.4,  -0.2,   0. ,   0.2,   0.4,   0.6,
         0.8,   1. ,   1.2,   1.4,   1.6,   1.8,   2. ,   2.2,   2.4,
         2.6,   2.8,   3. ,   3.2,   3.4,   3.6,   3.8,   4. ,   4.2,
         4.4,   4.6,   4.8,   5. ,   5.2,   5.4,   5.6,   5.8,   6. ,
         6.2,   6.4,   6.6,   6.8,   7. ,   7.2,   7.4,   7.6,   7.8,
         8. ,   8.2,   8.4,   8.6,   8.8,   9. ,   9.2,   9.4,   9.6,
         9.8,  10. ]), units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False

However, we can also join this with some random sampling functions:

In [None]:
from autora.experimentalist.random_ import random_sample_state
random_sample_state(grid_pool_state(s_0), num_samples=5)


StandardState(variables=VariableCollection(independent_variables=[Variable(name='x', value_range=None, allowed_values=array([-10. ,  -9.8,  -9.6,  -9.4,  -9.2,  -9. ,  -8.8,  -8.6,  -8.4,
        -8.2,  -8. ,  -7.8,  -7.6,  -7.4,  -7.2,  -7. ,  -6.8,  -6.6,
        -6.4,  -6.2,  -6. ,  -5.8,  -5.6,  -5.4,  -5.2,  -5. ,  -4.8,
        -4.6,  -4.4,  -4.2,  -4. ,  -3.8,  -3.6,  -3.4,  -3.2,  -3. ,
        -2.8,  -2.6,  -2.4,  -2.2,  -2. ,  -1.8,  -1.6,  -1.4,  -1.2,
        -1. ,  -0.8,  -0.6,  -0.4,  -0.2,   0. ,   0.2,   0.4,   0.6,
         0.8,   1. ,   1.2,   1.4,   1.6,   1.8,   2. ,   2.2,   2.4,
         2.6,   2.8,   3. ,   3.2,   3.4,   3.6,   3.8,   4. ,   4.2,
         4.4,   4.6,   4.8,   5. ,   5.2,   5.4,   5.6,   5.8,   6. ,
         6.2,   6.4,   6.6,   6.8,   7. ,   7.2,   7.4,   7.6,   7.8,
         8. ,   8.2,   8.4,   8.6,   8.8,   9. ,   9.2,   9.4,   9.6,
         9.8,  10. ]), units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False