# Basic Usage

Aim: Use the Controller to recover a simple ground truth theory from noisy data.

In [50]:
import numpy as np
from autora.experimentalist.pipeline import make_pipeline
from autora.variable import VariableCollection, Variable
from sklearn.linear_model import LinearRegression
from autora_workflow import Cycle
from itertools import takewhile

In [51]:
def ground_truth(x):
    return x + 1

The space of allowed x values is the integers between 0 and 10 inclusive, and we record the allowed output values as well.

In [52]:
metadata_0 = VariableCollection(
   independent_variables=[Variable(name="x1", allowed_values=range(11))],
   dependent_variables=[Variable(name="y", value_range=(-20, 20))],
   )

The experimentalist is used to propose experiments.
Since the space of values is so restricted, we can just sample them all each time.

In [53]:
example_experimentalist = make_pipeline(
    [metadata_0.independent_variables[0].allowed_values])

When we run a synthetic experiment, we get a reproducible noisy result:

In [55]:
def get_example_synthetic_experiment_runner():
    rng = np.random.default_rng(seed=180)
    def runner(x):
        return ground_truth(x) + rng.normal(0, 0.1, x.shape)
    return runner
example_synthetic_experiment_runner = get_example_synthetic_experiment_runner()
example_synthetic_experiment_runner(np.array([1]))

array([2.04339546])

The theorist "tries" to work out the best theory. We use a trivial scikit-learn regressor.

In [56]:
example_theorist = LinearRegression()

    We initialize the Controller with the metadata describing the domain of the theory,
    the theorist, experimentalist and experiment runner,
    as well as a monitor which will let us know which cycle we're currently on.

In [37]:
cycle = Cycle(
    metadata=metadata_0,
    theorist=example_theorist,
    experimentalist=example_experimentalist,
    experiment_runner=example_synthetic_experiment_runner,
    monitor=lambda state: print(f"Generated {len(state.theories)} theories"),
)
cycle # doctest: +ELLIPSIS

<autora_workflow.cycle.Cycle at 0x146f1f8b0>

We can run the cycle by calling the run method:

In [57]:
cycle.run(num_cycles=3)  # doctest: +ELLIPSIS

Generated 112 theories
Generated 113 theories
Generated 114 theories


<autora_workflow.cycle.Cycle at 0x146f1f8b0>

We can now interrogate the results. The first set of conditions which went into the
experiment runner were:

In [58]:
cycle.data.conditions[0]

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

The observations include the conditions and the results:

In [59]:
cycle.data.observations[0]

array([[ 0.        ,  0.92675345],
       [ 1.        ,  1.89519928],
       [ 2.        ,  3.08746571],
       [ 3.        ,  3.93023943],
       [ 4.        ,  4.95429102],
       [ 5.        ,  6.04763988],
       [ 6.        ,  7.20770574],
       [ 7.        ,  7.85681519],
       [ 8.        ,  9.05735823],
       [ 9.        , 10.18713406],
       [10.        , 10.88517906]])

In the third cycle (index = 2) the first and last values are different again:

In [60]:
cycle.data.observations[2][[0,-1]]

array([[ 0.        ,  1.08559827],
       [10.        , 11.08179553]])

The best fit theory after the first cycle is:

In [61]:
cycle.data.theories[0]

In [62]:
def report_linear_fit(m: LinearRegression,  precision=4):
    s = f"y = {np.round(m.coef_[0].item(), precision)} x " \
        f"+ {np.round(m.intercept_.item(), 4)}"
    return s
report_linear_fit(cycle.data.theories[0])

'y = 1.0089 x + 0.9589'

The best fit theory after all the cycles, including all the data, is:

In [63]:
report_linear_fit(cycle.data.theories[-1])

'y = 1.0005 x + 0.9986'

This is close to the ground truth theory of x -> (x + 1)
We can also run the cycle with more control over the execution flow:

In [64]:
next(cycle) # doctest: +ELLIPSIS

Generated 115 theories


<autora_workflow.cycle.Cycle at 0x146f1f8b0>

next(cycle) # doctest: +ELLIPSIS

In [65]:
next(cycle) # doctest: +ELLIPSIS

Generated 116 theories


<autora_workflow.cycle.Cycle at 0x146f1f8b0>

We can continue to run the cycle as long as we like,
with a simple arbitrary stopping condition like the number of theories generated:

In [66]:
_ = list(takewhile(lambda c: len(c.data.theories) < 9, cycle))

Generated 117 theories


or the precision (here we keep iterating while the difference between the gradients
of the second-last and last cycle is larger than 1x10^-3).

In [67]:
_ = list(
        takewhile(
            lambda c: np.abs(c.data.theories[-1].coef_.item() -
                           c.data.theories[-2].coef_.item()) > 1e-3,
            cycle
        )
    )


Generated 118 theories


or continue to run as long as we like:

In [69]:
_ = cycle.run(num_cycles=100) # doctest: +ELLIPSIS

Generated 219 theories
Generated 220 theories
Generated 221 theories
Generated 222 theories
Generated 223 theories
Generated 224 theories
Generated 225 theories
Generated 226 theories
Generated 227 theories
Generated 228 theories
Generated 229 theories
Generated 230 theories
Generated 231 theories
Generated 232 theories
Generated 233 theories
Generated 234 theories
Generated 235 theories
Generated 236 theories
Generated 237 theories
Generated 238 theories
Generated 239 theories
Generated 240 theories
Generated 241 theories
Generated 242 theories
Generated 243 theories
Generated 244 theories
Generated 245 theories
Generated 246 theories
Generated 247 theories
Generated 248 theories
Generated 249 theories
Generated 250 theories
Generated 251 theories
Generated 252 theories
Generated 253 theories
Generated 254 theories
Generated 255 theories
Generated 256 theories
Generated 257 theories
Generated 258 theories
Generated 259 theories
Generated 260 theories
Generated 261 theories
Generated 2