# Introduction
## Basic Tutorial III: Workflow Logic

**[AutoRA](https://pypi.org/project/autora/)** (**Au**tomated **R**esearch **A**ssistant) is an open-source framework designed to automate various stages of empirical research, including model discovery, experimental design, and data collection.

This notebook is the third of four notebooks within the basic tutorials of ``autora``. We suggest that you go through these notebooks in order as each builds upon the last. However, each notebook is self-contained and so there is no need to *run* the content of the last notebook for your current notebook. We will here provide a link to each notebook, but we will also provide a link at the end of each notebook to navigate you to the next notebook.

[AutoRA Basic Tutorial I: Components](www.addlink.com) <br>
[AutoRA Basic Tutorial II: Loop Constructs](www.addlink.com) <br>
[AutoRA Basic Tutorial III: Workflow Logic](www.addlink.com) <br>
[AutoRA Basic Tutorial IV: Customization](www.addlink.com) <br>

These notebooks provide a comprehensive introduction to the capabilities of ``autora``. **It demonstrates the fundamental components of ``autora``, and how they can be combined to facilitate automated (closed-loop) empirical research through synthetic experiments.**

**How to use this notebook** *You can progress through the notebook section by section or directly navigate to specific sections. If you choose the latter, it is recommended to execute all cells in the notebook initially, allowing you to easily rerun the cells in each section later without issues.*

## Tutorial Setup
This tutorial is self-contained so that you do not need to run the previous notebook to begin. However, the four notebooks are continuous so that what we define in a previous notebook should still exist within this notebook. As such, we will here re-run relevant code from past tutorials. We will not again walk you through these, but if you need a reminder what they are then go see the descriptions in previous notebooks.

In [28]:
#### Installation ####
!pip install -q "autora[experimentalist-falsification]"
!pip install -q "autora[experimentalist-sampler-novelty]"
!pip install -q "autora[theorist-bms]"

#### Import modules ####
import numpy as np
import torch
from autora.variable import DV, IV, ValueType, VariableCollection
from autora.experimentalist.pooler.grid import grid_pool
from autora.experimentalist.sampler.falsification import falsification_sample
from autora.experimentalist.sampler.novelty import novelty_sample
from autora.theorist.bms import BMSRegressor

#### Set seeds ####
np.random.seed(42)
torch.manual_seed(42)

#### Define ground truth and experiment runner ####
ground_truth = lambda x: np.sin(x)
run_experiment = lambda x: ground_truth(x) + np.random.normal(0, 0.1, size=x.shape)

#### Define condition pool ####
condition_pool = np.linspace(0, 2 * np.pi, 10)
condition_pool = condition_pool.reshape((len(condition_pool), 1))

#### Define metadata ####
iv = IV(name="x", value_range=(0, 2 * np.pi), allowed_values=np.linspace(0, 2 * np.pi, 10))
dv = DV(name="y", type=ValueType.REAL)
metadata = VariableCollection(independent_variables=[iv],dependent_variables=[dv])

#### Define theorists ####
theorist_bms = BMSRegressor(epochs=100)

# Workflow Logic

Workflows in ``autora`` implement the *autonomous empirical research paradigm*. This paradigm centers around the dynamic interplay between automated theorists and automated experimentalists. As outlined above, theorists rely–among other things–on existing data to construct computational models by linking experimental conditions to dependent measures. Experimentalists design follow-up experiments to refine and validate models generated by the theorist. Together, these agents enable a closed-loop scientific discovery process.

The following sections introduce ways of specifying workflows directly in ``autora``. For more information on workflows, please refer to the [corresponding documentation](https://autoresearch.github.io/autora/user-guide/workflow/).

## Basic Workflows

This section provides an introduction to handling workflows with the controller object. Here, we focus on workflows implementing the **default execution order**: (1) generate experiment conditions using the ``eperimentalist``, (2) collect observations using the ``experiment_runner``, and (3), generate a model that links experiment conditions to observations using the ``theorist``.

We begin with implementing the following workflow:
1. Generate seed experimental conditions
2. Iterate 5 times through the following steps
   - Collect observations using ``run_experiment``
   - Identify a model relating conditions to observations using ``theorist_bms``
   - Identify 3 new experimental conditions using ``falsification_sample``

### Declaration

We begin with defining a simple workflow. Workflows can be encapsulated in a ``Controller`` object. For instance, the following code block sets up a closed-loop cycle between (1) a grid pooler for sampling experimental conditions, (2) an experiment runner for obtaining respective observations, and (3) a BMS theorist for discoverying an equation relating experimental conditions to observations.

As with pipelines, we can pass the ``Controller`` object static parameters for each component. In this case, we provide the grid experimentalist with information about the independent variables to sample.

**Note**: *We haven't included the ``falsification_sample`` experimentalist into our workflow yet because it requires us to specify state-dependent input arguments (e.g., the model generated by the theorist), which we will cover at the end of this section.*

In [29]:
from autora.workflow import Controller

controller = Controller(
    variables=metadata,
    experimentalist=grid_pool,
    experiment_runner=run_experiment,
    theorist=theorist_bms,
    params={
        "experimentalist":
                {"ivs": metadata.independent_variables}
    }
)

In the declaration of the ``params`` parameter, we first specify the type of the component we seek to parameterize as a dictionary key, e.g., ``"experimentalist"``. Then we nest within it, another dictionary with the input arguments to the respective component as keys (e.g., ``"ivs"`` is an input argument to the ``grid_pool`` experimentalist) along with their values (e.g.,  ``metadata.independent_variables``).

### Monitoring

Before we execute the controller, lets also add a **monitor function** which is executed with every autonomous empirical research step. The following code block prints the last generated result of the workflow defined by the controller. All workflow results are stored in the ``state.history`` object. We can access the kind of the latest result using ``state.history[-1].kind``.

In [30]:
# define monitor function
def monitor(state):
    print(f"MONITOR: Generated new {state.history[-1].kind}")

# add monitor function to controller
controller.monitor = monitor

### Execution

The controller is defined as an iterator. We can execute a single step in the workflow by passing the ``controller`` object to the ``next()`` method. The following code block executes three steps of the default research cycle.

In [31]:
next(controller)
next(controller)
next(controller)

MONITOR: Generated new CONDITION
MONITOR: Generated new OBSERVATION


100%|██████████| 100/100 [00:21<00:00,  4.66it/s]


MONITOR: Generated new MODEL


<autora.workflow.controller.Controller at 0x7f1dc043bb20>

As indicated by the monitor, the **default execution order** is as follows: (1) generate experiment conditions, (2) collect observations, and (3), generate a model. After executing step (3), the controller would then continue with step (1):

In [32]:
next(controller)

MONITOR: Generated new CONDITION


<autora.workflow.controller.Controller at 0x7f1dc043bb20>

Since ``controller`` is an iterator, we can use [itertools](https://docs.python.org/3/library/itertools.html) for efficient looping. The following example uses ``takewhile`` to define a loop that stops as soon as we obtained three models from the theorist.

We begin with defining a lambda function which returns true whenever the controller has less then 5 models. As explained in the next subsection, we can obtain a list of generated models by accessing the controller's state via ``controller.state.models``.

In [33]:
continue_criterion = lambda controller: len(controller.state.models) < 5

Now we can run a for-loop using the ``controller`` as an iterator, and ``takewhile`` as iterator logic that continues to execute steps of the controller as long as  ``continue_criterion`` returns ``True``. In this way, we can execute 5 research cycles.

In [34]:
from itertools import takewhile

for step in takewhile(continue_criterion, controller):
    print(f"Number of models: {len(step.state.models)}")

MONITOR: Generated new OBSERVATION
Number of models: 1


100%|██████████| 100/100 [00:20<00:00,  5.00it/s]


MONITOR: Generated new MODEL
Number of models: 2
MONITOR: Generated new CONDITION
Number of models: 2
MONITOR: Generated new OBSERVATION
Number of models: 2


100%|██████████| 100/100 [00:11<00:00,  9.04it/s]


MONITOR: Generated new MODEL
Number of models: 3
MONITOR: Generated new CONDITION
Number of models: 3
MONITOR: Generated new OBSERVATION
Number of models: 3


100%|██████████| 100/100 [00:11<00:00,  8.92it/s]


MONITOR: Generated new MODEL
Number of models: 4
MONITOR: Generated new CONDITION
Number of models: 4
MONITOR: Generated new OBSERVATION
Number of models: 4


100%|██████████| 100/100 [00:11<00:00,  8.92it/s]

MONITOR: Generated new MODEL





### Result Inspection

After each executed step, we can observe the result generated by the ``controller``. All results are stored in in ``controller.state.history``. Each result is composed of a value specifying its ``kind`` (``CONDITION``, ``OBSERVATION``, or ``MODEL``) and the respective ``data``.

We can obtain the observations collected in the last step of the workflow as follows:

In [35]:
result = controller.state.history[-1]

print(result.kind)
print(result.data)

ResultKind.MODEL
BMSRegressor(epochs=100)


We can also specify the kind of result we are looking for directly. For instance, we can obtain all models generated by the theorist using ``controller.state.models``. The following code block prints the last model discovered by the BMS theorist (note that ``repr()`` is a function specific to the BMS theorist which returns its model as a string).

In [36]:
print(controller.state.models[-1].repr())

sin(X0)


Alternatively, we can access probed experimental conditions via ``controller.state.conditions`` and observations via ``controller.state.observations``, respectively. The following code block requests the latest experimental conditions identified by the experimentalist and the corresponding observations collected by the experiment runner

In [37]:
print(f"Conditions:\n{controller.state.conditions[-1]}")
print(f"Observations:\n{controller.state.observations[-1]}")

Conditions:
[[0.        ]
 [0.6981317 ]
 [1.3962634 ]
 [2.0943951 ]
 [2.7925268 ]
 [3.4906585 ]
 [4.1887902 ]
 [4.88692191]
 [5.58505361]
 [6.28318531]]
Observations:
[[ 0.          0.07384666]
 [ 0.6981317   0.65992444]
 [ 1.3962634   0.97324292]
 [ 2.0943951   0.83591503]
 [ 2.7925268   0.19416794]
 [ 3.4906585  -0.41400456]
 [ 4.1887902  -0.91208928]
 [ 4.88692191 -0.87909553]
 [ 5.58505361 -0.60842578]
 [ 6.28318531 -0.17630402]]


### Seeding

The default execution order always begins with an experimentalist. This is problematic if we want to use an experimentalist that depends on prior steps (e.g., the falsification experimentalist requires a model generated by the theorist). We can circumvent this problem by seeding the controller with experiment conditons.

The following code block seeds the controller with 3 experiment conditions. We first generate the ``seed_conditions``, and then pass them, encapsulated in a list, to the ``seed`` function of the ``controller`` object.

In [38]:
# generate initial pool of 3 experimental conditions
seed_conditions = np.linspace(0,2*np.pi,3)

# define controller
controller = Controller(
    monitor=monitor,
    variables=metadata,
    experimentalist=grid_pool,
    experiment_runner=run_experiment,
    theorist=theorist_bms,
    params={
        "experimentalist":
                {"ivs": metadata.independent_variables}
    }
)

# seed controller
controller.seed(conditions=[seed_conditions])

next(controller)

MONITOR: Generated new OBSERVATION


<autora.workflow.controller.Controller at 0x7f1dbe8150c0>

Note that, since we seeded the controller with initial experimental conditions, the next step is to execute the ``experiment_runner``. This is why the first step reported by the monitor involves the generation of observations (based on the seed experimental conditions).



### Accessing State-Dependent Properties

Some automated empirical research components require input arguments that depend on the result of the last step in the workflow. For instance, the ``falsification_sample`` experimentalist depends on the previously collected experimental conditions, observations, and the fitted model. For such cases, it is possible to use "state-dependent properties" in the ``params`` dictionary. These are the following strings, which will be replaced during execution by their respective current values:

- ``"%observations.ivs[-1]%"``: the last observed independent variables <br>
- ``"%observations.dvs[-1]%"``: the last observed dependent variables <br>
- ``"%observations.ivs%"``: all the observed independent variables (observations), concatenated into a single array <br>
- ``"%observations.dvs%"``: all the observed dependent variables (experimental conditions), concatenated into a single array <br>
- ``"%models[-1]%"``: the last fitted theorist <br>
- ``"%models%"``: all the fitted theorists <br>

In the following example, we use the ``"%observations.ivs%"``, ``"%observations.dvs%"``, and ``"%models%"``  properties for the ``falsification_sample`` experimentalist which seeks to identify experimental conditions that are predicted to maximize the loss of the fitted model.

The code block below implements the following workflow:
1. Generate 3 seed experimental conditions
2. Iterate 5 times through the following steps
   - Collect observations using ``run_experiment``
   - Identify a model relating conditions to observations using ``theorist_bms``
   - Identify 3 new experimental conditions using ``falsification_sample``


In [39]:
# generate initial pool of 3 experimental conditions
seed_conditions = np.linspace(0,2*np.pi,3)

# define controller
controller = Controller(
    monitor=monitor,
    variables=metadata,
    experimentalist=falsification_sample,
    experiment_runner=run_experiment,
    theorist=theorist_bms,
    params={
        "experimentalist":
                {"condition_pool": condition_pool,
                 "model": "%models[-1]%", # access last model generated by theorist
                 "reference_conditions": "%observations.ivs%", # access all conditions probed so far
                 "reference_observations": "%observations.dvs%", # access all observations collected so far
                 "metadata": metadata,
                 "num_samples": 3}
    }
)

# seed controller
controller.seed(conditions=[seed_conditions])

Using ``takewhile``, we can now specify a workflow logic that executes the automated research process 5 times. Accordingly, we stop execution of the ``controller`` as soon as it accumulated 5 models.

In [40]:
from itertools import takewhile

continue_criterion = lambda controller: len(controller.state.models) < 6

for step in takewhile(continue_criterion, controller):
    print(f"Number of models: {len(step.state.models)}")

MONITOR: Generated new OBSERVATION
Number of models: 0


100%|██████████| 100/100 [00:10<00:00,  9.51it/s]


MONITOR: Generated new MODEL
Number of models: 1
MONITOR: Generated new CONDITION
Number of models: 1
MONITOR: Generated new OBSERVATION
Number of models: 1


100%|██████████| 100/100 [00:11<00:00,  8.80it/s]


MONITOR: Generated new MODEL
Number of models: 2
MONITOR: Generated new CONDITION
Number of models: 2
MONITOR: Generated new OBSERVATION
Number of models: 2


100%|██████████| 100/100 [00:11<00:00,  8.70it/s]


MONITOR: Generated new MODEL
Number of models: 3
MONITOR: Generated new CONDITION
Number of models: 3
MONITOR: Generated new OBSERVATION
Number of models: 3


100%|██████████| 100/100 [00:11<00:00,  8.98it/s]


MONITOR: Generated new MODEL
Number of models: 4
MONITOR: Generated new CONDITION
Number of models: 4
MONITOR: Generated new OBSERVATION
Number of models: 4


100%|██████████| 100/100 [00:10<00:00,  9.47it/s]


MONITOR: Generated new MODEL
Number of models: 5
MONITOR: Generated new CONDITION
Number of models: 5
MONITOR: Generated new OBSERVATION
Number of models: 5


100%|██████████| 100/100 [00:09<00:00, 10.86it/s]

MONITOR: Generated new MODEL





## Advanced Workflows

In some cases, we may want to condition the sequence of steps taken in the empirical research process on the current state of the process. For instance, one might want to switch from a novelty sampling strategy to a falsification sampling strategy as soon as one has probed enough novel experiment conditions. This section provides a basic introduction to the``BaseController``, which enables the implementation of such arbitrary execution orders.

In this section, we consider a scenario in which we switch experimentalists, depending on the amount of observations collected:
- If no observations are collected, we sample some seed experimental conditions
- If less than 7 observations are collected, we sample experimental conditions with ``novelty_sample``
- If 7 or more observations are collected, we sample experimental conditions with ``falsification_sample``

#### Planner Declaration

We begin with defining an ``experimentalist_planner`` function. Such planner function will be provided as input to the ``BaseController``, and will be used to determine the next step of the workflow, depending on the current state. The code block below implements a planner that selects the experimentalist to be executed depending on the amount of observations collected:

In [41]:
from autora.workflow.planner import last_result_kind_planner

def experimentalist_planner(state):
    # We're going to reuse the "last_result_kind_planner" planner, and modify its output.
    proposed_next_step = last_result_kind_planner(state)

    # Obtain a list of all observations collected so far
    all_observations = [item for sublist in state.observations for item in sublist]
    num_observations = len(all_observations)

    # Determine next experimentalist
    if proposed_next_step == "experimentalist":
        if num_observations < 1:
            next_step = "seed_experimentalist"
        elif num_observations > 0 and num_observations < 7:
            next_step = "novelty_experimentalist"
        else:
            next_step = "falsification_experimentalist"
    else:
      next_step = proposed_next_step

    print("PLANNER: Next step: " + next_step)
    return next_step

The ``experimentalist_planner`` function accepts a ``controller``'s state as input and returns the next step to be executed. Here, we call the ``last_result_kind_planner`` to obtain the default next step. For instance, according to the autonomous empirical research paradigm, if the last step involved executing the ``"theorist"``, the next step would be executing the ``experimentalist``.

If the next default step is the ``experimentalist``, the  ``experimentalist_planner`` will select the type of experimentalist based on the total number of collected observations.

### Executor Collection Declaration

In order for the ``BaseController`` to work with the ``experimentalist_planner``, we need to specify the experimentalists that it selects to be executed. In the next code block, we define all experimentalists by wrapping each of them into a ``Pipeline``. However, at this point, we don't need to provide the respective parameters for each experimentalist–we will provide these later, directly to the ``BaseController`` object.


In [42]:
from autora.experimentalist.pipeline import make_pipeline

seed_pipeline = make_pipeline([np.linspace(0, 2*np.pi, 3)])
novelty_pipeline = make_pipeline([novelty_sample])
falsification_pipeline = make_pipeline([falsification_sample])

We can now wrap all elements of our research process–this includes all experimentalists as well as the theorist and experiment runner–into a collection of executors. The following code block defines this collection using ``ChainedFunctionMapping``.

In [43]:
from autora.workflow.executor import (ChainedFunctionMapping, from_experimentalist_pipeline,
    from_experiment_runner_callable, from_theorist_estimator)

executor_collection = ChainedFunctionMapping(
    seed_experimentalist=
        [from_experimentalist_pipeline, seed_pipeline],
    novelty_experimentalist=
        [from_experimentalist_pipeline, novelty_pipeline],
    falsification_experimentalist=
        [from_experimentalist_pipeline, falsification_pipeline],
    experiment_runner=[from_experiment_runner_callable, run_experiment],
    theorist=[from_theorist_estimator, theorist_bms],
)

In the ``ChainedFunctionMapping``, we specify each element by its type, followed by its function. For instance, the ``seed_experimentalist`` is defined as an experimentalist pipeline. Thus, we specify it as ``from_experimentalist_pipeline``, and chain it with its respective function ``seed_experimentalist`` defined above.

### Base Controller Declaration

So far, we have defined a ``experimentalist_planner`` function which determines the next step in our workflow. We have also defined a  ``executor_collection`` defining each step of the workflow. Both will be provided to a special ``Controller`` called ``BaseController``. The ``BaseController`` does not require us to specify a ``theorist``, ``experimentalist``, or ``experiment_runner``. Instead, we can provide it with an ``executor_collection`` specifying all the elements of the workflow we require.

The ``BaseController`` also requires us to specify an initial ``state``. Here, we can instantiate a state as a ``History`` object which entails all variables of the experiment (as declared in ``metadata``) along with the parameters provided to each element in the ``executor_collection``. Let's begin with defining the parameters for all elements in the ``executor_collection``. Here, only two of the elements (``novelty_experimentalist`` and ``falsification_experimentalist``) require us to specify additional parameters.


In [44]:
params = {"novelty_experimentalist":
              {"novelty_sample":
                 {"condition_pool": condition_pool,
                  "reference_conditions": "%observations.ivs%", # access all conditions probed so far
                   "num_samples": 3},
               },
          "falsification_experimentalist":
              {"falsification_sample":
                 {"condition_pool": condition_pool,
                 "model": "%models[-1]%", # access last model generated by theorist
                 "reference_conditions": "%observations.ivs%", # access all conditions probed so far
                 "reference_observations": "%observations.dvs%", # access all observations collected so far
                 "metadata": metadata,
                 "num_samples": 3}
              }
          }

Using the ``metadata`` and ``params``, we can instantiate an initial ``state`` for the workflow.

In [45]:
from autora.workflow.state import History

state = History(variables=metadata, params=params)

For convenience, let us also define a monitor function which can print the current total number of observations. We will provide this monitor to the ``BaseController``.

In [46]:
def monitor(state):
    all_observations = [item for sublist in state.observations for item in sublist]
    num_observations = len(all_observations)
    print(f"MONITOR: Number of observations {num_observations}")

We now have all the required input arguments for the ``BaseController``.

In [47]:
from autora.workflow.base import BaseController

# define controller
controller = BaseController(
    state=state,
    monitor=monitor,
    planner=experimentalist_planner,
    executor_collection=executor_collection,
)


Finally, let's execute the controller for 5 research cycles, measured in terms of the number of generated models.

In [48]:
from itertools import takewhile

continue_criterion = lambda controller: len(controller.state.models) < 5

for step in takewhile(continue_criterion, controller):
    print(f"MONITOR: Number of models: {len(step.state.models)}")

PLANNER: Next step: seed_experimentalist
MONITOR: Number of observations 0
MONITOR: Number of models: 0
PLANNER: Next step: experiment_runner
MONITOR: Number of observations 3
MONITOR: Number of models: 0
PLANNER: Next step: theorist


100%|██████████| 100/100 [00:09<00:00, 10.08it/s]


MONITOR: Number of observations 3
MONITOR: Number of models: 1
PLANNER: Next step: novelty_experimentalist
MONITOR: Number of observations 3
MONITOR: Number of models: 1
PLANNER: Next step: experiment_runner
MONITOR: Number of observations 6
MONITOR: Number of models: 1
PLANNER: Next step: theorist


100%|██████████| 100/100 [00:10<00:00,  9.26it/s]


MONITOR: Number of observations 6
MONITOR: Number of models: 2
PLANNER: Next step: novelty_experimentalist
MONITOR: Number of observations 6
MONITOR: Number of models: 2
PLANNER: Next step: experiment_runner
MONITOR: Number of observations 9
MONITOR: Number of models: 2
PLANNER: Next step: theorist


100%|██████████| 100/100 [00:12<00:00,  7.91it/s]


MONITOR: Number of observations 9
MONITOR: Number of models: 3
PLANNER: Next step: falsification_experimentalist
MONITOR: Number of observations 9
MONITOR: Number of models: 3
PLANNER: Next step: experiment_runner
MONITOR: Number of observations 12
MONITOR: Number of models: 3
PLANNER: Next step: theorist


100%|██████████| 100/100 [00:10<00:00,  9.73it/s]


MONITOR: Number of observations 12
MONITOR: Number of models: 4
PLANNER: Next step: falsification_experimentalist
MONITOR: Number of observations 12
MONITOR: Number of models: 4
PLANNER: Next step: experiment_runner
MONITOR: Number of observations 15
MONITOR: Number of models: 4
PLANNER: Next step: theorist


100%|██████████| 100/100 [00:10<00:00,  9.46it/s]


MONITOR: Number of observations 15


We can observe that the controller begins with sampling experiment condition using the ``seed_experimentalist``. It then proceeds to sample condition using the ``novelty_experimentalist`` until it has collected 7 or more observations, at which it switches to the ``falsification_experimentalist``.

# Next Notebook
This concludes the tutorial on ``autora`` functionality. However, ``autora`` is a flexible framework in which users can integrate their own theorists, experimentalists, and experiment_runners in an automated empirical research workflow. The next notebook illustrates how to add your own custom theorists and experimentalists to use with ``autora``.

Follow this link for the next notebook tutorial:
[AutoRA Basic Tutorial IV: Customization](www.addlink.com) <br>