# Introduction
## Basic Tutorial III: Functional Workflow

**[AutoRA](https://pypi.org/project/autora/)** (**Au**tomated **R**esearch **A**ssistant) is an open-source framework designed to automate various stages of empirical research, including model discovery, experimental design, and data collection.

This notebook is the third of four notebooks within the basic tutorials of ``autora``. We suggest that you go through these notebooks in order as each builds upon the last. However, each notebook is self-contained and so there is no need to *run* the content of the last notebook for your current notebook. We will here provide a link to each notebook, but we will also provide a link at the end of each notebook to navigate you to the next notebook.

[AutoRA Basic Tutorial I: Components](https://autoresearch.github.io/autora/tutorials/basic/Tutorial-I-Components/) <br>
[AutoRA Basic Tutorial II: Loop Constructs](https://autoresearch.github.io/autora/tutorials/basic/Tutorial-II-Loop-Constructs/) <br>
[AutoRA Basic Tutorial III: Functional Workflow](https://autoresearch.github.io/autora/tutorials/basic/Tutorial-III-Functional-Workflow/) <br>
[AutoRA Basic Tutorial IV: Customization](https://autoresearch.github.io/autora/tutorials/basic/Tutorial-IV-Customization/) <br>

These notebooks provide a comprehensive introduction to the capabilities of ``autora``. **It demonstrates the fundamental components of ``autora``, and how they can be combined to facilitate automated (closed-loop) empirical research through synthetic experiments.**

**How to use this notebook** *You can progress through the notebook section by section or directly navigate to specific sections. If you choose the latter, it is recommended to execute all cells in the notebook initially, allowing you to easily rerun the cells in each section later without issues.*

## Tutorial Setup

In [None]:
#### Installation ####
!pip install -q "autora[theorist-bms]"

#### Import modules ####
import numpy as np
import pandas as pd
import torch

#### Set seeds ####
np.random.seed(42)
torch.manual_seed(42)


[notice] A new release of pip is available: 23.2 -> 23.2.1
[notice] To update, run: python.exe -m pip install --upgrade pip


<torch._C.Generator at 0x1f3b98c81f0>

## States

Using the functions and objects in `autora.state`, we can build flexible pipelines and cycles which operate on state
objects. State objects are containers with specialized functionality that will allow you to build processing pipelines containing experimentatlists, experiment runners, and theorists.

In tutorial I, we had experimentalists define new conditions, experiment runners collect new observations, and theorists model the data. To do this, we used the output of one as the input of the other, such as: 

`conditions = experimentalist(...)` $\rightarrow$ <br>
`observations = experiment_runner(conditions,...)` $\rightarrow$ <br>
`model = theorist(conditions, observations)` <br>

This chaining is embedded within the `State` functionality. To use a state, we must independently wrap our experimentalist, experiment_runner, and theorist into the same state, so that they become functions that:
- operate on the `State`, and
- return a modified object of the **same type** `State`.

### Defining The State

We use the `StandardState` object bundled with `autora`: `StandardState`. Let's begin by initiating the state while only providing *variable information* (`variables`), *seed condition data* (`conditions`), and a *dataframe* (`pd.DataFrame(columns=["x","y"])`) that will hold our conditions (`x`) and observations (`y`).

In [None]:
from autora.variable import Variable, ValueType, VariableCollection
from autora.experimentalist.random_ import random_pool
from autora.state.bundled import StandardState

#### Define variable data ####
iv = Variable(name="x", value_range=(0, 2 * np.pi), allowed_values=np.linspace(0, 2 * np.pi, 10))
dv = Variable(name="y", type=ValueType.REAL)
variables = VariableCollection(independent_variables=[iv],dependent_variables=[dv])

#### Define seed condition data ####
conditions = random_pool(variables, num_samples=10)

#### Initialize State ####
s = StandardState(
    variables = variables,
    conditions=conditions,
    experiment_data = pd.DataFrame(columns=["x","y"])
)

## Viewing the State

Now, let's view the contents of the state we just initialized.

In [None]:
print(s)

StandardState(variables=VariableCollection(independent_variables=[Variable(name='x', value_range=(0, 6.283185307179586), allowed_values=array([0.        , 0.6981317 , 1.3962634 , 2.0943951 , 2.7925268 ,
       3.4906585 , 4.1887902 , 4.88692191, 5.58505361, 6.28318531]), units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], dependent_variables=[Variable(name='y', value_range=None, allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], covariates=[]), conditions=          x
0  0.000000
1  0.000000
2  4.188790
3  2.792527
4  2.094395
5  6.283185
6  5.585054
7  2.094395
8  4.188790
9  0.000000, experiment_data=Empty DataFrame
Columns: [x, y]
Index: [], models=[])


Within the state, we can view all of the content we provided it more directly if we choose.

In [None]:
print("\033[1mThe variables we provided:\033[0m")
print(s.variables)

print("\033[1mThe conditions we provided:\033[0m")
print(s.conditions)

print("\n\033[1mThe dataframe we provided:\033[0m")
print(s.experiment_data)

[1mThe variables we provided:[0m
VariableCollection(independent_variables=[Variable(name='x', value_range=(0, 6.283185307179586), allowed_values=array([0.        , 0.6981317 , 1.3962634 , 2.0943951 , 2.7925268 ,
       3.4906585 , 4.1887902 , 4.88692191, 5.58505361, 6.28318531]), units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], dependent_variables=[Variable(name='y', value_range=None, allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], covariates=[])
[1mThe conditions we provided:[0m
          x
0  0.000000
1  0.000000
2  4.188790
3  2.792527
4  2.094395
5  6.283185
6  5.585054
7  2.094395
8  4.188790
9  0.000000

[1mThe dataframe we provided:[0m
Empty DataFrame
Columns: [x, y]
Index: []


## AutoRA Components and the State

Now that we have initialized the state, we need to start adding components of `AutoRA` to the state - namely, experiment runners, experimentalists, and theorists. 

These components are defined in the same way as past tutorials. All we need to do so that these can function within the state is to wrap them in specialized state functions. The wrappers are:
- `on_state()` for experiment runners and experimentalists
- `state_fn_from_estimator()` for theorists

The first input for each wrapper should be your corresponding function (i.e., the experiment runner, experimentalist, and the theorist). The `on_state` wrapper takes a second input, `output`, to determine where in the state the component is acting on. For the experimentalist this will be `output=["conditions"]`, and for the experiment runner this will be `output=["experiment_data"]`.

Once the components are wrapped, their functionality changes to act on the state, meaning that they now expect a state as the first input and will return a modified version of that state.

## Wrapping Components to Work with State

### Experimentalist Defined and Wrapped with State

We will use autora's `random_sample` sampler for our experimentalist. We import this and then wrap it so that it functions with the state.

In [None]:
from autora.experimentalist.random_ import random_sample
from autora.state.delta import on_state

experimentalist = on_state(random_sample, output=["conditions"])

### Experiment Runner Defined and Wrapped with State
We define the same experiment runner from the first two tutorials and then wrap it so that it functions with the state.

In [None]:
def experiment_runner(conditions: pd.DataFrame):
    x = conditions["x"]
    y = np.sin(x) + np.random.normal(0, 0.5, size=x.shape)
    observations = conditions.assign(y = y)
    print(observations)
    return observations

experiment_runner = on_state(experiment_runner, output=["experiment_data"])

### Theorist Defined and Wrapped with State

We will use autora's BMSRegressor theorist. We import this and then wrap it so that if functions with the state.

In [None]:
from autora.theorist.bms import BMSRegressor
from autora.state.wrapper import state_fn_from_estimator

theorist = state_fn_from_estimator(BMSRegressor(epochs=100))

## Running Each Component Within the State

### Run the Experimentalist

Let's run the experimentalist within the state and see how the state changes.

In [None]:
print('\033[1mPrevious Conditions:\033[0m')
print(s.conditions)

s = experimentalist(s, num_samples=5)

print('\n\033[1mUpdated Conditions:\033[0m')
print(s.conditions)

[1mPrevious Conditions:[0m
          x
0  0.000000
1  0.000000
2  4.188790
3  2.792527
4  2.094395
5  6.283185
6  5.585054
7  2.094395
8  4.188790
9  0.000000

[1mUpdated Conditions:[0m
          x
8  4.188790
1  0.000000
5  6.283185
0  0.000000
7  2.094395


### Run the Experiment Runner

Let's run the experiment runner and see how the state changes.

In [None]:
print("\033[1mPrevious Data:\033[0m")
print(s.experiment_data)

s = experiment_runner(s) #TODO: Why does it print the experiment data automatically?

print("\n\033[1mUpdated Data:\033[0m")
print(s.experiment_data)

[1mPrevious Data:[0m
Empty DataFrame
Columns: [x, y]
Index: []
          x         y
8  4.188790 -0.726505
1  0.000000  0.505258
5  6.283185 -0.290439
0  0.000000 -0.262585
7  2.094395  0.580335

[1mUpdated Data:[0m
          x         y
0  4.188790 -0.726505
1  0.000000  0.505258
2  6.283185 -0.290439
3  0.000000 -0.262585
4  2.094395  0.580335


### Run the Theorist

Let's run the theorist and see how the state changes.

In [None]:
print("\033[1mPrevious Model:\033[0m")
print(f"{s.model}\n")

s = theorist(s)

print("\n\033[1mUpdated Model:\033[0m")
print(s.model)

INFO:autora.theorist.bms.regressor:BMS fitting started


[1mPrevious Model:[0m
None



100%|██████████| 100/100 [00:03<00:00, 26.73it/s]
INFO:autora.theorist.bms.regressor:BMS fitting finished



[1mUpdated Model:[0m
-0.04


## Component Chaining and Looping

As such, we have our `AutoRA` components wrapped to work with the state. Remember, this means that they take the state as an input and returns the updated state as an output.

### Component Chaining

As the components all act on the state, they can be chained in a nested fashion.

In [None]:
s = theorist(experiment_runner(experimentalist(s, num_samples=5)))
print(s)

INFO:autora.theorist.bms.regressor:BMS fitting started


          x         y
1  0.000000 -0.462041
5  6.283185 -1.219553
8  4.188790 -0.564305
0  0.000000 -0.125522
7  2.094395  0.784092


100%|██████████| 100/100 [00:03<00:00, 26.29it/s]
INFO:autora.theorist.bms.regressor:BMS fitting finished


StandardState(variables=VariableCollection(independent_variables=[Variable(name='x', value_range=(0, 6.283185307179586), allowed_values=array([0.        , 0.6981317 , 1.3962634 , 2.0943951 , 2.7925268 ,
       3.4906585 , 4.1887902 , 4.88692191, 5.58505361, 6.28318531]), units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], dependent_variables=[Variable(name='y', value_range=None, allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], covariates=[]), conditions=          x
1  0.000000
5  6.283185
8  4.188790
0  0.000000
7  2.094395, experiment_data=          x         y
0  4.188790 -0.726505
1  0.000000  0.505258
2  6.283185 -0.290439
3  0.000000 -0.262585
4  2.094395  0.580335
5  0.000000 -0.462041
6  6.283185 -1.219553
7  4.188790 -0.564305
8  0.000000 -0.125522
9  2.094395  0.784092, models=[-0.18, -0.18])


## Chain Looping with Number of Cycles

Moreover, we can use these chained components within a loop to run multiple cycles.

In [None]:
#### First, let's reinitialize the state object to get a clean state ####
s = StandardState(
    variables = variables,
    conditions = conditions,
    experiment_data = pd.DataFrame(columns=["x","y"])
)

### Then we cycle through the pipeline we built five times ###
num_cycles = 5 # number of empirical research cycles
for cycle in range(num_cycles):
    print(f"\n\033[1mRunning Cycle {cycle+1}:\033[0m")
    s = theorist(experiment_runner(experimentalist(s, num_samples=5)))
    print(f"\033[1mCycle {cycle+1} model: {s.model}\033[0m")

INFO:autora.theorist.bms.regressor:BMS fitting started



[1mRunning Cycle 1:[0m
          x         y
1  0.000000 -0.146700
8  4.188790 -0.880945
7  2.094395  0.913588
5  6.283185  0.332327
4  2.094395  0.795916


100%|██████████| 100/100 [00:03<00:00, 25.24it/s]
INFO:autora.theorist.bms.regressor:BMS fitting finished
INFO:autora.theorist.bms.regressor:BMS fitting started


[1mCycle 1 model: sin(x)[0m

[1mRunning Cycle 2:[0m
          x         y
1  0.000000 -0.016597
4  2.094395  0.859277
7  2.094395  0.337170
5  6.283185  0.411272
8  4.188790 -1.476447


100%|██████████| 100/100 [00:03<00:00, 25.75it/s]
INFO:autora.theorist.bms.regressor:BMS fitting finished
INFO:autora.theorist.bms.regressor:BMS fitting started


[1mCycle 2 model: 0.11[0m

[1mRunning Cycle 3:[0m
          x         y
1  0.000000 -0.664093
7  2.094395  0.964456
5  6.283185  0.369233
8  4.188790 -0.780341
4  2.094395  0.808201


100%|██████████| 100/100 [00:03<00:00, 25.62it/s]
INFO:autora.theorist.bms.regressor:BMS fitting finished
INFO:autora.theorist.bms.regressor:BMS fitting started


[1mCycle 3 model: sin(x)[0m

[1mRunning Cycle 4:[0m
          x         y
8  4.188790 -1.016577
1  0.000000  0.230408
5  6.283185  0.042962
7  2.094395  0.111047
4  2.094395  1.226777


100%|██████████| 100/100 [00:03<00:00, 25.24it/s]
INFO:autora.theorist.bms.regressor:BMS fitting finished
INFO:autora.theorist.bms.regressor:BMS fitting started


[1mCycle 4 model: sin(x)[0m

[1mRunning Cycle 5:[0m
          x         y
1  0.000000 -0.021768
7  2.094395  0.728375
8  4.188790 -1.647559
5  6.283185 -0.397815
4  2.094395  1.331318


100%|██████████| 100/100 [00:03<00:00, 25.29it/s]
INFO:autora.theorist.bms.regressor:BMS fitting finished


[1mCycle 5 model: sin(x)[0m


If everything went well in terms of our theorist, we should have recovered our ground truth model `sin(x)`:

In [None]:
print(s.model)

sin(x)


## Chain Looping with Stopping Criterion

Alternatively, we can run the chain until we reach a stopping criterion. For example, here we will loop until we get 30 datapoints.

In [None]:
#### First, let's reinitialize the state object to get a clean state ####
s = StandardState(
    variables = variables,
    conditions = conditions,
    experiment_data = pd.DataFrame(columns=["x","y"])
)

### Then we cycle through the pipeline we built until we reach our stopping criteria ###
cycle = 0
while len(s.experiment_data) < 30:
    print(f"\n\033[1mRunning Cycle {cycle+1}, number of datapoints: {len(s.experiment_data)}\033[0m")
    s = theorist(experiment_runner(experimentalist(s, num_samples=5)))
    print(f"\033[1mCycle {cycle+1} model: {s.model}\033[0m")
    cycle += 1

print(f"\n\033[1mNumber of datapoints: {len(s.experiment_data)}\033[0m")
print(f"\033[1mDetermined Model: {s.model}\033[0m")


INFO:autora.theorist.bms.regressor:BMS fitting started



[1mRunning Cycle 1, number of datapoints: 0[0m
          x         y
2  4.188790 -0.527142
9  0.000000 -0.088866
7  2.094395  0.660834
5  6.283185  0.589858
8  4.188790 -1.315129


100%|██████████| 100/100 [00:03<00:00, 27.83it/s]
INFO:autora.theorist.bms.regressor:BMS fitting finished
INFO:autora.theorist.bms.regressor:BMS fitting started


[1mCycle 1 model: -0.14[0m

[1mRunning Cycle 2, number of datapoints: 5[0m
          x         y
9  0.000000  0.180818
5  6.283185 -0.322560
2  4.188790 -0.685328
8  4.188790 -0.097007
7  2.094395  0.848112


100%|██████████| 100/100 [00:04<00:00, 24.68it/s]
INFO:autora.theorist.bms.regressor:BMS fitting finished
INFO:autora.theorist.bms.regressor:BMS fitting started


[1mCycle 2 model: sin(x)[0m

[1mRunning Cycle 3, number of datapoints: 10[0m
          x         y
7  2.094395  1.648347
5  6.283185  0.043524
8  4.188790 -1.015529
2  4.188790 -0.820145
9  0.000000 -0.993784


100%|██████████| 100/100 [00:03<00:00, 27.33it/s]
INFO:autora.theorist.bms.regressor:BMS fitting finished
INFO:autora.theorist.bms.regressor:BMS fitting started


[1mCycle 3 model: -0.13[0m

[1mRunning Cycle 4, number of datapoints: 15[0m
          x         y
5  6.283185 -0.709298
7  2.094395  0.961194
2  4.188790 -0.798148
8  4.188790 -0.561981
9  0.000000  0.352491


100%|██████████| 100/100 [00:03<00:00, 25.85it/s]
INFO:autora.theorist.bms.regressor:BMS fitting finished
INFO:autora.theorist.bms.regressor:BMS fitting started


[1mCycle 4 model: -0.13[0m

[1mRunning Cycle 5, number of datapoints: 20[0m
          x         y
2  4.188790 -0.685564
8  4.188790 -0.918654
9  0.000000 -0.477673
5  6.283185 -0.207382
7  2.094395  0.166655


100%|██████████| 100/100 [00:03<00:00, 26.22it/s]
INFO:autora.theorist.bms.regressor:BMS fitting finished
INFO:autora.theorist.bms.regressor:BMS fitting started


[1mCycle 5 model: sin(x)[0m

[1mRunning Cycle 6, number of datapoints: 25[0m
          x         y
7  2.094395  0.711381
2  4.188790 -0.529463
8  4.188790 -0.994340
5  6.283185 -0.183913
9  0.000000  0.636867


100%|██████████| 100/100 [00:04<00:00, 24.70it/s]
INFO:autora.theorist.bms.regressor:BMS fitting finished


[1mCycle 6 model: sin(x)[0m

[1mNumber of datapoints: 30[0m
Determined Model: sin(x)


# Next Notebook
This concludes the tutorial on ``autora`` functionality. However, ``autora`` is a flexible framework in which users can integrate their own theorists, experimentalists, and experiment_runners in an automated empirical research workflow. The next notebook illustrates how to add your own custom theorists and experimentalists to use with ``autora``.

Follow this link for the next notebook tutorial:
[AutoRA Basic Tutorial IV: Customization](https://autoresearch.github.io/autora/tutorials/basic/Tutorial-IV-Customization/) <br>