# Introduction

**[AutoRA](https://pypi.org/project/autora/)** (**Au**tomated **R**esearch **A**ssistant) is an open-source framework designed to automate various stages of empirical research, including model discovery, experimental design, and data collection.

This notebook is the fourth of four notebooks within the basic tutorials of ``autora``. We suggest that you go through these notebooks in order as each builds upon the last. However, each notebook is self-contained and so there is no need to *run* the content of the last notebook for your current notebook. We will here provide a link to each notebook, but we will also provide a link at the end of each notebook to navigate you to the next notebook.

[AutoRA Basic Tutorial I: Components](https://autoresearch.github.io/autora/tutorials/basic/Tutorial-I-Components/) <br>
[AutoRA Basic Tutorial II: Loop Constructs](https://autoresearch.github.io/autora/tutorials/basic/Tutorial-II-Loop-Constructs/) <br>
[AutoRA Basic Tutorial III: Functional Workflow](https://autoresearch.github.io/autora/tutorials/basic/Tutorial-III-Functional-Workflow/) <br>
[AutoRA Basic Tutorial IV: Customization](https://autoresearch.github.io/autora/tutorials/basic/Tutorial-IV-Customization/) <br>

These notebooks provide a comprehensive introduction to the capabilities of ``autora``. **It demonstrates the fundamental components of ``autora``, and how they can be combined to facilitate automated (closed-loop) empirical research through synthetic experiments.**

**How to use this notebook** *You can progress through the notebook section by section or directly navigate to specific sections. If you choose the latter, it is recommended to execute all cells in the notebook initially, allowing you to easily rerun the cells in each section later without issues.*

## Tutorial Setup

In [None]:
#### Installation ####
!pip install -q "autora[theorist-bms]"

#### Import modules ####
import numpy as np
import pandas as pd
import torch
from autora.variable import DV, IV, ValueType, VariableCollection
from autora.state.bundled import StandardState
from autora.state.delta import on_state
from autora.state.wrapper import state_fn_from_estimator
from autora.experimentalist.random_ import random_pool
from autora.theorist.bms import BMSRegressor

#### Set seeds ####
np.random.seed(42)
torch.manual_seed(42)

# Customizing Automated Empirical Research Components

``autora`` is a flexible framework in which users can integrate their own theorists, experimentalists, and experiment_runners in aa automated empirical research workflow. This section illustrates the integration of custom theorists and experimentalists. For more information on how to contribute your own modules to the ``autora`` ecosystem, please refer to the [Contributor Documentation](https://autoresearch.github.io/autora/contribute/modules/).

To illustrate the use of custom theorists and experimentalists, we consider a simple workflow:
1. Generate 10 seed experimental conditions using `random_pool`
2. Iterate through the following steps
   - Collect observations using the ``experiment_runner``
   - Identify a model relating conditions to observations using a ``theorist``
   - Identify 3 new experimental conditions using an ``experimentalist``

Once this workflow is setup, we will replace each component with a custom function.

In [None]:
#### Define metadata ####
iv = IV(name="x", value_range=(0, 2 * np.pi), allowed_values=np.linspace(0, 2 * np.pi, 10))
dv = DV(name="y", type=ValueType.REAL)
variables = VariableCollection(independent_variables=[iv],dependent_variables=[dv])

#### Define condition pool ####
conditions = random_pool(variables, num_samples=10)

#### Define state ####
s = StandardState(
    variables = variables,
    conditions = conditions,
    experiment_data = pd.DataFrame(columns=["x","y"])
)

#### Define experiment runner and wrap with state functionality ####
def run_experiment(conditions: pd.DataFrame):
    x = conditions["x"]
    y = np.sin(x) + np.random.normal(0, 0.5, size=x.shape)
    observations = conditions.assign(y = y)
    return observations

experiment_runner = on_state(run_experiment, output=["experiment_data"])

#### Define theorist and wrap with state functionality ####
theorist = state_fn_from_estimator(BMSRegressor(epochs=100))

#### Define experimentalist and wrap with state functionality ####
experimentalist = on_state(random_pool, output=["conditions"])

We should quickly test to make sure everything works as expected.

In [None]:
print('\033[1mPrevious State:\033[0m')
print(s)

for cycle in range(2):
    s = theorist(experiment_runner(experimentalist(s)))

print('\n\033[1mUpdated State:\033[0m')
print(s)

## Custom Theorists

What if we wanted to replace the ``theorist`` with a custom theorist?

We can implement our theorist as a class that inherits from  `sklearn.base.BaseEstimator`. The class must implement the following methods:

- `fit(self, conditions, observations)`
- `predict(self, conditions)`

The following code block implements such a theorist that fits a polynomial of a specified degree.

In [None]:
import numpy as np
from sklearn.base import BaseEstimator

class PolynomialRegressor(BaseEstimator):
    """
    This theorist fits a polynomial function to the data.
    """

    def __init__(self, degree: int = 3):
        self.degree = degree

    def fit(self, conditions, observations):

        # polyfit expects a 1D array
        if conditions.ndim > 1:
            conditions = conditions.flatten()

        if observations.ndim > 1:
            observations = observations.flatten()

        # fit polynomial
        self.coeff = np.polyfit(conditions, observations, self.degree)
        self.polynomial = np.poly1d(self.coeff)
        pass

    def predict(self, conditions):
        return self.polynomial(conditions)
    
custom_theorist = state_fn_from_estimator(PolynomialRegressor())

Let's run the controller with the new theorist for 3 research cycles, defined by the number of models generated.

In [None]:
#### First, let's reinitialize the state object to get a clean state ####
s = StandardState(variables = variables, conditions = conditions, experiment_data = pd.DataFrame(columns=["x","y"]))

print('\033[1mPrevious State:\033[0m')
print(s)

for cycle in range(5):
    s = custom_theorist(experiment_runner(experimentalist(s)))

print('\n\033[1mUpdated State:\033[0m')
print(s)


## Custom Experimentalists

We can also implement custom experimentalists. Experimentalists are generally implemented as functions that can be integrated into an
[Experimentalist Pipeline](https://autoresearch.github.io/autora/core/docs/pipeline/Experimentalist%20Pipeline%20Examples/). For instance, an experimentalist sampler function expects a pool of experimental conditions–typically passed as a 2D numpy array named ``condition_pool``–and returns a modified set of experimental conditions.

The following code block implements a basic experimentalist that considers two models, and identifies experimental conditions for which the two models differ most in their predictions. This is a special case of the [Model Disagreement Sampler](https://autoresearch.github.io/autora/user-guide/experimentalists/samplers/model-disagreement/).

In [None]:
def uniform_experimentalist(variables: VariableCollection, conditions: pd.DataFrame, num_samples = 1):

    """
    An experimentalist that selects the least represented datapoints
    """

    #Retrieve the possible values
    allowed_values = variables.independent_variables[0].allowed_values
    
    #Determine the representation of each value
    conditions_count = np.array([conditions["x"].isin([value]).sum(axis=0) for value in allowed_values])
    
    #Sort to determine the least represented values
    conditions_sort = conditions_count.argsort()
    values_count = allowed_values[conditions_sort]
    
    return pd.DataFrame({"x": values_count[:num_samples]})

custom_experimentalist = on_state(uniform_experimentalist, output=["conditions"])

In [None]:
#### First, let's reinitialize the state object to get a clean state ####
s = StandardState(variables = variables, conditions = conditions, experiment_data = pd.DataFrame(columns=["x","y"]))

print('\033[1mPrevious State:\033[0m')
print(s)

for cycle in range(5):
    s = theorist(experiment_runner(custom_experimentalist(s, num_samples = 5)))

print('\n\033[1mUpdated State:\033[0m')
print(s)

[1mPrevious State:[0m
StandardState(variables=VariableCollection(independent_variables=[IV(name='x', value_range=(0, 6.283185307179586), allowed_values=array([0.        , 0.6981317 , 1.3962634 , 2.0943951 , 2.7925268 ,
       3.4906585 , 4.1887902 , 4.88692191, 5.58505361, 6.28318531]), units='', type=<ValueType.REAL: 'real'>, variable_label='Independent Variable', rescale=1, is_covariate=False)], dependent_variables=[DV(name='y', value_range=None, allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='Dependent Variable', rescale=1, is_covariate=False)], covariates=[]), conditions=          x
0  4.886922
1  4.886922
2  0.698132
3  2.792527
4  0.698132
5  4.886922
6  0.000000
7  4.886922
8  0.698132
9  1.396263, experiment_data=Empty DataFrame
Columns: [x, y]
Index: [], models=[])


ValueError: Length of values (5) does not match length of index (10)

## Custom Experiment Runner

In [None]:
def sine_experiment_runner(conditions: pd.DataFrame, added_noise: float = 0.5):
    x = conditions["x"]
    y = np.sin(x) + np.random.normal(0, added_noise, size=x.shape)
    observations = conditions.assign(y = y)
    return observations

custom_experiment_runner = on_state(sine_experiment_runner, output=["experiment_data"])

In [None]:
#### First, let's reinitialize the state object to get a clean state ####
s = StandardState(variables = variables, conditions = conditions, experiment_data = pd.DataFrame(columns=["x","y"]))

print('\033[1mPrevious State:\033[0m')
print(s)

for cycle in range(5):
    s = theorist(custom_experiment_runner(experimentalist(s, num_samples = 5)))

print('\n\033[1mUpdated State:\033[0m')
print(s)

## Altogether Now

In [None]:
#### First, let's reinitialize the state object to get a clean state ####
s = StandardState(variables = variables, conditions = conditions, experiment_data = pd.DataFrame(columns=["x","y"]))

print('\033[1mPrevious State:\033[0m')
print(s)

for cycle in range(5):
    s = custom_theorist(custom_experiment_runner(custom_experimentalist(s, num_samples = 5)))

print('\n\033[1mUpdated State:\033[0m')
print(s)

# Help
We hope that this tutorial helped demonstrate the fundamental components of ``autora``, and how they can be combined to facilitate automated (closed-loop) empirical research through synthetic experiments. We encourage you to explore other [tutorials](https://autoresearch.github.io/autora/tutorials/) and check out the [documentation](https://autoresearch.github.io/).

If you encounter any issues, bugs, or questions, please reach out to us through the [AutoRA Forum](https://github.com/orgs/AutoResearch/discussions). Feel free to report any bugs by [creating an issue in the AutoRA repository](https://github.com/AutoResearch/autora/issues).

You may also post questions directly into the [User Q&A Section](https://github.com/orgs/AutoResearch/discussions/categories/using-autora).
