# Bayesian Symbolic Regression

## Example

In [None]:
# Uncomment the following line when running on Google Colab
# !pip install autora

Let's generate a simple data set with two features $x_1, x_2 \in [0, 1]$ and a target $y$. We will use the following generative model:
$y = 2 x_1 - e^{(5 x_2)}$

In [None]:
import numpy as np

x_1 = np.linspace(0, 1, num=10)
x_2 = np.linspace(0, 1, num=10)
X = np.array(np.meshgrid(x_1, x_2)).T.reshape(-1,2)

y = 2 * X[:,0] + np.exp(5 * X[:,1])

Recall, the following are built-in operators which constitute the search space; that is, the space of operations to consider when searching over the space of computation graphs:

- **\+**: The output of the computation $x_j$ is the sum over its inputs $x_i, x_{ii}$: $x_j = x_i + x_{ii}$.
- **\-**: The output of the computation $x_j$ is the respective difference between its inputs $x_i, x_{ii}$: $x_j = x_i - x_{ii}$.
- __\*__: The output of the computation $x_j$ is the product over its two inputs $x_i, x_{ii}$: $x_j = x_i * x_{ii}$.
- **exp**: The output of the computation $x_j$ is the natural exponential function applied to its input $x_i$: $x_j = \exp(x_i)$.
- **pow2**: The output of the computation $x_j$ is the square function applied to its input $x_i$: $x_j$ = $x_i^2$.
- **pow3**: The output of the computation $x_j$ is the cube function applied to its input $x_i$: $x_j$ = $x_i^3$.
- **sin**: The output of the computation $x_j$ is the sine function applied to its input $x_i$: $x_j = \sin(x_i)$.
- **cos**: The output of the computation $x_j$ is the cosine function applied to its input $x_i$: $x_j = \cos(x_i)$.
- **ln**: The output of the computation $x_j$ is the linear transformation applied to its input $x_i$: $x_j = a * x_i + b$, where $a$ and $b$ are slope and intercept parameters.

## Set up the BSR Regressor

We will use the BSR Regressor to predict the outcomes. There are a number of parameters that determine how search is performed. The most important ones are listed below:

- `tree_num`: the number of expression trees to use in the linear mixture (final prediction model); also denoted by `K` in BSR.
- `iter_num`: the number of RJ-MCMC steps to execute (note: this can also be understood as the number of `K`-samples to take in the fitting process).
- `val`: the number of validation steps to execute following each iteration.
- `beta`: the hyperparameter that controls growth of a new expression tree. This needs to be < 0, and in general, smaller values of `beta` correspond to deeper expression trees.

Let's set up the BSR regressor with some default parameters.

In [None]:
from autora.skl.bsr import BSRRegressor

bsr_estimator = BSRRegressor(tree_num=3, itr_num=5000, val=100, beta=-1)

Now we have everything to run Bayesian Symbolic Regression and visualize the fitted model.

In [None]:
bsr_estimator.fit(X, y)
y_pred = bsr_estimator.predict(X)

In [None]:
import matplotlib.pyplot as plt

plt.figure()
plt.plot(X, y, "o")
plt.plot(X, y_pred, "-")
plt.show()
