# Basic Usage
Here, we demonstrate core functionalities of the Equation Tree:
- Basic Functionality for sampling and processing equations
- Advanced settings for sampling equations

## Installation

In [1]:
import random
!pip install equation_tree


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.1.2[0m[39;49m -> [0m[32;49m23.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


## Basic Functionality

### Sampling With Default Settings
First, we need to import the functionality. Here we also set a seed to ensure reproducible results.

In [2]:
from equation_tree import sample

# To obtain reproducible results, we set a seed for the following section
import numpy as np
np.random.seed(42)
# Adjusting the input dimension of the equations

We use this to sample an equation:

In [3]:
equation = sample()

Processing: 100%|██████████| 1/1 [00:00<00:00,  9.77iteration/s]


### Equation Representations And Features

First, lets look at the type of the equation

In [4]:
type(equation)

list

It is a list! This is because we can sample multiple equations in one go:

In [5]:
equations = sample(n = 100)

  "log": lambda a: np.log(a),
  "acos": lambda a: np.arccos(a),
  "^": lambda a, b: a**b,
  "sqrt": lambda a: np.sqrt(a),
  "asin": lambda a: np.arcsin(a),
Processing: 100%|██████████| 100/100 [00:01<00:00, 85.06iteration/s]


This returns 100 equations:

In [6]:
len(equations)

100

In [7]:
equations[0]

-sin(x_1 - exp(exp(x_1)))

In [8]:
equations[42]

-c_2 + tan(c_1*x_1)

They are represented as strings, but we can look at other representations as well. For example, prefix notation (for more details on different representations of the equations, see the respective section of the documentation):

In [9]:
equations[42].prefix

['-', 'tan', '*', 'c_1', 'x_1', 'c_2']

We can also look at features of the equation, for example at the number of constants, the tree depth of the underlying tree, the number of nodes or the tree structure (for more details on these features, see the respective section of the documentation):

In [10]:
equations[42].n_constants

2

In [11]:
equations[42].depth

3

In [12]:
equations[42].n_nodes

6

In [13]:
equations[42].structure

[0, 1, 2, 3, 3, 1]

### Instantiate Equations

Note: the sampled equation are abstract: c_1 are representation of a constant. We can instantiate constants to be numbers:

In [14]:
# first we need to import the functionality
from equation_tree import instantiate_constants
import random

# then we can use a function to instantiate the constants. For example for random constants between 0 and 1:
instantiated_equation = instantiate_constants(equations[42], lambda : random.random())
print(f'abstract: {equations[42]}', f', instantiated: {instantiated_equation}')

abstract: -c_2 + tan(c_1*x_1) , instantiated: tan(0.2720878822071884*x_1) - 0.4936315339297549


In [15]:
# we can also use other functions (for example all functions to be a constant
instantiated_equation_ = instantiate_constants(equations[41], lambda : 1)
print(f'abstract: {equations[41]}', f', instantiated: {instantiated_equation_}')

abstract: c_2*x_1**c_1 , instantiated: x_1


**We can use arbitrary functions to instantiate the constants.

### Evaluating Equations

After instantiating equations, we can also evaluate on arbitrary input:

In [16]:
# import functionality
values = instantiated_equation.evaluate({'x_1': [1, 2, 3, 4]})
values

array([-0.21462429,  0.11148855,  0.57008635,  1.41577756])

In [17]:
# We can also use pandas dataframes as inputs:

# import functionality
import pandas as pd

# define the input and get the values
input_df = pd.DataFrame({'x_1': [1, 2, 3, 4]})
instantiated_equation.evaluate(input_df)

array([-0.21462429,  0.11148855,  0.57008635,  1.41577756])


## Sample Settings

When sampling equations, we can control for a variety of features of the underlying distribution.

### Input Dimensions

We can manipulate the space on witch the equation is defined. For example, if we want equations that are defined on 2-dimensions, we can write:

In [25]:
equations_2d = sample(n=5, max_num_variables=2)

  "acos": lambda a: np.arccos(a),
  "log": lambda a: np.log(a),
  "asin": lambda a: np.arcsin(a),
  "sqrt": lambda a: np.sqrt(a),
Processing: 100%|██████████| 5/5 [00:00<00:00, 112.51iteration/s]


In [27]:
equations_2d

[acos(Abs(acos(x_1))),
 -Abs(x_2) + asin(log(x_1)),
 Max(c_1, x_1),
 Max(x_1, Abs(sqrt(sin(x_1)))),
 cos(x_1*(c_1 + x_2))]

**Note: Not all the equations have exactly 2 input variable. Some of them have only one. This is since equations with only one input variable are still defined on 2 (or more dimensions)

### Equation Complexity

We can also manipulate the equation complexity (as number of nodes)

In [28]:
equations_simple = sample(n=5, depth=3)
equations_complex = sample(n=5, depth=8)

  "acos": lambda a: np.arccos(a),
  "asin": lambda a: np.arcsin(a),
  "^": lambda a, b: a**b,
Processing: 100%|██████████| 5/5 [00:00<00:00, 312.27iteration/s]
  "^": lambda a, b: a**b,
  "sin": lambda a: np.sin(a),
  "sqrt": lambda a: np.sqrt(a),
  "log": lambda a: np.log(a),
Processing: 100%|██████████| 5/5 [00:00<00:00, 21.56iteration/s]


In [29]:
print('*** simple equations ***\n', equations_simple, '\n')
print('*** complex equations ***\n', equations_complex)

*** simple equations ***
 [acos(exp(x_1)), Max(c_1, x_1), x_1/c_1, asin(exp(x_1)), x_1**c_1] 

*** complex equations ***
 [sin(exp(sqrt(1 - sin(x_1)**2)/sin(x_1))**x_1), (x_1 + Min(x_1, sin(x_1)))/sqrt(1 - (x_1 + Min(x_1, sin(x_1)))**2), sqrt(Min(c_1**sin(sin(x_1)), x_1)), -asin(x_1)**x_1 + Min(c_1, x_1), Max(-x_1 + log(x_1), sqrt(Abs(x_1)))]


### Using Priors
We can also make use of priors to fully customize the sampling. Here, the entries for the structures, features, functions and operators represent the probability of the respective attribute being sampled.

In [34]:
p = {
    'structures': {'[0, 1, 1]': .3, '[0, 1, 2]': .3, '[0, 1, 2, 3, 2, 3, 1]': .4},
    'features': {'constants': .2, 'variables': .8},
    'functions': {'sin': .5, 'cos': .5},
    'operators': {'+': .8, '-': .2}
}
equations_with_prior = sample(n=10, prior=p, max_num_variables=10)
equations_with_prior

Processing: 100%|██████████| 100/100 [00:00<00:00, 190.81iteration/s]


[cos(cos(x_1)),
 sin(sin(x_1)),
 -x_3 + sin(x_1) + sin(x_2),
 -x_3 + sin(x_2) + cos(x_1),
 cos(cos(x_1)),
 cos(sin(x_1)),
 sin(cos(x_1)),
 x_1 + x_2,
 sin(cos(x_1)),
 cos(sin(x_1)),
 x_1 + x_2,
 cos(sin(x_1)),
 x_1 - x_2,
 x_1 + x_2,
 sin(sin(x_1)),
 x_1 + x_2,
 sin(cos(x_1)),
 -c_1 + x_1,
 sin(cos(x_1)),
 x_1 + x_2,
 x_1 + x_2,
 x_1 + x_2,
 x_1 + x_2,
 sin(cos(x_1)),
 c_1 + x_1,
 x_1 - x_2,
 x_1 - x_2,
 x_1 + x_2,
 sin(sin(x_1)),
 c_1 + x_1,
 cos(sin(x_1)),
 cos(sin(x_1)),
 cos(sin(x_1)),
 cos(cos(x_1)),
 c_1 + x_1,
 sin(cos(x_1)),
 sin(cos(x_1)),
 x_1 - x_2,
 cos(sin(x_1)),
 cos(sin(x_1)),
 x_1 - x_2,
 cos(cos(x_1)),
 x_1 + x_2,
 x_1 + x_2,
 sin(sin(x_1)),
 c_1 + x_1,
 c_1 + x_1,
 x_1 + x_2,
 x_1 - x_2,
 sin(sin(x_1)),
 cos(cos(x_1)),
 cos(cos(x_1)),
 cos(cos(x_1)),
 sin(cos(x_1)),
 x_1 - x_2,
 sin(sin(x_1)),
 cos(cos(x_1)),
 c_1 + x_1,
 sin(sin(x_1)),
 cos(sin(x_1)),
 cos(sin(x_1)),
 x_1 + x_2,
 cos(sin(x_1)),
 sin(cos(x_1)),
 c_1 + x_1,
 -x_2,
 x_1 - x_2,
 -x_3 + sin(x_1) + sin(x_2

We can also use conditional priors conditioned on the parent node:

In [38]:
p_ = {
    'structures': {'[0, 1, 1]': .3, '[0, 1, 2]': .3, '[0, 1, 2, 3, 2, 3, 1]': .4},
    'features': {'constants': .2, 'variables': .8},
    'functions': {'sin': .5, 'cos': .5},
    'operators': {'+': .5, '-': .5},
    'function_conditionals': {
        'sin': {
            'features': {'constants': 0., 'variables': 1.},
            'functions': {'sin': 0., 'cos': 1.},
            'operators': {'+': .5, '-': .5}
        },
        'cos': {
            'features': {'constants': 0., 'variables': 1.},
            'functions': {'cos': 1., 'sin': 0.},
            'operators': {'+': 0., '-': 1.}
        }
    },
    'operator_conditionals': {
        '+': {
            'features': {'constants': .5, 'variables': .5},
            'functions': {'sin': 1., 'cos': 0.},
            'operators': {'+': 1., '-': 0.}
        },
        '-': {
            'features': {'constants': .3, 'variables': .7},
            'functions': {'cos': .5, 'sin': .5},
            'operators': {'+': .9, '-': .1}
        }
    },
}
equations_with_conditional_prior = sample(n=10, prior=p_, max_num_variables=10)
equations_with_conditional_prior

Processing: 100%|██████████| 100/100 [00:00<00:00, 128.08iteration/s]


[cos(cos(x_1)),
 x_1 - x_2,
 -c_1 + x_1,
 sin(cos(x_1)),
 sin(cos(x_1)),
 c_1 + x_1,
 -c_1 + sin(x_1) + sin(x_2),
 cos(cos(x_1)),
 cos(cos(x_1)),
 -x_3 + sin(x_1) + sin(x_2),
 sin(cos(x_1)),
 c_1 - x_1,
 -x_3 + sin(x_1) + sin(x_2),
 cos(cos(x_1)),
 -x_3 + sin(x_1) + sin(x_2),
 cos(cos(x_1)),
 c_1 - x_1,
 sin(cos(x_1)),
 cos(cos(x_1)),
 x_1 - x_2,
 -c_1 + sin(x_1) + sin(x_2),
 -x_3 + sin(x_1) + sin(x_2),
 cos(cos(x_1)),
 c_1 - x_1,
 sin(cos(x_1)),
 x_1 - x_2,
 -x_3 - sin(x_2) + cos(x_1),
 -c_1 + x_1,
 sin(cos(x_1)),
 cos(cos(x_1)),
 sin(cos(x_1)),
 x_1 + x_2,
 cos(cos(x_1)),
 -x_3 + sin(x_1) - sin(x_2),
 c_1 + x_1,
 cos(cos(x_1)),
 cos(cos(x_1)),
 sin(cos(x_1)),
 x_1 - x_2,
 -x_3 + sin(x_1) + sin(x_2),
 -c_1 + x_1,
 c_1 - x_1,
 c_1 + x_1,
 c_1 + x_1,
 cos(cos(x_1)),
 -c_1 + sin(x_1) + sin(x_2),
 c_1 + x_1,
 sin(cos(x_1)),
 sin(cos(x_1)),
 -x_3 + sin(x_1) + sin(x_2),
 -x_3 + sin(x_1) - cos(x_2),
 sin(cos(x_1)),
 x_1 - x_2,
 -x_3 + cos(x_1) - cos(x_2),
 x_1 - x_2,
 cos(cos(x_1)),
 sin(cos

**WARNING**
If your application is dependent on these priors, you should "burn" samples before starting the sampling.
During the sampling process, equations get simplified and invalid equations are discarded. This is likely to lead to disparities between the priors and the sampled frequencies.
To counteract this, the package offers the functionality to "burn" samples and adjust the priors so that the outcome frequency match them more closely. To burn samples, use the following code (We don't run it in the notebook since the adjusted priors are saved to disk for future use):
```
burn(
    prior,
    max_number_variables,
    path_to_file,
    number_of_burned_samples,
    learning_rate
    )
```
*this function should be run multiple times. The learning rate defines how much adjusted from previous runs are adjusted.
After burning, you can load the adjusted priors via:
```
    sample(..., file=path_to_file)
```
*multiple adjusted priors can be stored in the same file.