## How to generate an univartiate Dataset

In [1]:
%load_ext autoreload
%autoreload 

from eqlearner.dataset.univariate.datasetcreator import DatasetCreator
from sympy import sin, Symbol, log, exp 
import numpy as np

The basis function order **is important**. Note that the basis functions should be passed **without** the independent variable (i.e. sin and not sin(x))

In [2]:
x = Symbol('x')
basis_functions = [x,sin,log,exp]

Create a func generator object. As argument you need to pass the basis_function and some keywords arguments regarding how many terms from each class you want to have.

In [3]:
fun_generator = DatasetCreator(basis_functions, constants_enabled=False)

Call the fun_generator with no arguments for return the equation string and a dictionary containing a list of each added.

In [4]:
string, dictionary, dictionary_clean =  fun_generator.generate_fun()
print("string: \n  {} \n".format(string))
print("dictionary \n {} \n".format(dictionary))
print("dictionary clean \n {} \n".format(dictionary_clean))

string: 
  exp(6*x) + exp(4*x) + exp(3*x) + exp(x) + sin(exp(3*x) + exp(2*x) + exp(x) + 1) + 1 

dictionary 
 {'Single': [exp(6*x) + exp(4*x) + exp(3*x) + exp(x) + 1], 'binomial': [], 'N_terms': [], 'compositions': [sin(exp(3*x) + exp(2*x) + exp(x) + 1)]} 

dictionary clean 
 {'Single': [exp(6*x) + exp(4*x) + exp(3*x) + exp(x) + 1], 'binomial': [], 'N_terms': [], 'compositions': [sin(exp(3*x) + exp(2*x) + exp(x) + 1)]} 



To make prediction use evaluate function

In [5]:
support = np.arange(1,20)
y = fun_generator.evaluate_function(support,string)
support,y

(array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
        18, 19]),
 array([4.81609556e+02, 1.66148561e+05, 6.58308491e+07, 2.64981711e+10,
        1.06869630e+13, 4.31125810e+15, 1.73927639e+18, 7.01673670e+20,
        2.83075335e+23, 1.14200739e+26, 4.60718664e+28, 1.85867175e+31,
        7.49841700e+33, 3.02507732e+36, 1.22040329e+39, 4.92345829e+41,
        1.98626484e+44, 8.01316426e+46, 3.23274119e+49]))

You can also generate batch of equation in a single step with generate_batch

In [6]:
from eqlearner.dataset.processing import tokenization
support = np.arange(1,20)
number_to_generate = 5
inp, out = fun_generator.generate_batch(support, number_to_generate)
t = tokenization.pipeline(out)

assert len(inp) == len(out)
print("Number of equations", len(inp))
print("Example of input", inp[0])
print("Example of out before tokenization", out[0])
print("Example of out after tokenization", t[0])

Number of equations 5
Example of input [[1.00000000e+00 2.00000000e+00 3.00000000e+00 4.00000000e+00
  5.00000000e+00 6.00000000e+00 7.00000000e+00 8.00000000e+00
  9.00000000e+00 1.00000000e+01 1.10000000e+01 1.20000000e+01
  1.30000000e+01 1.40000000e+01 1.50000000e+01 1.60000000e+01
  1.70000000e+01 1.80000000e+01 1.90000000e+01]
 [6.43383594e+00 1.77950984e+01 1.05815443e+02 1.06975456e+03
  1.17191417e+04 1.27321981e+05 1.33014768e+06 1.31928456e+07
  1.23626092e+08 1.09339652e+09 9.13573253e+09 7.22481474e+10
  5.42051657e+11 3.86785749e+12 2.63152422e+13 1.71124876e+14
  1.06610898e+15 6.37720133e+15 3.67026019e+16]]
Example of out before tokenization {'Single': [sin(x)**6 + sin(x)**5 + sin(x)**4 + sin(x)**3 + sin(x)], 'binomial': [], 'N_terms': [], 'compositions': [log(exp(x)), x*exp(log(x)**3 + log(x)**2 + 1)]}
Example of out after tokenization [12  2  5  1  6  7 21  9  2  5  1  6  7 20  9  2  5  1  6  7 19  9  2  5
  1  6  7 18  9  2  5  1  6  9  4  5  3  5  1  6  6  9  1  8 

## Call generate set for generating a training and test set

In [7]:
train_dataset, info_training = fun_generator.generate_set(support,25,isTraining=True)

In [8]:
test_dataset, info_testing = fun_generator.generate_set(support,5,isTraining=False)

## We can directly use then the dataset_loader function to create iterators

In [9]:
from eqlearner.architectures.utils import dataset_loader

In [10]:
train_loader, valid_loader, test_loader, valid_idx, train_idx = dataset_loader(train_dataset,test_dataset)

## Saving the data

In [12]:
%load_ext autoreload
%autoreload 
from eqlearner.dataset.utils import save_dataset, load_dataset

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [None]:
save_dataset(train_dataset, info_training, 
             test_dataset, info_testing, path="./dataset.npy")