## How to generate an univartiate Dataset

In [1]:
%load_ext autoreload
%autoreload 

from eqlearner.dataset.univariate.datasetcreator import DatasetCreator
from sympy import sin, Symbol, log, exp 
import numpy as np

The basis function order **is important**. Note that the basis functions should be passed **without** the independent variable (i.e. sin and not sin(x))

In [2]:
x = Symbol('x')
basis_functions = [x,sin,log,exp]

Create a func generator object. As argument you need to pass the basis_function and some keywords arguments regarding how many terms from each class you want to have.

In [4]:
fun_generator = DatasetCreator(basis_functions, constants_enabled=False)

Call the fun_generator with no arguments for return the equation string and a dictionary containing a list of each added.

In [48]:
string, dictionary, dictionary_clean =  fun_generator.generate_fun()
print("string: \n  {} \n".format(string))
print("dictionary \n {} \n".format(dictionary))
print("dictionary clean \n {} \n".format(dictionary_clean))

string: 
  0 

dictionary 
 {'Single': [], 'binomial': [], 'N_terms': [], 'compositions': []} 

dictionary clean 
 {'Single': [], 'binomial': [], 'N_terms': [], 'compositions': []} 



To make prediction use evaluate function

In [49]:
support = np.arange(1,20)
y = fun_generator.evaluate_function(support,string)
support,y

(array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
        18, 19]),
 array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0.]))

You can also generate batch of equation in a single step with generate_batch

In [62]:
from eqlearner.dataset.processing import tokenization
support = np.arange(1,20)
number_to_generate = 5
inp, out = fun_generator.generate_batch(support, number_to_generate)
t = tokenization.pipeline(out)

assert len(inp) == len(out)
print("Number of equations", len(inp))
print("Example of input", inp[0])
print("Example of out before tokenization", out[0])
print("Example of out after tokenization", t[0])

Number of equations 5
Example of input [[  1.           2.           3.           4.           5.
    6.           7.           8.           9.          10.
   11.          12.          13.          14.          15.
   16.          17.          18.          19.        ]
 [  2.48876973   2.84224481   0.293246    -1.38981561  -2.65363284
   -0.52583748   1.70710753   3.31778363   0.94292411  -0.98248723
  -10.84068883  -0.96976556   0.96468624   3.32575727   1.68273918
   -0.54094259  -2.70697796  -1.37592786   0.31225156]]
Example of out before tokenization {'Single': [sin(x)**4 + sin(x)], 'binomial': [], 'N_terms': [], 'compositions': [log(sin(x)**3 + sin(x)**2 + sin(x) + 1)]}
Example of out after tokenization [12  2  5  1  6  7 17  9  2  5  1  6  9  4  5  2  5  1  6  7 16  9  2  5
  1  6  7 15  9  2  5  1  6  9 14  6 13]


## Call generate set for generating a training and test set

In [64]:
train_dataset, info_training = fun_generator.generate_set(support,25,isTraining=True)

In [66]:
test_dataset, info_testing = fun_generator.generate_set(support,5,isTraining=False)

## We can directly use then the dataset_loader function to create iterators

In [67]:
from eqlearner.architectures.utils import dataset_loader

In [68]:
train_loader, valid_loader, test_loader, valid_idx, train_idx = dataset_loader(train_dataset,test_dataset)

## Saving the data

In [69]:
%load_ext autoreload
%autoreload 
from eqlearner.dataset.univariate.utils import save_dataset, load_dataset

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [70]:
save_dataset(train_dataset, info_training, 
             test_dataset, info_testing, path="./dataset.npy")

AssertionError: Torch not compiled with CUDA enabled