# SeTGAP: **S**ymbolic R**e**gression using **T**ransformers, **G**enetic **A**lgorithms, and Genetic **P**rogramming

## Installation

Execute `!pip install git+https://github.com/...`  % Hidden due to double-blind review

In [1]:
!pip install -q git+https://github.com/MultiSetSR
import warnings
warnings.filterwarnings("ignore")

Found existing installation: MultiSetSR 0.0.1
Uninstalling MultiSetSR-0.0.1:
  Successfully uninstalled MultiSetSR-0.0.1
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
  Building wheel for MultiSetSR (pyproject.toml) ... [?25l[?25hdone


## Example using pre-determined datasets

In this example, we will estimate the multivariate symbolic expression of a system whose underlying equation is one of the following:

<br>

| Eq. | Underlying equation________________________________________________________|
|-----|------------------------|
| E1  | $ (3.0375 x_1 x_2 + 5.5 \sin (9/4 (x_1 - 2/3)(x_2 - 2/3)))/5 $|
| E2  | $ 5.5 + (1- x_1/4) ^ 2 + \sqrt{x_2 + 10} \sin( x_3/5)$|
| E3  | $(1.5 e^{1.5  x_1} + 5 \cos(3 x_2)) / 10$|
| E4  | $((1- x_1)^2 + (1- x_3) ^ 2 + 100 (x_2 - x_1 ^ 2) ^ 2 + 100 (x_4 - x_3 ^ 2) ^ 2)/10000$|
| E5  | $\sin(x_1 + x_2 x_3) + \exp{(1.2  x_4)}$|
| E6  | $\tanh(x_1 / 2) + \text{abs}(x_2) \cos(x_3^2/5)$|
| E7  | $(1 - x_2^2) / (\sin(2 \pi \, x_1) + 1.5)$|
| E8  | $x_1^4 / (x_1^4 + 1) + x_2^4 / (x_2^4 + 1)$|
| E9  | $\log(2 x_2 + 1) - \log(4 x_1 ^ 2 + 1)$|
| E10 | $\sin(x_1 \, e^{x_2})$|
| E11 | $x_1 \, \log(x_2 ^ 4)$|
| E12 | $1 + x_1 \, \sin(1 / x_2)$|
| E13 | $\sqrt{x_1}\, \log(x_2 ^ 2)$|

In [2]:
from EquationLearning.SymbolicRegressor.MSSP import *
from EquationLearning.SymbolicRegressor.SetGAP import SetGAP

datasetName = 'E6'
data_loader = DataLoader(name=datasetName)
data = data_loader.dataset

**Define NN and load weights**

For this example, we have already trained a feedforward neural network on the generated dataset so we only load their corresponding weights.

In [3]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
root = get_project_root()
folder = os.path.join(root, "EquationLearning//saved_models//saved_NNs//" + datasetName)
filepath = folder + "//weights-NN-" + datasetName
nn_model = NNModel(device=device, n_features=data.n_features, NNtype=data_loader.modelType)
nn_model.loadModel(filepath)

**Get Estimated Multivariate Expressions**

The following method will generate some candidate multivariate expressions and select the most appropriates for the given dataset

In [4]:
regressor = SetGAP(dataset=data, bb_model=nn_model, n_candidates=2)
results = regressor.run()

********************************
Analyzing variable x0
********************************
Predicted skeleton 1 for variable x0: c*tanh(c*x0) + c
Predicted skeleton 2 for variable x0: c*sqrt(c*tanh(c*x0) + c) + c

Choosing the best skeleton... (skeletons ordered based on number of nodes)
    Skeleton: c*tanh(c*x0). Correlation: 0.9996484870765862. Expr: -tanh(0.535674*x0)
    Skeleton: c*sqrt(c*tanh(c*x0) + c). Correlation: 0.9996505762354936. Expr: 4.2461216421577*sqrt(0.0253371839750401*tanh(0.535718*x0) + 1)
********************************
Analyzing variable x1
********************************
Predicted skeleton 1 for variable x1: c*Abs(x1) + c

Choosing the best skeleton... (skeletons ordered based on number of nodes)
    Skeleton: c*Abs(x1). Correlation: 0.9990508604702034. Expr: 14.998537*Abs(x1)
********************************
Analyzing variable x2
********************************
Predicted skeleton 1 for variable x2: c*cos(c*x2**2 + c*x2 + c) + c
Predicted skeleton 2 for variabl

[1.000611*cos(0.199503240596298*x2**2 + 0.00476047459374139)*Abs(x1) + 1.00051094890511*tanh(0.501189912723317*x0)]