# Regression Task
 
**SSDE** is developed based on the source code of **DSO** (Deep Symbolic Optimization), available at [DSO GitHub Repository](https://github.com/dso-org/deep-symbolic-optimization/tree/master). It retains the regression task implemented in DSO. To perform regression and recover the true mathematical expression, we can utilize the same methods provided by DSO, including:

1. **`Sklearn`-like regressor interface**: make it easy to try out deep symbolic regression on your own data.
2. **Command Line Method**: Using the command-line interface provided by DSO to execute regression tasks directly.



## Sklearn-like regressor interface 

You can directly use the `DSR` methods as following: 

In [1]:
from ssde import DeepSymbolicRegressor
import numpy as np

# Generate some data
np.random.seed(0)
X = np.random.random((10, 2))
# y = np.sin(X[:,0]) + X[:,1] ** 2

# y = X[:,0]**4 -X[:,0]**3 + 0.5 * X[:,1]**2 - X[:,1]
y = X[:,0]**4  + X[:,0]**3 +  X[:,0]**2 + X[:,0]


# Create the model
model = DeepSymbolicRegressor("./config/config_regression.json") # Alternatively, you can pass in your own config JSON path

# Fit the model
model.fit(X, y) # Should solve in ~10 seconds

# View the best expression
print(model.program_.pretty())

# Make predictions
# model.predict(2 * X)

  from .autonotebook import tqdm as notebook_tqdm


-- BUILDING PRIOR START -------------
LengthConstraint: Sequences have minimum length 4.
                  Sequences have maximum length 64.
TrigConstraint: [sin, cos] cannot be a descendant of [sin, cos].
UniformArityPrior: Activated.
SoftLengthPrior: No description available.
-- BUILDING PRIOR END ---------------

-- RUNNING ITERATIONS START -------------
[00:00:00:01.80] Training iteration 1, current best R: 0.7562

	** New best
	Reward: 0.7561798430745647
	Count Off-policy: 0
	Count On-policy: 1
	Originally on Policy: True
	Invalid: False
	Traversal: div,x1,mul,x2,mul,add,x1,sub,cos,x1,x2,cos,mul,div,x1,x1,x1
	Expression:
	                x₁              
	  ──────────────────────────────
	  x₂⋅(x₁ - x₂ + cos(x₁))⋅cos(x₁)

-- RUNNING ITERATIONS START -------------
-- RUNNING ITERATIONS START -------------
[00:00:00:00.12] Training iteration 3, current best R: 0.7895

	** New best
	Reward: 0.7894646592987298
	Count Off-policy: 0
	Count On-policy: 1
	Originally on Policy: True
	Inval

Different from the source code in the [DSO GitHub Repository](https://github.com/dso-org/deep-symbolic-optimization/tree/master), we also provide a manneer to perform `DSO` in the `Sklean`-like regressor manner:

In [1]:
from ssde import DeepSymbolicRegressor
import numpy as np
import warnings
warnings.filterwarnings("ignore")

# Generate some data
np.random.seed(0)
X = np.random.random((10, 2))
# y = np.sin(X[:,0]) + X[:,1] ** 2

# y = X[:,0]**4 -X[:,0]**3 + 0.5 * X[:,1]**2 - X[:,1]
y = X[:,0]**4  + X[:,0]**3 +  X[:,0]**2 + X[:,0]


# Create the model
model = DeepSymbolicRegressor("./config/config_regression_gp.json") # Alternatively, you can pass in your own config JSON path

# Fit the model
model.fit(X, y) # Should solve in ~10 seconds

# View the best expression
print(model.program_.pretty())

  from .autonotebook import tqdm as notebook_tqdm


-- BUILDING PRIOR START -------------
LengthConstraint: Sequences have minimum length 4.
                  Sequences have maximum length 32.
RelationalConstraint: [exp] cannot be a child of [log].
InverseUnaryConstraint: RelationalConstraint: [log] cannot be a child of [exp].
TrigConstraint: [sin, cos] cannot be a descendant of [sin, cos].
UniformArityPrior: Activated.
SoftLengthPrior: No description available.
-- BUILDING PRIOR END ---------------

GP Controller using parallel evaluation
	>>> Using 64 processes
-- RUNNING ITERATIONS START -------------
[00:00:00:09.86] Training iteration 1, current best R: 1.0000

	** New best
	Reward: 0.9999999999999998
	Count Off-policy: 1
	Count On-policy: 0
	Originally on Policy: False
	Invalid: False
	Traversal: add,x1,mul,x1,add,mul,div,x1,x1,x1,add,mul,mul,x1,x1,x1,mul,x1,x1
	Expression:
	     ⎛  3     2     ⎞     
	  x₁⋅⎝x₁  + x₁  + x₁⎠ + x₁

[00:00:00:10.04] Early stopping criteria met; breaking early.
Invalid expressions: 235 of 2408 (9.8%).

## Command Line Method


You can test symbolic regression out of the box with a default configuration, after running setup, with a command such as:
```bash
python -m ssde.run ssde/config/config_regression.json --b Nguyen-7
```
This will run DSO on the regression task with benchmark Nguyen-7.

If you want to include optimized floating-point constants in the search space, simply include "const" in the function_set list. Note that constant optimization uses an inner-optimization loop, which leads to much longer runtimes (~hours instead of ~minutes).

In [3]:
from ssde import DeepSymbolicRegressor
import numpy as np

# Generate some data
np.random.seed(0)
# Poisson 2d
X = np.random.uniform(-1, 1, (500, 2))
y = 2.5 * X[:,0]**4  - 1.3 * X[:,0]**3 + 0.5 * X[:,1]**2 - 1.7 * X[:,1]
data = np.hstack((X, y.reshape(-1, 1))) 
np.savetxt("../../data/poisson2d_truth.csv", data, delimiter=",", comments="", fmt="%.6f")
# Poisson 3d
X = np.random.uniform(-1, 1, (500, 3))
y = 2.5 * X[:,0]**4  - 1.3 * X[:,1]**3 + 0.5 * X[:,2]**2
data = np.hstack((X, y.reshape(-1, 1))) 
np.savetxt("../../data/poisson3d_truth.csv", data, delimiter=",", comments="", fmt="%.6f")
# Heat 2d
X_t = np.random.uniform(0, 1, (500, 1))
X_x = np.random.uniform(-1, 1, (500, 2))
y = 2.5 * X_x[:,0]**4 - 1.3 * X_x[:,1]**3 + 0.5 * X_t[:,0]**2
data = np.hstack((X_t, X_x, y.reshape(-1, 1)))
np.savetxt("../../data/heat2d_truth.csv", data, delimiter=",", comments="", fmt="%.6f")
# Heat 3d
X_t = np.random.uniform(0, 1, (500, 1))
X_x = np.random.uniform(-1, 1, (500, 3))
y = 2.5 * X_x[:,0]**4 - 1.3 * X_x[:,1]**3 + 0.5 * X_x[:,2]**2 - 1.7*X_t[:,0]
data = np.hstack((X_t, X_x, y.reshape(-1, 1)))
np.savetxt("../../data/heat3d_truth.csv", data, delimiter=",", comments="", fmt="%.6f")
# Wave 2d
X_t = np.random.uniform(0, 1, (500, 1))
X_x = np.random.uniform(-1, 1, (500, 2))
y = np.exp(X_x[:,0]**2-0.5*X_t[:,0])*np.sin(X_x[:,1])
data = np.hstack((X_t, X_x, y.reshape(-1, 1)))
np.savetxt("../../data/wave2d_truth.csv", data, delimiter=",", comments="", fmt="%.6f")
# Wave 3d
X_t = np.random.uniform(0, 1, (500, 1))
X_x = np.random.uniform(-1, 1, (500, 3))
y = np.exp(X_x[:,0]**2 + X_x[:,2]**2 -0.5*X_t[:,0])*np.cos(X_x[:,1])
data = np.hstack((X_t, X_x, y.reshape(-1, 1)))
np.savetxt("../../data/wave3d_truth.csv", data, delimiter=",", comments="", fmt="%.6f")
data = np.hstack((X_t, X_x, y.reshape(-1, 1)))
np.savetxt("../../data/wave3d_truth.csv", data, delimiter=",", comments="", fmt="%.6f")

```bash
python3 -m ssde.run ssde/config/config_regression.json --b=data/fdm_vanderpol_sol.csv --runs=1 \
--n_cores_task=1 \
--seed=500 \
> experiments/regression/log/vanderpol_sol.log 2>&1 &
```
```bash
python3 -m ssde.run ssde/config/config_regression.json --b=data/heat2d_truth.csv --runs=1 \
--n_cores_task=1 \
--seed=500 \
> experiments/regression/log/heat2d_sol.log 2>&1 &
```
```bash
python3 -m ssde.run ssde/config/config_regression.json --b=data/heat3d_truth.csv --runs=1 \
--n_cores_task=1 \
--seed=500 \
> experiments/regression/log/heat3d_sol.log 2>&1 &
```

```bash
python3 -m ssde.run ssde/config/config_regression.json --b=data/wave2d_truth.csv --runs=1 \
--n_cores_task=1 \
--seed=500 \
> experiments/regression/log/wave2d_sol.log 2>&1 &
```
```bash
python3 -m ssde.run ssde/config/config_regression.json --b=data/wave3d_truth.csv --runs=1 \
--n_cores_task=1 \
--seed=500 \
> experiments/regression/log/wave3d_sol.log 2>&1 &
```

```bash
python3 -m ssde.run ssde/config/config_regression.json --b=data/poisson2d_truth.csv --runs=1 \
--n_cores_task=1 \
--seed=500 \
> experiments/regression/log/poisson2d_sol.log 2>&1 &
```
```bash
python3 -m ssde.run ssde/config/config_regression.json --b=data/poisson3d_truth.csv --runs=1 \
--n_cores_task=1 \
--seed=500 \
> experiments/regression/log/poisson3d_sol.log 2>&1 &
```