<div class="cite2c-biblio"></div># 1. Building a Symbolic Regressor


In this short notebook, we detail how to use the *symbolic-pursuit* package that we developped to build a concise symbolic model for a black-box model. Here, the black-box model whe shall use is a *MLP* regressor model for a UCI dataset *wine quality red* <cite data-cite="2480681/TI5B4V8W"></cite>.
Note that our implementation of the meijer G-functions relies on the *pysymbolic* package<cite data-cite="2480681/IH83ZXGR"></cite>.
Let us simply start by importing the package we are going to use.

In [None]:
from datasets.data_loader_UCI import data_loader, mixup  # dataset loader for the UCI dataset
from symbolic_pursuit.models import SymbolicRegressor  # our symbolic model is an instance of this class 
from sklearn.neural_network import MLPRegressor # we use a MLP regressor as the black-box model
from sklearn.metrics import mean_squared_error # we are going to assess the quality of the model based on the generalization MSE
from sympy import init_printing # We use sympy to display mathematical expresssions 
import numpy as np # we use numpy to deal with arrays
init_printing()

We now split the dataset into a trainig and a test subest. All the features are normalized to the range $[0,1]$ and the labels are divided by the average of their absolute value. 

In [None]:
X_train, y_train, X_test, y_test = data_loader("wine-quality-red")

A MLP regressor is fitted to the training subset:

In [None]:
model = MLPRegressor()
model.fit(X_train, y_train)

Now, we shall build the training set for the *symbolic model*. To capture the peculiarities of our black-box, this is done by using a mixup strategy on the original training set <cite data-cite="2480681/H82VI2CA"></cite>. 

In [None]:
X_random = mixup(X_train)

We use these as training points to fit a *symbolic model* to the black-box MLP regressor. 
This model is built by using a projection pursuit strategy <cite data-cite="2480681/AD298KCW"></cite>. Note that the evaluation of Meijer G-functions is slow in the current Python implementations so this step might take a while. 

In [None]:
symbolic_model = SymbolicRegressor()
symbolic_model.fit(model.predict, X_random)

We can now compare the performance of the two models in terms of their MSE evaluated on the test set.

In [None]:
print("MSE score for the MLP Regressor: ", mean_squared_error(y_test, model.predict(X_test)))
print("MSE score for the Symbolic Regressor: ", mean_squared_error(y_test, symbolic_model.predict(X_test)))

As we can see, the performance of both model is comparable. The difference between the two model is the fact that the symbolic model is expressed in terms of analytic *Meijer G-functions* whose expression is short and concise. Let us display the epxression for the faithful model we just obtained.

In [None]:
symbolic_model.get_expression()

As we can see, this model only involves one Bessel function.This model is expressed in terms of the following linear combinations of the features:

In [None]:
symbolic_model.print_projections()

## References

<div class="cite2c-biblio"></div>