# Symbolic regression 

- A type of regression analysis where the goal is to *discover a mathematical expression that best fits a given dataset*. 
- Unlike traditional regression, which assumes a specific model (e.g., linear or polynomial), symbolic regression searches the space of all possible equations to find the most accurate and interpretable (i.e. least complex) one.

Features and use cases:

- **Model Discovery**: Learns both the structure and parameters of the model. Remember that in typical regression, we don't change the model, we estimate the parameters of the model from data. For example, estimating the slope and intercept from data for linear regression.
- No Prior Assumptions: Does not require a predefined model form.
- Interpretability: Produces human-readable equations. (can be very complex, though)

- **For example**: given data relating variables $x_1$ and $x_2$ to $y$, symbolic regression might find an equation like $y = 3.2 \cdot \sin(x_1) + \log(x_2)$


One package we will discuss: [PySR](https://github.com/MilesCranmer/PySR), but there are many (for example [SISSO](https://github.com/rouyang2017/SISSO)).

# PySR 

- PySR (Python Symbolic Regression) is a powerful open-source library for symbolic regression.
- It finds simple, interpretable mathematical expressions that describe data.
- It uses genetic programming to evolve equations over time.
    - Please check the paper for more details: Cranmer, M. (2023). Interpretable Machine Learning for Science with PySR and SymbolicRegression.jl [arXiv:2305.01582](https://doi.org/10.48550/arXiv.2305.01582)

In [1]:
import numpy as np
from pysr import PySRRegressor
np.random.seed(0)
X = np.random.randn(10).reshape(-1,1)
y = np.sin(X) + X 
# X,y

Detected IPython. Loading juliacall extension. See https://juliapy.github.io/PythonCall.jl/stable/compat/#IPython


In [2]:
# Learn equations
model = PySRRegressor(
    niterations=30,
    binary_operators=["+", "*"],
    unary_operators=["cos", "exp", "sin"],
    **{'populations':10,
    'model_selection':"best",},
)

model.fit(X, y)
model.latex()

Compiling Julia backend...
[ Info: Started!
[ Info: Final population:
[ Info: Results saved to:


───────────────────────────────────────────────────────────────────────────────────────────────────
Complexity  Loss       Score      Equation
1           4.874e-01  1.594e+01  y = x₀
3           5.276e-02  1.112e+00  y = 1.5536 * x₀
4           7.461e-15  1.594e+01  y = x₀ + sin(x₀)
6           1.776e-15  7.175e-01  y = x₀ + sin(x₀ + 4.6156e-09)
10          0.000e+00  3.986e+00  y = ((sin(x₀ + 4.9319e-08) + -0.038612) + x₀) + 0.038612
───────────────────────────────────────────────────────────────────────────────────────────────────


'x_{0} + \\sin{\\left(x_{0} + 4.93 \\cdot 10^{-8} \\right)} - 0.0386 + 0.0386'

  - outputs/20250417_143521_J6731c/hall_of_fame.csv


In [3]:
model.sympy()


x0 + sin(x0 + 4.931907e-8) - 0.03861208 + 0.038612034