# Lecture 25: Symbolic Regression

```
So far, we have explored linear regression, logistic regression, and their applications in modeling relationships between variables. However, these methods assume predefined model forms—linear or logistic in nature. Symbolic Regression offers a data-driven discovery of both the structure and parameters of the relationship between variables, without imposing a fixed functional form.
```



---

## What is Symbolic Regression?

Symbolic regression is a form of regression analysis that searches the space of mathematical expressions to find the model that best fits a given dataset. Unlike linear or logistic regression, symbolic regression does not start with a predetermined equation structure. Instead, it uses evolutionary algorithms to evolve candidate equations that can model the data.

It combines principles from:
- **Machine Learning**: For automated model discovery.
- **Genetic Programming**: To evolve symbolic representations.
- **Classical Regression**: To fit parameters within the discovered structure.

Symbolic regression is particularly useful when:
- The underlying relationship between variables is unknown.
- We suspect the relationship is non-linear or involves interactions not easily captured by standard models.
- Interpretability of the resulting model is important.



## Endogenous and Exogenous Variables

Symbolic regression operates similarly to other regression methods with:
- **Endogenous Variable**: The response or dependent variable we aim to predict or explain.
- **Exogenous Variables**: The independent variables used to explain or predict the endogenous variable.

However, the functional form relating these variables is not predefined but discovered.



## Example Use Cases

- **Physical Sciences**: Discovering governing equations from experimental data.
- **Economics**: Uncovering non-linear dependencies in economic indicators.
- **Transportation**: Modeling complex interactions between traffic flow, speed, and emissions.



## Mathematical Formulation

Instead of assuming a model like:
$$
Y = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \epsilon
$$
symbolic regression searches for:
$$
Y = f(X_1, X_2, ..., X_n) + \epsilon
$$
where \(f\) is an expression composed of:
- Arithmetic operators (+, -, *, /)
- Analytical functions (sin, cos, log, exp, etc.)
- Constants and parameters to be optimized



## PySR: Symbolic Regression in Python

[PySR](https://github.com/MilesCranmer/PySR) is a high-performance symbolic regression library that combines evolutionary search with a focus on simplicity and interpretability of the discovered models.



### Installation

```bash
pip install pysr
```



### Basic Example

Let us explore symbolic regression using PySR.


In [None]:
# Step 1: Import Libraries
import numpy as np
from pysr import PySRRegressor

[juliapkg] Found dependencies: c:\users\anmpa\appdata\local\programs\python\python313\lib\site-packages\juliapkg\juliapkg.json
[juliapkg] Found dependencies: c:\users\anmpa\appdata\local\programs\python\python313\lib\site-packages\pysr\juliapkg.json
[juliapkg] Found dependencies: c:\users\anmpa\appdata\local\programs\python\python313\lib\site-packages\juliacall\juliapkg.json
[juliapkg] Locating Julia =1.10.0, ^1.10.3
[juliapkg]   It is recommended that you upgrade Julia or install JuliaUp.
[juliapkg] Querying Julia versions from https://julialang-s3.julialang.org/bin/versions.json
[juliapkg]   If you use juliapkg in more than one environment, you are likely to
[juliapkg]   have Julia installed in multiple locations. It is recommended to
[juliapkg]   install JuliaUp (https://github.com/JuliaLang/juliaup) or Julia
[juliapkg]   (https://julialang.org/downloads) yourself.
[juliapkg] Downloading Julia from https://julialang-s3.julialang.org/bin/winnt/x64/1.11/julia-1.11.6-win64.zip
        

In [None]:
# Step 2: Generate Synthetic Data
np.random.seed(0)
X = np.random.randn(1000, 2)
y = X[:, 0] ** 2 + np.sin(X[:, 1]) + 0.1 * np.random.randn(1000)

In [None]:
# Step 3: Model Fitting with PySR
model = PySRRegressor(
    niterations=40,
    binary_operators=["+", "-", "*", "/"],
    unary_operators=["sin", "cos", "exp", "log", "abs"],
    populations=5,
    progress=True
)

model.fit(X, y)

In [None]:
# Step 4: Viewing Discovered Equations
print(model)

In [None]:
# Step 5: Making Predictions
y_pred = model.predict(X)


## Strengths of Symbolic Regression

- **Interpretability**: Yields human-readable equations.
- **Flexibility**: Does not assume any functional form.
- **Exploration of Complex Relationships**: Captures non-linearities, interactions, and transformations.



## Limitations

- **Computational Intensity**: The evolutionary search is computationally demanding.
- **Overfitting Risk**: Especially with noisy data or excessive search depth.
- **Parameter Sensitivity**: The choice of operators and evolutionary settings can influence results.



## Applications in Transportation and Logistics

Symbolic regression can help:
- Model fuel consumption as a function of speed, load, and terrain.
- Discover non-linear dependencies in traffic flow models.
- Derive empirical formulas from simulation or sensor data in smart logistics.



## Summary

Symbolic Regression bridges the gap between black-box machine learning models and interpretable regression by automatically discovering mathematical models from data. With tools like PySR, researchers can not only model complex phenomena but also understand them through concise, symbolic expressions.



---
In the next lecture, we will explore **Generalized Linear Models (GLM)**, which extend linear models to accommodate different types of response variables and link functions.
