# Symbolic Regression

## What is symbolic regression?

> **Symbolic regression is a type of regression analysis that searches the space of mathematical expressions to find the model that best fits a given dataset, both in terms of accuracy and simplicity. No particular model is provided as a starting point to the algorithm.** Instead, initial expressions are formed by randomly combining mathematical building blocks such as mathematical operators, analytic functions, constants, and state variables. (Usually, a subset of these primitives will be specified by the person operating it, but that's not a requirement of the technique.) New equations are then formed by recombining previous equations, using genetic programming.

> By not requiring a specific model to be specified, symbolic regression isn't affected by human bias, or unknown gaps in domain knowledge. It attempts to uncover the intrinsic relationships of the dataset, by letting the patterns in the data itself reveal the appropriate models, rather than imposing a model structure that is deemed mathematically tractable from a human perspective. The fitness function that drives the evolution of the models takes into account not only error metrics (to ensure the models accurately predict the data), but also special complexity measures, thus ensuring that the resulting models reveal the data's underlying structure in a way that's understandable from a human perspective. This facilitates reasoning and favors the odds of getting insights about the data-generating system.

> **While conventional regression techniques seek to optimize the parameters for a pre-specified model structure, symbolic regression avoids imposing prior assumptions, and instead infers the model from the data. In other words, it attempts to discover both model structures and model parameters.**

>> [Symbolic regression. (2017, August 18). In Wikipedia, The Free Encyclopedia. Retrieved
09:24, October 11, 2017](https://en.wikipedia.org/w/index.php?title=Symbolic_regression&oldid=796051029)

&nbsp;
> Many people are familiar with the notion of regression. **Regression means finding the coefficients of a predefined function such that the function best fits some data. A problem with regression analysis is that, if the fit is not good, the experimenter has to keep trying different functions** by hand until a good model for the data is found. Not only is this laborious, but also the results of the analysis depend very much on the skills and inventiveness of the experimenter. Furthermore, even expert users tend to have strong mental biases when choosing functions to fit. For example, in many application areas there is a considerable tradition of using only linear or quadratic models, even when the data might be better fit by a more complex model.

> Symbolic regression attempts to go beyond this. **It consists of finding a function that fits the given data points without making any assumptions about the structure of that function.** Since GP makes no such assumption, it is well suited to this sort of discovery task.
>> [McPhee, N. F., Poli, R., & Langdon, W. B. (2008). Field guide to genetic programming.](https://books.google.com/books?isbn=1409200736)
