Genetic Programming with Rademacher Complexity for Symbolic Regression
Genetic Programming (GP) for symbolic regression is often prone to overfitting the training data, causing poor performance on unseen data. A number of recent works in the field have been devoted to regulating this problem by investigating both the structural and functional complexity of GP individuals during the evolutionary process. This work uses the Rademacher complexity and incorporates it into the fitness function of GP, utilising it as a means of controlling the functional complexity of GP individuals. The experiment results confirm that the new GP method has a notable generalization gain compared to the standard GP and Support Vector Regression (SVR) in most of the considered problems. Further investigations also show that the new GP method generates symbolic regression models that could not only release the overfitting trend in standard GP but also are significantly smaller in size compared to their counterparts in standard GP.