This repository contains code for implementing genetic programming for symbolic regression. The code uses a genetic algorithm to evolve a population of trees to find the best regression for a given dataset. Our research paper Using Genetic Programming (GP) to Implement Symbolic Regression.pdf explains our implementation of genetic programming along with our results.
The genetic programming process involves the following steps:
- Initializing the population: The code seeds the initial population of trees.
 - Fitness evaluation: The fitness of each tree in the population is calculated using a fitness function.
 - Tournament selection: A tournament selection process is used to select trees to create the next generation based on their fitness values.
 - Reproduction, crossover, and mutation: The selected trees undergo reproduction, crossover, or mutation operations to create the next generation of trees.
 - New generation evaluation: The fitness of the new generation of trees is evaluated.
 - Iteration: Steps 3 to 5 are repeated until a termination condition is met (e.g., a maximum number of generations or a fitness threshold).
 
The code requires the following dependencies:
pandas: A library for data manipulation and analysis.
You can install the required dependencies using the following command:
pip install pandasTo use the code, follow these steps:
- 
Prepare your dataset: The code assumes that the dataset is in CSV format. You need to provide the path to your dataset in the
bloatControlfunction. - 
Define the fitness function: You need to define a fitness function specific to your problem. The fitness function evaluates the performance of a tree model on the dataset.
 - 
Run the evolutionary tree modeling process: Call the
bloatControlfunction to start the evolution process. This function runs the evolution multiple times and selects the best tree model based on fitness. - 
Interpret the results: The best tree model and its fitness value will be printed at the end of the execution.