### Introduction
The purpose of this code is to demonstrate how to define and use custom primitives in the Evolutionary Forest regression model.

Evolutionary Forest is a machine learning algorithm that combines genetic programming with decision trees to create an ensemble model for regression tasks. It uses genetic programming to evolve a set of complex features, with each feature represented as a computer program. Each program contains primitives, which represent functions or operators that can be used to build complex features.

By default, Evolutionary Forest comes with a set of built-in primitives, such as addition, subtraction, multiplication, and division. However, users can define their own custom primitives to build more complex features.

The code first trains an instance of the Evolutionary Forest regression model using the default set of primitives. It then defines a new custom primitive, `Sin`, which computes the sine of a given value. The `custom_primitive` argument of the `EvolutionaryForestRegressor` constructor is used to pass a dictionary to specify custom primitives.

To begin, we trained an instance of the Evolutionary Forest regression model using the default set of primitives. The experimental results show an $R^2$ score of 0.72.

In [1]:
import sys
sys.path.insert(0, '../')

import random
import numpy as np
from sklearn.datasets import make_friedman1
from sklearn.metrics import r2_score
from sklearn.model_selection import train_test_split

from evolutionary_forest.forest import EvolutionaryForestRegressor

random.seed(0)
np.random.seed(0)

# Generate dataset
X, y = make_friedman1(n_samples=100, n_features=5, random_state=0)
# Split dataset
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

# Train Evolutionary Forest
r = EvolutionaryForestRegressor(max_height=5, normalize=True, select='AutomaticLexicase',
                                gene_num=10, boost_size=100, n_gen=20, n_pop=200, cross_pb=1,
                                base_learner='Random-DT', verbose=True, n_process=1)
r.fit(x_train, y_train)
print(r2_score(y_test, r.predict(x_test)))

IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html


   	      	                                fitness                                 	                      size                     
   	      	------------------------------------------------------------------------	-----------------------------------------------
gen	nevals	avg      	gen	max     	min      	nevals	std     	avg  	gen	max	min	nevals	std    
0  	200   	-0.277553	0  	0.750226	-0.634022	200   	0.233268	43.98	0  	60 	30 	200   	5.83435
1  	200   	-0.358072	1  	0.425801	-0.64822 	200   	0.185575	44.19	1  	60 	30 	200   	6.85448
2  	200   	-0.396818	2  	0.14375 	-0.680851	200   	0.159129	44.27	2  	66 	30 	200   	7.944  
3  	200   	-0.407203	3  	0.551556	-0.700657	200   	0.162068	44.07	3  	60 	30 	200   	7.15368
4  	200   	-0.438888	4  	0.0888584	-0.674901	200   	0.148646	42.11	4  	60 	30 	200   	6.10638
5  	200   	-0.424578	5  	0.312934 	-0.673571	200   	0.154473	41.5 	5  	56 	32 	200   	5.87112
6  	200   	-0.442436	6  	0.358423 	-0.673963	200   	0.138987	41.76	6  	58 	30 	200 

### Custom Defined Primitives

Now, we use the `custom_primitive` argument of the `EvolutionaryForestRegressor` constructor to pass a dictionary that maps each custom primitive name to a tuple containing the corresponding function and the number of arguments it takes.

Next, we train a new instance of the Evolutionary Forest regression model using the custom primitives and evaluate its performance on the test set using the $R^2$ score. The experimental results demonstrate an improvement in performance, with the $R^2$ score increasing from 0.72 to 0.73. This improvement indicates the effectiveness of using custom primitives in the Evolutionary Forest regression model to enhance its performance on regression tasks.

In [2]:
# Define Custom Primitives
r = EvolutionaryForestRegressor(max_height=5, normalize=True, select='AutomaticLexicase',
                                gene_num=10, boost_size=100, n_gen=20, n_pop=200, cross_pb=1,
                                base_learner='Random-DT', verbose=True, n_process=1,
                                basic_primitive='Add,Mul,Div',
                                custom_primitive={
                                    'Sin': (np.sin, 1)
                                })
r.fit(x_train, y_train)
print(r2_score(y_test, r.predict(x_test)))

   	      	                                fitness                                 	                      size                     
   	      	------------------------------------------------------------------------	-----------------------------------------------
gen	nevals	avg      	gen	max     	min      	nevals	std     	avg  	gen	max	min	nevals	std    
0  	200   	-0.287499	0  	0.726149	-0.645282	200   	0.236723	44.57	0  	68 	32 	200   	6.16564
1  	200   	-0.358474	1  	0.312388	-0.681907	200   	0.170254	44.57	1  	66 	32 	200   	5.7875 
2  	200   	-0.386404	2  	0.510312	-0.685204	200   	0.157304	45.62	2  	72 	34 	200   	6.22379
3  	200   	-0.403728	3  	0.304373	-0.663364	200   	0.156506	45.17	3  	66 	34 	200   	5.76984
4  	200   	-0.430501	4  	0.20794 	-0.694122	200   	0.135206	45.17	4  	58 	34 	200   	5.06469
5  	200   	-0.4453  	5  	0.0619265	-0.6594  	200   	0.12675 	44.86	5  	60 	34 	200   	4.8146 
6  	200   	-0.436184	6  	0.287351 	-0.688235	200   	0.134184	44.94	6  	60 	32 	200  

In summary, this code demonstrates how to extend the functionality of the Evolutionary Forest regression model by defining and using custom primitives. This is useful for tackling more complex regression problems.