# Learning constants in a symbolic regression task

One of the long standing "skeletons in the closet" of GP techniques is the constant finding problem. Here we provide a novel solution where such constants are learned during evolution thanks to a second order back-propagation algorithm

Lets first import dcgpy and pyaudi and set up things as to use dCGP on gduals defined over vectorized floats

In [151]:
from dcgpy import expression_gdual_vdouble as expression
from dcgpy import kernel_set_gdual_vdouble as kernel_set
from pyaudi import gdual_vdouble as gdual
import pyaudi
from matplotlib import pyplot as plt
import numpy as np
from random import randint
%matplotlib inline

## 1 - We define the set of kernel functions we will be using

In [152]:
kernels = kernel_set(["sum", "mul", "diff","pdiv"])() # note the call operator (returns the list of kernels)
dCGP = expression(inputs=2, outputs=1, rows=1, cols=15, levels_back=16, arity=2, kernels=kernels, seed = 1)

# 2 - We define the target functions we will use
As target function, we will try to learn some variation to the koza quintic polynomial that is one of the basic tests devised for symbolic regression:

P1: $x^5 - \pi x^3 + x$

P2: $x^5 - \pi x^3 + \frac{2\pi}x$

P3: $\pi x^5 + e x^3 + x$

where $\pi$ and $e$ are present in the exrpessions and have thus to be learned

In [153]:
def targetP1(x):
    return x**5 - np.pi*x**3 + x
def targetP2(x):
    return x**5 - np.pi*x**3 + 2*np.pi / x
def targetP3(x):
    return (np.e*x**5 + x**3)/(x + 1)


In [154]:
# This is the quadratic error of the expression when the constant value is fixed to cin
# The target values are contained in yt
def err(dCGP, yt, cin):
    c = gdual([cin], "c", 2)
    y = dCGP([x,c])[0]
    return (y-gdual(yt))**2 / 10

# This is the quadratic error of the expression when the constant is learned using a, one step, second order method.
# The target values are contained in yt
def err2(dCGP, yt,cin):
    c = gdual([cin], "c", 2)
    y = dCGP([x,c])[0]
    dc =  sum(err(dCGP,yt,cin).get_derivative({"dc":1}))
    dc2 = sum(err(dCGP,yt,cin).get_derivative({"dc":2}))
    if dc2 != 0:
        learned_constant = c - dc/dc2
        y = dCGP([x, learned_constant])[0]
    else:
        learned_constant = c
    return (y-gdual(yt))**2 / 10, learned_constant.constant_cf[0]

# 3 - Problem P1

In [129]:
x = np.linspace(1,3,10)
yt = targetP1(x)
x = gdual(x)

In [132]:
offsprings = 4
max_gen=10000
constant = 1.
chromosome = [1] * offsprings
fitness = [1] *offsprings
cout_off = [1]*offsprings
best_chromosome = dCGP.get()
fit, cout = err2(dCGP,yt,constant)
best_fitness = sum(fit.constant_cf)
for g in range(max_gen):
    for i in range(offsprings):
        dCGP.set(best_chromosome)
        cumsum=0
        dCGP.mutate_active(i)
        fit, cout = err2(dCGP,yt,constant)
        fitness[i] = sum(fit.constant_cf )
        chromosome[i] = dCGP.get()
        cout_off[i] = cout
    for i in range(offsprings):
        if fitness[i] <= best_fitness:
            if (fitness[i] != best_fitness):
                best_chromosome = chromosome[i]
                best_fitness = fitness[i]
                dCGP.set(best_chromosome)
                print("New best found: gen: ", g, " value: ", fitness[i],  dCGP.simplify(["x","c"]), cout_off[i])

    if best_fitness < 1e-7:
        break

New best found: gen:  0  value:  4085.678641291084 [c**2*x] -0.09905548743227777
New best found: gen:  1  value:  3937.7316332690107 [c**5*x**3 - c**4*x**2] 0.7147855554869127
New best found: gen:  1  value:  67.33252622575444 [c*x**4 - 2*c*x**2] 2.346929735085598
New best found: gen:  18  value:  65.80173002473113 [2*c*x**5] 0.3085477207392846
New best found: gen:  32  value:  53.30829799719378 [2*c*x**2 + x**5] -3.868800493696562
New best found: gen:  38  value:  10.206419789384034 [-c*x**2 + x**5 - x**4] 0.8447992392876195
New best found: gen:  44  value:  0.6248353291985256 [-c*x**3 + x**5] 2.9965137674496463
New best found: gen:  72  value:  0.6248353291985245 [-c*x**3 + x**5] 2.9965137674496463
New best found: gen:  3696  value:  2.9635362575055256e-29 [-c*x**3 + x**5 + x] 3.1415926535897927


# 4 - Problem P2


In [159]:
x = np.linspace(0.1,3,10)
yt = targetP2(x)
x = gdual(x)

In [162]:
offsprings = 4
max_gen=10000
constant = 1.
chromosome = [1] * offsprings
fitness = [1] *offsprings
cout_off = [1]*offsprings
best_chromosome = dCGP.get()
fit, cout = err2(dCGP,yt,constant)
best_fitness = sum(fit.constant_cf)
for g in range(max_gen):
    for i in range(offsprings):
        dCGP.set(best_chromosome)
        cumsum=0
        dCGP.mutate_active(i)
        fit, cout = err2(dCGP,yt,constant)
        fitness[i] = sum(fit.constant_cf )
        chromosome[i] = dCGP.get()
        cout_off[i] = cout
    for i in range(offsprings):
        if fitness[i] <= best_fitness:
            if (fitness[i] != best_fitness):
                best_chromosome = chromosome[i]
                best_fitness = fitness[i]
                dCGP.set(best_chromosome)
                print("New best found: gen: ", g, " value: ", fitness[i],  dCGP.simplify(["x","c"]), cout_off[i])

    if best_fitness < 1e-7:
        break

New best found: gen:  0  value:  3209.5894376102124 [c**2*x**4/(c - x) - 2*c**2*x**2/(c - x) + c**2/(c - x) + c*x**4 - 2*c*x**4/(c - x) + 2*c*x**3/(c - x) + 2*c*x**2/(c - x) - 2*c*x/(c - x) - x**4 + x**4/(c - x) - 2*x**3/(c - x) + x**2/(c - x)] 1.5449450361025139
New best found: gen:  2  value:  1075.6984537636474 [-c**2*x**2 + c**2 + c*x**4 + c*x**2 - c*x - x**4] 2.398806307587214
New best found: gen:  3  value:  380.6023634263559 [c**2*x + c**2 - 2*c*x**3 - c*x + x**5] 1.5990573327001094
New best found: gen:  6  value:  376.32362054974664 [c**2*x + c**2 - 2*c*x**3 + c*x + x**5] 1.810095032408087
New best found: gen:  9  value:  50.18451575258108 [c**2*x + c**2/x - 2*c*x**3 + c*x + x**5] 2.231656721128762
New best found: gen:  71  value:  34.26078387890098 [c**2 + c**2/x**2 - c*x**3 + 2*c + x**5 - x**4 + x**2] 0.834974028417865
New best found: gen:  80  value:  34.26078387890097 [c**2 + c**2/x**2 - c*x**3 + 2*c + x**5 - x**4 + x**2] 0.834974028417865
New best found: gen:  245  value: 

# 4 - Problem P3


In [168]:
x = np.linspace(-0.9,1.1,10)
yt = targetP3(x)
x = gdual(x)

In [169]:
offsprings = 4
max_gen=20000
constant = 1.
chromosome = [1] * offsprings
fitness = [1] *offsprings
cout_off = [1]*offsprings
best_chromosome = dCGP.get()
fit, cout = err2(dCGP,yt,constant)
best_fitness = sum(fit.constant_cf)
for g in range(max_gen):
    for i in range(offsprings):
        dCGP.set(best_chromosome)
        cumsum=0
        dCGP.mutate_active(i)
        fit, cout = err2(dCGP,yt,constant)
        fitness[i] = sum(fit.constant_cf )
        chromosome[i] = dCGP.get()
        cout_off[i] = cout
    for i in range(offsprings):
        if fitness[i] <= best_fitness:
            if (fitness[i] != best_fitness):
                best_chromosome = chromosome[i]
                best_fitness = fitness[i]
                dCGP.set(best_chromosome)
                print("New best found: gen: ", g, " value: ", fitness[i],  dCGP.simplify(["x","c"]), cout_off[i])

    if best_fitness < 1e-7:
        break

New best found: gen:  0  value:  73.72081061104146 [x**8 + 2*x**6 + 2*x**4 + x + x**4/c] 1.4773885669889213
New best found: gen:  0  value:  66.66824160764467 [c/(x**2 + x) + x**6 - 2*x**5 + x**4] 0.035581376437456624
New best found: gen:  1  value:  63.22109986023807 [x**6 - 2*x**5 + x**4 + x**3] 1.0
New best found: gen:  1  value:  55.838240314736574 [0] 1.0
New best found: gen:  2  value:  54.546686605835184 [c**2*x**4 - 2*c**2*x**3 + c**2*x**2 + 2*c*x**3 - 3*c*x**2 + c*x + x**2 - x] 0.8308912587171273
New best found: gen:  4  value:  53.712638748254754 [c**2*x**3 + c**2*x**2 - c*x**3 + c*x**2 + c*x - x**2] 0.6673396524674569
New best found: gen:  5  value:  50.87787218996387 [x**4 + x**3 - x**2] 1.0
New best found: gen:  8  value:  50.24116350150763 [x**2 + x - 1] 1.0
New best found: gen:  11  value:  35.20774238042942 [2*c*x - c - 2*x**3 + x**2] 3.8437963745175816
New best found: gen:  14  value:  34.107940186050364 [0] -9.245398900435778e+16
New best found: gen:  14  value:  19.0

In [212]:
dCGP.set(best_chromosome)
print(dCGP.simplify(["x","c"]))

[-c*x**2/(c**2 + c*x - c*x/(c + x) - x**2/(c + x)) + 3*x**3/(c**3 + 2*c**2*x - c**2*x/(c + x) + c*x**2 - 2*c*x**2/(c + x) - x**3/(c + x)) + x**2/(c**4 + 2*c**3*x - 2*c**3*x/(c + x) + c**2*x**2 + c**2*x**2/(c**2 + 2*c*x + x**2) - 4*c**2*x**2/(c + x) + 2*c*x**3/(c**2 + 2*c*x + x**2) - 2*c*x**3/(c + x) + x**4/(c**2 + 2*c*x + x**2))]


In [213]:
costante = 0.9930759331251494
out = err2(dCGP, yt, costante)[0]
e = sum(out.constant_cf)
a = sum(out.get_derivative({"dc":1}))
b = sum(out.get_derivative({"dc":2}))
print("errore: ", e)
print("dc: ", a)
print("dc2: ", b)

errore:  0.12925393681619088
dc:  -0.6602711548105689
dc2:  14849.838034660796


In [214]:
-a/b

4.4463188976838624e-05