# FUNCTION GENERATOR using Policy Gradient

Useful links:
Policy Gradient Explanation: http://karpathy.github.io/2016/05/31/rl/ <br>
Example of Policy Gradient: https://github.com/keon/policy-gradient

In [1]:
import numpy as np
from keras.models import Sequential, Model
from keras.layers import TimeDistributed, Dense, Reshape, Flatten, GRU, Input, Embedding
from keras.optimizers import Adam
from keras.layers.convolutional import Convolution2D
from PolicyGradientModel import PolicyGradientModel
from RewardCalculator import RewardCalculator

Using TensorFlow backend.


In [2]:
ALLOWED_PARAMETERS = list('XY')
ALLOWED_SYMBOLS = ALLOWED_PARAMETERS + list('0123456789+-*/#')
NUM_SYMBOLS = len(ALLOWED_SYMBOLS)
MAX_LENGTH = 30 # Max length of the output expression
CORRECT_EXPRESSION = "3*X+2*Y"

### DEFINE MODEL

In [3]:
def getModel():
    # Trying to neglect input
    input1 = Input(shape=(1,1))
    # TODO: Add noise layer to make output vary
    x = GRU(32)(input1)
    out = Dense(NUM_SYMBOLS, activation='sigmoid')(x)
    model = Model(inputs=input1, outputs=out)
    model.compile(optimizer=Adam(),
                loss='categorical_crossentropy')
    return model

In [4]:
rewardCalculator = RewardCalculator(correctExpression=CORRECT_EXPRESSION,
                                    parameters=ALLOWED_PARAMETERS,
                                    usingFunctionDifferenceReward=False,
                                    usingCompilableReward=True,
                                    usingLengthReward=True,
                                    usingFile=False)

In [5]:
model = getModel()
pgModel = PolicyGradientModel(model=model,
                              allowedSymbol=ALLOWED_SYMBOLS,
                              numSymbol=NUM_SYMBOLS,
                              maxLength=MAX_LENGTH,
                              rewardCalculator=rewardCalculator,
                              learningRate=0.0001,
                              fileName="Model1.hdf5")

## TRAINING

In [6]:
pgModel.train(input=np.ones((1,1,1)))

Epoch: 0	Loss: 23.8920276642	Example Output: 6/90	Example Reward:  0.92
Saving Weight
Epoch: 1	Loss: 23.9066110611	Example Output: 	Example Reward:  -1.0
Epoch: 2	Loss: 23.9249277115	Example Output: 054719Y4*1X1566	Example Reward:  -1
Epoch: 3	Loss: 23.9386295319	Example Output: 5Y90	Example Reward:  -1
Epoch: 4	Loss: 23.9565574646	Example Output: -13+36+35+7Y/+0091/171	Example Reward:  -1
Epoch: 5	Loss: 23.9669418335	Example Output: 7X+0609*16+3X9X79+80*1	Example Reward:  -1
Epoch: 6	Loss: 23.9653806686	Example Output: 496-504+//16/-88914591/4-1**6+	Example Reward:  -1
Epoch: 7	Loss: 23.9670284271	Example Output: 74X-22/+1/18*+91441687570*X39Y	Example Reward:  -1
Epoch: 8	Loss: 23.9701984406	Example Output: 9+4Y	Example Reward:  -1
Epoch: 9	Loss: 23.9725664139	Example Output: 5-	Example Reward:  -1
Epoch: 10	Loss: 23.9733650208	Example Output: X60Y	Example Reward:  -1
Saving Weight
Epoch: 11	Loss: 23.975005722	Example Output: 5/99133	Example Reward:  0.86
Epoch: 12	Loss: 23.9751838684

Epoch: 96	Loss: 24.5377794266	Example Output: 54*Y572520+Y--8828085+403772+-	Example Reward:  -1
Epoch: 97	Loss: 24.5438961029	Example Output: 30845Y-9+1X629//1-15+2*8-17971	Example Reward:  -1
Epoch: 98	Loss: 24.5364681244	Example Output: 94/943+70+961Y1+-340-6403Y/294	Example Reward:  -1
Epoch: 99	Loss: 24.5192964554	Example Output: 9X20	Example Reward:  -1
Epoch: 100	Loss: 24.5197889328	Example Output: 18/+8X6-8945628X5Y7221/1	Example Reward:  -1
Saving Weight
Epoch: 101	Loss: 24.5256254196	Example Output: X89/39Y1X9046Y5+692*7+-4520790	Example Reward:  -1
Epoch: 102	Loss: 24.5189512253	Example Output: 3	Example Reward:  0.98
Epoch: 103	Loss: 24.5030447006	Example Output: 50726X/062421024*32473X17/2929	Example Reward:  -1
Epoch: 104	Loss: 24.4647474289	Example Output: 820799/+306-4X950+37+331/95147	Example Reward:  -1
Epoch: 105	Loss: 24.4399023056	Example Output: 425417254113+15+7+4X42-93YYX06	Example Reward:  -1
Epoch: 106	Loss: 24.4329978943	Example Output: 23++*05Y-58+9-7+7782/5

Epoch: 184	Loss: 24.2885471344	Example Output: 94635X96-++3894/965371259/2971	Example Reward:  -1
Epoch: 185	Loss: 24.2953014374	Example Output: 746/7+8+41+53007Y7202851Y32*96	Example Reward:  -1
Epoch: 186	Loss: 24.2979301453	Example Output: 87*13X*91/*2+359+366-71/891414	Example Reward:  -1
Epoch: 187	Loss: 24.2930192947	Example Output: 19439/*48413822243787139-316+8	Example Reward:  -1
Epoch: 188	Loss: 24.2944231033	Example Output: 85844/1*77--*10857259068+822+4	Example Reward:  -1
Epoch: 189	Loss: 24.3034187317	Example Output: /141072311*-29113680-228211499	Example Reward:  -1
Epoch: 190	Loss: 24.2951791763	Example Output: 6940286	Example Reward:  0.86
Saving Weight
Epoch: 191	Loss: 24.2975269318	Example Output: 95*71+210X3	Example Reward:  -1
Epoch: 192	Loss: 24.3186954498	Example Output: 7+1-778352881194+3747738375695	Example Reward:  0.4
Epoch: 193	Loss: 24.3150762558	Example Output: 679109Y88X3+1618+45X167+0//515	Example Reward:  -1
Epoch: 194	Loss: 24.2842258453	Example Output

KeyboardInterrupt: 