# FUNCTION GENERATOR using Policy Gradient

Useful links:
Policy Gradient Explanation: http://karpathy.github.io/2016/05/31/rl/ <br>
Example of Policy Gradient: https://github.com/keon/policy-gradient

In [1]:
import numpy as np
from keras.models import Sequential, Model
from keras.layers import TimeDistributed, Dense, Reshape, Flatten, GRU, Input, Embedding
from keras.optimizers import Adam
from keras.layers.convolutional import Convolution2D
from PolicyGradientModel import PolicyGradientModel
from RewardCalculator import RewardCalculator

Using TensorFlow backend.


In [2]:
ALLOWED_PARAMETERS = list('XY')
ALLOWED_SYMBOLS = ALLOWED_PARAMETERS + list('0123456789+-*/#')
NUM_SYMBOLS = len(ALLOWED_SYMBOLS)
MAX_LENGTH = 30 # Max length of the output expression
CORRECT_EXPRESSION = "3*X+2*Y"

### DEFINE MODEL

In [3]:
def getModel():
    # Trying to neglect input
    input1 = Input(shape=(1,1))
    # TODO: Add noise layer to make output vary
    x = GRU(32)(input1)
    out = Dense(NUM_SYMBOLS, activation='sigmoid')(x)
    model = Model(inputs=input1, outputs=out)
    model.compile(optimizer=Adam(),
                loss='categorical_crossentropy')
    return model

In [4]:
rewardCalculator = RewardCalculator(correctExpression=CORRECT_EXPRESSION,
                                    parameters=ALLOWED_PARAMETERS,
                                    usingFunctionDifferenceReward=False,
                                    usingCompilableReward=True,
                                    usingLengthReward=True,
                                    usingFoundSymbolReward=True,
                                    usingFile=False)

In [5]:
model = getModel()
pgModel = PolicyGradientModel(model=model,
                              allowedSymbol=ALLOWED_SYMBOLS,
                              numSymbol=NUM_SYMBOLS,
                              maxLength=MAX_LENGTH,
                              rewardCalculator=rewardCalculator,
                              learningRate=0.0001,
                              fileName="Model1.hdf5")

## TRAINING

In [6]:
pgModel.train(input=np.ones((1,1,1)))

Epoch: 0	Loss: 23.7324594498	Example Output: 52Y1	Example Reward:  -1
Saving Weight
Epoch: 1	Loss: 23.7449872971	Example Output: 786/4X	Example Reward:  -1.0
Epoch: 2	Loss: 23.7600557327	Example Output: +7/X7*857069-/+79-18+	Example Reward:  -0.8799999999999999
Epoch: 3	Loss: 23.7723045349	Example Output: /*4**	Example Reward:  -0.8600000000000001
Epoch: 4	Loss: 23.7927639008	Example Output: 0Y+3Y	Example Reward:  -0.9200000000000002
Epoch: 5	Loss: 23.8141292572	Example Output: 51XX086391X2095YX47	Example Reward:  -1
Epoch: 6	Loss: 23.8192802429	Example Output: *	Example Reward:  -0.96
Epoch: 7	Loss: 23.8280546188	Example Output: 94495Y5646/	Example Reward:  -1
Epoch: 8	Loss: 23.8391956329	Example Output: Y715	Example Reward:  -1
Epoch: 9	Loss: 23.8451564789	Example Output: 6113*6*YX4	Example Reward:  -0.96
Epoch: 10	Loss: 23.8513959885	Example Output: 	Example Reward:  -1.0
Saving Weight
Epoch: 11	Loss: 23.8640470505	Example Output: 	Example Reward:  -1.0
Epoch: 12	Loss: 23.8886007309

KeyboardInterrupt: 