# FUNCTION GENERATOR using Policy Gradient

Useful links:
Policy Gradient Explanation: http://karpathy.github.io/2016/05/31/rl/ <br>
Example of Policy Gradient: https://github.com/keon/policy-gradient

In [1]:
import numpy as np
from keras.models import Sequential, Model
from keras.layers import TimeDistributed, Dense, Reshape, Flatten, GRU, Input, Embedding
from keras.optimizers import Adam
from keras.layers.convolutional import Convolution2D
from PolicyGradientModel import PolicyGradientModel
from RewardCalculator import RewardCalculator

Using TensorFlow backend.


In [2]:
ALLOWED_PARAMETERS = list('XY')
ALLOWED_SYMBOLS = ALLOWED_PARAMETERS + list('0123456789+-*/#')
NUM_SYMBOLS = len(ALLOWED_SYMBOLS)
MAX_LENGTH = 10 # Max length of the output expression
CORRECT_EXPRESSION = "3*X+2*Y"

### DEFINE MODEL

In [3]:
def getModel():
    # Trying to neglect input
    input1 = Input(shape=(1,1))
    # TODO: Add noise layer to make output vary
    x = GRU(32)(input1)
    out = Dense(NUM_SYMBOLS, activation='sigmoid')(x)
    model = Model(inputs=input1, outputs=out)
    model.compile(optimizer=Adam(),
                loss='categorical_crossentropy')
    return model

In [4]:
setting = []
setting.append([0.0,0.6,-0.0,0.05,-0.7]) # Converge to Number + Math Symbol

In [5]:
rewardCalculator = RewardCalculator(correctExpression=CORRECT_EXPRESSION,
                                    parameters=ALLOWED_PARAMETERS,
                                    functionDifferenceRewardWeight=0.0,
                                    compilableRewardWeight=0.60, 
                                    lengthRewardWeight=-0.00,
                                    foundMathSymbolWeight=0.05,
                                    foundVariableWeight=0.00,
                                    rewardOffset=-0.7,
                                    usingFile=False)

In [6]:
model = getModel()
pgModel = PolicyGradientModel(model=model,
                              allowedSymbol=ALLOWED_SYMBOLS,
                              numSymbol=NUM_SYMBOLS,
                              maxLength=MAX_LENGTH,
                              rewardCalculator=rewardCalculator,
                              learningRate=0.0001,
                              fileName="Model1.hdf5")

In [7]:
#pgModel.loadWeight()

## TRAINING

In [8]:
pgModel.train(input=np.ones((1,1,1)))

Epoch: 0	Loss: 24.2600702286	Example Output: XX5660Y11/	Example Reward:  -1
Saving Weight
Epoch: 1	Loss: 24.3220811844	Example Output: 81+	Example Reward:  -1
Epoch: 2	Loss: 24.3805141449	Example Output: *Y04643726	Example Reward:  -1
Epoch: 3	Loss: 24.417432785	Example Output: 	Example Reward:  -1
Epoch: 4	Loss: 24.4529138565	Example Output: Y8+4Y/4*74	Example Reward:  -1
Epoch: 5	Loss: 24.4757192612	Example Output: 7*89199Y+2	Example Reward:  -1
Epoch: 6	Loss: 24.4907264709	Example Output: 	Example Reward:  -1
Epoch: 7	Loss: 24.507711792	Example Output: 8	Example Reward:  -0.1
Epoch: 8	Loss: 24.5395874023	Example Output: 	Example Reward:  -1
Epoch: 9	Loss: 24.5744539261	Example Output: 	Example Reward:  -1
Epoch: 10	Loss: 24.6164491653	Example Output: 926Y+8/X9/	Example Reward:  -1
Saving Weight
Epoch: 11	Loss: 24.6509269714	Example Output: 3+/60XX312	Example Reward:  -1
Epoch: 12	Loss: 24.6674991608	Example Output: 0*9+6	Example Reward:  2.77555756156e-17
Epoch: 13	Loss: 24.69137840

Epoch: 106	Loss: 26.6040924072	Example Output: 339X05*786	Example Reward:  -1
Epoch: 107	Loss: 26.6745632172	Example Output: 0645*43*70	Example Reward:  -1
Epoch: 108	Loss: 26.7210302353	Example Output: 482XX0/994	Example Reward:  -1
Epoch: 109	Loss: 26.7357982635	Example Output: 2+1-0230+X	Example Reward:  -1
Epoch: 110	Loss: 26.7651390076	Example Output: -	Example Reward:  -1
Saving Weight
Epoch: 111	Loss: 26.8303514481	Example Output: 8682338+98	Example Reward:  -0.05
Epoch: 112	Loss: 26.8614048004	Example Output: 4+82-Y1733	Example Reward:  -1
Epoch: 113	Loss: 26.930431366	Example Output: ++*83--643	Example Reward:  -1
Epoch: 114	Loss: 26.9592653275	Example Output: 91973-9785	Example Reward:  -0.05
Epoch: 115	Loss: 26.9680830002	Example Output: 4666Y721-2	Example Reward:  -1
Epoch: 116	Loss: 26.9877897263	Example Output: 08824Y1453	Example Reward:  -1
Epoch: 117	Loss: 27.0121234894	Example Output: -*/894-411	Example Reward:  -1
Epoch: 118	Loss: 27.0353897095	Example Output: 34/*7YY

KeyboardInterrupt: 