# FUNCTION GENERATOR using Policy Gradient

Useful links:
Policy Gradient Explanation: http://karpathy.github.io/2016/05/31/rl/ <br>
Example of Policy Gradient: https://github.com/keon/policy-gradient

In [1]:
import numpy as np
from keras.models import Sequential, Model
from keras.layers import TimeDistributed, Dense, Reshape, Flatten, GRU, Input, Embedding, LSTM
from keras.optimizers import Adam
from keras.layers.convolutional import Convolution2D
from PolicyGradientModel import PolicyGradientModel
from RewardCalculator import RewardCalculator

Using TensorFlow backend.


In [2]:
ALLOWED_PARAMETERS = list('XY')
ALLOWED_SYMBOLS = ALLOWED_PARAMETERS + list('0123456789+-*/#')
NUM_SYMBOLS = len(ALLOWED_SYMBOLS)
MAX_LENGTH = 10 # Max length of the output expression
CORRECT_EXPRESSION = "3*X+2*Y"

### DEFINE MODEL

In [3]:
def getModel():
    # Trying to neglect input
    input1 = Input(shape=(10,NUM_SYMBOLS,))
    x = Flatten()(input1)
    x = Dense(5, activation='relu')(x)
    x = Dense(100, activation='relu')(x)
    x = Dense(100, activation='relu')(x)
    x = Dense(100, activation='relu')(x)
    out = Dense(NUM_SYMBOLS, activation='softmax')(x)
    
    model = Model(inputs=input1, outputs=out)
    model.compile(optimizer=Adam(lr=0.0001),
                loss='categorical_crossentropy')
    return model

In [4]:
rewardCalculator = RewardCalculator(correctExpression=CORRECT_EXPRESSION,
                                    parameters=ALLOWED_PARAMETERS,
                                    functionDifferenceRewardWeight=1,
                                    compilableRewardWeight=1,
                                    lengthRewardWeight=-0.001,
                                    rewardOffset=+0.2,
                                    usingFile=False)

In [5]:
model = getModel()
pgModel = PolicyGradientModel(model=model,
                              allowedSymbol=ALLOWED_SYMBOLS,
                              numSymbol=NUM_SYMBOLS,
                              maxLength=MAX_LENGTH,
                              rewardCalculator=rewardCalculator,
                              learningRate=0.0000001,
                              fileName="Model1.hdf5")
pgModel.loadWeight("CorrectSyntaxModel.hdf5")

## TRAINING

In [6]:
pgModel.train(input=np.zeros((1,10,NUM_SYMBOLS)))

Epoch: 0	Loss: 2.07394477129	Example Output: 5+6/2340/0	Example Reward:  -1
Prob
[[  2.50208229e-01   2.49753371e-01   5.00139855e-02   5.02526797e-02
    4.99045700e-02   4.99861017e-02   5.00249900e-02   4.99344878e-02
    5.00656217e-02   5.00519685e-02   4.99017239e-02   4.99020442e-02
    4.13315249e-08   4.03202129e-08   4.03750064e-08   4.00067464e-08
    5.53263675e-08]]
[[  2.69773310e-12   2.64875513e-12   2.85591055e-02   2.85984818e-02
    2.85785329e-02   2.85597462e-02   2.85707079e-02   2.85640322e-02
    2.85611581e-02   2.85863969e-02   2.85697747e-02   2.85592675e-02
    1.42778993e-01   1.42771855e-01   1.42809406e-01   1.42733306e-01
    1.43199280e-01]]
[[  2.50209570e-01   2.50142783e-01   4.99962978e-02   5.00308424e-02
    4.99569550e-02   4.99982648e-02   5.00218496e-02   4.99261506e-02
    4.99532484e-02   4.99709956e-02   4.99048680e-02   4.98882048e-02
    1.04887423e-08   1.00719291e-08   9.95821470e-09   9.74854064e-09
    1.42383083e-08]]
[[  2.70130499e-

Epoch: 27	Loss: 2.04422874451	Example Output: 5	Example Reward:  0.130906469353
Epoch: 28	Loss: 2.00343642235	Example Output: Y+Y/91/X*6	Example Reward:  -0.506735132838
Epoch: 29	Loss: 2.04183911085	Example Output: 8*6	Example Reward:  0.152542813432
Epoch: 30	Loss: 2.0788974762	Example Output: Y*1+08*698	Example Reward:  -1
Saving Weight
Epoch: 31	Loss: 2.06115843058	Example Output: 2*X	Example Reward:  0.176764921791
Epoch: 32	Loss: 2.00726610422	Example Output: 9/0	Example Reward:  -1
Epoch: 33	Loss: 2.08195979595	Example Output: 08-278*Y-Y	Example Reward:  -1
Epoch: 34	Loss: 2.13381563425	Example Output: 4/X/Y+X+0	Example Reward:  -0.574763284411
Epoch: 35	Loss: 2.09641858339	Example Output: 0*Y*85+Y+X	Example Reward:  0.176764921791
Epoch: 36	Loss: 2.05160061121	Example Output: 593/Y	Example Reward:  -0.511389180588
Epoch: 37	Loss: 2.06611613035	Example Output: 0+0/Y-X+X	Example Reward:  0.128320278918
Epoch: 38	Loss: 2.0955735445	Example Output: Y	Example Reward:  0.149482313769

Epoch: 53	Loss: 1.81823843718	Example Output: 7*X+X+X/69	Example Reward:  0.143161937875
Epoch: 54	Loss: 1.79564113617	Example Output: Y/X-X-6+Y	Example Reward:  -0.506758990447
Epoch: 55	Loss: 1.90861591101	Example Output: 1+X-Y/5	Example Reward:  0.145470266334
Epoch: 56	Loss: 1.80442231894	Example Output: Y+Y-X*Y+86	Example Reward:  -0.0639531907304
Epoch: 57	Loss: 1.76225117445	Example Output: 4*X	Example Reward:  0.230337444003
Epoch: 58	Loss: 1.78812409639	Example Output: 6*Y+X	Example Reward:  0.164602159674
Epoch: 59	Loss: 1.80372980833	Example Output: 0+X-X/515	Example Reward:  0.149436288061
Epoch: 60	Loss: 1.83244863749	Example Output: X-4	Example Reward:  0.146930519405
Saving Weight
Epoch: 61	Loss: 1.86244678497	Example Output: 6*5/0-9*43	Example Reward:  -1
Epoch: 62	Loss: 1.77156885862	Example Output: X+Y*2/Y/X	Example Reward:  -0.51135677062
Epoch: 63	Loss: 1.81821761131	Example Output: X+X-1	Example Reward:  0.175906671873
Epoch: 64	Loss: 1.76338034868	Example Output: 

Epoch: 119	Loss: 1.79488400221	Example Output: X+Y/Y+Y	Example Reward:  0.177627063972
Epoch: 120	Loss: 1.8086166501	Example Output: X	Example Reward:  0.149482313769
Saving Weight
Epoch: 121	Loss: 1.78798279762	Example Output: 4*X+Y-Y	Example Reward:  0.230337444003
Epoch: 122	Loss: 1.75759891272	Example Output: 0+X+2	Example Reward:  0.150778484076
Epoch: 123	Loss: 1.84049180746	Example Output: 1/X*Y	Example Reward:  -0.506754218146
Epoch: 124	Loss: 1.79443860054	Example Output: X-3+X+X	Example Reward:  0.211415201409
Epoch: 125	Loss: 1.81533818245	Example Output: 7	Example Reward:  0.131931772361
Epoch: 126	Loss: 1.77201683521	Example Output: Y+X+7+Y+X	Example Reward:  0.293722088999
Epoch: 127	Loss: 1.86319936514	Example Output: 8-Y+1+X+11	Example Reward:  0.138608561216
Epoch: 128	Loss: 1.8607446909	Example Output: 6+2	Example Reward:  0.132448610884
Epoch: 129	Loss: 1.78421438932	Example Output: X-Y-4-X/X	Example Reward:  0.125768484554
Epoch: 130	Loss: 1.75458760262	Example Outp

Epoch: 151	Loss: 1.84299371243	Example Output: X/6+Y+4	Example Reward:  0.156215597
Epoch: 152	Loss: 1.81436990499	Example Output: Y-Y+X	Example Reward:  0.149482313769
Epoch: 153	Loss: 1.84855707884	Example Output: 6+X+170/X	Example Reward:  -0.511351599614
Epoch: 154	Loss: 1.81340811253	Example Output: Y-3-Y/Y+Y	Example Reward:  0.173377562827
Epoch: 155	Loss: 1.89597376585	Example Output: 1+Y+444*7	Example Reward:  -0.135553836951
Epoch: 156	Loss: 1.81174322367	Example Output: X*4+8+X	Example Reward:  0.214784583112
Epoch: 157	Loss: 1.79860098362	Example Output: Y-2	Example Reward:  0.14819783406
Epoch: 158	Loss: 1.81524407864	Example Output: Y*6/Y*X*Y	Example Reward:  -0.228586591754
Epoch: 159	Loss: 1.83565555811	Example Output: X*Y-Y/X*30	Example Reward:  -0.507461087321
Epoch: 160	Loss: 1.7722065568	Example Output: 0-X*Y	Example Reward:  -0.0727569674652
Saving Weight
Epoch: 161	Loss: 1.78481132984	Example Output: 3	Example Reward:  0.129873053254
Epoch: 162	Loss: 1.78828359842	

Epoch: 201	Loss: 1.85902562141	Example Output: X*X+080/39	Example Reward:  -1
Epoch: 202	Loss: 1.78215178251	Example Output: Y	Example Reward:  0.149482313769
Epoch: 203	Loss: 1.80484772921	Example Output: X*X*Y+4-Y	Example Reward:  -0.457480638685
Epoch: 204	Loss: 1.7791719079	Example Output: X	Example Reward:  0.149482313769
Epoch: 205	Loss: 1.78926357031	Example Output: 2	Example Reward:  0.129355794056
Epoch: 206	Loss: 1.78304877281	Example Output: 6+9+9*Y	Example Reward:  0.10915297778
Epoch: 207	Loss: 1.77167239189	Example Output: 4	Example Reward:  0.130388353684
Epoch: 208	Loss: 1.85244517326	Example Output: 7	Example Reward:  0.131931772361
Epoch: 209	Loss: 1.76064816713	Example Output: Y/4-Y*X+X	Example Reward:  -0.0698683390171
Epoch: 210	Loss: 1.7981818676	Example Output: 2-1*X+Y+X	Example Reward:  0.15077253862
Saving Weight
Epoch: 211	Loss: 1.81794563532	Example Output: X-0	Example Reward:  0.149482313769
Epoch: 212	Loss: 1.80456039906	Example Output: 5-X*Y-5*Y	Example Re

Epoch: 255	Loss: 1.86321768761	Example Output: 1	Example Reward:  0.128836623144
Epoch: 256	Loss: 1.78237059116	Example Output: X+5*X	Example Reward:  0.189241565828
Epoch: 257	Loss: 1.80340846777	Example Output: 7	Example Reward:  0.131931772361
Epoch: 258	Loss: 1.86329418421	Example Output: 7	Example Reward:  0.131931772361
Epoch: 259	Loss: 1.86315348148	Example Output: 6	Example Reward:  0.131417735253
Epoch: 260	Loss: 1.9124286294	Example Output: Y+X+Y	Example Reward:  0.215217598098
Saving Weight
Epoch: 261	Loss: 1.84589930773	Example Output: 9+Y+3	Example Reward:  0.157175933398
Epoch: 262	Loss: 1.79687358141	Example Output: X/9/8*Y/2	Example Reward:  0.133277205972
Epoch: 263	Loss: 1.74783591032	Example Output: Y	Example Reward:  0.149482313769
Epoch: 264	Loss: 1.7722853303	Example Output: 3	Example Reward:  0.129873053254
Epoch: 265	Loss: 1.76456288099	Example Output: X	Example Reward:  0.149482313769
Epoch: 266	Loss: 1.80636063814	Example Output: Y*Y-X-Y+Y	Example Reward:  -0.

Epoch: 301	Loss: 1.78449071646	Example Output: Y/Y+0	Example Reward:  0.128836623144
Epoch: 302	Loss: 1.8136901021	Example Output: 3+X/66*9*X	Example Reward:  0.104952394717
Epoch: 303	Loss: 1.79320065975	Example Output: X-Y	Example Reward:  0.128320278918
Epoch: 304	Loss: 1.85949729681	Example Output: 2*X	Example Reward:  0.176764921791
Epoch: 305	Loss: 1.85331244469	Example Output: Y+Y/Y/7-Y	Example Reward:  0.128393870482
Epoch: 306	Loss: 1.78416475058	Example Output: Y	Example Reward:  0.149482313769
Epoch: 307	Loss: 1.77698692083	Example Output: Y+2+X+3+85	Example Reward:  0.232881984793
Epoch: 308	Loss: 1.85395942926	Example Output: 0/Y-8	Example Reward:  0.124269728659
Epoch: 309	Loss: 1.7438914299	Example Output: 5*3/Y*X+X	Example Reward:  -0.506747146191
Epoch: 310	Loss: 1.79421198368	Example Output: X+Y+9*57*Y	Example Reward:  -0.310328911086
Saving Weight
Epoch: 311	Loss: 1.79030436277	Example Output: Y+0-69-Y*Y	Example Reward:  -0.124208864711
Epoch: 312	Loss: 1.84352000952

Epoch: 351	Loss: 0.760293543339	Example Output: X+Y+X+Y+Y	Example Reward:  0.281600229437
Epoch: 352	Loss: 0.724077379704	Example Output: Y+Y+Y+X+X	Example Reward:  0.281600229437
Epoch: 353	Loss: 0.726206958294	Example Output: X+	Example Reward:  -1
Epoch: 354	Loss: 0.634841519594	Example Output: X+	Example Reward:  -1
Epoch: 355	Loss: 0.658085721731	Example Output: Y+	Example Reward:  -1
Epoch: 356	Loss: 0.682382678986	Example Output: Y+	Example Reward:  -1
Epoch: 357	Loss: 0.627230530977	Example Output: X++XY+X+X	Example Reward:  -1
Epoch: 358	Loss: 0.618194073439	Example Output: Y	Example Reward:  0.149482313769
Epoch: 359	Loss: 0.614266520739	Example Output: X++XX+X+X	Example Reward:  -1
Epoch: 360	Loss: 0.630412179232	Example Output: X++Y+X+X+	Example Reward:  -1
Saving Weight
Epoch: 361	Loss: 0.572537982464	Example Output: Y+	Example Reward:  -1
Epoch: 362	Loss: 0.514842665195	Example Output: X++XY+Y+X	Example Reward:  -1
Epoch: 363	Loss: 0.613489124179	Example Output: X+	Exampl

Epoch: 401	Loss: 0.346378409863	Example Output: Y++XY+Y+Y	Example Reward:  -1
Epoch: 402	Loss: 0.345278310776	Example Output: X++Y+X+X+Y	Example Reward:  1000
Epoch: 403	Loss: 0.339184015989	Example Output: Y++XY+Y+Y	Example Reward:  -1
Epoch: 404	Loss: 0.326997876167	Example Output: Y++Y+Y+X+Y	Example Reward:  0.215864945108
Epoch: 405	Loss: 0.313720458746	Example Output: X++Y+X+Y+X	Example Reward:  1000
Epoch: 406	Loss: 0.278652656078	Example Output: Y++Y+X+X+X	Example Reward:  1000
Epoch: 407	Loss: 0.262896199524	Example Output: Y++Y+X+Y+X	Example Reward:  0.281600229437
Epoch: 408	Loss: 0.232673215866	Example Output: Y++Y+X+X+Y	Example Reward:  0.281600229437
Epoch: 409	Loss: 0.206952801347	Example Output: Y++Y+X+X+X	Example Reward:  1000
Epoch: 410	Loss: 0.200538127124	Example Output: Y++Y+X+X+X	Example Reward:  1000
Saving Weight
Epoch: 411	Loss: 0.198989243805	Example Output: Y++X+X+X+X	Example Reward:  0.281600229437
Epoch: 412	Loss: 0.173636458814	Example Output: Y++Y+X+Y+X	Ex

Epoch: 451	Loss: 0.0131021853536	Example Output: Y++Y+X+X+X	Example Reward:  1000
Epoch: 452	Loss: 0.0126957366243	Example Output: Y++Y+X+X+X	Example Reward:  1000
Epoch: 453	Loss: 0.0123082286678	Example Output: Y++Y+X+X+X	Example Reward:  1000
Epoch: 454	Loss: 0.0119420282543	Example Output: Y++Y+X+X+X	Example Reward:  1000
Epoch: 455	Loss: 0.0115486006252	Example Output: Y++Y+X+X+X	Example Reward:  1000
Epoch: 456	Loss: 0.0112829408608	Example Output: Y++Y+X+X+X	Example Reward:  1000
Epoch: 457	Loss: 0.0109765338711	Example Output: Y++Y+X+X+X	Example Reward:  1000
Epoch: 458	Loss: 0.0106791164726	Example Output: Y++Y+X+X+X	Example Reward:  1000
Epoch: 459	Loss: 0.0103943655267	Example Output: Y++Y+X+X+X	Example Reward:  1000
Epoch: 460	Loss: 0.0101226639003	Example Output: Y++Y+X+X+X	Example Reward:  1000
Saving Weight
Epoch: 461	Loss: 0.00986314220354	Example Output: Y++Y+X+X+X	Example Reward:  1000
Epoch: 462	Loss: 0.0096150111407	Example Output: Y++Y+X+X+X	Example Reward:  1000
E

KeyboardInterrupt: 