# Bit flipping game with DQN solver

This is the implementation of the DQN solver for the bit flipping game in [**Hindsight Experience Replay**](https://arxiv.org/abs/1707.01495).

**Rerefence**:

1. Marcin Andrychowicz, Filip Wolski, Alex Ray, Jonas Schneider, Rachel Fong, Peter Welinder, Bob McGrew, Josh Tobin, Pieter Abbeel, Wojciech Zaremba, Hindsight Experience Replay


In [1]:
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
from bitflipping import bitflipping as bf
from DQN import DQN

plt.rcParams['figure.figsize'] = [15, 20]
%matplotlib inline

  from ._conv import register_converters as _register_converters


## Set up the bit flipping game environment

In [2]:
init_state = np.array([0,1])
goal = np.ones((2,))
n = 3
bf_env = bf(init_state, goal, n)

## Build up the DQN neural network

In [3]:
tf.reset_default_graph()

x = tf.placeholder(tf.float32, shape=(None, 2*n+1))
y = tf.placeholder(tf.float32, shape=(None, 1))


hid = [256]
agent = DQN(x, hid, n, discount=0.99, eps=0.9, annealing=0.9, replay_buffer_size=1e4, batch_size=128)

Instructions for updating:
Use the retry module or similar alternatives.


In [None]:
losses = agent.train_Q(x, y, episode=64, T=n, update_step=32, iteration=50, epoch=5)

Episode 0: loss is 0.00252
Episode 0: loss is 0.00208
Episode 0: loss is 0.000891
Episode 0: loss is 0.000319
Episode 0: loss is 0.000531
Episode 0: loss is 0.000399
Episode 0: loss is 5.72e-05
Episode 0: loss is 0.000163
Episode 0: loss is 0.000467
Episode 0: loss is 0.000488
Episode 0: loss is 0.00029
Episode 0: loss is 0.000245
Episode 0: loss is 0.000463
Episode 0: loss is 0.000358
Episode 0: loss is 0.00016
Episode 0: loss is 7.7e-05
Episode 0: loss is 9e-05
Episode 0: loss is 0.000134
Episode 0: loss is 7.86e-05
Episode 0: loss is 1.15e-05
Episode 0: loss is 2.96e-05
Episode 0: loss is 8.96e-05
Episode 0: loss is 0.000105
Episode 0: loss is 6.48e-05
Episode 0: loss is 4.98e-05
Episode 0: loss is 9.07e-05
Episode 0: loss is 0.0001
Episode 0: loss is 5.45e-05
Episode 0: loss is 2.21e-05
Episode 0: loss is 2.72e-05
Episode 0: loss is 3.41e-05
Episode 0: loss is 1.52e-05
Episode 0: loss is 6.27e-07
Episode 0: loss is 9.12e-06
Episode 0: loss is 2.2e-05
Episode 0: loss is 2.68e-05
Epi

Episode 6: loss is 0.0427
Episode 6: loss is 0.0345
Episode 6: loss is 0.0198
Episode 6: loss is 0.02
Episode 6: loss is 0.0202
Episode 6: loss is 0.0151
Episode 6: loss is 0.0171
Episode 6: loss is 0.0125
Episode 6: loss is 0.0112
Episode 6: loss is 0.00842
Episode 6: loss is 0.00845
Episode 6: loss is 0.00811
Episode 6: loss is 0.00752
Episode 6: loss is 0.00784
Episode 6: loss is 0.00545
Episode 6: loss is 0.00487
Episode 6: loss is 0.00355
Episode 6: loss is 0.00334
Episode 6: loss is 0.00326
Episode 6: loss is 0.00375
Episode 6: loss is 0.00239
Episode 6: loss is 0.0018
Episode 6: loss is 0.00163
Episode 6: loss is 0.00224
Episode 6: loss is 0.00191
Episode 6: loss is 0.00214
Episode 6: loss is 0.00237
Episode 6: loss is 0.00225
Episode 6: loss is 0.00156
Episode 6: loss is 0.00157
Episode 6: loss is 0.0014
Episode 6: loss is 0.00135
Episode 6: loss is 0.0016
Episode 6: loss is 0.00166
Episode 6: loss is 0.000781
Episode 6: loss is 0.000821
Episode 6: loss is 0.000925
Episode 6: l

Episode 11: loss is 0.000389
Episode 12: loss is 0.00646
Episode 12: loss is 0.00637
Episode 12: loss is 0.00796
Episode 12: loss is 0.00573
Episode 12: loss is 0.00582
Episode 12: loss is 0.00605
Episode 12: loss is 0.00501
Episode 12: loss is 0.00377
Episode 12: loss is 0.00478
Episode 12: loss is 0.00299
Episode 12: loss is 0.0042
Episode 12: loss is 0.00267
Episode 12: loss is 0.0037
Episode 12: loss is 0.00267
Episode 12: loss is 0.00252
Episode 12: loss is 0.00234
Episode 12: loss is 0.00189
Episode 12: loss is 0.00225
Episode 12: loss is 0.00262
Episode 12: loss is 0.00227
Episode 12: loss is 0.00204
Episode 12: loss is 0.00145
Episode 12: loss is 0.00139
Episode 12: loss is 0.00172
Episode 12: loss is 0.00155
Episode 12: loss is 0.00157
Episode 12: loss is 0.00117
Episode 12: loss is 0.000756
Episode 12: loss is 0.000998
Episode 12: loss is 0.00118
Episode 12: loss is 0.000764
Episode 12: loss is 0.00115
Episode 12: loss is 0.000618
Episode 12: loss is 0.000802
Episode 12: loss

Episode 17: loss is 0.0021
Episode 17: loss is 0.00237
Episode 17: loss is 0.00321
Episode 17: loss is 0.00288
Episode 17: loss is 0.00272
Episode 17: loss is 0.00226
Episode 17: loss is 0.00232
Episode 18: loss is 0.0164
Episode 18: loss is 0.0183
Episode 18: loss is 0.0133
Episode 18: loss is 0.00983
Episode 18: loss is 0.0123
Episode 18: loss is 0.0103
Episode 18: loss is 0.0119
Episode 18: loss is 0.00923
Episode 18: loss is 0.00658
Episode 18: loss is 0.00767
Episode 18: loss is 0.00788
Episode 18: loss is 0.00944
Episode 18: loss is 0.00925
Episode 18: loss is 0.00811
Episode 18: loss is 0.00565
Episode 18: loss is 0.00665
Episode 18: loss is 0.00872
Episode 18: loss is 0.00806
Episode 18: loss is 0.00678
Episode 18: loss is 0.00484
Episode 18: loss is 0.00602
Episode 18: loss is 0.00564
Episode 18: loss is 0.00616
Episode 18: loss is 0.0049
Episode 18: loss is 0.00524
Episode 18: loss is 0.00459
Episode 18: loss is 0.00366
Episode 18: loss is 0.00479
Episode 18: loss is 0.00458


Episode 23: loss is 0.00239
Episode 23: loss is 0.0027
Episode 23: loss is 0.0023
Episode 23: loss is 0.00179
Episode 23: loss is 0.00198
Episode 23: loss is 0.00267
Episode 23: loss is 0.00266
Episode 23: loss is 0.00234
Episode 23: loss is 0.00241
Episode 23: loss is 0.00183
Episode 23: loss is 0.00145
Episode 24: loss is 0.00742
Episode 24: loss is 0.00665
Episode 24: loss is 0.00703
Episode 24: loss is 0.00556
Episode 24: loss is 0.00516
Episode 24: loss is 0.00559
Episode 24: loss is 0.0049
Episode 24: loss is 0.00436
Episode 24: loss is 0.0032
Episode 24: loss is 0.00365
Episode 24: loss is 0.00273
Episode 24: loss is 0.00223
Episode 24: loss is 0.00346
Episode 24: loss is 0.002
Episode 24: loss is 0.00211
Episode 24: loss is 0.00233
Episode 24: loss is 0.0026
Episode 24: loss is 0.00232
Episode 24: loss is 0.00201
Episode 24: loss is 0.00217
Episode 24: loss is 0.0016
Episode 24: loss is 0.00254
Episode 24: loss is 0.00127
Episode 24: loss is 0.0018
Episode 24: loss is 0.00152
E

Episode 29: loss is 0.00263
Episode 29: loss is 0.00154
Episode 29: loss is 0.00145
Episode 29: loss is 0.00214
Episode 29: loss is 0.00219
Episode 29: loss is 0.00126
Episode 29: loss is 0.00154
Episode 29: loss is 0.00151
Episode 29: loss is 0.00235
Episode 29: loss is 0.00156
Episode 29: loss is 0.00231
Episode 29: loss is 0.0013
Episode 29: loss is 0.00157
Episode 29: loss is 0.00136
Episode 29: loss is 0.00203
Episode 29: loss is 0.0016
Episode 29: loss is 0.00116
Episode 29: loss is 0.00085
Episode 29: loss is 0.00172
Episode 30: loss is 0.0137
Episode 30: loss is 0.0154
Episode 30: loss is 0.00945
Episode 30: loss is 0.0105
Episode 30: loss is 0.0119
Episode 30: loss is 0.00929
Episode 30: loss is 0.00873
Episode 30: loss is 0.0116
Episode 30: loss is 0.00584
Episode 30: loss is 0.00683
Episode 30: loss is 0.00647
Episode 30: loss is 0.00658
Episode 30: loss is 0.00712
Episode 30: loss is 0.007
Episode 30: loss is 0.00578
Episode 30: loss is 0.00568
Episode 30: loss is 0.00471
E

Episode 35: loss is 0.000981
Episode 35: loss is 0.00111
Episode 35: loss is 0.00224
Episode 35: loss is 0.00119
Episode 35: loss is 0.00156
Episode 35: loss is 0.000913
Episode 35: loss is 0.00113
Episode 35: loss is 0.00102
Episode 35: loss is 0.00181
Episode 35: loss is 0.000757
Episode 35: loss is 0.00118
Episode 35: loss is 0.00139
Episode 35: loss is 0.000655
Episode 35: loss is 0.00108
Episode 35: loss is 0.00101
Episode 35: loss is 0.00199
Episode 35: loss is 0.00124
Episode 35: loss is 0.000738
Episode 35: loss is 0.00199
Episode 35: loss is 0.00128
Episode 35: loss is 0.00136
Episode 35: loss is 0.00133
Episode 35: loss is 0.00116
Episode 35: loss is 0.000843
Episode 36: loss is 0.0235
Episode 36: loss is 0.0164
Episode 36: loss is 0.00699
Episode 36: loss is 0.00314
Episode 36: loss is 0.00393
Episode 36: loss is 0.0092
Episode 36: loss is 0.0112
Episode 36: loss is 0.0103
Episode 36: loss is 0.00581
Episode 36: loss is 0.00298
Episode 36: loss is 0.00159
Episode 36: loss is

Episode 41: loss is 0.00322
Episode 41: loss is 0.00343
Episode 41: loss is 0.0035
Episode 41: loss is 0.00339
Episode 41: loss is 0.00317
Episode 41: loss is 0.00247
Episode 41: loss is 0.00184
Episode 41: loss is 0.00152
Episode 41: loss is 0.00359
Episode 41: loss is 0.00264
Episode 41: loss is 0.00257
Episode 41: loss is 0.00326
Episode 41: loss is 0.00182
Episode 41: loss is 0.0022
Episode 41: loss is 0.00233
Episode 41: loss is 0.00204
Episode 41: loss is 0.00172
Episode 41: loss is 0.0013
Episode 41: loss is 0.00221
Episode 41: loss is 0.00262
Episode 41: loss is 0.0019
Episode 41: loss is 0.00172
Episode 41: loss is 0.00184
Episode 41: loss is 0.0013
Episode 41: loss is 0.000899
Episode 41: loss is 0.00111
Episode 41: loss is 0.00186
Episode 41: loss is 0.0021
Episode 41: loss is 0.000812
Episode 41: loss is 0.00107
Episode 41: loss is 0.00165
Episode 41: loss is 0.00121
Episode 42: loss is 0.0173
Episode 42: loss is 0.0127
Episode 42: loss is 0.00567
Episode 42: loss is 0.0019

Episode 47: loss is 0.00229
Episode 47: loss is 0.00225
Episode 47: loss is 0.00203
Episode 47: loss is 0.00218
Episode 47: loss is 0.00214
Episode 47: loss is 0.00217
Episode 47: loss is 0.00199
Episode 47: loss is 0.00211
Episode 47: loss is 0.0021
Episode 47: loss is 0.0019
Episode 47: loss is 0.00205
Episode 47: loss is 0.00201
Episode 47: loss is 0.00202
Episode 47: loss is 0.00196
Episode 47: loss is 0.00197
Episode 47: loss is 0.00195
Episode 47: loss is 0.00181
Episode 47: loss is 0.00191
Episode 47: loss is 0.0019
Episode 47: loss is 0.00187
Episode 47: loss is 0.00186
Episode 47: loss is 0.00182
Episode 47: loss is 0.00186
Episode 47: loss is 0.00165
Episode 47: loss is 0.00184
Episode 47: loss is 0.00176
Episode 47: loss is 0.00164
Episode 47: loss is 0.00174
Episode 47: loss is 0.00178
Episode 47: loss is 0.00173
Episode 47: loss is 0.00172
Episode 47: loss is 0.00173
Episode 47: loss is 0.00169
Episode 47: loss is 0.00164
Episode 47: loss is 0.00167
Episode 47: loss is 0.0

Episode 53: loss is 0.0223
Episode 53: loss is 0.0187
Episode 53: loss is 0.0144
Episode 53: loss is 0.0114
Episode 53: loss is 0.0121
Episode 53: loss is 0.0122
Episode 53: loss is 0.0117
Episode 53: loss is 0.0117
Episode 53: loss is 0.0088
Episode 53: loss is 0.00755
Episode 53: loss is 0.00616
Episode 53: loss is 0.00519
Episode 53: loss is 0.00605
Episode 53: loss is 0.00598
Episode 53: loss is 0.00616
Episode 53: loss is 0.00591
Episode 53: loss is 0.00463
Episode 53: loss is 0.00402
Episode 53: loss is 0.0041
Episode 53: loss is 0.00429
Episode 53: loss is 0.00417
Episode 53: loss is 0.00443
Episode 53: loss is 0.00413
Episode 53: loss is 0.00376
Episode 53: loss is 0.00357
Episode 53: loss is 0.00347
Episode 53: loss is 0.00335
Episode 53: loss is 0.00349
Episode 53: loss is 0.00336
Episode 53: loss is 0.00324
Episode 53: loss is 0.00285
Episode 53: loss is 0.00225
Episode 53: loss is 0.00278
Episode 53: loss is 0.00223
Episode 53: loss is 0.00279
Episode 53: loss is 0.0026
Epi

Episode 58: loss is 0.0019
Episode 58: loss is 0.00199
Episode 58: loss is 0.00197
Episode 58: loss is 0.00204
Episode 58: loss is 0.00177
Episode 58: loss is 0.00194
Episode 58: loss is 0.00187
Episode 58: loss is 0.00167
Episode 59: loss is 0.00153
Episode 59: loss is 0.00159
Episode 59: loss is 0.00163
Episode 59: loss is 0.00128
Episode 59: loss is 0.00183
Episode 59: loss is 0.00164
Episode 59: loss is 0.00172
Episode 59: loss is 0.00154
Episode 59: loss is 0.00176
Episode 59: loss is 0.00148
Episode 59: loss is 0.00153
Episode 59: loss is 0.00126
Episode 59: loss is 0.00161
Episode 59: loss is 0.00146
Episode 59: loss is 0.00155
Episode 59: loss is 0.0014
Episode 59: loss is 0.0016
Episode 59: loss is 0.00154
Episode 59: loss is 0.00122
Episode 59: loss is 0.00115
Episode 59: loss is 0.00147
Episode 59: loss is 0.00124
Episode 59: loss is 0.00118
Episode 59: loss is 0.00103
Episode 59: loss is 0.00134
Episode 59: loss is 0.00106
Episode 59: loss is 0.00131
Episode 59: loss is 0.0

Episode 0: loss is 0.0392
Episode 0: loss is 0.0417
Episode 0: loss is 0.0338
Episode 0: loss is 0.0399
Episode 0: loss is 0.0393
Episode 0: loss is 0.0227
Episode 0: loss is 0.0319
Episode 0: loss is 0.0404
Episode 0: loss is 0.0306
Episode 1: loss is 1.04
Episode 1: loss is 0.886
Episode 1: loss is 0.702
Episode 1: loss is 0.509
Episode 1: loss is 0.321
Episode 1: loss is 0.197
Episode 1: loss is 0.111
Episode 1: loss is 0.0463
Episode 1: loss is 0.0392
Episode 1: loss is 0.0406
Episode 1: loss is 0.0608
Episode 1: loss is 0.0834
Episode 1: loss is 0.1
Episode 1: loss is 0.116
Episode 1: loss is 0.139
Episode 1: loss is 0.158
Episode 1: loss is 0.156
Episode 1: loss is 0.163
Episode 1: loss is 0.152
Episode 1: loss is 0.145
Episode 1: loss is 0.139
Episode 1: loss is 0.119
Episode 1: loss is 0.112
Episode 1: loss is 0.0884
Episode 1: loss is 0.0857
Episode 1: loss is 0.0673
Episode 1: loss is 0.054
Episode 1: loss is 0.0445
Episode 1: loss is 0.0349
Episode 1: loss is 0.0295
Episode 

Episode 7: loss is 0.0275
Episode 7: loss is 0.0316
Episode 7: loss is 0.0402
Episode 7: loss is 0.0198
Episode 7: loss is 0.0344
Episode 7: loss is 0.0237
Episode 7: loss is 0.0204
Episode 7: loss is 0.0272
Episode 7: loss is 0.0193
Episode 7: loss is 0.0201
Episode 7: loss is 0.0289
Episode 7: loss is 0.0352
Episode 7: loss is 0.0295
Episode 7: loss is 0.0159
Episode 7: loss is 0.0151
Episode 7: loss is 0.0272
Episode 7: loss is 0.0221
Episode 7: loss is 0.0328
Episode 7: loss is 0.0293
Episode 7: loss is 0.025
Episode 7: loss is 0.025
Episode 7: loss is 0.0277
Episode 7: loss is 0.0237
Episode 7: loss is 0.0131
Episode 7: loss is 0.0207
Episode 7: loss is 0.0205
Episode 7: loss is 0.0288
Episode 7: loss is 0.0194
Episode 7: loss is 0.031
Episode 7: loss is 0.0247
Episode 7: loss is 0.0104
Episode 7: loss is 0.0208
Episode 7: loss is 0.0243
Episode 7: loss is 0.0203
Episode 7: loss is 0.0183
Episode 7: loss is 0.0266
Episode 7: loss is 0.0205
Episode 8: loss is 0.0191
Episode 8: loss

Episode 13: loss is 0.00198
Episode 13: loss is 0.00163
Episode 13: loss is 0.0023
Episode 13: loss is 0.00222
Episode 13: loss is 0.00229
Episode 13: loss is 0.00196
Episode 13: loss is 0.00176
Episode 13: loss is 0.00172
Episode 13: loss is 0.00188
Episode 13: loss is 0.00188
Episode 13: loss is 0.00197
Episode 13: loss is 0.00202
Episode 13: loss is 0.00192
Episode 13: loss is 0.00164
Episode 13: loss is 0.00171
Episode 13: loss is 0.00169
Episode 13: loss is 0.00145
Episode 13: loss is 0.00178
Episode 13: loss is 0.00167
Episode 13: loss is 0.00153
Episode 13: loss is 0.00173
Episode 13: loss is 0.00173
Episode 13: loss is 0.00164
Episode 13: loss is 0.00152
Episode 13: loss is 0.00178
Episode 13: loss is 0.00141
Episode 13: loss is 0.00194
Episode 13: loss is 0.0016
Episode 13: loss is 0.00122
Episode 13: loss is 0.00164
Episode 14: loss is 0.00608
Episode 14: loss is 0.00608
Episode 14: loss is 0.00526
Episode 14: loss is 0.00568
Episode 14: loss is 0.00466
Episode 14: loss is 0.

Episode 19: loss is 0.00272
Episode 19: loss is 0.00251
Episode 19: loss is 0.00224
Episode 19: loss is 0.00231
Episode 19: loss is 0.00277
Episode 19: loss is 0.00226
Episode 19: loss is 0.00188
Episode 19: loss is 0.00208
Episode 19: loss is 0.00269
Episode 19: loss is 0.00204
Episode 19: loss is 0.00196
Episode 19: loss is 0.002
Episode 19: loss is 0.00237
Episode 19: loss is 0.00196
Episode 19: loss is 0.00203
Episode 19: loss is 0.00199
Episode 19: loss is 0.00178
Episode 19: loss is 0.00209
Episode 19: loss is 0.00197
Episode 19: loss is 0.00195
Episode 19: loss is 0.00195
Episode 19: loss is 0.00208
Episode 19: loss is 0.00199
Episode 19: loss is 0.00179
Episode 19: loss is 0.00169
Episode 19: loss is 0.00183
Episode 19: loss is 0.00175
Episode 19: loss is 0.0018
Episode 19: loss is 0.00152
Episode 19: loss is 0.00159
Episode 19: loss is 0.00188
Episode 20: loss is 0.00595
Episode 20: loss is 0.00526
Episode 20: loss is 0.00475
Episode 20: loss is 0.0051
Episode 20: loss is 0.00

Episode 25: loss is 0.042
Episode 25: loss is 0.0385
Episode 25: loss is 0.0445
Episode 25: loss is 0.0405
Episode 25: loss is 0.0438
Episode 25: loss is 0.0382
Episode 25: loss is 0.0425
Episode 25: loss is 0.0433
Episode 25: loss is 0.0383
Episode 25: loss is 0.0367
Episode 25: loss is 0.0378
Episode 25: loss is 0.0341
Episode 25: loss is 0.033
Episode 25: loss is 0.0352
Episode 25: loss is 0.0302
Episode 25: loss is 0.0294
Episode 25: loss is 0.0264
Episode 25: loss is 0.0262
Episode 25: loss is 0.0216
Episode 25: loss is 0.0228
Episode 25: loss is 0.0207
Episode 25: loss is 0.0192
Episode 25: loss is 0.0167
Episode 26: loss is 1.02
Episode 26: loss is 1.02
Episode 26: loss is 0.955
Episode 26: loss is 0.936
Episode 26: loss is 0.856
Episode 26: loss is 0.779
Episode 26: loss is 0.713
Episode 26: loss is 0.634
Episode 26: loss is 0.57
Episode 26: loss is 0.488
Episode 26: loss is 0.434
Episode 26: loss is 0.36
Episode 26: loss is 0.32
Episode 26: loss is 0.259
Episode 26: loss is 0.

Episode 31: loss is 0.0616
Episode 31: loss is 0.0453
Episode 31: loss is 0.0571
Episode 31: loss is 0.0633
Episode 31: loss is 0.0736
Episode 31: loss is 0.0684
Episode 31: loss is 0.0723
Episode 31: loss is 0.0537
Episode 31: loss is 0.0249
Episode 31: loss is 0.0406
Episode 31: loss is 0.0564
Episode 31: loss is 0.0365
Episode 32: loss is 1.06
Episode 32: loss is 0.997
Episode 32: loss is 0.958
Episode 32: loss is 0.883
Episode 32: loss is 0.807
Episode 32: loss is 0.697
Episode 32: loss is 0.589
Episode 32: loss is 0.534
Episode 32: loss is 0.459
Episode 32: loss is 0.396
Episode 32: loss is 0.314
Episode 32: loss is 0.265
Episode 32: loss is 0.219
Episode 32: loss is 0.18
Episode 32: loss is 0.158
Episode 32: loss is 0.13
Episode 32: loss is 0.106
Episode 32: loss is 0.0869
Episode 32: loss is 0.0639
Episode 32: loss is 0.059
Episode 32: loss is 0.0464
Episode 32: loss is 0.0462
Episode 32: loss is 0.0342
Episode 32: loss is 0.0283
Episode 32: loss is 0.0312
Episode 32: loss is 0.

Episode 38: loss is 0.172
Episode 38: loss is 0.196
Episode 38: loss is 0.197
Episode 38: loss is 0.188
Episode 38: loss is 0.23
Episode 38: loss is 0.188
Episode 38: loss is 0.201
Episode 38: loss is 0.13
Episode 38: loss is 0.149
Episode 38: loss is 0.204
Episode 38: loss is 0.16
Episode 38: loss is 0.196
Episode 38: loss is 0.143
Episode 38: loss is 0.143
Episode 38: loss is 0.151
Episode 38: loss is 0.16
Episode 38: loss is 0.146
Episode 38: loss is 0.146
Episode 38: loss is 0.166
Episode 38: loss is 0.148
Episode 38: loss is 0.149
Episode 38: loss is 0.153
Episode 38: loss is 0.149
Episode 38: loss is 0.141
Episode 38: loss is 0.138
Episode 38: loss is 0.165
Episode 38: loss is 0.148
Episode 38: loss is 0.13
Episode 38: loss is 0.146
Episode 38: loss is 0.174
Episode 38: loss is 0.173
Episode 38: loss is 0.137
Episode 38: loss is 0.148
Episode 38: loss is 0.139
Episode 38: loss is 0.143
Episode 38: loss is 0.123
Episode 38: loss is 0.112
Episode 38: loss is 0.138
Episode 38: loss 

Episode 44: loss is 0.187
Episode 44: loss is 0.164
Episode 44: loss is 0.129
Episode 44: loss is 0.11
Episode 44: loss is 0.11
Episode 44: loss is 0.0797
Episode 44: loss is 0.0792
Episode 44: loss is 0.0675
Episode 44: loss is 0.0564
Episode 44: loss is 0.0493
Episode 44: loss is 0.046
Episode 44: loss is 0.0427
Episode 44: loss is 0.0376
Episode 44: loss is 0.0332
Episode 44: loss is 0.0367
Episode 44: loss is 0.0334
Episode 44: loss is 0.0273
Episode 44: loss is 0.0325
Episode 44: loss is 0.027
Episode 44: loss is 0.0252
Episode 44: loss is 0.0242
Episode 44: loss is 0.0253
Episode 44: loss is 0.0261
Episode 44: loss is 0.0194
Episode 44: loss is 0.0259
Episode 44: loss is 0.0259
Episode 44: loss is 0.0262
Episode 44: loss is 0.0265
Episode 44: loss is 0.0241
Episode 44: loss is 0.0252
Episode 44: loss is 0.0269
Episode 44: loss is 0.0256
Episode 44: loss is 0.0247
Episode 44: loss is 0.0232
Episode 44: loss is 0.0224
Episode 45: loss is 0.0212
Episode 45: loss is 0.0209
Episode 45

Episode 50: loss is 0.0685
Episode 50: loss is 0.0618
Episode 50: loss is 0.0492
Episode 50: loss is 0.0457
Episode 50: loss is 0.0385
Episode 50: loss is 0.0349
Episode 50: loss is 0.0376
Episode 50: loss is 0.0349
Episode 50: loss is 0.0262
Episode 50: loss is 0.0306
Episode 50: loss is 0.0314
Episode 50: loss is 0.0309
Episode 50: loss is 0.0319
Episode 50: loss is 0.0256
Episode 50: loss is 0.0259
Episode 50: loss is 0.0238
Episode 50: loss is 0.0237
Episode 50: loss is 0.0242
Episode 50: loss is 0.0262
Episode 50: loss is 0.0257
Episode 50: loss is 0.0237
Episode 50: loss is 0.0227
Episode 50: loss is 0.0234
Episode 51: loss is 1.07
Episode 51: loss is 1.03
Episode 51: loss is 1.05
Episode 51: loss is 1.03
Episode 51: loss is 1
Episode 51: loss is 0.985
Episode 51: loss is 0.942
Episode 51: loss is 0.965
Episode 51: loss is 0.895
Episode 51: loss is 0.877
Episode 51: loss is 0.81
Episode 51: loss is 0.776
Episode 51: loss is 0.763
Episode 51: loss is 0.688
Episode 51: loss is 0.66

Episode 56: loss is 0.124
Episode 56: loss is 0.113
Episode 56: loss is 0.0992
Episode 56: loss is 0.128
Episode 56: loss is 0.151
Episode 56: loss is 0.15
Episode 56: loss is 0.129
Episode 56: loss is 0.0941
Episode 57: loss is 0.101
Episode 57: loss is 0.13
Episode 57: loss is 0.0982
Episode 57: loss is 0.112
Episode 57: loss is 0.0896
Episode 57: loss is 0.102
Episode 57: loss is 0.101
Episode 57: loss is 0.0999
Episode 57: loss is 0.106
Episode 57: loss is 0.0923
Episode 57: loss is 0.0835
Episode 57: loss is 0.0982
Episode 57: loss is 0.0907
Episode 57: loss is 0.0883
Episode 57: loss is 0.105
Episode 57: loss is 0.0714
Episode 57: loss is 0.0844
Episode 57: loss is 0.0818
Episode 57: loss is 0.0938
Episode 57: loss is 0.1
Episode 57: loss is 0.102
Episode 57: loss is 0.101
Episode 57: loss is 0.0773
Episode 57: loss is 0.0959
Episode 57: loss is 0.0874
Episode 57: loss is 0.0939
Episode 57: loss is 0.0897
Episode 57: loss is 0.0734
Episode 57: loss is 0.104
Episode 57: loss is 0.

Episode 63: loss is 0.127
Episode 63: loss is 0.144
Episode 63: loss is 0.166
Episode 63: loss is 0.126
Episode 63: loss is 0.108
Episode 63: loss is 0.194
Episode 63: loss is 0.166
Episode 63: loss is 0.194
Episode 63: loss is 0.112
Episode 63: loss is 0.144
Episode 63: loss is 0.165
Episode 63: loss is 0.16
Episode 63: loss is 0.129
Episode 63: loss is 0.0819
Episode 63: loss is 0.148
Episode 63: loss is 0.133
Episode 63: loss is 0.132
Episode 63: loss is 0.143
Episode 63: loss is 0.136
Episode 63: loss is 0.112
Episode 63: loss is 0.109
Episode 63: loss is 0.124
Episode 63: loss is 0.132
Episode 63: loss is 0.141
Episode 63: loss is 0.146
Episode 63: loss is 0.131
Episode 63: loss is 0.129
Episode 63: loss is 0.127
Episode 63: loss is 0.142
Episode 63: loss is 0.104
Episode 63: loss is 0.121
Episode 63: loss is 0.109
Episode 63: loss is 0.0897
Episode 63: loss is 0.138
Episode 63: loss is 0.155
Episode 63: loss is 0.134
Episode 63: loss is 0.117
Episode 63: loss is 0.12
Episode 63: 

Episode 5: loss is 0.171
Episode 5: loss is 0.19
Episode 5: loss is 0.145
Episode 5: loss is 0.0559
Episode 5: loss is 0.118
Episode 5: loss is 0.178
Episode 5: loss is 0.234
Episode 5: loss is 0.0964
Episode 5: loss is 0.205
Episode 5: loss is 0.13
Episode 5: loss is 0.174
Episode 5: loss is 0.112
Episode 5: loss is 0.146
Episode 5: loss is 0.131
Episode 5: loss is 0.162
Episode 6: loss is 0.463
Episode 6: loss is 0.518
Episode 6: loss is 0.368
Episode 6: loss is 0.524
Episode 6: loss is 0.43
Episode 6: loss is 0.4
Episode 6: loss is 0.545
Episode 6: loss is 0.43
Episode 6: loss is 0.41
Episode 6: loss is 0.447
Episode 6: loss is 0.438
Episode 6: loss is 0.326
Episode 6: loss is 0.347
Episode 6: loss is 0.29
Episode 6: loss is 0.423
Episode 6: loss is 0.43
Episode 6: loss is 0.4
Episode 6: loss is 0.387
Episode 6: loss is 0.312
Episode 6: loss is 0.376
Episode 6: loss is 0.452
Episode 6: loss is 0.461
Episode 6: loss is 0.285
Episode 6: loss is 0.438
Episode 6: loss is 0.393
Episode 6

Episode 12: loss is 0.165
Episode 12: loss is 0.0764
Episode 12: loss is 0.143
Episode 12: loss is 0.132
Episode 12: loss is 0.14
Episode 12: loss is 0.157
Episode 12: loss is 0.208
Episode 12: loss is 0.169
Episode 12: loss is 0.125
Episode 12: loss is 0.0913
Episode 12: loss is 0.0738
Episode 12: loss is 0.0703
Episode 12: loss is 0.126
Episode 12: loss is 0.149
Episode 12: loss is 0.0878
Episode 12: loss is 0.173
Episode 12: loss is 0.118
Episode 12: loss is 0.125
Episode 12: loss is 0.183
Episode 12: loss is 0.0767
Episode 12: loss is 0.105
Episode 12: loss is 0.179
Episode 12: loss is 0.1
Episode 12: loss is 0.103
Episode 12: loss is 0.0468
Episode 12: loss is 0.107
Episode 12: loss is 0.0692
Episode 12: loss is 0.109
Episode 12: loss is 0.165
Episode 12: loss is 0.103
Episode 12: loss is 0.157
Episode 12: loss is 0.0886
Episode 12: loss is 0.129
Episode 12: loss is 0.0923
Episode 12: loss is 0.0958
Episode 12: loss is 0.0788
Episode 12: loss is 0.193
Episode 12: loss is 0.106
Epi

Episode 18: loss is 0.159
Episode 18: loss is 0.181
Episode 18: loss is 0.188
Episode 18: loss is 0.171
Episode 18: loss is 0.175
Episode 18: loss is 0.173
Episode 18: loss is 0.161
Episode 18: loss is 0.132
Episode 18: loss is 0.0923
Episode 18: loss is 0.0914
Episode 18: loss is 0.156
Episode 18: loss is 0.177
Episode 18: loss is 0.128
Episode 18: loss is 0.112
Episode 18: loss is 0.154
Episode 18: loss is 0.181
Episode 18: loss is 0.197
Episode 18: loss is 0.0563
Episode 18: loss is 0.186
Episode 18: loss is 0.226
Episode 19: loss is 0.275
Episode 19: loss is 0.301
Episode 19: loss is 0.294
Episode 19: loss is 0.161
Episode 19: loss is 0.214
Episode 19: loss is 0.225
Episode 19: loss is 0.296
Episode 19: loss is 0.332
Episode 19: loss is 0.205
Episode 19: loss is 0.172
Episode 19: loss is 0.243
Episode 19: loss is 0.247
Episode 19: loss is 0.212
Episode 19: loss is 0.212
Episode 19: loss is 0.248
Episode 19: loss is 0.253
Episode 19: loss is 0.321
Episode 19: loss is 0.286
Episode 1

Episode 24: loss is 0.271
Episode 24: loss is 0.21
Episode 25: loss is 0.184
Episode 25: loss is 0.201
Episode 25: loss is 0.179
Episode 25: loss is 0.169
Episode 25: loss is 0.202
Episode 25: loss is 0.156
Episode 25: loss is 0.197
Episode 25: loss is 0.202
Episode 25: loss is 0.207
Episode 25: loss is 0.174
Episode 25: loss is 0.193
Episode 25: loss is 0.158
Episode 25: loss is 0.201
Episode 25: loss is 0.19
Episode 25: loss is 0.173
Episode 25: loss is 0.164
Episode 25: loss is 0.16
Episode 25: loss is 0.153
Episode 25: loss is 0.211
Episode 25: loss is 0.152
Episode 25: loss is 0.176
Episode 25: loss is 0.175
Episode 25: loss is 0.181
Episode 25: loss is 0.173
Episode 25: loss is 0.187
Episode 25: loss is 0.168
Episode 25: loss is 0.187
Episode 25: loss is 0.151
Episode 25: loss is 0.162
Episode 25: loss is 0.161
Episode 25: loss is 0.185
Episode 25: loss is 0.174
Episode 25: loss is 0.167
Episode 25: loss is 0.174
Episode 25: loss is 0.169
Episode 25: loss is 0.201
Episode 25: los

Episode 31: loss is 0.838
Episode 31: loss is 0.819
Episode 31: loss is 0.744
Episode 31: loss is 0.698
Episode 31: loss is 0.71
Episode 31: loss is 0.64
Episode 31: loss is 0.602
Episode 31: loss is 0.528
Episode 31: loss is 0.528
Episode 31: loss is 0.515
Episode 31: loss is 0.497
Episode 31: loss is 0.418
Episode 31: loss is 0.408
Episode 31: loss is 0.459
Episode 31: loss is 0.353
Episode 31: loss is 0.393
Episode 31: loss is 0.374
Episode 31: loss is 0.408
Episode 31: loss is 0.365
Episode 31: loss is 0.244
Episode 31: loss is 0.311
Episode 31: loss is 0.308
Episode 31: loss is 0.324
Episode 31: loss is 0.32
Episode 31: loss is 0.325
Episode 31: loss is 0.343
Episode 31: loss is 0.373
Episode 31: loss is 0.348
Episode 31: loss is 0.324
Episode 31: loss is 0.237
Episode 31: loss is 0.352
Episode 32: loss is 1.21
Episode 32: loss is 1.3
Episode 32: loss is 1.19
Episode 32: loss is 1.27
Episode 32: loss is 1.19
Episode 32: loss is 1.18
Episode 32: loss is 1.25
Episode 32: loss is 1.1

Episode 37: loss is 0.18
Episode 37: loss is 0.173
Episode 37: loss is 0.126
Episode 37: loss is 0.125
Episode 37: loss is 0.176
Episode 37: loss is 0.193
Episode 37: loss is 0.161
Episode 37: loss is 0.139
Episode 37: loss is 0.163
Episode 37: loss is 0.193
Episode 37: loss is 0.197
Episode 38: loss is 0.116
Episode 38: loss is 0.157
Episode 38: loss is 0.12
Episode 38: loss is 0.122
Episode 38: loss is 0.165
Episode 38: loss is 0.131
Episode 38: loss is 0.118
Episode 38: loss is 0.135
Episode 38: loss is 0.133
Episode 38: loss is 0.135
Episode 38: loss is 0.115
Episode 38: loss is 0.117
Episode 38: loss is 0.133
Episode 38: loss is 0.179
Episode 38: loss is 0.132
Episode 38: loss is 0.129
Episode 38: loss is 0.113
Episode 38: loss is 0.13
Episode 38: loss is 0.14
Episode 38: loss is 0.15
Episode 38: loss is 0.124
Episode 38: loss is 0.138
Episode 38: loss is 0.15
Episode 38: loss is 0.158
Episode 38: loss is 0.164
Episode 38: loss is 0.137
Episode 38: loss is 0.127
Episode 38: loss i

Episode 44: loss is 0.173
Episode 44: loss is 0.175
Episode 44: loss is 0.192
Episode 44: loss is 0.176
Episode 44: loss is 0.143
Episode 44: loss is 0.184
Episode 44: loss is 0.193
Episode 44: loss is 0.169
Episode 44: loss is 0.194
Episode 44: loss is 0.191
Episode 44: loss is 0.187
Episode 44: loss is 0.146
Episode 44: loss is 0.177
Episode 44: loss is 0.191
Episode 44: loss is 0.186
Episode 44: loss is 0.167
Episode 44: loss is 0.135
Episode 44: loss is 0.146
Episode 44: loss is 0.161
Episode 44: loss is 0.118
Episode 44: loss is 0.147
Episode 44: loss is 0.17
Episode 44: loss is 0.158
Episode 44: loss is 0.128
Episode 44: loss is 0.187
Episode 44: loss is 0.204
Episode 44: loss is 0.181
Episode 44: loss is 0.187
Episode 44: loss is 0.167
Episode 44: loss is 0.169
Episode 44: loss is 0.109
Episode 44: loss is 0.153
Episode 44: loss is 0.158
Episode 44: loss is 0.2
Episode 44: loss is 0.153
Episode 44: loss is 0.173
Episode 44: loss is 0.173
Episode 44: loss is 0.157
Episode 44: los

Episode 50: loss is 0.192
Episode 50: loss is 0.205
Episode 50: loss is 0.196
Episode 50: loss is 0.121
Episode 50: loss is 0.107
Episode 50: loss is 0.19
Episode 50: loss is 0.132
Episode 50: loss is 0.136
Episode 50: loss is 0.0986
Episode 50: loss is 0.148
Episode 50: loss is 0.165
Episode 50: loss is 0.143
Episode 50: loss is 0.211
Episode 50: loss is 0.147
Episode 50: loss is 0.16
Episode 50: loss is 0.13
Episode 50: loss is 0.166
Episode 50: loss is 0.16
Episode 50: loss is 0.114
Episode 50: loss is 0.151
Episode 51: loss is 0.133
Episode 51: loss is 0.141
Episode 51: loss is 0.139
Episode 51: loss is 0.131
Episode 51: loss is 0.134
Episode 51: loss is 0.161
Episode 51: loss is 0.158
Episode 51: loss is 0.157
Episode 51: loss is 0.145
Episode 51: loss is 0.136
Episode 51: loss is 0.16
Episode 51: loss is 0.155
Episode 51: loss is 0.177
Episode 51: loss is 0.141
Episode 51: loss is 0.137
Episode 51: loss is 0.149
Episode 51: loss is 0.142
Episode 51: loss is 0.114
Episode 51: loss

Episode 57: loss is 1.26
Episode 57: loss is 1.22
Episode 57: loss is 1.21
Episode 57: loss is 1.19
Episode 57: loss is 1.22
Episode 57: loss is 1.08
Episode 57: loss is 1.09
Episode 57: loss is 0.958
Episode 57: loss is 1.07
Episode 57: loss is 0.947
Episode 57: loss is 0.848
Episode 57: loss is 0.815
Episode 57: loss is 0.825
Episode 57: loss is 0.808
Episode 57: loss is 0.705
Episode 57: loss is 0.685
Episode 57: loss is 0.617
Episode 57: loss is 0.511
Episode 57: loss is 0.466
Episode 57: loss is 0.465
Episode 57: loss is 0.415
Episode 57: loss is 0.384
Episode 57: loss is 0.337
Episode 57: loss is 0.36
Episode 57: loss is 0.334
Episode 57: loss is 0.264
Episode 57: loss is 0.244
Episode 57: loss is 0.244
Episode 57: loss is 0.261
Episode 57: loss is 0.213
Episode 57: loss is 0.25
Episode 57: loss is 0.237
Episode 57: loss is 0.134
Episode 57: loss is 0.146
Episode 57: loss is 0.179
Episode 57: loss is 0.177
Episode 57: loss is 0.18
Episode 57: loss is 0.207
Episode 57: loss is 0.1

Episode 63: loss is 0.628
Episode 63: loss is 0.433
Episode 63: loss is 0.381
Episode 63: loss is 0.494
Episode 63: loss is 0.491
Episode 63: loss is 0.362
Episode 63: loss is 0.279
Episode 63: loss is 0.349
Episode 63: loss is 0.322
Episode 63: loss is 0.247
Episode 63: loss is 0.28
Episode 63: loss is 0.297
Episode 63: loss is 0.309
Episode 63: loss is 0.291
Episode 63: loss is 0.311
Episode 63: loss is 0.247
Episode 63: loss is 0.232
Episode 63: loss is 0.215
Episode 63: loss is 0.27
Episode 63: loss is 0.227
Episode 63: loss is 0.223
Episode 63: loss is 0.177
Episode 63: loss is 0.22
Episode 63: loss is 0.161
Episode 63: loss is 0.201
Episode 63: loss is 0.224
Episode 63: loss is 0.195
Episode 63: loss is 0.229
Episode 63: loss is 0.178
Episode 63: loss is 0.182
Episode 63: loss is 0.205
Episode 0: loss is 0.957
Episode 0: loss is 0.942
Episode 0: loss is 0.917
Episode 0: loss is 0.922
Episode 0: loss is 0.933
Episode 0: loss is 0.915
Episode 0: loss is 0.894
Episode 0: loss is 0.8

In [None]:
plt.figure()
plt.plot(losses)

## Test DQN

In [None]:
s_0 = agent._sample_state()
goal = agent._sample_state()
print('Initial state:{0}\n Goal state:{1}'.format(s_0, goal))

env = bf(s_0, goal, n)

In [None]:

with tf.Session() as sess:
    saver = tf.train.Saver()
    saver.restore(sess, '/tmp/model.ckpt')
    
    for i in range(n):
        print(env.state)
        _, action = agent.V_value(sess, env.state, goal, x)
        env.update_state(action)
        print(env.state)
        print('reward:{}'.format(env.reward(env.state)))
        
        