# Bit flipping game with DQN solver

This is the implementation of the DQN solver for the bit flipping game in [**Hindsight Experience Replay**](https://arxiv.org/abs/1707.01495).

**Rerefence**:

1. Marcin Andrychowicz, Filip Wolski, Alex Ray, Jonas Schneider, Rachel Fong, Peter Welinder, Bob McGrew, Josh Tobin, Pieter Abbeel, Wojciech Zaremba, Hindsight Experience Replay


In [1]:
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
from bitflipping import bitflipping as bf
from DQN import DQN

plt.rcParams['figure.figsize'] = [15, 20]
%matplotlib inline

## Set up the bit flipping game environment

In [2]:
init_state = np.array([0,1])
goal = np.ones((2,))
n = 4
bf_env = bf(n)

## Build up the DQN neural network

In [3]:
tf.reset_default_graph()


x = tf.placeholder(tf.float32, shape=(None, 2*n))
y = tf.placeholder(tf.float32, shape=(None, 1))


hid = [256]
agent = DQN(x, hid, n, discount=0.98, eps=1, tau = 0.95, replay_buffer_size=1e5, batch_size=32)

In [None]:
losses, success_all = agent.train_Q(x, y, epoch=20, cycles=50, episode=16, iteration=50)

Epoch 0 Cycle 0: loss is 0.0358
Epoch 0 Cycle 1: loss is 0.0301
Epoch 0 Cycle 2: loss is 0.0571
Epoch 0 Cycle 3: loss is 0.0441
Epoch 0 Cycle 4: loss is 0.00443
Epoch 0 Cycle 5: loss is 0.0644
Epoch 0 Cycle 6: loss is 0.052
Epoch 0 Cycle 7: loss is 0.0547
Epoch 0 Cycle 8: loss is 0.0744
Epoch 0 Cycle 9: loss is 0.0406
Epoch 0 Cycle 10: loss is 0.0431
Epoch 0 Cycle 11: loss is 0.0661
Epoch 0 Cycle 12: loss is 0.0679
Epoch 0 Cycle 13: loss is 0.0959
Epoch 0 Cycle 14: loss is 0.00222
Epoch 0 Cycle 15: loss is 0.127
Epoch 0 Cycle 16: loss is 0.0563
Epoch 0 Cycle 17: loss is 0.0816
Epoch 0 Cycle 18: loss is 0.0299
Epoch 0 Cycle 19: loss is 0.0865
Epoch 0 Cycle 20: loss is 0.0628
Epoch 0 Cycle 21: loss is 0.0325
Epoch 0 Cycle 22: loss is 0.0955
Epoch 0 Cycle 23: loss is 0.0986
Epoch 0 Cycle 24: loss is 0.104
Epoch 0 Cycle 25: loss is 0.0729
Epoch 0 Cycle 26: loss is 0.0766
Epoch 0 Cycle 27: loss is 0.147
Epoch 0 Cycle 28: loss is 0.0785
Epoch 0 Cycle 29: loss is 0.0802
Epoch 0 Cycle 30: loss

Epoch 5 Cycle 9: loss is 0.0125
Epoch 5 Cycle 10: loss is 2.3
Epoch 5 Cycle 11: loss is 2.02
Epoch 5 Cycle 12: loss is 0.6
Epoch 5 Cycle 13: loss is 1.17
Epoch 5 Cycle 14: loss is 1.18
Epoch 5 Cycle 15: loss is 1.77
Epoch 5 Cycle 16: loss is 0.595
Epoch 5 Cycle 17: loss is 0.603
Epoch 5 Cycle 18: loss is 1.49
Epoch 5 Cycle 19: loss is 1.2
Epoch 5 Cycle 20: loss is 1.5
Epoch 5 Cycle 21: loss is 1.2
Epoch 5 Cycle 22: loss is 2.1
Epoch 5 Cycle 23: loss is 1.21
Epoch 5 Cycle 24: loss is 0.618
Epoch 5 Cycle 25: loss is 0.613
Epoch 5 Cycle 26: loss is 1.23
Epoch 5 Cycle 27: loss is 0.31
Epoch 5 Cycle 28: loss is 2.44
Epoch 5 Cycle 29: loss is 2.15
Epoch 5 Cycle 30: loss is 1.85
Epoch 5 Cycle 31: loss is 1.86
Epoch 5 Cycle 32: loss is 1.86
Epoch 5 Cycle 33: loss is 0.942
Epoch 5 Cycle 34: loss is 1.87
Epoch 5 Cycle 35: loss is 1.26
Epoch 5 Cycle 36: loss is 0.947
Epoch 5 Cycle 37: loss is 3.14
Epoch 5 Cycle 38: loss is 2.52
Epoch 5 Cycle 39: loss is 2.21
Epoch 5 Cycle 40: loss is 1.28
Epoch 5

Epoch 10 Cycle 25: loss is 1.52
Epoch 10 Cycle 26: loss is 3.51
Epoch 10 Cycle 27: loss is 3.53
Epoch 10 Cycle 28: loss is 3.53
Epoch 10 Cycle 29: loss is 3.03
Epoch 10 Cycle 30: loss is 2.04
Epoch 10 Cycle 31: loss is 3.54
Epoch 10 Cycle 32: loss is 2.03
Epoch 10 Cycle 33: loss is 5.58
Epoch 10 Cycle 34: loss is 2.54
Epoch 10 Cycle 35: loss is 2.07
Epoch 10 Cycle 36: loss is 3.06
Epoch 10 Cycle 37: loss is 3.59
Epoch 10 Cycle 38: loss is 3.57
Epoch 10 Cycle 39: loss is 3.07
Epoch 10 Cycle 40: loss is 0.533
Epoch 10 Cycle 41: loss is 1.54
Epoch 10 Cycle 42: loss is 4.1
Epoch 10 Cycle 43: loss is 2.59
Epoch 10 Cycle 44: loss is 4.62
Epoch 10 Cycle 45: loss is 3.6
Epoch 10 Cycle 46: loss is 2.07
Epoch 10 Cycle 47: loss is 1.57
Epoch 10 Cycle 48: loss is 4.63
Epoch 10 Cycle 49: loss is 3.62
Epoch 11 Cycle 0: loss is 2.1
Epoch 11 Cycle 1: loss is 3.11
Epoch 11 Cycle 2: loss is 2.09
Epoch 11 Cycle 3: loss is 3.12
Epoch 11 Cycle 4: loss is 3.13
Epoch 11 Cycle 5: loss is 1.05
Epoch 11 Cycle 6

Epoch 15 Cycle 34: loss is 2.62
Epoch 15 Cycle 35: loss is 2.62
Epoch 15 Cycle 36: loss is 4.57
Epoch 15 Cycle 37: loss is 3.92
Epoch 15 Cycle 38: loss is 3.93
Epoch 15 Cycle 39: loss is 3.28


In [None]:
plt.figure()
plt.plot(losses)
plt.show()

## Test DQN

In [None]:
with tf.Session() as sess:
    saver = tf.train.Saver()
    saver.restore(sess, '/tmp/model.ckpt')
    
    success = 0
    for i in range(100):
        
        bf_env.reset()

        for i in range(n):
            X = np.concatenate((bf_env.state.reshape((1,-1)),bf_env.goal.reshape((1,-1))), axis=1)
            Q = sess.run(agent.targetModel, feed_dict={x: X})
            action = np.argmax(Q)
            bf_env.update_state(action)
            if (bf_env.reward(bf_env.state)==0):
                print('Success! state:{0}\t Goal state:{1}'.format(bf_env.state, bf_env.goal))
                success += 1
                break
            elif (i==n-1):
                print('Fail! state:{0}\t Goal state:{1}'.format(bf_env.state, bf_env.goal))
                
    print('Success rate {}%'.format(success))

In [None]:
a=np.array([[1,2,3,2,1,3]])

In [None]:
a.shape

In [None]:
s=np.argmax(a)