# Bit flipping game with DQN solver

This is the implementation of the DQN solver for the bit flipping game in [**Hindsight Experience Replay**](https://arxiv.org/abs/1707.01495).

**Rerefence**:

1. Marcin Andrychowicz, Filip Wolski, Alex Ray, Jonas Schneider, Rachel Fong, Peter Welinder, Bob McGrew, Josh Tobin, Pieter Abbeel, Wojciech Zaremba, Hindsight Experience Replay


In [1]:
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
from bitflipping import bitflipping as bf
from DQN import DQN

plt.rcParams['figure.figsize'] = [10, 12]
%matplotlib inline

  from ._conv import register_converters as _register_converters


## Set up the bit flipping game environment

In [2]:
init_state = np.array([0,1])
goal = np.ones((2,))
n = 2
bf_env = bf(init_state, goal, n)

## Build up the DQN neural network

In [3]:
tf.reset_default_graph()

x = tf.placeholder(tf.float32, shape=(None, n+1))
y = tf.placeholder(tf.float32, shape=(None, 1))

hid = [256]
agent = DQN(x, hid, n, discount=0.99, eps=0.1, annealing=0.9, replay_buffer_size=100)

In [4]:
losses = agent.train_Q(x, y, episode=5, T=n)

[[ 1.  1.  1.  1.  0. -1.]]
Episode 0: loss is 1.29
[[ 1.  0.  1.  1.  1.  0.]
 [ 1.  1.  1.  1.  0. -1.]]
Episode 0: loss is 0.345
[[ 1.  1.  1.  1.  0. -1.]
 [ 0.  1.  1.  0.  0. -1.]
 [ 1.  0.  1.  1.  1.  0.]]
Episode 1: loss is 0.442
[[ 1.  0.  1.  1.  1.  0.]
 [ 0.  1.  1.  0.  0. -1.]
 [ 0.  0.  0.  1.  0.  0.]
 [ 1.  1.  1.  1.  0. -1.]]
Episode 1: loss is 0.279
[[ 1.  0.  1.  1.  1.  0.]
 [ 0.  0.  0.  1.  0.  0.]
 [ 0.  0.  0.  1.  0. -1.]
 [ 0.  1.  1.  0.  0. -1.]
 [ 1.  1.  1.  1.  0. -1.]]
Episode 2: loss is 0.494
[[ 1.  1.  1.  1.  0. -1.]
 [ 0.  1.  1.  0.  0. -1.]
 [ 0.  0.  0.  1.  0.  0.]
 [ 0.  0.  0.  1.  0. -1.]
 [ 1.  0.  0.  0.  0. -1.]
 [ 1.  0.  1.  1.  1.  0.]]
Episode 2: loss is 0.524
[[ 0.  1.  1.  0.  0. -1.]
 [ 1.  0.  1.  1.  1.  0.]
 [ 1.  0.  0.  0.  0. -1.]
 [ 0.  0.  0.  1.  0. -1.]
 [ 0.  0.  0.  1.  0.  0.]
 [ 1.  1.  1.  1.  0. -1.]
 [ 1.  0.  0.  0.  0. -1.]]
Episode 3: loss is 0.407
[[ 0.  0.  0.  1.  0. -1.]
 [ 0.  0.  0.  1.  0.  0.]
 [ 1.  0.