Name	Name	Last commit message	Last commit date
parent directory ..
README.md	README.md
network.py	network.py
nn-gridworld.py	nn-gridworld.py

NN-Gridworld Experiment

In comparison to the Q-Gridworld experiment, we're now going train an agent to play Gridworld by utilizing a Neural Network (NN) instead of storing all Q-values in a table. By sending the state of the game as an input, the network will be trained to estimate the Q-values of that state.

The state of the game is defined as three zero-filled matricies of the field (3 x [field_size, field_size]) where the coordinate of the actor, pit and the goal is marked by setting that element of the matrix to 1.

At each step, the flattened state s (defined as a single row of size 3 * field_size * field_size) is fed into the Neural Network. The network then outputs a row of Q-values, each representing the values of the actions in the state s. The agent then performs an action according to its policy π(s) = max(Q(s,a)) or another random action given the exploration rate ε.

Once the agent has taken its action, the Q-values of the next state s' are estimated, and the max(Q(s',a')) is then used to update the previous Q-value Q(s,a) as stated in Q-Gridworld.

Get Started

To get started, use the terminal to navigate to ml-in-tf/experiments/nn-gridworld/and run python nn-gridworld.py.

To see the graph and plots using tensorboard, use the terminal to navigate to ml-in-tf/ and run tensorboard --logdir logs/. Wait for the following message:

Starting TensorBoard on port <port>

And then open up a browser and go to localhost:<port>.

Network

The network in this experiment has two fixed hidden layers, but with customizable number of neurons in each hidden layers. Given the variables input_size and action_size, the network structure can be described as:

input_size = 3 * field_size * field_size
action_size = 4 (up, down, left, right)

Input	Hidden L1	Hidden L2	Output
`input_size`	80 [c]	80 [c]	`action_size`

[c] - Customizable

Parameters

The customizable parameters of this experiment - and their default values - are as follows:

Q Learning settings

episodes - 100 - Number of minibatches to run the training on.
gamma - 0.99- Discount (ɣ) to use when Q-value is updated.
initial_epsilon - 1.0 - Initial epsilon value that epsilon will be annealed from.
final_epsilon - 0.1 - Final epsilon value that epsilon will be annealed to.

Network settings

hidden_l1 - 80 - Number of neurons in hidden layer 1.
hidden_l2 - 80 - Number of neurons in hidden layer 2.

Training settings

learning_rate - 0.001 - Learning rate of the optimizer.
optimizer - rmsprop - If another optimizer should be used [adam, gradientdescent, rmsprop]. Defaults to gradient descent.
train_step_limit - 300 - Limits the number of steps in training to avoid badly performing agents running forever.

General settings

field_size - 4 - Determines width and height of the Gridworld field.
status_update - 10 - How often to print an status update.
use_gpu - False - If TensorFlow operations should run on GPU rather than CPU.
random_seed - 123 - Number of minibatches to run the training on.

Testing settings

run_test - True - If the final model should be tested.
test_runs - 100 - Number of times to run the test.
test_epsilon - 0.1 - Epsilon to use on test run.
test_step_limit - 1000 - Limits the number of steps in test to avoid badly performing agents running forever.

Experiment results

The plots above show the agents training progress running with all parameters set to their default values.

As you can see in the plot, something happend to the performance of the agent just after 600 episodes. From that point and beyond, it really got the gist of it and performed really well thereafter. The result of running 100 test runs with the fully trained agent can be seen in the table below.

	Average	Max	Min
Steps	2.66	9	1
Rewards	0.48	1	0.1

Next step

If you have played around anythin with the field_size, you'll quickly notice that the network runs into trouble as you are approaching 10. This is partly due to the exponentially larger state we send in to the network, but also because a single row of states actually isn't the best possible representation of the game.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nn-gridworld

nn-gridworld

README.md

README.md

network.py

network.py

nn-gridworld.py

nn-gridworld.py

README.md

NN-Gridworld Experiment

Get Started

Network

Parameters

Q Learning settings

Network settings

Training settings

General settings

Testing settings

Experiment results

Next step

Files

nn-gridworld

Directory actions

More options

Directory actions

More options

Latest commit

History

nn-gridworld

Folders and files

parent directory

README.md

README.md

network.py

network.py

nn-gridworld.py

nn-gridworld.py

README.md

NN-Gridworld Experiment

Get Started

Network

Parameters

Q Learning settings

Network settings

Training settings

General settings

Testing settings

Experiment results

Next step