Skip to content
This repository

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

evolved neural network solution to rubywarrior

branch: master

Fetching latest commit…

Octocat-spinner-32-eaf2f5

Cannot retrieve the latest commit at this time

Octocat-spinner-32 genomes_by_level
Octocat-spinner-32 saved_populations
Octocat-spinner-32 README.rdoc
Octocat-spinner-32 bootcamp.rb
Octocat-spinner-32 brains.rb
Octocat-spinner-32 darwin.rb
Octocat-spinner-32 genome
Octocat-spinner-32 genome_epic
Octocat-spinner-32 genome_normal
Octocat-spinner-32 init.sh
Octocat-spinner-32 player.rb
README.rdoc

Evolved Neural Network solution to RubyWarrior

This is an attempt at solving Ryan Bates' RubyWarrior using an evolved Neural Network. My aim was to build a solution that was not a 'state machine' which used designed 'if then else' logic to respond to situations but rather an Artificial Intelligence which had figured out the correct responses to situations on its own (with some guided prodding).

I used Darwin's 'LAW* of selection' to evolve the genomes of a population of Neural Networks (NN's) from initially random genomes and eventually arrived at a genome able to get a grade A score on the beginner tower in epic mode. I have not attempted to use this approach on the intermediate tower, one day.…

Thanks to Ryan for writing RubyWarrior, it was great fun writing this solution.

The Solution

The solution uses a 'genome' file to define the 'weights' of a neural network which responds to inputs on each turn. The behaviour of the warrior is entirly dependent on the genome and the solution's code does not contain any human designed responses. The code for the solution (player.rb and brains.rb) take inputs from the level and passes them to the NN and then translates it's output into a command for the warrior.

Running the solution

To run the solution copy the files 'brains.rb' and 'genome' along with my player.rb into the warrior directory.

The genome in that file will solve rubywarrior in normal mode but can't pass level 8. Once at level 8 use the genome in the file 'genome_epic' (renaming it to 'genome'). That genome will complete normal mode levels 8(badly) and 9(S grade) and then all of the levels on epic. It's the current best, I hope to get it a bit better and to get one which can do all levels in normal mode.

The code in brains.rb has extra code for using NN's with different numbers of layers. There is a cut down version without all the extra code/comments, just the bare essentials to run the solution (in 100 lines :) ) and the two genomes (gist.github.com/2795962). The other files 'darwin.rb' and 'bootcamp.rb' are not needed to run the solution. They where used to evolve the solution, see 'Evolving the Solution' for more.

Inside a newly created rubywarrior folder:

git clone git@github.com:Sujimichi/ruby_warrior_NN_solution.git && sh ruby_warrior_NN_solution/init.sh
rubywarrior #repeat till level 7
rm genome && cp genome_epic genome
rubywarrior #repeat till epic mode and then again to run all of epic

How the Warrior works

Player#initialize loads the 'genome' file and uses this to initialize it's Brain (neural network).

Player#play_turn

  • On each call it first 'senses' the world as an 'input array'.

  • The input array is passed to the :act_on method of the brain which calls the neural network to calculate its response to the inputs. The networks response is interpreted and returned as an 'action' and an 'impulse', ie: [:walk, :forward], or [:attack, :backward]

  • Finally the action and impulse are passed to the warrior.

Inputs

The input array has a mapping of warror.feel and warror.look (when available) in all directions as an Array of values. The first four values are for 'walls' in each direction, left, forward, right, backward respectively. The value is 1 if the warrior can feel a wall in the given direction, otherwise 0. The next four values represent 'enemies' of all kinds and the next four are for 'captives' (all have the same order for directions).

For example, if the warrior can just :feel the first 12 values might be;

 -----------
|  @a     >|   =>   [1,0,1,0, 0,1,0,0, 0,0,0,0]
 -----------

When the :look ability is added the senses for each type (walls, enemies, captives) provide fainter values for these objects if they are in the distance. For example;

 -----------
|  @ a    >|   =>   [1,0,1,0.3, 0,0.6,0,0, 0,0,0,0]
 -----------

The input array also has values for armed, health and health reducing.

  • armed gives the neural net a sense of whether it can shoot. 1 for can shoot, otherwise 0.

  • health is the current health, scaled between 0 and 1 with 0 for full, 1 for dead. if the warrior does not respond to health it returns 0.

  • 'health reducing' returns 1 if the previous health is less than current, otherwise 0.

There is one final input which is a constant 1 in all cases (representational bias) The full set of 16 inputs might be something like;

 -----------
|  @a     >|   =>   [1,0,1,0, 0,1,0,0, 0,0,0,0, 0, 0.8, 1, 1]
 -----------
 -----------
| C@  w   >|   =>   [1,0,1,0, 0,0,0,0.3, 0,0,0,1, 1, 0.8, 1, 1]
 -----------

The Brain

The Brain class has a method :act_on which takes the input array and passes it to an instance of NeuralNetwork. The network calculates its response which is returned as an array of 8 values. These eight values or 'nodes' represent different actions on the warrior.

The first 2 nodes select an 'impulse' while the other 6 select an 'action'. The first node defines the forward and backward impulse; positive for forward and negative for backward, the second node defines the impulse for left and right, +ive for left, -ive for right. If the absolute value for the first node is greater than the absolute value for the second, it takes the forward/backward impulse, otherwise it takes the left/right impulse.

The other 6 nodes each represent an action; :walk, :attack, :rest, :rescue, :pivot, :shoot respectively (the order was not important). Which ever of these nodes has the strongest stimulation (highest value) will be the action taken.

I defined which node is connected to which action, the evolution of the neural network will have to work out which node results in what outcome. It will also have to 'realise' which actions it can use as all nodes are present throughout all the levels. It's therefore possible for the brain to request actions which are not yet available, it will have to evolve not to as this results in no activity.

The resultant output is an array of an action and an impulse ie; [:walk, :forward] or [:attack, :backward].

The NeuralNetwork

I tried the solution with 1, 2 and 3 layered neural nets with different numbers of nodes. A 1 layer can perform and AND or an OR function, a 2 layer can perform XOR and a 3 can approximate any mathematical function. brains.rb has a separate class for a 1, 2 and 3 layer NN; Brains::Neanderthal, Brains::R2D2 and Brains::RiverTam all of which inherit from the Brain class. Brains::Neanderthal was too dumb and River was…, problematic. In the end the brain used was R2D2, the 2 layer NN; set to have 6 inner nodes (16 in, 8 out). It might look something like this;

[w<] [w/\] [w>] [w\/] [<e] [e/\] [e>] [e\/] [c<] [c/\] [c>] [c\/] [Armed?] [cur_health] [health_reducing?] [1]

                                [?]     [?]     [?]     [?]     [?]     [?]

                   [fwd/bkwd]  [<-/->]  [walk]  [attack]  [rest]  [rescue]  [pivot]  [shoot]

Each 'node' is connected to each 'node' in the subsequent layer resulting in 144 connections. The connections (which I could not draw in text!) are weighted according to the (144gene) genome used. The value on an inner node is the value given to each input node multiplied by the weight of the connection to that inner node and all summed together and passed through a sin function (Math.sin(summed_value)). The same thing happens for the output nodes; the value is the weighted sum (with sin func) of the values from the inner layer.

Instructing the Warrior

The action and impulse from the brain are passed to the warrior inside a rescue block to handle the event of the brain requesting actions which are not yet available. If the brain does request an unavailable action it rescues the NoMethodError, `puts` an error message and the warrior does nothing for that turn.

Evolving the Solution

Evolving the NN's was not straightforward, my aim was to run evolution in each level until it managed to solve it and then progress. When I reached the top I hoped that I could put the winning genomes from each level into one population and use that to evolve the epic solution. I was not expecting the epic solution to allow all abilities in all levels but by crossing winners from the later levels I was able to evolve the epic solution.

RubyWarrior presents a difficult environment for evolution as many positive changes need to have happened before any points are awarded at all and final scores are only shown after completing a level. The other problem is the genetic search space is big and pretty uneven resulting in much slower evolution.

To speed things up I created several classes which are used to run different evolution processes and assess performance (and lost the plot somewhat with the names).

To run evolution of the NN's you need 'darwin.rb' and 'bootcamp.rb' along with the player and its 'brains.rb' inside a rubywarrior folder. To get everything set in one go run this inside the folder for a rubywarrior;

git clone git@github.com:Sujimichi/ruby_warrior_NN_solution.git && sh ruby_warrior_NN_solution/init.sh

In 'bootcamp.rb' there are several 'training grounds' which all inherit from the class BasicTraining and evolve populations of NN's in different environments. All training grounds work in the same way to run populations of NN's through evolution in different environments. Each training ground initializes a genetic algorithm and sets up a fitness function specific to that training ground. It can then be used to run the GA, save/load the population and some other common actions.

The training grounds are BootCamp, CombatTraining, AgentTraining and FieldTraining.

  • BootCamp runs very fast evolution with simulated inputs

  • CombatTraining runs evolution in current rubywarrior level

  • AgentTraining runs evolution in epic mode using all levels

  • FieldTraining run evolution in normal mode, but with all levels. (requires some extra setup)

The last two run all levels for each NN to be tested, so each level is run in a separate thread to speed things up.

BootCamp

BootCamp runs very fast (without rubywarrior) and simulates inputs and awards points for correct output. For example, when given inputs that show an enemy in front the NN is given a point for returning [:attack, :forward]. The predefined inputs/responses are a set of constants in AssaultCourse, ie AssaultCourse::BasicManuvers. An instance of the class DrillSergeant is used to score each NN over the AssaultCourse.

This evolution starts with a completely random population and over several thousand generations will have introduced some of the required genes. This does imply some degree of human design in the NN responses but once evolved in BootCamp it is only enough to pass the first two levels. It is possible to not use BootCamp but it takes much longer. So far without BootCamp I've evolved a level 1-5 solution but its taken a couple of days so far.

Once the NNs start evolving in the real environment a lot of the behaviours learnt in BootCamp are lost or are changed. The other key use of BootCamp was to test the potential of NN's quickly; for example I was able to rule out using a single layered NN as it could not make all the distictions to pass BootCamp.

To start a fresh BootCamp evolution in a rubywarrior folder:

require './bootcamp.rb'
bootcamp = BootCamp.new(2) #arg can be 1,2,3 defining number of layers for the NN.
bootcamp.train

stop it (ctrl+c) at any point to save population or change vars

bootcamp.save_pop "popname"
bootcamp.muation_rate 2 #set new mutation rate (call without args to show current)
bootcamp.write_best #test all population and write best to genome file
bootcamp.train #carry on

Once the population has individuals which can pass BootCamp they can graduate to the next training ground.

combat_training = bootcamp.graduate

CombatTraining

This takes a population and evolves it in the current rubywarrior level. Uses the Invigilator class to parse the output from rubywarrior and assign points. The Invigilator class evaluates the NN run in a level and is essentially the 'fitness function'. As a separate class this means is can be changed mid evolution; halt execution of .train on the training ground and paste in redefined Invigilator, the run .train again on the current training ground instance.

combat_training = CombatTraining.new(2)
combat_training.load_pop "one_I_made_eariler"
combat_training.train

Once a population passed bootcamp it was passed to CombatTraining and run in the first level the bootcamp grads could not pass (level 3). It was run until a solution was found and then moved to the next level. At each level I used the population from the previous level, sometimes I'd introduce some more bootcamp graduates to shake things up. Level 7 took a long time to solve until I took the winners from each previous level and make a population from just those. The solution to 7 evolved from that quite quickly. From a NN point the last 2 levels are very different from the first and pose a problem. It seems that it is easy to learn to attack, walk, rescue, pivot, rest but not shoot, or to shoot, walk, rescue.… Learning to have both attack and shoot seems to be very hard to evolve. This meant starting from BootCamp again at levels 8 and 9, I hoped to be able to cross breed the winners from previous levels with the 8-9 solutions to get a generic solution. still trying.

After evolving a solution to each level and unlocking epic mode then next training ground can be used.

agent_training = combat_training.graduate

AgentTraining

Like CombatTraining this uses Invigilator to score the performance of an NN in rubywarrior levels. However it is run in epic mode so all abilities are available in all levels and it tests the NN's performance of each level. In one fitness evaluation the NN is tested on each level so to speed things up each level is run in a separate Thread and the score collected after all threads complete.

FieldTraining

Like AgentTraining this tests the NN in all levels but in normal mode. Sadly rubywarrior does not have option to run without all abilities when in epic so for this to work a separate rubywarrior folder needs to be created for each level and set at the correct level.

FieldTraining was added after I found that epic mode allows all abilities in all levels. I wanted to take the winning genomes from each level and evolve a solution which could do all levels in normal mode. I'm still trying to get one that can do 8 + 9 as well as 1-7.

* Darwins Law

I referred to 'the theory of evolution' as Darwin's LAW of selection. It so is! The process of selection which he defined does exactly what he said 'produce complex novel behaviours via recombination and mutation'. Whether WE are here because of it is another matter, but does not change the fact that Darwin's process is mathematically sound.

Something went wrong with that request. Please try again.