Skip to content
This repository was archived by the owner on Aug 11, 2025. It is now read-only.

Conversation

bennr01
Copy link
Contributor

@bennr01 bennr01 commented Jul 15, 2017

I have added the ThreadedEvaluator class and the DistributedEvaluator class for a more flexible usage of computational resources during evaluation.

ThreadedEvaluator: A class inspired by the ParallelEvaluator, but uses Threads for evaluating genomes. This is useful when using a python implementation without a GIL (e.g. jython and pypy-stm).
DistributedEvaluator: An Evaluator for the evaluation of genomes accross multiple computer nodes. This class is also inspired by the ParallelEvaluator. However, the overhead is even bigger than the one of the ParalellEvaluator, which means it it only useful when the evaluation function requires heavy calculations.
Both the ThreadedEvaluator and the DistributedEvaluator are implemented using the standard modules.
Usage of the ThreadedEvaluator:
The usage of the ThreadedEvaluator is much like the usage of the ThreadedEvaluator. However, the ThreadedEvaluator does not support the timeout argument. Also, the worker threads are not stopped automatically when the instance is deleted. However, the threads can still be stopped using the stop() method of the ThreadedEvaluator and will automatically stop when the other non-daemonic threads are done.
Usage of the DistributedEvaluator:
The usage of the DistributedEvaluator is very simple. Both the master node (=the computer mutating genomes) and the slave nodes (=the computers evaluating genomes) can run the exact same script.
Please note that the master node will not try to evaluate any genomes by itself. At least one slave node is required, but you can launch a slave node on the same physical node as the master node. Please keep in mind, that you will have to force the slave-node into slave-mode in this case. The examples/xor/evolve-feedforward-distributed.py has a --force-slave argument for this case.

  1. define evaluation logic, load config and create population as you would normally do
  2. create an instance of neat.DistributedEvaluator using the following arguments:
    • addr is a tuple of (hostname/ip, port) pointing to the master node.
    • authkey is a password used for authentification to the main node.
    • eval_function is the function for evaluating a single genome
    • slave_chunksize=1 defines the number of genomes which will be send to a slave at once. When a slave node is using multiple worker processes, this number should be at least equal to the number of worker processes. Higher values may be more overhead efficient. Default: 1.
    • num_workers=1: When this value is greater than 1 and this node is in slave mode, use this many worker processes for the evaluation. Otherwise, evaluate in the thread which called DistributedEvaluator.start() (most likely the main thread). Default: 1.
    • worker_timeout: When this node is in slave mode, wait at most this many seconds for the result of the worker processes. Default: 60-
    • mode=neat.distributed.MODE_AUTO: In which mode this node should operate (one of MODE_MASTER, MODE_SLAVE, MODE_AUTO (the default), as defined in neat.distributed). If the value is MODE_AUTO (the default), check if the addr argument points to the localhost. If it does, set the mode to MODE_MASTER, otherwise to MODE_SLAVE. The other two values force this node into the corresponding mode.
  3. call the start() method of the DistributedEvaluator instance. This function blocks on the slave nodes, but returns on the master nodes. By default, the slave nodes will exit at the end of this call when the work is done. The arguments are all optional:
    • exit_on_stop=True: if in slave mode, call sys.exit(0) once the work is done.
    • slave_wait=0: If in slave mode, wait this many seconds before connecting, This is useful if the master node may take some time to start.
  4. call the run method of the population, using the evaluate method/attribute of the DistributedEvaluator instance.
  5. call the stop method of the DistributedEvaluator instance.
  6. proceed normally (print statistics, show graphs, ...)
    If you are using multiple worker processes, all calls starting from step 2 should be done in a if __name__ == "__main__" statement.

This PR also contains tests and xor-examples for the ThreadedEvaluator and the DistributedEvaluator. The DistributedEvaluator example requires command line arguments to specify the address of the master node. It is not the most simple example for the DistributedEvaluator, but tries to use most arguments to explain what they do.
important: This PR changes the travis.yml config to use pypy3.5-5.8.0 instead of pypy3. Travis apparently uses an outdated pypy3 version, which contains a few bugs.

This PR replaces PR #95. However, some changes (like the changes to the docstrings of ParallelEvaluator.__init__ ) have been removed using a history rewrite.

Edit: I added a note to this PR stating that the master node does not try to evaluate any genome wby itself.

bennr01 and others added 13 commits July 8, 2017 19:28
I have added a 'neat.threaded.ThreadedEvaluator' for evaluating genomes
in threads. This is useful when using a python implementation without
GIL.
The ThreadedEvaluator is based on the ParallelEvaluator.
neat.threaded.ThreadEvolver will now start its worker automatically
I have added a test for neat.threaded.ThreadedEvaluator based on the
test for neat.parallel.ParallelEvaluator.
i removed the test for checki if
'neat.threaded.ThreadedEvaluator.__del__'  stops the threads. This is
because __del__ is not always called and thus may result in false test
results.
I have changed travis.yml to use `pypy3.5-5.8.0` instead of `pypy3`.
Travis uses an outdated version of `pypy3`.
`pypy3.5-5.8.0` contains some fixes for multithreaded scripts, which *may* fix the bug in the travic-ci build for `neat.threaded.ThreadedEvaluator`.
I added the first version of the DistributedEvaluator, an Evaluator for evaluating across multiple machines.
While the tests (i will commit them later) seem to work pretty good, further tests are needed.
The tests only use one machine, so i have to wait until i am able to use an cluster (or just use VMs).
I have added some test for neat.distributed.
These tests are not perfect because they run on only one machine. However, i doubt that we can change this on Travis-CI.
Unfortunately, my neat-python cloned repo now contains too many checkpoints, so i have to commit all the changes using the github webinterface :(  .
Well, why am i even writing this here? I doubt someone will actually read this commit messages. (quick note: the perfect diary: hidden in front of everyone).
The authkeys used in the tests are now explicitly binary.
@coveralls
Copy link

coveralls commented Jul 15, 2017

Coverage Status

Coverage increased (+0.7%) to 93.259% when pulling 5eea101 on bennr01:pr_prepare into 765c5b5 on CodeReclaimers:master.

@drallensmith
Copy link
Contributor

Heh. Due to the browser I mostly use displaying the full commit messages instead of having to click on the '...', I actually did see that part of the commit message...

-Allen

@drallensmith
Copy link
Contributor

drallensmith commented Jul 15, 2017

Even the most recent version of pypy3 is rather slow on the parallel evaluator (dunno yet on threaded), according to profiling (on OS X 10.12), BTW - lots of time spent in mutexes (or equivalent - waiting for thread lock) that didn't happen with other Python versions (including 2.7 pypy).

@drallensmith
Copy link
Contributor

Now that I think about it, the parallel evaluator is supposed to be using subprocesses, not threads... I'm thinking that pypy3 may have problems with parallel/threaded execution because it's probably using a separate thread to do its JIT compilation. (To be fair, it's also clearly labeled as a beta...) Reducing the number of parallel subprocesses to 2 (from 4) in the test did not significantly affect this. (This is running on a machine with 1 processor, 2 cores, incidentally - 2011 Mac mini.)

@bennr01
Copy link
Contributor Author

bennr01 commented Jul 15, 2017

@drallensmith I think you are right. But i think the issue with the JIT and subprocesses is that each new subprocess spawned by multiprocessing launches its own instance of pypy with its own JIT. And because JITs rely on prediction of the next operations (i may be wrong about this, but i think i read somewhere that JIT learn that a call to a variable always points to a specific method and thus skip some of the abstract logic inbetween), but the next calls cant be predicted because they are initiated by the parent process.
My experience using neat-python, pypy2 and cpython2.7 was:

  • single-process + pypy is (after a short time) way faster than cpython (sometimes even faster than the parallel evaluator on cpython)
  • multi-process + pypy: slower than single-process + pypy
  • single-process + cpython: slow
    - multi-process + cpython: faster than single-proces + cpython
    However, this is probably very case dependent. An evaluation function which requires a minute of calculation time for a single genome on cpython may be faster on pypy + multiple processes than pypy + singleprocess.
    Of course, the ThreadedEvaluator will currently be way slower than the ParallelEvaluator in most cases, but this will hopefully change when pypy-stm becomes stable. I think the pypy team calculated that pypy-stm would be 25% slower than pypy when using a single thread, which means just using two threads would make the whole script faster. This could mean that threaded evaluation on pypy-stm may be way faster than serial evaluation on pypy and cpython and also faster than pypy multi-process evaluation.
    Unfortunately, i cant say much about the performance of the DistributedEvaluator. I used the xor-example for testing, but the overhead was way to large compared to the time required for evaluation.
    Maybe i will try to make some benchmarks once i finally have fixed my neat-tetris.
    By the way, which browser are you using?

@drallensmith
Copy link
Contributor

drallensmith commented Jul 15, 2017

lynx - text-only, no javascript (I use Firefox for doing things like typing this comment)... but also low-memory and fast.

NEAT-Tetris? Interesting! Probably after some testing of possible enhancements on less-complex systems (lander, perhaps?), I've been looking at doing some experiments with LARG/HFO.

BTW, I should add that my profiling was using the test suite (since I was looking at why pypy3 was having problems - good spotting of the older version in use on Travis, BTW!), which probably doesn't run long enough for compilation to help much. I did put together a variation of the test suite meant for profiling, but have been working more on other things (particularly since LARG/HFO is mostly C++)... about all I did was trim it down to just the tests actually doing runs, then up the generation count and population size (and adjust the fitness function termination criteria so they wouldn't happen).

@bennr01
Copy link
Contributor Author

bennr01 commented Jul 15, 2017

@drallensmith I knew it! I actually tought that you may be using lynx (i use it sometimes too), but it seemed too unlikely.
About NEAT-tetris: well, it is very simple (no ui, but you can print an ascii version of the current state). I finally got the rotate-function to work, but the combination of the rotation logic and the not-yet-implemented movement logic for the x-axis is problematic. LARG/HFO seems interesting, though.

The `DistributedEvaluator` will now shutdown its manager when `stop()` is called.
@CodeReclaimers
Copy link
Owner

Thank you thank you to both of you for the work on this! Apologies for not having the time to look through it yet, hoping to change that soon. :)

@coveralls
Copy link

coveralls commented Jul 17, 2017

Coverage Status

Coverage increased (+0.7%) to 93.267% when pulling 4da38f4 on bennr01:pr_prepare into 765c5b5 on CodeReclaimers:master.

@CodeReclaimers CodeReclaimers merged commit a27b2c7 into CodeReclaimers:master Jul 17, 2017
@CodeReclaimers
Copy link
Owner

This looks so thorough that I figured it's just best to merge and let everybody try it out. :)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants