Skip to content
Foreman and worker code for EC262
Find file
New pull request
Pull request Compare This branch is 3 commits ahead, 2 commits behind master.
Fetching latest commit...
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.

EC262 Client Libraries



Quick start


To run a job, import the ec262 module. You can then use the ec262.mapper and ec262.reducer decorator to specify your map/reduce functions like so:

def mymap(key, value):
    yield (key, value)

def myreduce(key, values):
    yield (key, sum(values))

Finally, to run your MapReduce job, call ec262.run_job(data), where data is a dictionary mapping each key to a value.

mapreduce_data = {
    'hello': 1,
    'world': 2

See for more information; it's a working script that counts the number of times each word appears in "Humpty Dumpty".


To start up a worker, run You can specify the port you want it to run on using the -P PORT flag (defaults to 11235), and you can have it run in either verbose mode (-v) or loud mode (-V).

Example (verbose mode, running on port 12345):

python -v -P 12345

You can also run it by importing ec262 from a script and then calling ec262.run_worker([port=11235]).

Design decisions

  • All data is sent around in dictionaries, which is serialized to JSON as a list of [key, value] lists sorted by key. These are encrypted using AES-128 in cipher-block chaining (CBC) mode, and finally serialized using Base64.


Unexpected challenges

  • Integrating with the discovery service. The teams had slightly different ideas about how the protocol worked, arising from the introduction of credits. Originally, the client library developers assumed that foreman were leased workers for the entire job, which enabled some optimizations. But because workers need to get paid for every task, the foreman can only ask workers to compute the one task that they're assigned.

  • Encryption and checking results. Because results are encoded as a dictionary, keys can be in arbitrary order. We need to enforce that they are always in the same order, so that all clients serialize them same way and the results can be easily checked for correctness. We now encode results in JSON as a list of ['key', 'value'] lists, sorted by key. (These results are then encrypted with AES, and serialized with Base64.)

  • Testing. Because keys are destroyed when a foreman requests them, it was somewhat tricky to figure out to test the code that gets a key from the discovery service to decrypt a task. We ended up breaking that function into two relatively simple pieces (getting a key, decrypting the data) and just testing those pieces separately. We also tried decrypting data that was encrypted with a bogus key, and simply made sure that the right exception was thrown during decryption.


  • Finish this guide

  • Make a list of dependencies somewhere? requests, pycrypto...

Something went wrong with that request. Please try again.