Skip to content

Blitzen is a python framework built on top of python's multiprocessing module intended to maximize processor utilization on individual computers and clusters.

License

Notifications You must be signed in to change notification settings

BlakeERichey/blitzen

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Blitzen

Blitzen is a python framework built on top of python's multiprocessing module intended to maximize processor utilization on individual computers and clusters.

Installation

pip install blitzen

Usage:

Blitzen uses a dispatcher to provide concurrent processing features. Below are the dispatchers available and their use case.

MulticoreDispatcher

Multicore dispatcher is similar to multiprocessing.Pool in that you initialize the dispatcher and can pass it tasks to complete concurrently.

The notable differences are:

  1. Workers you create with a dispatcher will remain open until you shutdown the dispatcher.
  2. You can queue multiple tasks, and the dispatcher will complete all of them using the specified number of processes.
  3. Workers are Exception resistant, meaning if the worker subprocess crashes, the dispatcher will shut down the process and initialize a new one until the specified number of workers are actively monitor incoming tasks.

Example:

import time
import random
from blitzen import MulticoreDispatcher

def f1():
  delay = random.randint(3,5)
  time.sleep(delay)
  print('Finished after', delay, 'secs.')
  return delay

def f2(delay):
  time.sleep(delay)
  raise ValueError('Throwing error')


if __name__ == '__main__':
  dispatcher = MulticoreDispatcher(workers=4)
  task_id = dispatcher.run(f2, 4)
  task_ids = [
    dispatcher.run(f1)
    for _ in range(5)
  ]

All tasks are either started or queued immediately upon running the dispatcher.run() call.

dispatcher.run() returns the dispatcher's task_id for the task you just passed it. This is used if you want to get specific task results from the dispatcher or tasks results in a specific order.

Getting results from the dispatcher:

print('Fetching Results.')
results = dispatcher.get_results(task_ids)
print('Recevied results from dispatcher:', results)

The results are always returned in the order of task_ids.

Eventually you will have to shutdown the dispatcher to close the underlying worker processes. You can do this two ways.

  1. dispatcher.shutdown() Terminates all worker subprocesses and deallocates memory resources.
  2. dispatcher.join() returns all results in task order as they were passed to the dispatcher, then calls dispatcher.shutdown().
print('Fetching Results.')
results = dispatcher.join()
print('Recevied results from dispatcher:', results)

Critical Note: Once you request a result from the dispatcher, it is removed the dispatchers memory by default. To override this feature use dispatcher.get_results(clear=False)

A full example is visible here.

DistributedDispatcher

DistributedDispatcher lets you utilize a cluster for concurrent computing. It will handle packet synchronization between clients, servers, and drivers. It will also log all activity on the clients and servers.

Define some common functions

The clients and servers need access to the same functions, so it is likely a good idea to make a common file that will be used to import functions that arent natively in python.

#common.py
import time

def time_consuming_function(delay):
  time.sleep(delay)
  return delay

Initialize your server

from blitzen.utils import get_local_ip
from blitzen.distributed import DistributedDispatcher
from common import time_consuming_function

if __name__ == '__main__':
  ip = get_local_ip()
  dispatcher = DistributedDispatcher(server_ip=ip)
  dispatcher.spawn_server(duration=30) #Run server for 30 seconds

Initialize your clients

from blitzen.utils import get_local_ip
from blitzen.distributed import DistributedDispatcher
from common import time_consuming_function

if __name__ == '__main__':
  ip = '192.168.1.2' #Server IP
  dispatcher = DistributedDispatcher(server_ip=ip)
  dispatcher.spawn_client(workers=6)

Run your driver code to be executed by the clients

from blitzen.utils import get_local_ip
from blitzen.distributed import DistributedDispatcher
from common import time_consuming_function

if __name__ == '__main__':
  ip = '192.168.1.2' #Server IP
  dispatcher = DistributedDispatcher(server_ip=ip)

  #With DistributedDispatcher you can specify an amount
  #of time each client has to finish their task
  #with the `timeout` keyword
  task_ids = [
    dispatcher.run(time_consuming_function, i+5, timeout=7) 
    for i in range(5)
  ]

  print('Requesting results.')
  results = dispatcher.join()
  print(results)

Logging

DistributedDispatchers provide some logging, using the python logging module, so you can monitor your cluster. By default the logger is configured to report all logging info of level logging.INFO. To some this can be excessive, so blitzen provides some functions to control the logger.

You can get the logger using logger = blitzen.logging.get_logger()
This logger is used by all blitzen submodules.

The other blitzen.logging fuctions are:

  • set_logfile(filename) lets you set a file to log out to.
  • set_loglevel(level) lets you set the logging level using the same convention as python's logging module.
  • disable() disabled the logger.
  • enable() enables the logger.

A full example visible here.

About

Blitzen is a python framework built on top of python's multiprocessing module intended to maximize processor utilization on individual computers and clusters.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages