Skip to content

GeneAssembly/biosal

Repository files navigation

biosal is a distributed BIOlogical Sequence Actor Library.

biosal applications are written in the form of actors which send each other messages, change their state, spawn other actors and eventually die of old age. These actors run on top of biosal runtime system called "The Thorium Engine" using the API. Thorium is a distributed actor machine. Thorium uses script-wise symmetric actor placement and is targeted for high-performance computing. Thorium is a general purpose actor model implementation.

Exciting actor applications for genomics

Name Description
argonnite k-mer counter
gc Guanine-cytosine counter
spate Exact, convenient, and scalable metagenome assembly and genome isolation for everyone (with emphasis on low-abundance species too)

Technologies

Key Value
Name biosal
Computation model actor model (Hewitt & Baker 1977)
Programming language C99 (ISO/IEC 9899:1999)
Message passing MPI 2.2
Threads Pthreads
License The BSD 2-Clause License

Try it out

git clone https://github.com/GeneAssembly/biosal.git
cd biosal
make tests # run tests
make examples # run examples

Branches

Branch name Person (alphabetical order) Clone URL Branch Build Status
master Sébastien Boisvert HTTPS Build Status
energy Sébastien Boisvert HTTPS
pami Huy Bui HTTPS
granularity George K. Thiruvathukal HTTPS Build Status
entropy Fangfang Xia HTTPS Build Status

Community

Actor model

The actor model has actors and messages, mostly.

When an actor receives a message, it can (Agha 1986, p. 12, 2.1.3):

  • send a finite number of messages to other actors (thorium_actor_send);
  • create a finite number of new actors (thorium_actor_spawn, ACTION_SPAWN);
  • designate the behavior to be used for the next message it receives (thorium_actor_concrete_actor, ACTION_STOP).

Other names for the actor model: actors, virtual processors, activation frames, streams (Hewitt, Bishop, Steiger 1973).

Also, in the actor model, the arrival order of messages is both arbitrary and unknown (Agha 1986, p. 22, 2.4).

One of the most important requirements of actors is that of acquaintances. An actor can only send message to one of its acquaintances. Acquaintance vectors were introduced in (Hewitt and Baker 1977, p. 7, section III.3). Any actor using an acquaintance vector is migratable by an actor machine. An actor machine can distribute and balance actors according to some arbitrary rules wwhen all the actors in an actor system use acquaintance vectors.

Important actor model papers

Actor model links

Languages using actors

Design

Key concepts

Concept Description Structure
Message Information with a source and a destination struct thorium_message
Actor Something that receives messages and behaves according to a script struct thorium_actor
Actor mailbox Messages for an actor are buffered there struct core_fast_ring
Script Describes the behavior of an actor (Hewitt, Bishop, Steiger 1973) struct thorium_script
Node A runtime system that can be connected to other nodes (see Erlang's definition) struct thorium_node
Worker pool A set of available workers inside a node struct thorium_worker_pool
Worker A instance that has a actor scheduling queue struct thorium_worker
Scheduler Each worker has a actor scheduling queue and an outbound message queue struct thorium_scheduler
Transport Each node has a transport subsystem for moving messages between nodes struct thorium_transport

Implementation of the runtime system

The code has to be formulated in term of actors. Actors are executes inside a controlled environment managed by the Thorium engine, which is a runtime system. An actor has a name, and does something when it receives a message. A message is however first received by a node. The node gives it to a worker_pool. The worker pool assigns the actor to a worker. Finally, worker eventually calls the corresponding receive function using the actor and message presented.

When an actor sends a message, the destination is either an actor on the same node or an actor on another node. The runtime sends messages for actors on other nodes with MPI. Otherwise, the message is prepared and given to a actor directly on the same node.

Runtime

Each biosal node is managed by an instance of the Thorium engine. These Thorium instances speak to each other. The number of nodes is set by mpiexec -n @number_of_thorium_nodes ./a.out. The number of threads on each node is set with -threads-per-node.

The following command starts 256 nodes (there is 1 MPI rank per node) and 64 threads per node for a total of 256 * 64 = 16384 threads.

mpiexec -n 256 ./a.out -threads-per-node 64

Because the whole thing is event-driven by inbound and outbound messages, a single node can run much more actors than the number of threads it has.

The runtime also supports asymmetric numbers of threads, but most platforms have identical compute nodes (examples: IBM Blue Gene/Q, Cray XC30) so this is not very useful. Example:

# launch 32 nodes with 32 threads, 244 threads, 32 threads, 244 threads, and so on
mpiexec -n 32 ./a.out -threads-per-node 32,244

Design links

Other possible names

  • name := biosal
  • bio := Biological or Biology
  • s := Sequence or Scalable or Salable or Salubrious or Satisfactory
  • a := Analysis or Actor or Actors
  • l := Library or Lift

Alternative name: BIOlogy Scalable Actor Library

Tested platforms

  • Cray XE6
  • IBM Blue Gene/Q
  • XPS 13 with Fedora
  • Xeon E7-4830 with CentOS 6.5
  • Apple Mavericks

Product team

see CREDITS.md