Home

Paco Nathan edited this page Dec 17, 2013 · 42 revisions

Exelixi is a distributed framework based on Apache Mesos, mostly implemented in Python using gevent for high-performance concurrency. It is intended to run cluster computing jobs (partitioned batch jobs, which include some messaging) in pure Python.

By default, Exelixi runs genetic algorithms at scale. However, it can handle a range of distributed computing problems based on a barrier pattern, by using the --uow command line option and overriding the uow.UnitOfWorkFactory class definition.

Table of Contents

Rationale

Why build yet another framework for this purpose? Apache Hadoop would be quite a poor fit, due to requirements for in-memory iteration. Apache Spark could fit the problem more closely, in terms of iterative tasks. However, task overhead can become high in proportion to tasks being performed ("small file problem"). Server-side operations and coprocessors in Apache Cassandra or Apache HBase might also provide a good fit for GA processing, but both of those also require lots of configuration. Moreover, many of the features for these more heavyweight frameworks are not needed, plus it helps to have some lightweight messaging available among the shards -- which these existing frameworks tend to lack.

Pyevolve provides a complete GA framework written in pure Python, with an excellent API. However, it only runs on one node. Perhaps someday, these two projects could have some crossover...

On the one hand, Exelixi provides the basis for a tutorial for building distributed frameworks in Apache Mesos. On the other hand, it provides a general-purpose GA platform that emphasizes scalability and fault tolerance, while leveraging the wealth of available Python analytics packages.

Project Name

Where does the name come from? The name Exelixi derives from the Greek word ekseliksi meaning progress, or a similar connotation for the more contemporary notion of evolution. It is pronounced: e'kseliksi

Next: Quick Start