A distributed algorithms research tool in Haskell
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
app
disco-docker
src
test
.gitignore
ChangeLog.md
LICENSE
README.org
Setup.hs
package.yaml
run.sh
stack.yaml

README.org

Disco

Aims

The primary aim of this project is to facilitate **distributed algorithms research**. We want to enable researchers to write distributed algorithms in a high-level DSL (domain-specific language) embedded in Haskell.

To support this primary aim, we provide a **distributed systems management console**. This enables researchers to quickly launch, analyze and iterate distributed algorithms.

Finally, as a personal aim. I would like to investigate abstractions built upon distributed system primitives that allow us to treat remote data as easily as local data.

Technical Overview

Here we provide a brief overview of the types used by Disco. In the future all documentation will likely be moved to a Tutorial.hs file.

Node

The definition of a node is given below. I recommend looking at src/Network.hs for the most important definitions used by this library. The file is quite short and nothing fancy is going on.

-- | A node represents a program in a network.
data Node = Node {
  -- ^ A unique identifier.
  _id        :: NodeID,
  -- ^ The program to run.
  _exe       :: Exe,
  -- ^ Where to start the node.
  _service   :: Service,
  -- ^ Information to enable high-level messaging.
  _messaging :: Messaging
}

Where nodes are started is based on the given Service information, currently we only support starting a node on a local Docker container (one per node) but starting nodes on other services is high priority. The service will run a lightweight boot program which is passed the NodeID, Exe, and Messaging information.

The Exe determines which program to run, this program can be a distributed algorithm or we can inform the service to run a default program. We would additionally like to support installation from Hackage, GitHub etc. and to support binaries (maybe start with Docker Hub).

The Messaging data type contains network topology information which enables high-level messaging functions such as neighbours. Built on top of these high-level functions we can then provide analysis tools for distributed algorithms e.g. message/bit complexity.

Network

We define a network as consisting of nodes and edges. The edges define an artificial network topology. The Service information of a node controls where it runs, and thus the physical network topology. When we talk simply of “network topology” we mean the artificial network topology.

data Network = Network { _nodes :: [Node], _edges :: Edges }

The Edges provides us with the option of specifying the network topology. Either directly as a set of edges or as a network topology shorthand (which we then convert to edges). The following code shows the specification of a simple ring consisting of five of the same hello world node, where each node is started on a local Docker container.

exampleNetwork = Network {
  _nodes = replicate 5 helloWorldOnDocker, _edges = Ring }

Reaching Our Aims

The overview above has described the core (and very simple architecture) of what is necessary to support the three aims at the top of this file.

To support the management console aim we need to support the modification of the network, and provide a graphical layer (maybe with brick to show a network overview/map).

To reach the distributed algorithms research goal we just need to implement some algorithms. Doing so will reveal the useful building blocks a distributed algorithms researcher needs, functions like neighbours (what are a running node’s neighbours). Perhaps start with a simple ring election algorithm maybe. For message passing, the integration of an Erlang-style message passing library like courier may be useful.

Regarding the final aim we would like to provide abstractions/examples for dealing with distributed data. Distributed STM exists (DSTM) can we integrate it, should we? What about distributed FRP as hinted at recently on this reddit thread. What about decentralized data?

Existing Libraries

Cloud Haskell and transient seem to be in a similar domain as disco. Though disco has a distinct focus on distributed algorithms research. Furthermore disco maintains a more simple semantic model of the network, and this simplicity is reflected in our data types. We aim to offload a lot of functionality, such as Erlang-style message passing, to third-party libraries like courier.

Distributed programming is given a rating of immature in the state of the Haskell ecosystem document. Quoting from the same document: “Work on the higher-level libraries seems to have stopped, but the low-level libraries are still good for distributing computation.” and “We need more analytics libraries”.

Getting Started

Install Docker, Docker Compose, Stack. On macOS you can do:

brew cask install docker
brew install docker-compose haskell-stack

Clone the project and install Haskell dependencies:

git clone git@github.com:barischrooneyj/disco
cd disco && stack build --install-ghc  # Build Disco.
cd disco-docker && stack image container  # Build Docker 'Service'.

Start the Docker daemon, on macOS:

open -a docker

Start a disco:

stack exec disco-exe