Skip to content

Cloud Haskell 3.0 Proposal

Tim Watson edited this page Dec 2, 2018 · 13 revisions

Discussion

Although I've called this iteration of Cloud Haskell 3.0, this doesn't represent an epoch for any of the libraries - indeed distributed-process is still at v0.7.4 as of December 2018. Instead we're talking about a third re-write, the first iteration being the remote package, and the second being the current live version.

Motivation

See the High Level Redesign Proposal for details.

Outline

The architecture of the current (2.0) release is outlined here.

We will present several viewpoints, starting with a bottom up, dependency oriented perspective.

+-------------------------------------------------------------------------------+
|                         Application \ Framework Code                          |
+-------------------------------------------------------------------------------+
          |            |     |   |         |               |             |
          V            |     |   |         |               |             |
+------------------+   |     |   |         |               |             |
|   Cloud Haskell  |   |     |   |         |               |             |
|     Libraries    |   |     |   |         |               |             |
| (supervisor etc) |   |     |   |         |               |             |
+------------------+   |     |   |         |               |             |
      |                |     |   |         |               |             |
      |                |     |   |         |               |             |
      V                V     |   |         |               |             |
+-------------------------+  |   |   +-------------+       |             | 
|      Cloud Haskell      |  |   |   |   Akka.hs   |       |             |
|  (distributed-process)  |  |   |   |  (akka-hs)  |       |             |
+-------------------------+  |   |   +-------------+       |             |
             |       |       |   |            |            |             |
             |       |_______|_. |            |            |             |
             V               | | |            V            V             |
+-------------------------+  | | |      +-------------------------+      |
|     Core Actor Model    |__| .-|----->|    Distributed Actors   |      |
|     (control-actor)     |      |      |   (control-actor-dist)  |      |
+-------------------------+      |      +-------------------------+      |
          |         ^            |          |               |            |
          |         |            |          |               V            V
          |         |            |          |          +--------------------------+
          |         |            |          |          |      Data.Distributed    |
          |         |            V          V          |   (data-dist-serialise)  |
          |         |    +------------------------+    +--------------------------+
          |         |    |  Distributed Channels  |        ^        |
          |         |    |  (control-dist*-chan)  |________|        |
          |         |    +------------------------+                 |
          |         |               |                               V
          |         |               V                       +-------------------+
          |      +------------------------------+           |  serialise/cborg  |
          |      |    Distribution Framework    |           +-------------------+
          |      |  (haskell-distributed-node)  |
          |      +------------------------------+
          |                                 |
          V                                 V
+-------------------------+   +------------------------------+
|    Typed Channel API    |   |      Transport Interface     |
|     (control-chan)      |   |      (network-transport)     |
+-------------------------+   +------------------------------+
          |                                 |
          V                                 V
+------------------+          +------------------------------+
| concurrency/STM  |          | Haskell/C Transport Library  |
+------------------+          +------------------------------+

Transport and distribution layer

Some pre-amble about backends...

It will be rather obvious from the diagram above that in this design, we have deferred the question of who is responsible for instantiating a specific network-transport implementation. My current assumption is that this will be a configuration element of the Distribution Framework, which will be discussed in more detail on the Distribution Framework wiki page. My premise for this design decision is that the current structure, where user application code defers to a Cloud Haskell backend such as distributed-process-simplelocalnet or distributed-process-zookeeper, seems to hardly be used. The reason for this appears to be that there is no static coupling between the concept of a Cloud Haskell node and a Cloud Haskell backend. This is a separation which I intend to codify in the API for defining nodes in a distributed haskell system, such that the APIs for node and service discovery and so on, are well defined, and in order to instantiate a node/cluster, users will need to choose a default or select an alternative implementation to meet their needs.

Transport and distribution layer discussion

The basic premise for the transport and distribution layers here, is the same as the original vision for Cloud Haskell, as outlined in the Cloud Haskell 2.0 wiki page. We defer to network-transport as before, to provide us with a stable API - though distribution backends could conceivably choose to eschew this decision and implement their own transport mechanisms should they wish to.

The Distribution Carrier is responsible for providing users with a means to choose from various guarantees about the semantics of inter-node communications, and for meeting the chosen guarantees. This has two fairly important consequences for the layers above, which rely on the transport and distribution framework.

  1. Local Applications/Frameworks can ignore semantics of inter-networked scenarios

This simply means that when implementing the local infrastructure for managing channels and/or actors, the code running locally can provide users with up to the highest possible level of semantic guarantees available, as required by the circumstances. Remote interactions are deferred entirely to the Distribution Framework, which exposes an API that local infrastructure can utilise to provide a remote layer which matches the semantic guarantees chosen when the distributed node was configured by the application.

  1. Applications/Frameworks can offer different guarantees for local vs. remote interactions

As a result of this separation between the layers, we are able to provide stronger guarantees for local interactions (i.e. within the same RTS), which users may opt to take advantage of if location transparency is unnecessary for some (or all) of their application's needs.

Relationship between distribution, actors, and distributed channels

The Distribution Framework is a standalone component, which should be usable without visible reference to components higher up the stack. If, for example, one wanted to build a distributed file system, then the framework ought to provide the right level of abstraction for connecting nodes (i.e. file systems) across multiple machines.

The core actor model, which is not distributed in nature, ought to provide a useful set of abstractions for building a reliable and fault tolerant distribution management layer. The Actor Model's built in concepts of monitoring and supervision, for example, should make writing connection management, application (node) level buffering, and the like, very easy. User's of the Distribution Framework should not be obliged to use the Actor Model if they it is not a good fit for their distributed application.

The Distributed Channels layer depends directly on the Distribution Framework beneath it. Our example above, of a distributed file system, might find this abstraction useful, and if they did then selecting both should allow them to utilise Distributed Channels without reference to the actor model (either local or distributed).