Skip to content
Permalink
Branch: master
Find file Copy path
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
382 lines (318 sloc) 17.5 KB

Cylc-8 Architecture

Updated: 17 December 2018.

Author: Hilary Oliver.

Contributors: Dave Matthews, Matt Shin, Oliver Sanders, Sadie Bartholomew, Martin Ryan, Bruno Kinoshita, David Sutherland, Sujata Patnaik.

This document is a primary output of the 3-7 December 2018 Cylc Development Workshop at the Bureau of Meteorology, Melbourne, Australia.

Table of Contents

TOP

Appendices:

Cylc Terminology

TOP

  • A Cylc workflow is a single (possibly cycling) suite of inter-dependent tasks.

  • A Cylc workflow service is workflow manager program for a single workflow, formerly known as a suite server program or a suite daemon. (Cylc has no central server - each workflow gets its own ad-hoc service that runs as the user).

Technology Glossary

TOP

(Hyperlinks in the text below point here for further information).

  • Python 3

    • An interpreted high-level programming language for general-purpose programming.
    • The primary language of Cylc implementation.
  • Tornado

    • A Python web framework and asynchronous networking library.
  • GraphQL

    • A data query language ... that provides an alternative to REST and ad-hoc web service architectures. It allows clients to define the structure of the data required, and exactly the same structure of the data is returned from the server.
    • Originally out of (and backed by) Facebook.
    • A single flexible endpoint, instead of many fixed inflexible REST endpoints.
    • Should allow the UI to request just what it needs very easily.
  • WebSocket

    • A communications protocol providing persistent full-duplex communication channels over a single TCP connection.
    • Alternative to HTTPS (and initiated by HTTPS handshake).
    • Good when server-side data changes quickly and unpredictably.
  • ZeroMQ

    • A high-performance asynchronous messaging library aimed at use in distributed or concurrent applications.
    • For back-end server-to-server communications.
  • Javascript

    • A lightweight interpreted or JIT-compiled programming language with first-class functions, most well-known as the scripting language for Web pages (in which context it runs inside web browsers).
  • Node.js

    • A cross-platform JavaScript runtime environment that allows developers to build server-side and network applications with JavaScript.
  • Vue.js

    • A JavaScript framework for building user interfaces.
    • The smallest but fastest-growing of the current top Javascript frameworks.
    • Lighter than Angular.js and React.js, and reputedly the easiest to learn.
    • In terms of UI components our needs are quite modest, so we will try the simplest modern framework first.
  • JSON

    • JavaScript Object Notation, an open-standard format for human-readable text transmission of data objects as attribute–value pairs and array types, commonly used for asynchronous browser–server communication.
  • JupyterHub

    • A multi-user Hub that spawns, manages, and proxies multiple instances of the single-user Jupyter Notebook Server.
    • Architecturally analogous to Cylc-8, with:
      • "Jupyter Notebook Server" -> "Cylc UI Server"
      • "Jupyter Notebook Kernel" -> "Cylc Workflow Service"

Motivation

TOP

Cylc-7 is written in Python 2, with PyGTK native desktop GUIs, and relatively simple local client/server architecture in which everything runs as the user, all clients are treated equally (user GUI and CLI, and job CLI), clients get some server information via the filesystem and port scanning, and automatic owner-only authentication via a suite-specific passphrase file. (Un?)fortunately:

We have decided not to port the existing Cylc GUIs to PyGObject (the successor to PyGTK) because there is strong demand for a new architecture that supports an in-browser web GUI and integration with site identity management systems. The full web architecture is (necessarily) more complicated, but it is more powerful. It will enable us to:

  1. Provide a single point of access to many Cylc workflows on a pool of servers.
  2. Run and interact with workflows via a web browser, without requiring:
    • a Cylc installation on the front-end (browser) platform
    • a shared filesystem or SSH access between the front-end and workflow platforms
  3. Drop the requirement for port scanning by users.
  4. Retire the suite-specific passphrase files and self-signed SSL certificates.
  5. Integrate with site identity management.
  6. Support fine-grained authorized access to individual Workflow Services.

JupyterHub

TOP

The Hub and Proxy architecture described below is inspired by JupyterHub. JupyterHub is a proven technology that solves a very similar problem of managing back-end services spawned into user accounts. And it is commonly used in scientific modeling and HPC contexts, See below for details: Similarity with Jupyter Hub).

We hope to use JupyterHub "out of the box" for the Hub and Proxy components of the new archtitecture. Our back-end components are very different from Jupyter Notebook, but some of the technologies involved remain relevant.

Cylc-8 Architecture Diagram

TOP

Cylc-8 Architecture

Figure 1 Cylc-8 Architecture: The "user A" box represents processes owned by one user, but potentially spread over multiple workflow hosts (on a shared filesystem) and multiple job hosts. The term "HPC Platform" is used rather loosely - potentially only the jobs reside on actual HPC nodes. Yellow boxes show the technologies and protocols that will be used to implement each component.

Cylc Hub

TOP

Overview:

  • At start-up, the Hub launches a web proxy.
  • The proxy forwards requests to the Hub by default.
  • The Hub handles user login (authentication) and spawns UI Servers on demand.
  • The Hub configures the proxy to forward URL prefixes to the UI Servers.

Detail:

  • The Hub must be a privileged process - either root or sudo.
  • Implemented in Python 3 with Tornado.
  • Spawns a Proxy that it dynamically configures to route requests to Cylc UI Servers. The proxy is:
  • Hub Authenticator: calls out to host or site identity management, with plugins for:
    • PAM, LDAP, OAuth (GitHub and Google accounts), etc.
    • (PAM sufficient for sites where local accounts are driven by AD or LDAP?)
    • (Extendable wit: custom authenticators)
  • Hub User Database
  • Hub Spawner: spawn Cylc UI Servers on user accounts; plugins for:
    • ssh, sudo, PBS, Docker, ...
    • (Extendable with custom spawners).

For more detail on component interaction, including session management, see JupyterHub Technical Overview.

Cylc UI Server

TOP

Overview:

  • Serves the UI to the user's browser.
    • For uniform presentation of stopped suites and static services as well as running suites (a workflow service can only be queried if it is running).
  • Spawned by Cylc Hub on demand, into "suite owner" user accounts.

Detail:

  • Must run as the user (that is the suite owner, not the UI user) because it must run sub-services as the user.
    • (consider another user authorized to view your suites: she must be able to read your suite files without relying on local file permissions - she might not even have a local account on the workflow host).
  • Implemented in Python 3 with Vue.js-generated UI
  • User-facing server communications:
    • Tornado web server, with GraphQL API over the WebSocket protocol.
    • WebSocket allows server push to UI Servers (no need for polling) and the server to return a response to clients even a command has to be queued for asynchronous execution.
    • GraphQL allows the UI to request exactly what it needs and no more.
  • Workflow Service-facing server, API, and network protocol:
    • JSON over ZeroMQ.
    • (ZeroMQ is used between Jupyter Notebook Servers and Kernels).
    • (Later: consider Protocol Buffers and/or gRPC - possibly better efficiency).
  • One UI Server per (suite owner) user - i.e. a UI server fronts multiple suites.
    • Efficiency benefits for multiple UIs looking at the same suite?
    • Relieves workflow services of some comms load.
    • BUT consider one UI server per UI (i.e. per browser tab) for simplicity, if the aforementioned efficiency benefits can't be realized.
  • (Could potentially scrape suite databases rather than query Workflow Services, to remove all comms load from the suites ... but this has the potential for disk latency problems on NFS?)
  • Pluggable Sub-Services:
    • Suite Listing Sub-Service:
      • Location and identity of running workflow services (host:port).
      • Location and identity of inactive suites (stopped or never started).
      • Status of stopped suites (e.g. "stopped with N failed tasks").
      • (use existing cylc scan, as the user)
    • Suite Start Sub-Service:
      • Start up new workflow services (from inactive suites).
      • (use existing cylc run, as the user):
    • Static Sub-Services, e.g.:
      • cylc graph (dependency and inheritance graph visualization).
      • cylc review (formerly Rose Bush).
      • View suite definition.
      • Suite analytics.
      • rose edit.
      • etc.

Cylc Workflow Services

TOP

Largely unchanged from Cylc-7 "suite server programs", except:

Command Line Interface

TOP

  • User-executed commands go via the Proxy.
    • Allows remote commands, and other authorized users.
    • Might need some kind of in-memory or on-disk CLI session management, akin to use of cookies for session management in the browser (TBD).
  • Job-executed commands (e.g. for job status messages) should go direct to the parent Workflow Service.
    • Job clients know where their own Workflow Service is.
    • Suites must carry on even if the Hub and Proxy are down.
    • Server authentication (trust) - some kind of single-use token? (TBD).
  • CLI clients need to talk both ZeroMQ (direct mode) and WebSocket (indirect).

Authorization

TOP

  • Authenticated user name sent with requests.
  • Two-level authorization:
    1. Who is allowed to connect to my UI Server?
    2. Who is allowed to see or do what to which of my suites?
  • Both levels could be enforced by the UI Server, which runs as the user and can therefore see the same config files the Workflow Services can.
  • Simple text files that map user names or groups to privileges might do.
    • (Use Unix group names to authorize service/role accounts).

Appendix

TOP

Deployment

  • Cylc-8 will be packaged for installation with pip and conda.
  • (This wasn't possible at Cylc-7 and earlier due the extreme difficulty of PyGTK installation).
  • This will enable us to stop bundling 3rd party libraries like Jinja2.
  • It should also eliminate problems with version compatibility of software dependencies on the system.
  • Tools like safety will be able to scan software dependencies (listed in a standard requirements.txt) for security vulnerabilities.
  • No Cylc installation will be needed on the UI (browser) platform (the UI is served up by the Cylc UI Server.
  • We'll also consider .rpm and .deb packaging for system package managers, and containers.

Similarity with JupyterHub

TOP

Jupyter Notebooks are a proven technology commonly used in scientific and educational institutions worldwide for interactive programming and sharing web documents that contain embedded code. Architecturally, the user's browser talks to a Python Tornado-based Notebook Server that communicates via the ZeroMQ network library with a back-end Notebook Kernel that executes the code. Both the notebook server and kernel run as the user.

JupyterHub is used to deploy and manage single-user notebooks for large numbers of users within an institution. It consists of a privileged multi-user Hub (a Python Tornado process) that:

  • is a single point of access for all users
  • handles user authentication, with plugins to integrate with site identity management
  • spawns single-user Notebook Servers on user accounts
  • spawns a configurable Web Proxy to route requests to the single-user Notebook Servers
  • provides a REST API for "convenient administration of the Hub, its users, and services"

JupyterHub is extremely well documented, including (for example) a security overview of the architecture.

This architecture is (almost) exactly what we need for Cylc-8, with:

  • Notebook Server => Cylc UI Server, and
  • Notebook Kernel => Cylc Workflow Service

For an in-browser GUI to spawn and access a distributed set of Cylc workflow services running under various user accounts, we need a central "hub" that can spawn user processes and proxy requests to them, and a "UI Server" component to construct the UI (HTML/Javascript) around workflow status data obtained from the workflow services (and also static data about stopped workflows).

JupyterHub is Open Source, with the 3-Clause BSD License)

Differences from Jupyter Notebook

The Jupyter Notebook back-end is highly specific to the Notebook Document format. For Cylc, we have Workflow Services instead of Kernels, and we need a Cylc-specific UI Server (with a bunch of Cylc-specific sub-services for suite discovery and start-up etc.) instead of the Notebook Server. Further, Jupyter Notebook Servers fronts a single Kernel, whereas a Cylc UI Server may need to access (and possibly collate) data from multiple workflow services.

However, we do share the two-level "web server / kernel" back-end model, and we have therefore decided to use ZeroMQ for back-end communications. It is a well-used, efficient network library, and we may be able to learn by studying the Jupyter model - including the automatic trust (server authentication) mechanism.

Differences from JupyterHub

There is only one difference that we're aware of:

  • JupyterHub does not yet support access to other users' Notebooks, whereas we need (authorized) access to other users' workflows services.

However, multi-user access is presumably as simple as allowing access to /user/{other-name}/ URLs (in the Jupyter case the show-stopper is that the Notebooks themselves, at the back end, don't support shared use yet).

We therefore hope we can use JupyterHub "out of the box" for the Hub and Proxy components of the Cylc-8 architecture. The only question is, can the (minor?) multi-user access change be done (e.g. by plugin) without modifying the core of JupyterHub so that we can treat it as a third-party software requirement, or can we contribute a change back to JupyterHub to enable that, or do we need to fork the project and maintain our own "Cylc Hub" in the future?

You can’t perform that action at this time.