Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

create a remote spawner with load-balancing capability #137

Open
oliver-sanders opened this issue May 13, 2020 · 4 comments · May be fixed by #213
Open

create a remote spawner with load-balancing capability #137

oliver-sanders opened this issue May 13, 2020 · 4 comments · May be fixed by #213
Assignees
Milestone

Comments

@oliver-sanders
Copy link
Member

[discussed at CylcCon2020]

We want the Cylc Hub to be able to start UI Servers on any one of a group of hosts.

There is already [at least one] JupyterHub spawner which is able to start servers on remote hosts, the remote spawner.

We would like to add basic load-balancing to this. Cylc has load-balancing built-in, see the cylc.flow.host_select module.

The idea is to try and build a new JupyterHub spawner which handles load balancing to work out where to start the server, then uses the configured spawner (e.g. the remote spawner) to actually start the server.

This would give sites a lot of flexibility with which spawners they install.

As, depending on the nature of the installation this spawner may be running as a privileged user the code should be kept to a minimum.

@oliver-sanders oliver-sanders added this to the 0.3 milestone May 13, 2020
@jarich
Copy link

jarich commented Feb 18, 2021

From a securing things point of view it would be good if:

  1. All Cylc UI Server spawners allowed sudospawner capabilities. This reduces the attack surface is the hub if compromised as all it can do is spawn UI servers rather than do anything root can do.
  2. Remote spawners could be configured to exclusively spawn remote UI Servers (rather than current host plus other hosts). This further improves security by allowing the hub machine to be isolated from schedulers etc. The amount of protection this yields depends on how the UI Servers are spawned. For example, if they use passwordless ssh, then we can constrain the ssh keys to only allowing the specific spawning commands. This would mean a hub compromise that allows filesystem access has a vastly reduced attack surface.

https://hub.docker.com/r/jupyterhub/singleuser/
https://jupyterhub.readthedocs.io/en/stable/reference/config-sudo.html

@jarich
Copy link

jarich commented Feb 18, 2021

Related discussion from riot (from early February)

If UI servers run on machines that are not running the/a hub, how does the hub find them? Can we load balance UI servers.

This comes down to the Jupyterhub "spawner". The spawner is the thing that starts the UIServers in the Cylc Hub (notebooks in regular JupyterHub).
We would like to build a special spawner which uses the scheduler-distribution logic from cylc-flow allowing load balancing of UIServers on startup.
This load-balancing system can rank hosts by psutil metrics e.g. cpu, memory, server load, etc as well as setting hard limits.
This plugin would just do the load balancing then defer to a regular distributed-spawner plugin to start the workflow (i.e. it wraps a spawner of your choice).

@oliver-sanders
Copy link
Member Author

oliver-sanders commented Feb 18, 2021

The spawners select the command to spawn from the Jupyterhub configuration which we have modified to get Jupyterhub to run our UI Server:

c.Spawner.cmd = ['cylc-uiserver']

The spawners load this command from the config directly (rather than having it passed through from the hub):

https://github.com/jupyterhub/jupyterhub/blob/c3ca924ba896bf2da40cab29e2a08784850a945b/jupyterhub/spawner.py#L484-L499

For multi-user setups I think we will advise running the Hub under a special "cylc" user account with a limited subset of sudo privileges but installing the UI Server under root or some higher authority which would protect this base config from meddling.

Note1: Currently the hub is launched with the jupyterhub command and the UI Server by cylc-uiserver. We will change this to cylc hub and cylc uiserver which should make command whitelisting a little easier.

Note2: We will want to make the "base" config we provide in this repository overridable in some way as there are some settings in there that sites (and possibly users) may wish to fiddle (e.g. there will be a config for setting the scan interval).

@jarich
Copy link

jarich commented Mar 3, 2023

Running jupyterhub with sudo-spawner isn't especially difficult. But you do have to ensure c.JupyterHub.internal_ssl = False in the jupyterhub_config.py. Using sudospawner will add complexity to other spawners though. I'm happy to try to help meld them if necessary.

sudospawner-singleuser becomes:

#!/bin/bash -l

# Delegate to the Cylc hub server
exec "$(dirname "$0")/cylc" "hubapp" $@

and jupyterhub_config.py is:

# Configuration file for sudospawner jupyterhub (melded with Cylc hub config)
import os
from pathlib import Path
import pkg_resources

from cylc.uiserver import (
    __file__ as uis_pkg,
    getenv,
)
from cylc.uiserver.app import USER_CONF_ROOT

c.JupyterHub.bind_url = 'https://:8443'

## Enable SSL for all internal communication
#  
#          This enables end-to-end encryption between all JupyterHub components.
#          JupyterHub will automatically create the necessary certificate
#          authority and sign notebook certificates as they're created.
#
#  NOTE: these certificates are moved around in jupyterhub/spawner.py's move_certs which just assumes
#  that it has the right to create subdirectories and to put files into the authenticated user's $HOME, which
#  under sudospawner it does not.
#  
#  Default: False
c.JupyterHub.internal_ssl = False

## Path to SSL certificate file for the public facing interface of the proxy
#  
#          When setting this, you should also set ssl_key
#  Default: ''
c.JupyterHub.ssl_cert = '/path/to/semi-privileged-user/.tls/cylc-hub.cer'

## Path to SSL key file for the public facing interface of the proxy
#  
#          When setting this, you should also set ssl_cert
#  Default: ''
c.JupyterHub.ssl_key = '/path/to/semi-privileged-user/.tls/cylc-hub.key'

#  Some spawners allow shell-style expansion here, allowing you to use
#  environment variables. Most, including the default, do not. Consult the
#  documentation for your spawner to verify!
#  Default: ['jupyterhub-singleuser']
c.Spawner.cmd = ['sudospawner-singleuser']

c.JupyterHub.spawner_class='sudospawner.SudoSpawner'
c.JupyterHub.log_level=10

# environment variables to pass to the spawner (if defined)
c.Spawner.environment = getenv(
    # site config path override
    'CYLC_SITE_CONF_PATH',
    # used to specify the Cylc version if using a wrapper script
    'CYLC_VERSION',
    'CYLC_ENV_NAME',
    # may be used by Cylc UI developers to use a development UI build
    'CYLC_DEV',
)

# this auto-spawns uiservers without user interaction
c.JupyterHub.implicit_spawn_seconds = 0.01

# apply cylc styling to jupyterhub
c.JupyterHub.logo_file = str(Path(uis_pkg).parent / 'logo.svg')
c.JupyterHub.log_datefmt = '%Y-%m-%dT%H:%M:%S'  # ISO8601 (expanded)
c.JupyterHub.template_paths = [
    # custom HTML templates
    pkg_resources.resource_filename(
        'cylc.uiserver',
        'templates'
    )
]

# store JupyterHub runtime files in the user config directory
USER_CONF_ROOT.mkdir(parents=True, exist_ok=True)
c.JupyterHub.cookie_secret_file = f'{USER_CONF_ROOT / "cookie_secret"}'
c.JupyterHub.db_url = f'{USER_CONF_ROOT / "jupyterhub.sqlite"}'
c.ConfigurableHTTPProxy.pid_file = f'{USER_CONF_ROOT / "jupyterhub-proxy.pid"}'

# write Cylc logging to the user config directory
# NOTE: Parallel UIS instances will conflict over this file.
#       See https://github.com/cylc/cylc-uiserver/issues/240
c.CylcUIServer.logging_config = {
    'version': 1,
    'handlers': {
        'file': {
            'class': 'logging.handlers.RotatingFileHandler',
            'level': 'INFO',
            'filename': str(USER_CONF_ROOT / 'uiserver.log'),
            'mode': 'a',
            'backupCount': 5,
            'maxBytes': 10485760,
            'formatter': 'file_fmt',
        },
    },
    'loggers': {
        'CylcUIServer': {
            'level': 'INFO',
            'handlers': ['console', 'file'],
        },
    },
    'formatters': {
        'file_fmt': {
            'format': '%(asctime)s %(levelname)-8s %(message)s',
            'datefmt': '%Y-%m-%dT%H:%M:%S',
        }
    },
}

and the sudoers changes is as advised in https://jupyterhub.readthedocs.io/en/stable/reference/config-sudo.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants