Skip to content
helgridly edited this page Dec 7, 2017 · 3 revisions

Leonardo is a service that provisions Spark clusters and stands up Jupyter notebooks on them.

Motivation

Jupyter notebooks are becoming an increasingly popular way of creating reproducible bioinformatics analysis tasks. They combine familiar and powerful programming languages, like R and Python, with the ability to create and share documents containing code, results, and narrative text.

Jupyter can integrate with powerful compute paradigms running on horizontally scalable environments, such as Spark or Tensorflow, and provides an excellent environment in which to run leading genomic analysis software such as Hail.

Many systems would also like to provide a hosted version of this capability that provides resource management and security in a cloud-based environment. Leonardo aims to provide those capabilities and is being used as part of FireCloud and the All of Us Researcher Tools platform.

Key Features

  • REST-based service
  • Endpoint and resource based access control
  • End user can pip install additional packages
  • Automated provisioning of Google Cloud Platform Dataproc clusters with Jupyter notebooks
  • 2-way SSL encryption between dataproc cluster and proxy service

Roadmap

Immediate development plans are racing to meet the needs of the All of Us Researcher Tools platform as well as researcher usage within Broad and Verily on Google Cloud Platform. Through pluggable authorization and credential providers, we would like to be able to support many other security and infrastructure use cases.

Clone this wiki locally