Skip to content

Latest commit

 

History

History
executable file
·
114 lines (105 loc) · 6.5 KB

node-design.md

File metadata and controls

executable file
·
114 lines (105 loc) · 6.5 KB
layout title
default
Node Design

ESGF Node Design

The ESGF architecture is that of a global system of distributed Nodes, which interoperate with other according to a peer-to-peer paradigm. This means that there is not a rigid distinction of roles between different Nodes, rather each node can expose different services according to how it is configured, and can act as the provider or the consumer of services depending on the situation. In a peer-to-peer system, Nodes can join or leave the federation dynamically, without affecting the operations of the other Nodes. This is in stark contrast to a traditional architecture, where the server represents a single point of failure for the operations of multiple clients. There are two main characteristics that make ESGF a peer-to-peer system:

  • The modularity and configurability of the ESGF software stack, which allows each Node to expose a graduated set of services depending on the specific site requirements.
  • The establishment of federation protocols that allow the exchange of information from Node to Node on an equalitarian basis, without the existence of special central locations where the information is aggregated.

These two characteristics are described in more detail below.

The ESGF Software Stack

A common ESGF Software Stack is deployed at each Node in the federation to provide services for data, metadata and user management. The installation can be configured to install all or part of the available services, depending on the site needs, and possibly to replicate some of the services across multiple servers at the same site. Specifically, the following flavors of ESGF Node can be installed:

  • Data Node: includes services for publishing and serving data, namely:

    • The ESGF Node Manager. The ESGF Node Manager is a web application that mediates the peer-to-peer interaction among all the Nodes in the federation. Its main purpose is create and expose the ESGF Registry, a document that contains critical inter-operability information such as the name and type of each Node, its available services and URL endpoints, its CA certificate, etc.
    • The ESGF Publisher, and associated Postgres relational database. The ESGF Publisher is a desktop application that allows to publish data into a Node. The publishing workflow starts with extracting metadata from files on disk, storing it on the Node database, creating THREDDS XML catalogs and finally publishing the catalogs to the Node publishing service. Postgres is a popular freely available relational database that is used in ESGF to store all metadata harvested from the ESGF publisher, as well as user account information.
    • The Thredds Data Server, configured with the ESGF security filters. The Thredds Data Server (TDS), developed by Unidata, represents the standard mechanism through which an ESGF Node delivers its data to the clients. The TDS includes functionality for serving data in a variety of forms and protocols: full files HTTP download, OpenDAP sub-setting, GIS products via WMS and WCS, etc. The ESGF installation procedure configures the TDS with a set of special ESGF filters that intercept any data request, and enforce the access control policies established for that dataset by interacting with the appropriate ESGF Security Services deployed throughout the federation.
    • The ESGF Security Services. The ESGF Security framework includes functionality for distributed access control throughout the federation. It is composed of client-side components (the access filters and Openid Relying Party) that protect access to the data, and server-side components (the Attribute and Authorization services) that can be queried to gather all necessary information to make an access control decision. The framework supports access both by browsers (via OpenID authentication), and desktop clients and libraries (via X509 certificates).
    • The GridFTP server. The GridFTP server, developed by the Globus alliance, is a high performance protocol for reliable data transfer. It includes a server, deployed on an ESGF Node, and a client-side library that the user must deploy on their desktop.
  • IdP Node: includes services for authenticating users:

    • The OpenID Identity Provider web application. The OpenID Identity Provider (IdP) allows users to register and authenticate with the system, including Single-Sign-On functionality for browser-based access throughout the federation.
    • The Globus SimpleCA and My Proxy server. The My Proxy server, developed by NCSA, is used to issue short term certificates that can be used by client libraries and toolkits to authenticate the user during a data product request. The certificates are signed by the locally installed Globus Simple Certificate Authority (CA).
  • Index Node: includes the applications necessary to index and search metadata:

    • The Apache Solr engine. Apache Solr is a high performance, scalable web application for storing and searching metadata.
    • The ESGF Search back-end services. The ESGF Search module includes facilities for harvesting external metadata repositories (such as the THREDDS XML catalogs produced by the ESGF Publisher), and for searching the distributed metadata indexes deployed within the federation.
    • The ESGF Web Portal application. The ESGF Web Portal is a web application that contains the user interface to many of the other ESGF modules. It exposes web pages for registering users, searching for data, downloading data etc.
  • Compute Node: includes services for data analysis and visualization, namely:

    • The Live Access Server. The Live Access Server (LAS), developed by NOAA/PMEL, is an analysis and visualization engine that allows users to request advanced data and imaging products from multiple ESGF Nodes at once. Internally, it relies on the TDS catalogs and OpenDAP services for configuration and remote data access. It can be configured with a pluggable visualization engine such as Ferret (the default) or UV-CDAT.
  • Future modules:

    • ESGF Dashboard. The ESGF Dashboard is a web application intended for system administrators to monitor the status of all services deployed at each Node.