ESGF_Search_REST_API

The ESGF Search RESTful API

The ESGF search service exposes a RESTful URL that can be used by clients (browsers and desktop clients) to query the contents of the underlying search index, and return results matching the given constraints. Because of the distributed capabilities of the ESGF search, the URL at any Index Node can be used to query that Node only, or all Nodes in the ESGF system.

Syntax

The general syntax of the ESGF search service URL is:

http://<base_search_URL>/search?[keyword parameters as (name, value) pairs][facet parameters as (name,value) pairs]

where <base_search_URL> is the base URL of the search service at a given Index Node.

All parameters (keyword and facet) are optional. Also, the value of all parameters must be URL-encoded, so that the complete search URL is well formed.

Keywords

Keyword parameters are query parameters that have reserved names, and are interpreted by the search service to control the fundamental nature of a search request: where to issue the request to, how many results to return, etc.

The following keywords are currently used by the system - see below for usage examples:

facets= to return facet values and counts
shards= to specify an explicit list of shards to be queried
offset= , limit= to paginate through the available results (default: offset=0, limit=10)
fields= to return only specific metadata fields for each matching result (default: fields=*)
format= to specify the response document output format

Core Facets

Facet parameters are "search categories" that can be used to apply constraints to the search, and thus reduce the number of results returned. Internally, facets are metadata fields (single valued or multi-valued) that are stored for each search record. The search service will select records for which the metadata field values match the corresponding facet constraints.

The following facets are core system facets , and their names are reserved in the system. These facets can be used as valid query parameters at _ all _ sites in the federation.

query= for free text searches (default: query=*)
distrib=true to execute a distributed query, distrib=false to execute a local query (default: distrib=true)
id , master_id , instance_id : core record identifiers carrying different semantics - see later for detailed explanation.
title : record (short) title
description : record (longer) description
type : denotes the intrinsic type of the record. Currently supported values: Dataset, File, Aggregation (default: Dataset)
replica : indicates wether the record is the "master" copy, or a replica. Use replica=false to return only originals, replica=true to return only replicas (default: no replica flag specified, i.e. return both replicas and originals)
latest : indicates wether the record is the latest available version, or a previous version. Use latest=true to return only the latest version of all records, latest=false to return previous versions (default: no latest flag specified, i.e. return all versions)
data_node : indicates the Data Node where the data is stored
index_node : the Index Node where the data is published
version : the record version (a string)
timestamp : the date and time when the record was last modified
url : specific URL(s) to access the record
access : high level access capability available for a record
xlink : reference to external record documentation, such as technical notes
size : record size (for Datasets or Files)
checksum , checksum_type : file checksum value and type
number_of_files : number of files contained in a dataset
number_of_aggregations : number of aggregations in a dataset
dataset_id : the "id" value of the enclosing dataset (Files and Aggregations only)
tracking_id : the UUID assigned to a File by some special publication software, if available
drs_id : a templated string assigned to a Dataset by some special publication software, if available. Note: this field is deprecated .
start= , end= to execute a temporal range query
bbox=[west,south,east,north] to execute a spatial coverage query
from= , to= to execute a query based on the record last update date and time

Custom Facets

Additionally, each ESGF Index Node can harvest and make available additional custom facets that are relevant to its projects and users. For example, most Index Nodes support the set of CMIP5 facets , plus others. These custom facets are configured by the Node administrator in the file /esgf/config/facets.properties and can be discovered by the user through the following query:

http://<base_search_URL>/search?facets=*&distrib=false&limit=0

Example:

Determine all the allowed facet names and values at a specific site: http://esg-datanode.jpl.nasa.gov/esg-search/search?facets=*&limit=0&distrib=false

CMIP5 Facets

The following set of facets is supported by most ESGF Index Nodes in the federation, and can be used to discover/query/retrieve CMIP5 data. (the fa

CF Standard Name: cf_standard_name
Ensemble: ensemble
Experiment: experiment
Experiment Family: experiment_family
Institute: institute
MIP Table: cmor_table
Model: model
Project: project
Product: product
Realm: realm
Time Frequency: time_frequency
Variable: variable
Variable Long Name: variable_long_name
Instrument: source_id

Example:

Determine all the possible values of the "model", "experiment" and "project" facets throughout the federation: http://esg-datanode.jpl.nasa.gov/esg-search/search?facets=model,experiment,project&limit=0

Default Query

If no parameters at all are specified, the search service will execute a query using all the default values, specifically:

query=* (query all records)
distrib=true (execute a distributed search)
type=Dataset (return results of type "Dataset")

Example:

http://esg-datanode.jpl.nasa.gov/esg-search/search

Free Text Queries

The keyword parameter query= can be specified to execute a query that matches the given text _ anywhere _ in the records metadata fields. The parameter value can be any expression following the Apache Lucene query syntax (because it is passed "as-is" to the back-end Solr query), and must be URL- encoded.

Examples:

Search for any text, anywhere: http://esg-datanode.jpl.nasa.gov/esg-search/search?query=* (the default value of the query parameter)
Search for _ humidity _ in all metadata fields: http://esg-datanode.jpl.nasa.gov/esg-search/search?query=humidity
Search for the exact sentence _ specific humidity _ in all metadata fields: http://esg-datanode.jpl.nasa.gov/esg-search/search?query=%22specific%20humidity%22
Search for the words _ specific _ AND _ humidity _ , but not necessarily in an exact sequence: http://esg-datanode.jpl.nasa.gov/esg-search/search?query=specific%20humidity
Search for the word _ observations _ ONLY in the metadata field _ product _ : http://esg-datanode.jpl.nasa.gov/esg-search/search?query=product:observations
Using logical AND: http://esg-datanode.jpl.nasa.gov/esg-search/search?query=airs%20AND%20humidity (must use upper case "AND")
Using logical OR: http://esg-datanode.jpl.nasa.gov/esg-search/search?query=airs%20OR%20humidity (must use upper case "OR"). This is the same as using simply a blank space: http://esg-datanode.jpl.nasa.gov/esg-search/search?query=airs%20humidity )
Search for all datasets that match an id pattern: http://esg-datanode.jpl.nasa.gov/esg-search/search?query=id:obs4MIPs.NASA-JPL.AIRS.*

Facet Queries

A request to the search service can be constrained to return only those records that match specific values for one or more facets. Specifically, a facet constraint is expressed through the general form: <facet_name>=<facet_value> , where <facet_name> is chosen from the controlled vocabulary of facet names configured at each site, and <facet_value> must match _ exactly _ one of the possible values for that particular facet.

When specifying more than one facet constraint in the request, multiple values for the same facet are combined with a logical OR, while multiple values for different facets are combined with a logical AND . For example, _ experiment=decadal2000&variable=hus _ will return all records that match _ experiment=decadal2000 _ AND variable= _ hus _ , while _ variable=hus&variable=ta _ will return all records that match variable= _ hus _ OR variable= _ ta _ .

A facet constraint can be negated by using the != operator. For example, _ model!=CCSM _ searches for all items that do NOT match the CCSM model. Note that all negative facets are combined in logical AND, for example _ model!=CCSM&model!=HadCAM _ searches for all items that do not match _ CCSM _ , and do not match _ HadCAM _ .

By default, no facet counts are returned in the output document. Facet counts must be explicitly requested by specifying the facet names individually (for example: facets= _ experiment,model _ ) or via the special notation _ facets=* _ . The facets list must be comma-separated, and white spaces are ignored. Note also that at this time, the special notation _ facets=* _ will only count those facets that are explicitly configured in the file _ application- context.xml _ .

If facet counts is requested, facet values are sorted alphabetically (facet.sort=lex) , and all facet values are returned (facet.limit=-1), provided they match one or more records (facet.mincount=1)

The facet type must be always specified as part of any request to the ESGF search services, so that the appropriate records can be examined and returned. If not specified explicitly, the default value is type=Dataset .

Examples:

http://esg-datanode.jpl.nasa.gov/esg-search/search?cf_standard_name=air_temperature
http://esg-datanode.jpl.nasa.gov/esg-search/search?cf_standard_name=air_temperature&project=obs4MIPs
Combining two values of the same facet with a logical _ OR _ : http://esg-datanode.jpl.nasa.gov/esg-search/search?project=obs4MIPs&variable=hus&variable=ta (search for all observational files that have variable _ ta _ or _ hus _ )
Using a negative facet:
- http://esg-datanode.jpl.nasa.gov/esg-search/search?project=obs4MIPs&variable=hus&variable=ta&model!=Obs-AIRS (search for all observational datasets that have variable _ ta _ or _ hus _ , excluding those produced by _ AIRS _ )
- http://esg-datanode.jpl.nasa.gov/esg-search/search?project=obs4MIPs&variable!=ta&variable!=huss (search for all observational datasets that do not contain neither variable _ ta _ nor variable _ huss _ )
Search by tracking id: http://esg-datanode.jpl.nasa.gov/esg-search/search?type=File&tracking_id=2209a0d0-9b77-4ecb-b2ab-b7ae412e7a3f
Search by checksum: http://esg-datanode.jpl.nasa.gov/esg-search/search?type=File&checksum=cbff465c9cd8c9833fd7b85235be2d47
Issue a query for all supported facets and their values at one site, while returning no results (note that only facets with one or more values are returned):
- http://esg-datanode.jpl.nasa.gov/esg-search/search?facets=*&limit=0&distrib=false

Temporal Coverage Queries

The keyword parameters start= and/or end= can be used to query for data with temporal coverage that _ overlaps _ the specified range. The parameter values can either be date-times in the format "YYYY-MM-DDTHH:MM:SSZ" (UTC ISO 8601 format), or special values supported by the Solr DateMath syntax.

Examples:

Search for data in the past year: http://esg-datanode.jpl.nasa.gov/esg-search/search?start=NOW-1YEAR (translates into the constraint datetime_stop > NOW-1YEAR)
Search for data before the year 2000: http://esg-datanode.jpl.nasa.gov/esg-search/search?end=2000-01-01T00:00:00Z (translates into the constraint datetime_start < 2000-01-01)

Spatial Coverage Queries

The keyword parameter bbox=[west, south, east, north] can be used to query for data with spatial coverage that _ overlaps _ the given bounding box.

Examples:

http://esg-datanode.jpl.nasa.gov/esg-search/search?bbox=%5B-10,-10,+10,+10%5D (translates to: east_degrees:[-10 TO ] AND north_degrees:[-10 TO ] AND west_degrees:[ TO 10] AND south_degrees:[ TO 10])

Timestamp (aka ''last update'') Queries

The keyword parameters from= and/or to= can be used to query for data that was last updated in a given time range. These queries are executed against the "timestamp" field of the Solr records, which represents the date and time when the record was last modified. Note that if the timestamp cannot be set from the source metadata for that record, it is left unassigned so not to bias the query for records that have a valid timestamp.

When parsing THREDDS catalogs, the timestamp is assigned from the value of the properties creation_time (for datasets) and mod_time (for files), which are interpreted in the local time zone (local to the harvesting agent), and converted to UTC for input into the index. For example, the input value of creation_time="2012-03-15 12:59:09" (in the PDT time zone) becomes timestamp="2012-03-15T19:59:09Z".

The constraint values can either be date-times in the format "YYYY-MM- DDTHH:MM:SSZ" (UTC ISO 8601 format), or special values supported by the Solr DateMath syntax.

Examples:

Distributed Queries

The keyword parameter distrib= can be used to control whether the query is executed versus the local Index Noe only, or distributed to all other Nodes in the federation. If not specified, the default value distrib=true is assumed.

Examples:

Search for all datasets in the federation: http://esg-datanode.jpl.nasa.gov/esg-search/search?distrib=true
Search for all datasets at one Node only: http://esg-datanode.jpl.nasa.gov/esg-search/search?distrib=false

Shard Queries

By default, a distributed query ( _ distrib=true _ ) targets all ESGF Nodes in the current peer group, i.e. all nodes that are listed in the local configuration file /esg/config/esgf_shards.xml , which is continuously updated by the local node manager to reflect the latest state of the federation. It is possible to execute a distributed search that targets only one or more specific nodes, by specifying them in the _ shards _ parameter, as such: _ shards=hostname1:port1/solr,hostname2:port2/solr,.... _ . Note that the explicit shards value is ignored if _ distrib=false _ (but distrib=true by default if not otherwise specified).

Examples:

Query for CMIP5 data at the PCMDI and BADC sites only: http://esg-datanode.jpl.nasa.gov/esg-search/search?project=CMIP5&shards=pcmdi9.llnl.gov:8983/solr,esgf-index1.ceda.ac.uk:8983/solr
Query for all files belonging to a given dataset at one site only: http://esg-datanode.jpl.nasa.gov/esg-search/search?type=File&shards=esg-datanode.jpl.nasa.gov:8983/solr&dataset_id=obs4MIPs.CNES.AVISO.mon.v1%7Cesg-datanode.jpl.nasa.gov

Replica Queries

Replicas (Datasets and Files) are distinguished from the original record (a.k.a. the _ master _ ) in the Solr index by the value of two special keywords:

_ replica _ : a flag that is set to false for master records, true for replica records.
_ master_id _ : a string that is identical for the master and all replicas of a given logical record (Dataset or File).

By default, a query returns all records (masters and replicas) matching the search criteria, i.e. no _ replica _ constraint is used. To return only master records, use _ replica=false _ , to return only replicas, use _ replica=true _ . To search for all identical Datasets or Files (i.e. for the master AND replicas of a Dataset or File), use _ master_id=... _ .

Examples:

Search for all datasets in the system (masters and replicas): http://esg-datanode.jpl.nasa.gov/esg-search/search
Search for just master datasets, no replicas: http://esg-datanode.jpl.nasa.gov/esg-search/search?replica=false
Search for just replica datasets, no masters: http://esg-datanode.jpl.nasa.gov/esg-search/search?replica=true
Search for the master AND replicas of a given dataset: http://esg-datanode.jpl.nasa.gov/esg-search/search?master_id=cmip5.output1.BCC.bcc-csm1-1.1pctCO2.day.atmos.day.r1i1p1
Search for the master and replicas of a given file: http://esg-datanode.jpl.nasa.gov/esg-search/search?type=File&master_id=cmip5.output1.BCC.bcc-csm1-1.1pctCO2.day.atmos.day.r1i1p1.huss_day_bcc-csm1-1_1pctCO2_r1i1p1_01600101-02991231.nc

Latest and Version Queries

By default, a query to the ESGF search services will return all versions of the matching records (Datasets or Files). To only return the very last, up-to- date version include _ latest=true _ . To return a specific version, use _ version= _ . Using _ latest=false _ will return only datasets that were _ superseded _ by newer versions.

Examples:

Search for all latest CMIP5 datasets: http://esg-datanode.jpl.nasa.gov/esg-search/search?project=CMIP5&latest=true
Search for all versions of a given dataset: http://esg-datanode.jpl.nasa.gov/esg-search/search?project=CMIP5&master_id=cmip5.output1.NSF-DOE-NCAR.CESM1-CAM5-1-FV2.historical.mon.atmos.Amon.r1i1p1&facets=version
Search for a specific version of a given dataset: http://esg-datanode.jpl.nasa.gov/esg-search/search?project=CMIP5&master_id=cmip5.output1.NSF-DOE-NCAR.CESM1-CAM5-1-FV2.historical.mon.atmos.Amon.r1i1p1&version=20120712

Results Pagination

By default, a query to the search service will return the first 10 records matching the given constraints. The offset into the returned results, and the total number of returned results, can be changed through the keyword parameters limit= and offset= . The system imposes a maximum value of limit <= 10,000.

Examples:

Query for 100 CMIP5 datasets in the system: http://esg-datanode.jpl.nasa.gov/esg-search/search?project=CMIP5&limit=100
Query for the next 100 CMIP5 datasets in the system: http://esg-datanode.jpl.nasa.gov/esg-search/search?project=CMIP5&limit=100&offset=100

Sorting

By default, the results returned by a search are unsorted. The query parameter sort=true can be used to sort the returned results in inverse order of last modification time, i.e. to return the most up to date records first.

Example:

Return the most recent datasets with variable "hus" published to the ESGF system: http://esg-datanode.jpl.nasa.gov/esg-search/search?variable=hus&sort=true&fields=timestamp,variable

Output Format

The keyword parameter output= can be used to request results in a specific output format. Currently the only available options are Solr/XML (the default) and Solr/JSON.

Examples:

Request results in Solr XML format: http://esg-datanode.jpl.nasa.gov/esg-search/search?format=application%2Fsolr%2Bxml
Request results in Solr JSON format: http://esg-datanode.jpl.nasa.gov/esg-search/search?format=application%2Fsolr%2Bjson

Returned Metadata Fields

By default, all available metadata fields are returned for each result. The keyword parameter fields= can be used to limit the number of fields returned in the response document, for each matching result. The list must be comma-separated, and white spaces are ignored. Use _ fields=* _ to return all fields (same as not specifiying it, since it is the default). Note that the pseudo field _ score _ is always appended to any fields list.

Examples:

Return all available metadata fields for CMIP5 datasets: http://esg-datanode.jpl.nasa.gov/esg-search/search?project=CMIP5&fields=*
Return only the _ model _ and _ experiment _ fields for CMIP5 datasets: http://esg-datanode.jpl.nasa.gov/esg-search/search?project=CMIP5&fields=model,experiment

Identifiers

Each search record in the system is assigned the following identifiers (all of type string):

id : universally unique for each record across the federation, i.e. specific to each dataset or file, version and replica (and the data node storing the data). It is intended to be "opaque", i.e. it should not be parsed by clients to extract any information.

* Example: id=obs4MIPs.CNES.AVISO.mon.v1|esg-datanode.jpl.nasa.gov

master_id : same for all replicas and versions across the federation. When parsing THREDDS catalogs, it is extracted from the properties "dataset_id" or "file_id".

* Example: obs4MIPs.CNES.AVISO.mon

instance_id : same for all replicas across federation, but specific to each version. When parsing THREDDS catalogs, it is extracted from ID attribute of tag in THREDDS (for both Datasets and Files).

* Example: obs4MIPs.CNES.AVISO.mon.v1

Note also that the record version is the same for all replicas of that record, but different across versions. Examples:

version=20120201
version=1

Access URLs

In the returned Solr XML output document, URLs that are access points for Datasets and Files are encoded as 3-tuple of the form url|mime type|service name , where the fields are separated by the _ | _ character, and the _ mime type _ and _ service name _ are chosen from the ESGF controlled vocabulary.

Examples of Dataset URLs:

Examples of File URLs:

http://esg-datanode.jpl.nasa.gov/thredds/fileServer/esg_dataroot/obs4MIPs/observations/atmos/taNobs/mon/grid/NASA-JPL/AIRS/v20110608/taNobs_AIRS_L3_RetStd-v5_200209-201105.nc|application/netcdf|HTTPServer
http://esg-datanode.jpl.nasa.gov/thredds/dodsC/esg_dataroot/obs4MIPs/observations/atmos/taNobs/mon/grid/NASA-JPL/AIRS/v20110608/taNobs_AIRS_L3_RetStd-v5_200209-201105.nc.html|application/opendap-html|OpenDAP
gsiftp://esg.anl.gov:2811//Hiram/atmos/av/annual_1year/atmos.1980.ann.nc|application/gridftp|GridFTP

Wget scripting

The same RESTful API that is used to query the ESGF search services can also be used, with minor modifications, to generate a Wget script to download all files matching the given constraints. Specifically, each ESGF Index Node exposes the following URL for generating Wget scripts:

http://<base_search_URL>/wget?[keyword parameters as (name, value) pairs][facet parameters as (name,value) pairs]

where again <base_search_URL> is the base URL of the search service at a given Index Node. The only syntax differences with respect to the search URL are:

The keyword parameter _ type= _ is not allowed, as the wget URL always assumes type=File .
The keyword parameter _ format= _ is not allowed, as the wget URL always returns a shell script as response document.
The keyword parameter _ limit= _ is assigned a default value of limit=1000 (and must still be limit < 10,000).
The keyword parameter _ download_structure= _ is used for defining a relative directory structure for the download by using the facets value (i.e. of Files and not Datasets). For example, if you want to create a CMIP5 directory structure on your local computer and to copy your download files into this structure, run the Wget script created by http://esgf-data.dkrz.de/esg-search/wget?download_structure=project,product,institute,model,experiment,time_frequency,realm,cmor_table,ensemble,version,variable&project=CMIP5&experiment=historical&cmor_table=Amon&variable=tas&variable=pr
The keyword parameter _ download_emptypath= _ is used to define what to do it download_structure is set and the facet return no value (e.g. mixing files from CMIP5 and obs4MIP and selecting _ instrument _ as a facet value will result in all CMIP5 files returning an empty value)

A typical workflow pattern consists in first identifying all datasets or files matching some scientific criteria, then changing the request URL from "/search?" to "/wget?" to generate the corresponding shell scripts for bulk download of files.

Example:

Download all observational files with variable _ hus _ : http://esg-datanode.jpl.nasa.gov/esg-search/wget/?variable=hus&project=obs4MIPs&distrib=false

For more information on the wget scrip see ESGF_wget

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ESGF_Search_REST_API

The ESGF Search RESTful API

Syntax

Keywords

Core Facets

Custom Facets

CMIP5 Facets

Default Query

Free Text Queries

Facet Queries

Temporal Coverage Queries

Spatial Coverage Queries

Timestamp (aka ''last update'') Queries

Distributed Queries

Shard Queries

Replica Queries

Latest and Version Queries

Results Pagination

Sorting

Output Format

Returned Metadata Fields

Identifiers

Access URLs

Wget scripting

Clone this wiki locally