# Earth System Grid Federation Data Access

The Earth System Grid Federation (ESGF) has a search API that can be used by clients to query catalog content matching constraints (see [API documentation](https://github.com/ESGF/esgf.github.io/wiki/ESGF_Search_REST_API)). It's possible to send requests directly to the API using a simple function (see [example](https://esgf2.github.io/cmip6-cookbook/notebooks/foundations/esgf-opendap.html)), but here we'll use a python client named `pyesgf` to interact with the search API and get data from the ESGF THREDDS servers. The following shows examples of typical queries for data. 

If a login username and credentials are required, follow these [instructions](https://esgf-pyclient.readthedocs.io/en/latest/notebooks/examples/logon.html).

In [1]:
# NBVAL_IGNORE_OUTPUT

from pyesgf.search import SearchConnection

# Create a connection for search on ESGF nodes. Note that setting `distrib=True` can lead to unique failures.
conn = SearchConnection("https://esgf-node.llnl.gov/esg-search/", distrib=False)

# Launch a search query.
# Here we're looking for any variable related to humidity within the CMIP6 SSP2-4.5 experiment.
# Results will be stored in a dictionary with keys defined by the `facets` argument.
ctx = conn.new_context(
    project="CMIP6",
    experiment_id="ssp245",
    query="humidity",
    facets="variable_id,source_id",
)

print("Number of results: ", ctx.hit_count)
print("Variables related to humidity: ")
ctx.facet_counts["variable_id"]

HTTPError: 403 Client Error: Forbidden for url: https://esgf-node.llnl.gov/esg-search/search?format=application%2Fsolr%2Bjson&limit=0&distrib=false&query=humidity&type=Dataset&project=CMIP6&experiment_id=ssp245&facets=variable_id%2Csource_id

In [2]:
# NBVAL_IGNORE_OUTPUT

# Now let's look for simulations that have the `hurs` variable and pick the first member.
ctx.constrain(variable_id="hurs", ensemble="r1i1p1f1")
ctx.facet_counts["source_id"]

HTTPError: 403 Client Error: Forbidden for url: https://esgf-node.llnl.gov/esg-search/search?format=application%2Fsolr%2Bjson&limit=0&distrib=false&query=humidity&type=Dataset&project=CMIP6&experiment_id=ssp245&facets=variable_id%2Csource_id

In [3]:
# We can now refine the search and get datasets corresponding within our search context
results = ctx.constrain(source_id="CanESM5").search()
r = results[0]
r.dataset_id

HTTPError: 403 Client Error: Forbidden for url: https://esgf-node.llnl.gov/esg-search/search?format=application%2Fsolr%2Bjson&limit=0&distrib=false&query=humidity&type=Dataset&project=CMIP6&experiment_id=ssp245&source_id=CanESM5&facets=variable_id%2Csource_id

In [4]:
# To get file download links, there's an extra step
file_ctx = r.file_context()
file_ctx.facets = "*"
files = file_ctx.search()
[f.download_url for f in files]

NameError: name 'r' is not defined

In [5]:
# Instead of a download URL, we can also get OPeNDAP links.
urls = [f.opendap_url for f in files]
print(urls)

# It's sometimes possible to request aggregations of multiple netCDF into one OPeNDAP link,
# but this option is often unavailable.
agg_ctx = r.aggregation_context()
agg_ctx.facets = "*"
agg = agg_ctx.search()[0]
print(agg.opendap_url)

NameError: name 'files' is not defined

In [6]:
# Open the OPeNDAP link with xarray
import xarray as xr

try:
    ds = xr.open_mfdataset(urls)
    display(ds)
except OSError as e:
    print(
        "Looks like the remote server is down at the moment. Please try with another dataset stored on a different ESGF node.\n{e}"
    )

NameError: name 'urls' is not defined