# Earth System Grid Federation Data Access

The Earth System Grid Federation (ESGF) has a search API that can be used by clients to query catalog content matching constraints (see [API documentation](https://github.com/ESGF/esgf.github.io/wiki/ESGF_Search_REST_API)). It's possible to send requests directly to the API using a simple function (see [example](https://esgf2.github.io/cmip6-cookbook/notebooks/foundations/esgf-opendap.html)), but here we'll use a python client named `pyesgf` to interact with the search API and get data from the ESGF THREDDS servers. The following shows examples of typical queries for data. 

If a login username and credentials are required, follow these [instructions](https://esgf-pyclient.readthedocs.io/en/latest/notebooks/examples/logon.html).

In [1]:
# NBVAL_IGNORE_OUTPUT

from pyesgf.search import SearchConnection

# Create a connection for search on ESGF nodes. Note that setting `distrib=True` can lead to unique failures.
conn = SearchConnection("https://esgf-node.llnl.gov/esg-search/", distrib=False)

# Launch a search query.
# Here we're looking for any variable related to humidity within the CMIP6 SSP2-4.5 experiment.
# Results will be stored in a dictionary with keys defined by the `facets` argument.
ctx = conn.new_context(
    project="CMIP6",
    experiment_id="ssp245",
    query="humidity",
    facets="variable_id,source_id",
)

print(f"Number of results: {ctx.hit_count}")
print("Variables related to humidity: ")
ctx.facet_counts["variable_id"]

HTTPError: 504 Server Error: Gateway Time-out for url: https://esgf-node.ornl.gov/esgf-1-5-bridge?format=application%2Fsolr%2Bjson&limit=0&distrib=false&query=humidity&type=Dataset&project=CMIP6&experiment_id=ssp245&facets=variable_id%2Csource_id

In [2]:
# NBVAL_IGNORE_OUTPUT

# Now let's look for simulations that have the `hurs` variable and pick the first member.
ctx.constrain(variable_id="hurs", ensemble="r1i1p1f1")
ctx.facet_counts["source_id"]

{'EC-Earth3-AerChem': 2,
 'AWI-ESM-1-REcoM': 4,
 'GISS-E2-1-G-CC': 12,
 'CAMS-CSM1-0': 14,
 'CIESM': 15,
 'MCM-UA-1-0': 16,
 'FGOALS-f3-L': 20,
 'BCC-CSM2-MR': 32,
 'NESM3': 35,
 'IITM-ESM': 35,
 'CAS-ESM2-0': 36,
 'AWI-CM-1-1-MR': 36,
 'CNRM-CM6-1-HR': 38,
 'CMCC-CM2-SR5': 38,
 'FIO-ESM-2-0': 40,
 'E3SM-1-1': 42,
 'CMCC-ESM2': 44,
 'TaiESM1': 45,
 'CanESM5-CanOE': 48,
 'KIOST-ESM': 53,
 'INM-CM5-0': 62,
 'INM-CM4-8': 62,
 'GFDL-ESM4': 62,
 'GISS-E2-2-G': 80,
 'EC-Earth3-Veg-LR': 85,
 'NorESM2-MM': 88,
 'GFDL-CM4': 90,
 'MIROC-ES2H': 102,
 'KACE-1-0-G': 120,
 'MPI-ESM1-2-HR': 126,
 'FGOALS-g3': 144,
 'GISS-E2-1-H': 164,
 'EC-Earth3-CC': 171,
 'ACCESS-CM2': 175,
 'CanESM5-1': 216,
 'CNRM-CM6-1': 220,
 'CESM2-WACCM': 250,
 'EC-Earth3-Veg': 265,
 'CESM2': 270,
 'HadGEM3-GC31-LL': 278,
 'CNRM-ESM2-1': 303,
 'UKESM1-0-LL': 347,
 'NorESM2-LM': 516,
 'IPSL-CM6A-LR': 618,
 'GISS-E2-1-G': 720,
 'MRI-ESM2-0': 883,
 'ACCESS-ESM1-5': 1203,
 'CanESM5': 2041,
 'EC-Earth3': 2275,
 'MPI-ESM1-2-LR': 25

In [3]:
# We can now refine the search and get datasets corresponding within our search context
results = ctx.constrain(source_id="CanESM5").search()
r = results[0]
r.dataset_id

'CMIP6.ScenarioMIP.CCCma.CanESM5.ssp245.r21i1p2f1.Amon.hur.gn.v20190429|esgf-node.ornl.gov'

In [4]:
# To get file download links, there's an extra step
file_ctx = r.file_context()
file_ctx.facets = "*"
files = file_ctx.search()
[f.download_url for f in files]

HTTPError: 422 Client Error: Unprocessable Content for url: https://esgf-node.ornl.gov/esgf-1-5-bridge?format=application%2Fsolr%2Bjson&limit=0&distrib=false&type=File&dataset_id=CMIP6.ScenarioMIP.CCCma.CanESM5.ssp245.r21i1p2f1.Amon.hur.gn.v20190429%7Cesgf-node.ornl.gov&facets=%2A

In [5]:
# Instead of a download URL, we can also get OPeNDAP links.
urls = [f.opendap_url for f in files]
print(urls)

# It's sometimes possible to request aggregations of multiple netCDF into one OPeNDAP link,
# but this option is often unavailable.
agg_ctx = r.aggregation_context()
agg_ctx.facets = "*"
agg = agg_ctx.search()[0]
print(agg.opendap_url)

NameError: name 'files' is not defined

In [6]:
# Open the OPeNDAP link with xarray
import xarray as xr

try:
    ds = xr.open_mfdataset(urls)
    display(ds)
except OSError as e:
    print(
        "Looks like the remote server is down at the moment. Please try with another dataset stored on a different ESGF node.\n{e}"
    )

NameError: name 'urls' is not defined