## Examples of pyesgf.search usage

Prelude:

In [1]:
from pyesgf.search import SearchConnection
conn = SearchConnection('http://esgf-data.dkrz.de/esg-search', 
                        distrib=True)

**Warning**: don't use default search with `facets=*`.

This behavior is kept for backward-compatibility, but ESGF indexes might not
successfully perform a distributed search when this option is used, so some
results may be missing.  For full results, it is recommended to pass a list of
facets of interest when instantiating a context object. For example,

      ctx = conn.new_context(facets='project,experiment_id')

Only the facets that you specify will be present in the `facets_counts` dictionary.

This warning is displayed when a distributed search is performed while using the
`facets=*` default, a maximum of once per context object.  To suppress this warning,
set the environment variable `ESGF_PYCLIENT_NO_FACETS_STAR_WARNING` to any value
or explicitly use `conn.new_context(facets='*')`

In [2]:
facets='project,experiment_family'

Find how many datasets containing *humidity* in a given experiment family:

In [3]:
ctx = conn.new_context(project='CMIP5', query='humidity', facets=facets)
ctx.hit_count

30637

In [4]:
ctx.facet_counts['experiment_family']

{'RCP': 2841,
 'Paleo': 213,
 'Idealized': 1318,
 'Historical': 4501,
 'ESM': 506,
 'Decadal': 18731,
 'Control': 709,
 'Atmos-only': 2372,
 'All': 30637}

Search using a partial ESGF dataset ID (and get first download URL):

In [5]:
conn = SearchConnection('http://esgf-index1.ceda.ac.uk/esg-search', distrib=False)
ctx = conn.new_context(facets=facets)
dataset_id_pattern = "cmip5.output1.MOHC.HadGEM2-CC.historical.mon.atmos.Amon.*"
results = ctx.search(query="id:%s" % dataset_id_pattern)
len(results)

3

In [6]:
files = results[0].file_context().search()
len(files)

282

In [7]:
download_url = files[0].download_url
print(download_url)

http://esgf-data1.ceda.ac.uk/thredds/fileServer/esg_dataroot/cmip5/output1/MOHC/HadGEM2-CC/historical/mon/atmos/Amon/r1i1p1/v20110927/cl/cl_Amon_HadGEM2-CC_historical_r1i1p1_185912-188411.nc


Find the OpenDAP URL for an aggregated dataset:

In [8]:
conn = SearchConnection('http://esgf-data.dkrz.de/esg-search', distrib=False)
ctx = conn.new_context(project='CMIP5', model='MPI-ESM-LR', experiment='decadal2000', time_frequency='day')
print('Hits: {}, Realms: {}, Ensembles: {}'.format(
    ctx.hit_count, 
    ctx.facet_counts['realm'], 
    ctx.facet_counts['ensemble']))

Hits: 20, Realms: {'ocean': 10, 'atmos': 10}, Ensembles: {'r9i1p1': 2, 'r8i1p1': 2, 'r7i1p1': 2, 'r6i1p1': 2, 'r5i1p1': 2, 'r4i1p1': 2, 'r3i1p1': 2, 'r2i1p1': 2, 'r1i1p1': 2, 'r10i1p1': 2}


In [9]:
ctx = ctx.constrain(realm='atmos', ensemble='r1i1p1')
ctx.hit_count

1

In [10]:
result = ctx.search()[0]
agg_ctx = result.aggregation_context()
agg = agg_ctx.search()[0]
print(agg.opendap_url)


http://esgf1.dkrz.de/thredds/dodsC/cmip5.output1.MPI-M.MPI-ESM-LR.decadal2000.day.atmos.day.r1i1p1.tasmax.20111122.aggregation


Find download URLs for all files in a dataset:

In [11]:
conn = SearchConnection('http://esgf-data.dkrz.de/esg-search', distrib=False)
ctx = conn.new_context(project='obs4MIPs')
ctx.hit_count


2

In [12]:
ds = ctx.search()[0]
files = ds.file_context().search()
len(files)

1

In [13]:
for f in files:
    print(f.download_url)


http://eridanus.eoc.dlr.de/thredds/fileServer/esg_dataroot/obs4MIPs/observations/atmos/od550aer/mon/grid/SU/ATSR2-AATSR/v20160922/AOD550mean_Amon_ESA-CCI-Aerosol-AOD-SU_historical_r1i1p1_199607-201203.nc


Define a search for datasets that includes a temporal range:

In [14]:
conn = SearchConnection('http://esgf-index1.ceda.ac.uk/esg-search', distrib=False)
ctx = conn.new_context(
    project="CMIP5", model="HadGEM2-ES",
    time_frequency="mon", realm="atmos", ensemble="r1i1p1", latest=True,
    from_timestamp="2100-12-30T23:23:59Z", to_timestamp="2200-01-01T00:00:00Z")
ctx.hit_count

3

Or do the same thing by searching without temporal constraints and then applying the constraint:

In [15]:
ctx = conn.new_context(
    project="CMIP5", model="HadGEM2-ES",
    time_frequency="mon", realm="atmos", ensemble="r1i1p1", latest=True)
ctx.hit_count

22

In [16]:
ctx = ctx.constrain(from_timestamp = "2100-12-30T23:23:59Z", to_timestamp = "2200-01-01T00:00:00Z")
ctx.hit_count

3