# Catalog testing

**Author:** Xavier R Nogueira

**Overview:** The intake catalog will be the core of our first server implementation. Reading data descriptions, drivers, and locations from the catalog will allow xpublish to meet the expected OPeNDAP endpoint functionality. Later on we will also (potentially) implement read-in from a STAC catalog.

In [76]:
import intake
import intake_xarray
import panel
import xarray as xr
import fsspec
import zarr
from pathlib import Path

In [3]:
INTAKE_CATALOG_DIR = Path.cwd().parent / 'intake_catalogs'

# find a YAML intake catalogs
intake_yamls = []
for cat in INTAKE_CATALOG_DIR.iterdir():
    if cat.suffix == '.yml':
        intake_yamls.append(cat)
print(intake_yamls)

[WindowsPath('C:/Users/xrnogueira/Documents/Xpublish-OPeNDAP-Server/intake_catalogs/sample_zarr_catalog.yml')]


In [4]:
catalog = intake.open_catalog(intake_yamls[0])
catalog

sample_zarr_catalog:
  args:
    path: C:\Users\xrnogueira\Documents\Xpublish-OPeNDAP-Server\intake_catalogs\sample_zarr_catalog.yml
  description: ''
  driver: intake.catalog.local.YAMLFileCatalog
  metadata: {}


In [7]:
# check if the catalog was loaded correct
intake.interface.gui.GUI([catalog])

In [47]:
type(catalog['prism-v2-osn'])

intake_xarray.xzarr.ZarrSource

In [90]:
cat_info = catalog['prism-v2-osn'].describe()
cat_info

{'name': 'prism-v2-osn',
 'container': 'xarray',
 'plugin': ['zarr'],
 'driver': ['zarr'],
 'description': 'USGS THREDDS Holdings/Parameter-elevation Regressions on Independent Slopes Model Monthly Climate Data for the Continental United States',
 'direct_access': 'forbid',
 'user_parameters': [],
 'metadata': {},
 'args': {'urlpath': 's3://rsignellbucket2/nhgf/sample_data/prism_v2.zarr',
  'consolidated': True,
  'storage_options': {'anon': True,
   'requester_pays': False,
   'client_kwargs': {'endpoint_url': 'https://renc.osn.xsede.org'}}}}

In [105]:
xr.engines

AttributeError: module 'xarray' has no attribute 'engines'

In [99]:
intake.registry.keys()

['alias',
 'catalog',
 'csv',
 'intake_remote',
 'json',
 'jsonl',
 'ndzarr',
 'numpy',
 'textfiles',
 'tiled',
 'tiled_cat',
 'yaml_file_cat',
 'yaml_files_cat',
 'zarr_cat',
 'netcdf',
 'opendap',
 'rasterio',
 'remote-xarray',
 'xarray_image',
 'zarr',
 'remote_xarray']

In [84]:
type(catalog)

intake.catalog.local.YAMLFileCatalog

In [None]:
catalog['prism-v2-osn'].discover()

In [101]:
catalog['prism-v2-osn'].urlpath

's3://rsignellbucket2/nhgf/sample_data/prism_v2.zarr'

In [67]:
k = catalog['prism-v2-osn'].describe()['args']['storage_options']#['client_kwargs']
k

{'anon': True,
 'requester_pays': False,
 'client_kwargs': {'endpoint_url': 'https://renc.osn.xsede.org'}}

In [79]:
from fsspec.mapping import FSMap

In [127]:
%%time
fs = fsspec.filesystem(
    's3',
    **k,
)
open_s3 = fs.open(
    catalog['prism-v2-osn'].urlpath,
)

s3_map = FSMap(catalog['prism-v2-osn'].urlpath, fs)

ds = xr.open_dataset(
    s3_map,
    engine='zarr'#cat_info['driver'][0],
)
ds

CPU times: total: 46.9 ms
Wall time: 1.05 s


In [83]:
ds['ppt']

In [133]:
%%time
# start a s3 filesystem
fs = fsspec.filesystem('s3', **{})

# open the file and return as a dataset
s3_map = fs.open(
    r's3://era5-pds/2008/01/data/air_temperature_at_2_metres.nc',
)
ds2 = xr.open_dataset(
    s3_map,
    engine='h5netcdf',
)


CPU times: total: 688 ms
Wall time: 27.6 s


**WOW, opening from zarr is 27x faster even with a bigger dataset due to chunking!** Pretty crazy

In [120]:
ds2