# The ACCESS-NRI Intake Catalog

The [ACCESS-NRI Intake Catalog](https://access-nri-intake-catalog.readthedocs.io/en/latest/) was developed by Dougie Squire (ACCESS-NRI).

It is built on top of an existing catalog system called [Intake ESM](https://intake-esm.readthedocs.io/en/stable/), to replicate some of the functionality of the COSIMA Cookbook. 

To quote from the documentation

> The catalog aims to provide a way for Python users to discover and load data across a broad range of climate data products available on the Australian NCI supercomputer Gadi

Because it has similar functionality to the COSIMA Cookbook this is a good opportunity to highlight how they are similar, and also importantly how they differ. It is likely you would benefit from using both in your research, potentially in the same notebook, as they offer complementary capabilities.

The first step is to import the catalog. Because we are using the correct `CLEX` supported `conda` environment, the catalog is available without any extra steps

In [None]:
import intake
import cf_xarray

This is the first obvious difference to the COSIMA Cookbook. There is no connection to a database required, the catalog information is available through an `import`, and can be accessed directly 

In [None]:
catalog = intake.cat.access_nri

## Find the data we want to use

There is no GUI for finding data, but the search API is very powerful and flexible. For example: find every 1 degree ACCESS-OM2 experiment with daily `sea_level` data

In [None]:
variable_name    = "sea_level"
model_name       = "ACCESS-OM2"
experiment_regex = "^1deg.*"
frequency        = "1day"

In [None]:
filtered = catalog.search(model=model_name, name=experiment_regex, variable=variable_name, frequency=frequency)
filtered

Note: 

1. The `name` argument is a regular expression (regex), and does a search for the pattern [see python docs](https://docs.python.org/3/library/re.html#re.search)
2. In this case the other two arguments are straight strings, so only returns exact matches, but they could also be regular expressions

Finding combinations of variables and frequency is not as easy to do with the COSIMA Cookbook

Import a dask client library and start a client to allow parallelisation via dask

In [None]:
from distributed import Client

client = Client(threads_per_worker=1)
client.dashboard_link

## Compare to COSIMA Cookbook

Choose the last of the OMIP2 experiments listed in the search above, which is the same data loaded in the sea level notebook using the COSIMA Cookbook. This is to show how accessing data through the two tools differs.

In [None]:
experiment_name = "1deg_jra55_iaf_omip2_cycle6"

Save a variable that points to this experiment

In [None]:
esm_datastore = catalog[experiment_name]

This is an example of where Intake differs from the COSIMA Cookbook. An ESM Datastore can contain multiple datasets, so it is necessary to search *within* an experiment for the variable you already know exists, and then call `.to_dask` to open it as an [xarray Dataset](https://docs.xarray.dev/en/stable/generated/xarray.Dataset.html). In contrast the COSIMA Cookbookreturns a `DataArray` with just the variable requested: it's hides the detail of the files opened, by design.

In [None]:
%%time
sea_level = esm_datastore.search(variable="sea_level",frequency="1day").to_dask()

In [None]:
sea_level

This is exactly the same data as was loaded in the previous notebook (`Sea_level.ipynb`)

![sea-level.png](attachment:f6514d9f-1e83-42f1-a0fa-07c06ae1db6b.png)

Note there is no way to reduce the number of files that are opened in the process of gathering the metadata to construct the `xarray.Dataset` object. Though the dataset can of course be reduced using `sel`/`isel` before computation or data loading.

## Loading data not available in the Cookbook

The ACCESS-NRI Intake Catalogue indexes approximately 3PB of climate related data on gadi. That is a lot. Far more than is available through the COSIMA Cookbook database. For this reason alone it is useful to use in addition to the COSIMA Cookbook, to access data that would otherwise be difficult, or impossible, to find.

For this example we'll search for the variable `zos` the [sea_surface_height_above_geoid](https://pcmdi.llnl.gov/mips/cmip3/variableList.html#Table_O1c), which is the same as the `sea_level` variable loaded before.

In [None]:
catalog = catalog.search(variable="zos.*")
catalog

So straight away we can see there are a large number of CMIP datasets we could compare our data to, including the ACCESS-OM2 CMIP submissions. 

In this case we'll choose `cmip6_fs38` as it contains some ACCESS coupled models that share the same grid as ACCESS-OM2 1 degree model, allowing easy comparison without regridding.

Note: we're calling this a product. The catalogue is quite flexible, so different products can have quite different structures. The CMIP data has multiple realms, sources and experiments for each product.

In [None]:
product_name = "cmip6_fs38"

In [None]:
esm_datastore = catalog[product_name]

In [None]:
esm_datastore

Ok, here is where the schema-rubber hits the catalog-road. There is no getting around it, you need to understand CMIP land to be able to create a meaningful query for this data source. For fluent speakers of CMIP this is obvious, but this is searching for the historical ACCESS-CM2 experiment, choosing a single ensemble member, selecting the `zos` variable, and asking only for real files rather than symbolic links (that last bit is from the way NCI structure their CMIP collection).

Nonetheless, it works and returns an ACCESS-CM2 dataset with a date-range of 1850 to 2014.

In [None]:
zos_cm2 = esm_datastore.search(source_id="ACCESS-CM2", experiment_id="historical", member_id="r10i1p1f1", variable_id="zos", file_type="f").to_dask()
zos_cm2

Can recreate the same climatology as the previous sea level notebook and compare the two. Note here the [cf_xarray accessor](https://cosima-recipes.readthedocs.io/en/latest/Tutorials/Model_Agnostic_Analysis.html) is being used so the CF standard name can be used, which makes this more model agnostic.

In [None]:
zos_cm2_climatology = zos_cm2.sel(time=slice('1957', '1986')).cf['sea_surface_height_above_geoid'].mean(dim='time')

In [None]:
sea_level_climatology = sea_level.sel(time=slice('1957', '1986')).mean('time').compute()

In [None]:
zos_cm2_climatology.plot(vmin=-2, vmax=2, cmap='RdBu_r')

In [None]:
sea_level_climatology.sea_level.plot(vmin=-2, vmax=2, cmap='RdBu_r')

## Appendix

Regarding the use of cf_xarray, just printing the `.cf` accessor provides a wealth of information that the `cf_xarray` library has managed to infer from this data

In [None]:
zos_cm2.cf