# ACCESS-NRI Intake catalog for loading model output

This tutorial demonstrates how to use the ACCESS-NRI Intake catalog to load model or other output.

⚠️ **Membership to project `xp65` is required to access the ACCESS-NRI Intake catalog** ⚠️

This is a concise version of the longer [ACCESS-NRI Intake catalog documentation](https://access-nri-intake-catalog.readthedocs.io/) and related [COSIMA training workshop](https://github.com/ACCESS-Hive/cosima-training-workshop-2023/blob/main/Intake.ipynb). Users are encouraged to refer to these for more detail and demonstrations.

Requirements: The `conda/analysis3` module from `/g/data/xp65/public/modules`.

# Start a dask Client

This is not specific to using the ACCESS-NRI Intake catalog, but it's useful!

In [1]:
from dask.distributed import Client

client = Client(threads_per_worker=1)
client

Perhaps you already have a cluster running?
Hosting the HTTP server on port 45801 instead


0,1
Connection method: Cluster object,Cluster type: distributed.LocalCluster
Dashboard: /proxy/45801/status,

0,1
Dashboard: /proxy/45801/status,Workers: 28
Total threads: 28,Total memory: 126.00 GiB
Status: running,Using processes: True

0,1
Comm: tcp://127.0.0.1:38193,Workers: 0
Dashboard: /proxy/45801/status,Total threads: 0
Started: Just now,Total memory: 0 B

0,1
Comm: tcp://127.0.0.1:33173,Total threads: 1
Dashboard: /proxy/36167/status,Memory: 4.50 GiB
Nanny: tcp://127.0.0.1:33577,
Local directory: /jobfs/154337789.gadi-pbs/dask-scratch-space/worker-1weddm88,Local directory: /jobfs/154337789.gadi-pbs/dask-scratch-space/worker-1weddm88

0,1
Comm: tcp://127.0.0.1:45105,Total threads: 1
Dashboard: /proxy/36475/status,Memory: 4.50 GiB
Nanny: tcp://127.0.0.1:46439,
Local directory: /jobfs/154337789.gadi-pbs/dask-scratch-space/worker-69z2k8_l,Local directory: /jobfs/154337789.gadi-pbs/dask-scratch-space/worker-69z2k8_l

0,1
Comm: tcp://127.0.0.1:35663,Total threads: 1
Dashboard: /proxy/43597/status,Memory: 4.50 GiB
Nanny: tcp://127.0.0.1:42247,
Local directory: /jobfs/154337789.gadi-pbs/dask-scratch-space/worker-a7hppbn7,Local directory: /jobfs/154337789.gadi-pbs/dask-scratch-space/worker-a7hppbn7

0,1
Comm: tcp://127.0.0.1:36933,Total threads: 1
Dashboard: /proxy/45809/status,Memory: 4.50 GiB
Nanny: tcp://127.0.0.1:42319,
Local directory: /jobfs/154337789.gadi-pbs/dask-scratch-space/worker-8586956c,Local directory: /jobfs/154337789.gadi-pbs/dask-scratch-space/worker-8586956c

0,1
Comm: tcp://127.0.0.1:32911,Total threads: 1
Dashboard: /proxy/43417/status,Memory: 4.50 GiB
Nanny: tcp://127.0.0.1:40323,
Local directory: /jobfs/154337789.gadi-pbs/dask-scratch-space/worker-wn7xcer3,Local directory: /jobfs/154337789.gadi-pbs/dask-scratch-space/worker-wn7xcer3

0,1
Comm: tcp://127.0.0.1:40015,Total threads: 1
Dashboard: /proxy/41161/status,Memory: 4.50 GiB
Nanny: tcp://127.0.0.1:41243,
Local directory: /jobfs/154337789.gadi-pbs/dask-scratch-space/worker-9tl18bij,Local directory: /jobfs/154337789.gadi-pbs/dask-scratch-space/worker-9tl18bij

0,1
Comm: tcp://127.0.0.1:37783,Total threads: 1
Dashboard: /proxy/39841/status,Memory: 4.50 GiB
Nanny: tcp://127.0.0.1:36617,
Local directory: /jobfs/154337789.gadi-pbs/dask-scratch-space/worker-o7822140,Local directory: /jobfs/154337789.gadi-pbs/dask-scratch-space/worker-o7822140

0,1
Comm: tcp://127.0.0.1:40059,Total threads: 1
Dashboard: /proxy/37915/status,Memory: 4.50 GiB
Nanny: tcp://127.0.0.1:41791,
Local directory: /jobfs/154337789.gadi-pbs/dask-scratch-space/worker-9nh3vf43,Local directory: /jobfs/154337789.gadi-pbs/dask-scratch-space/worker-9nh3vf43

0,1
Comm: tcp://127.0.0.1:33191,Total threads: 1
Dashboard: /proxy/39567/status,Memory: 4.50 GiB
Nanny: tcp://127.0.0.1:34415,
Local directory: /jobfs/154337789.gadi-pbs/dask-scratch-space/worker-hu5xd0da,Local directory: /jobfs/154337789.gadi-pbs/dask-scratch-space/worker-hu5xd0da

0,1
Comm: tcp://127.0.0.1:44233,Total threads: 1
Dashboard: /proxy/37875/status,Memory: 4.50 GiB
Nanny: tcp://127.0.0.1:34727,
Local directory: /jobfs/154337789.gadi-pbs/dask-scratch-space/worker-42kwods0,Local directory: /jobfs/154337789.gadi-pbs/dask-scratch-space/worker-42kwods0

0,1
Comm: tcp://127.0.0.1:33703,Total threads: 1
Dashboard: /proxy/46557/status,Memory: 4.50 GiB
Nanny: tcp://127.0.0.1:46193,
Local directory: /jobfs/154337789.gadi-pbs/dask-scratch-space/worker-xkl8orn2,Local directory: /jobfs/154337789.gadi-pbs/dask-scratch-space/worker-xkl8orn2

0,1
Comm: tcp://127.0.0.1:43569,Total threads: 1
Dashboard: /proxy/37821/status,Memory: 4.50 GiB
Nanny: tcp://127.0.0.1:35471,
Local directory: /jobfs/154337789.gadi-pbs/dask-scratch-space/worker-pby826wk,Local directory: /jobfs/154337789.gadi-pbs/dask-scratch-space/worker-pby826wk

0,1
Comm: tcp://127.0.0.1:34335,Total threads: 1
Dashboard: /proxy/46037/status,Memory: 4.50 GiB
Nanny: tcp://127.0.0.1:36753,
Local directory: /jobfs/154337789.gadi-pbs/dask-scratch-space/worker-e4hcuync,Local directory: /jobfs/154337789.gadi-pbs/dask-scratch-space/worker-e4hcuync

0,1
Comm: tcp://127.0.0.1:42647,Total threads: 1
Dashboard: /proxy/42437/status,Memory: 4.50 GiB
Nanny: tcp://127.0.0.1:45541,
Local directory: /jobfs/154337789.gadi-pbs/dask-scratch-space/worker-alct4wwv,Local directory: /jobfs/154337789.gadi-pbs/dask-scratch-space/worker-alct4wwv

0,1
Comm: tcp://127.0.0.1:41733,Total threads: 1
Dashboard: /proxy/38335/status,Memory: 4.50 GiB
Nanny: tcp://127.0.0.1:39143,
Local directory: /jobfs/154337789.gadi-pbs/dask-scratch-space/worker-r9x2jqps,Local directory: /jobfs/154337789.gadi-pbs/dask-scratch-space/worker-r9x2jqps

0,1
Comm: tcp://127.0.0.1:45759,Total threads: 1
Dashboard: /proxy/36531/status,Memory: 4.50 GiB
Nanny: tcp://127.0.0.1:42279,
Local directory: /jobfs/154337789.gadi-pbs/dask-scratch-space/worker-ctjpkmfg,Local directory: /jobfs/154337789.gadi-pbs/dask-scratch-space/worker-ctjpkmfg

0,1
Comm: tcp://127.0.0.1:43105,Total threads: 1
Dashboard: /proxy/36607/status,Memory: 4.50 GiB
Nanny: tcp://127.0.0.1:43193,
Local directory: /jobfs/154337789.gadi-pbs/dask-scratch-space/worker-cbvxhwmr,Local directory: /jobfs/154337789.gadi-pbs/dask-scratch-space/worker-cbvxhwmr

0,1
Comm: tcp://127.0.0.1:40987,Total threads: 1
Dashboard: /proxy/38651/status,Memory: 4.50 GiB
Nanny: tcp://127.0.0.1:44313,
Local directory: /jobfs/154337789.gadi-pbs/dask-scratch-space/worker-xn85mtqm,Local directory: /jobfs/154337789.gadi-pbs/dask-scratch-space/worker-xn85mtqm

0,1
Comm: tcp://127.0.0.1:42281,Total threads: 1
Dashboard: /proxy/44463/status,Memory: 4.50 GiB
Nanny: tcp://127.0.0.1:46803,
Local directory: /jobfs/154337789.gadi-pbs/dask-scratch-space/worker-le35bhsl,Local directory: /jobfs/154337789.gadi-pbs/dask-scratch-space/worker-le35bhsl

0,1
Comm: tcp://127.0.0.1:35293,Total threads: 1
Dashboard: /proxy/41601/status,Memory: 4.50 GiB
Nanny: tcp://127.0.0.1:36533,
Local directory: /jobfs/154337789.gadi-pbs/dask-scratch-space/worker-anukpe04,Local directory: /jobfs/154337789.gadi-pbs/dask-scratch-space/worker-anukpe04

0,1
Comm: tcp://127.0.0.1:38313,Total threads: 1
Dashboard: /proxy/40581/status,Memory: 4.50 GiB
Nanny: tcp://127.0.0.1:39083,
Local directory: /jobfs/154337789.gadi-pbs/dask-scratch-space/worker-36wjzyti,Local directory: /jobfs/154337789.gadi-pbs/dask-scratch-space/worker-36wjzyti

0,1
Comm: tcp://127.0.0.1:43683,Total threads: 1
Dashboard: /proxy/45997/status,Memory: 4.50 GiB
Nanny: tcp://127.0.0.1:35999,
Local directory: /jobfs/154337789.gadi-pbs/dask-scratch-space/worker-imm3lmm6,Local directory: /jobfs/154337789.gadi-pbs/dask-scratch-space/worker-imm3lmm6

0,1
Comm: tcp://127.0.0.1:44665,Total threads: 1
Dashboard: /proxy/45051/status,Memory: 4.50 GiB
Nanny: tcp://127.0.0.1:33739,
Local directory: /jobfs/154337789.gadi-pbs/dask-scratch-space/worker-fducyvzq,Local directory: /jobfs/154337789.gadi-pbs/dask-scratch-space/worker-fducyvzq

0,1
Comm: tcp://127.0.0.1:41871,Total threads: 1
Dashboard: /proxy/34725/status,Memory: 4.50 GiB
Nanny: tcp://127.0.0.1:42963,
Local directory: /jobfs/154337789.gadi-pbs/dask-scratch-space/worker-d3vh_1ny,Local directory: /jobfs/154337789.gadi-pbs/dask-scratch-space/worker-d3vh_1ny

0,1
Comm: tcp://127.0.0.1:37465,Total threads: 1
Dashboard: /proxy/44927/status,Memory: 4.50 GiB
Nanny: tcp://127.0.0.1:40001,
Local directory: /jobfs/154337789.gadi-pbs/dask-scratch-space/worker-pmy5fmwp,Local directory: /jobfs/154337789.gadi-pbs/dask-scratch-space/worker-pmy5fmwp

0,1
Comm: tcp://127.0.0.1:46519,Total threads: 1
Dashboard: /proxy/37109/status,Memory: 4.50 GiB
Nanny: tcp://127.0.0.1:41163,
Local directory: /jobfs/154337789.gadi-pbs/dask-scratch-space/worker-xz08ckzm,Local directory: /jobfs/154337789.gadi-pbs/dask-scratch-space/worker-xz08ckzm

0,1
Comm: tcp://127.0.0.1:46753,Total threads: 1
Dashboard: /proxy/43445/status,Memory: 4.50 GiB
Nanny: tcp://127.0.0.1:40183,
Local directory: /jobfs/154337789.gadi-pbs/dask-scratch-space/worker-7xvvk2h5,Local directory: /jobfs/154337789.gadi-pbs/dask-scratch-space/worker-7xvvk2h5

0,1
Comm: tcp://127.0.0.1:38773,Total threads: 1
Dashboard: /proxy/33829/status,Memory: 4.50 GiB
Nanny: tcp://127.0.0.1:44655,
Local directory: /jobfs/154337789.gadi-pbs/dask-scratch-space/worker-cbzzfoiw,Local directory: /jobfs/154337789.gadi-pbs/dask-scratch-space/worker-cbzzfoiw


# Opening and searching the catalog

To use the ACCESS-NRI Intake catalog, we need to import `intake`

In [2]:
import intake

We can open the catalog as follows

In [3]:
catalog = intake.cat.access_nri

The returned object `catalog` is an instance of the ACCESS-NRI Intake catalog that we can use to find and load data.

Printing the `catalog` object will return a dataframe of experiments that you can browse:

In [4]:
catalog

Unnamed: 0_level_0,model,description,realm,frequency,variable
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
01deg_jra55_ryf_Control,{ACCESS-OM2-01},"{0.1° ACCESS-OM2 repeat year forcing control run for the simulations performed in Huguenin et al. (2024, GRL)}","{ocean, seaIce}","{fx, 1mon}","{HTN, st_ocean, nv, mlt_onset_m, grid_yu_ocean, mld, total_ocean_lw_heat, grid_xt_ocean, dzt, tx_trans_nrho_submeso, mass_pmepr_on_nrho, pme_net, kmt, sens_heat, sw_heat_on_nrho, potrho, total_oce..."
01deg_jra55_ryf_ENFull,{ACCESS-OM2},"{0.1° ACCESS-OM2 El Níño run for the simulations performed in Huguenin et al. (2024, GRL)}","{ocean, seaIce}","{fx, 1mon}","{HTN, st_ocean, nv, mlt_onset_m, grid_yu_ocean, mld, total_ocean_lw_heat, grid_xt_ocean, dzt, pme_net, kmt, sens_heat, potrho, total_ocean_swflx_vis, total_ocean_melt, sfc_hflux_from_runoff, salt_..."
01deg_jra55_ryf_LNFull,{ACCESS-OM2},"{0.1° ACCESS-OM2 La Níña run for the simulations performed in Huguenin et al. (2024, GRL)}","{ocean, seaIce}","{fx, 1mon}","{HTN, st_ocean, nv, mlt_onset_m, grid_yu_ocean, mld, total_ocean_lw_heat, grid_xt_ocean, dzt, pme_net, kmt, sens_heat, potrho, total_ocean_swflx_vis, total_ocean_melt, sfc_hflux_from_runoff, salt_..."
01deg_jra55v13_ryf9091,{ACCESS-OM2-01},{0.1 degree ACCESS-OM2 global model configuration with JRA55-do v1.3 RYF9091 repeat year forcing (May 1990 to Apr 1991)},"{ocean, seaIce}","{3mon, 3hr, 1day, fx, 1mon}","{HTN, st_ocean, nv, mlt_onset_m, mld, grid_yu_ocean, total_ocean_lw_heat, grid_xt_ocean, dzt, eta_t, pme_net, kmt, sens_heat, potrho, total_ocean_swflx_vis, xu_ocean_sub01, total_ocean_melt, sfc_h..."
01deg_jra55v13_ryf9091_easterlies_down10,{ACCESS-OM2-01},{0.1 degree ACCESS-OM2 global model configuration with JRA55-do v1.3 RYF9091 repeat year forcing (May 1990 to Apr 1991) and zonal/meridional wind speed around Antarctica decreased by 10%.},"{ocean, seaIce}","{1day, fx, 1mon}","{HTN, st_ocean, nv, mlt_onset_m, mld, grid_yu_ocean, total_ocean_lw_heat, grid_xt_ocean, dzt, eta_t, pme_net, kmt, sens_heat, potrho, total_ocean_swflx_vis, total_ocean_melt, sfc_hflux_from_runoff..."
01deg_jra55v13_ryf9091_easterlies_up10,{ACCESS-OM2-01},{0.1 degree ACCESS-OM2 global model configuration with JRA55-do v1.3 RYF9091 repeat year forcing (May 1990 to Apr 1991) and zonal/meridional wind speed around Antarctica increased by 10%.},"{ocean, seaIce}","{1day, fx, 1mon}","{HTN, st_ocean, nv, mlt_onset_m, mld, grid_yu_ocean, total_ocean_lw_heat, grid_xt_ocean, dzt, eta_t, pme_net, kmt, sens_heat, potrho, total_ocean_swflx_vis, total_ocean_melt, sfc_hflux_from_runoff..."
01deg_jra55v13_ryf9091_easterlies_up10_meridional,{ACCESS-OM2-01},{0.1 degree ACCESS-OM2 global model configuration with JRA55-do v1.3 RYF9091 repeat year forcing (May 1990 to Apr 1991) and meridional wind speed around Antarctica increased by 10%.},"{ocean, seaIce}","{1day, fx, 1mon}","{HTN, st_ocean, nv, mlt_onset_m, mld, grid_yu_ocean, total_ocean_lw_heat, grid_xt_ocean, dzt, eta_t, pme_net, kmt, sens_heat, potrho, total_ocean_swflx_vis, total_ocean_melt, sfc_hflux_from_runoff..."
01deg_jra55v13_ryf9091_easterlies_up10_zonal,{ACCESS-OM2-01},{0.1 degree ACCESS-OM2 global model configuration with JRA55-do v1.3 RYF9091 repeat year forcing (May 1990 to Apr 1991) and zonal wind speed around Antarctica increased by 10%.},"{ocean, seaIce}","{1day, fx, 1mon}","{HTN, st_ocean, nv, mlt_onset_m, mld, grid_yu_ocean, total_ocean_lw_heat, grid_xt_ocean, dzt, eta_t, pme_net, kmt, sens_heat, potrho, total_ocean_swflx_vis, total_ocean_melt, sfc_hflux_from_runoff..."
01deg_jra55v13_ryf9091_qian_wthmp,{ACCESS-OM2},"{Future perturbations with wind, thermal and meltwater forcing, branching off 01deg_jra55v13_ryf9091, as described in Li et al. 2023, https://www.nature.com/articles/s41586-023-05762-w}","{ocean, seaIce}","{fx, 1mon}","{HTN, st_ocean, nv, mlt_onset_m, grid_yu_ocean, mld, total_ocean_lw_heat, grid_xt_ocean, dzt, pme_net, kmt, sens_heat, potrho, total_ocean_swflx_vis, total_ocean_melt, sfc_hflux_from_runoff, salt_..."
01deg_jra55v13_ryf9091_qian_wthp,{ACCESS-OM2},"{Future perturbation with wind and thermal forcing, branching off 01deg_jra55v13_ryf9091, as described in Li et al. 2023, https://www.nature.com/articles/s41586-023-05762-w}","{ocean, seaIce}","{fx, 1mon}","{HTN, st_ocean, nv, mlt_onset_m, grid_yu_ocean, mld, total_ocean_lw_heat, grid_xt_ocean, dzt, pme_net, kmt, sens_heat, potrho, total_ocean_swflx_vis, total_ocean_melt, sfc_hflux_from_runoff, salt_..."


You can also search based on the columns in this dataframe to find experiments that are relevant to you. For example, you might be interested in all ACCESS-OM2 experiments that have the variable `"surface_salt"` at daily frequency. There are 6 such experiments currently available through the catalog:

In [5]:
catalog.search(model="ACCESS-OM2", variable="surface_salt", frequency="1day")

Unnamed: 0_level_0,model,description,realm,frequency,variable
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
025deg_era5_iaf,{ACCESS-OM2},{0.25 degree ACCESS-OM2 global model configuration with ERA5 interannual\nforcing (1980-2021)},{ocean},{1day},{surface_salt}
025deg_era5_ryf,{ACCESS-OM2},{0.25 degree ACCESS-OM2 global model configuration with ERA5 RYF9091 repeat\nyear forcing (May 1990 to Apr 1991)},{ocean},{1day},{surface_salt}
025deg_jra55_iaf_era5comparison,{ACCESS-OM2},{0.25 degree ACCESS-OM2 global model configuration with JRA55-do v1.5.0\ninterannual forcing (1980-2019)},{ocean},{1day},{surface_salt}
025deg_jra55_ryf_era5comparison,{ACCESS-OM2},{0.25 degree ACCESS-OM2 global model configuration with JRA55-do v1.4.0\nRYF9091 repeat year forcing (May 1990 to Apr 1991)},{ocean},{1day},{surface_salt}
1deg_era5_iaf,{ACCESS-OM2},{1 degree ACCESS-OM2 global model configuration with ERA5 interannual\nforcing (1960-2019)},{ocean},{1day},{surface_salt}
1deg_jra55_iaf_era5comparison,{ACCESS-OM2},{1 degree ACCESS-OM2 global model configuration with JRA55-do v1.4.0\ninterannual forcing (1960-2019)},{ocean},{1day},{surface_salt}


# Opening data

There are [multiple ways](https://access-nri-intake-catalog.readthedocs.io/en/latest/usage/quickstart.html#loading-intake-sources) to open data from the experiments in `catalog`. Here we'll demonstrate how to do this when you know the name of the experiment you are interested in, since this typical for COSIMA users.

For example, we can open monthly data for the `surface_salt` variable in the `01deg_jra55v13_ryf9091` experiment as follows:

In [6]:
experiment = "01deg_jra55v13_ryf9091"
variable = "surface_salt"

In [7]:
data_ic = catalog[experiment].search(
    variable=variable, 
    frequency="1mon"
).to_dask()

  records = grouped.get_group(internal_key).to_dict(orient='records')


In [8]:
data_ic["surface_salt"]

Unnamed: 0,Array,Chunk
Bytes,121.67 GiB,2.32 MiB
Shape,"(3360, 2700, 3600)","(1, 675, 900)"
Dask graph,53760 chunks in 2233 graph layers,53760 chunks in 2233 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 121.67 GiB 2.32 MiB Shape (3360, 2700, 3600) (1, 675, 900) Dask graph 53760 chunks in 2233 graph layers Data type float32 numpy.ndarray",3600  2700  3360,

Unnamed: 0,Array,Chunk
Bytes,121.67 GiB,2.32 MiB
Shape,"(3360, 2700, 3600)","(1, 675, 900)"
Dask graph,53760 chunks in 2233 graph layers,53760 chunks in 2233 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


# Some important facts

There are a few important facts in the ACCESS-NRI Intake catalog that users should be aware of.

## 1. The catalog returns `Datasets` (not `Dataarray`s)

This is because with the catalog you can load multiple variables into a single dataset with a single call (when these variables are in the same file). For example,

In [9]:
data_ic_multivar = catalog[experiment].search(
    variable=["surface_salt", "surface_temp"], 
    frequency="1mon"
).to_dask()

  records = grouped.get_group(internal_key).to_dict(orient='records')


In [10]:
data_ic_multivar

Unnamed: 0,Array,Chunk
Bytes,121.67 GiB,2.32 MiB
Shape,"(3360, 2700, 3600)","(1, 675, 900)"
Dask graph,53760 chunks in 2233 graph layers,53760 chunks in 2233 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 121.67 GiB 2.32 MiB Shape (3360, 2700, 3600) (1, 675, 900) Dask graph 53760 chunks in 2233 graph layers Data type float32 numpy.ndarray",3600  2700  3360,

Unnamed: 0,Array,Chunk
Bytes,121.67 GiB,2.32 MiB
Shape,"(3360, 2700, 3600)","(1, 675, 900)"
Dask graph,53760 chunks in 2233 graph layers,53760 chunks in 2233 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,121.67 GiB,2.32 MiB
Shape,"(3360, 2700, 3600)","(1, 675, 900)"
Dask graph,53760 chunks in 2233 graph layers,53760 chunks in 2233 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 121.67 GiB 2.32 MiB Shape (3360, 2700, 3600) (1, 675, 900) Dask graph 53760 chunks in 2233 graph layers Data type float32 numpy.ndarray",3600  2700  3360,

Unnamed: 0,Array,Chunk
Bytes,121.67 GiB,2.32 MiB
Shape,"(3360, 2700, 3600)","(1, 675, 900)"
Dask graph,53760 chunks in 2233 graph layers,53760 chunks in 2233 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


## 2. The catalog knows which files make up distinct datasets

The catalog knows which files make up distinct datasets and provides methods to open multiple datasets from a single query. We can run the equivalent to the cell above using the catalog, using `to_dataset_dict()` rather than `to_dask()`. Doing so returns a dictionary containing Datasets of the variable at all the available frequencies (daily and monthly in this case).

In [11]:
data_ic_multifreq = catalog[experiment].search(variable=variable).to_dataset_dict()


--> The keys in the returned dictionary of datasets are constructed as follows:
	'file_id'


  records = grouped.get_group(internal_key).to_dict(orient='records')
  records = grouped.get_group(internal_key).to_dict(orient='records')
  records = grouped.get_group(internal_key).to_dict(orient='records')


In [12]:
data_ic_multifreq

{'ocean.1day.nv:2.xt_ocean:3600.xu_ocean:3600.yt_ocean:2700.yu_ocean:2700': <xarray.Dataset> Size: 610GB
 Dimensions:       (time: 15695, yt_ocean: 2700, xt_ocean: 3600)
 Coordinates:
   * xt_ocean      (xt_ocean) float64 29kB -279.9 -279.8 -279.7 ... 79.85 79.95
   * yt_ocean      (yt_ocean) float64 22kB -81.11 -81.07 -81.02 ... 89.94 89.98
   * time          (time) object 126kB 2137-01-01 12:00:00 ... 2179-12-31 12:0...
 Data variables:
     surface_salt  (time, yt_ocean, xt_ocean) float32 610GB dask.array<chunksize=(1, 675, 900), meta=np.ndarray>
 Attributes:
     filename:                        ocean_daily.nc
     title:                           ACCESS-OM2-01
     grid_type:                       mosaic
     grid_tile:                       1
     intake_esm_vars:                 ['surface_salt']
     intake_esm_attrs:filename:       ocean_daily.nc
     intake_esm_attrs:file_id:        ocean.1day.nv:2.xt_ocean:3600.xu_ocean:3...
     intake_esm_attrs:frequency:      1day
     int

Alternatively, multiple datasets can be opened directly into an [xarray-datatree](https://xarray-datatree.readthedocs.io/en/latest/) by calling `to_datatree` rather than `to_dataset_dict` (in an upcoming release, it will be easier for users to control how the groups are structured in the datatree). For example:

In [13]:
data_ic_datatree = catalog[experiment].search(variable=variable).to_datatree()


--> The keys in the returned dictionary of datasets are constructed as follows:
	'file_id'


  records = grouped.get_group(internal_key).to_dict(orient='records')
  records = grouped.get_group(internal_key).to_dict(orient='records')
  records = grouped.get_group(internal_key).to_dict(orient='records')


In [14]:
data_ic_datatree

Unnamed: 0,Array,Chunk
Bytes,568.31 GiB,2.32 MiB
Shape,"(15695, 2700, 3600)","(1, 675, 900)"
Dask graph,251120 chunks in 345 graph layers,251120 chunks in 345 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 568.31 GiB 2.32 MiB Shape (15695, 2700, 3600) (1, 675, 900) Dask graph 251120 chunks in 345 graph layers Data type float32 numpy.ndarray",3600  2700  15695,

Unnamed: 0,Array,Chunk
Bytes,568.31 GiB,2.32 MiB
Shape,"(15695, 2700, 3600)","(1, 675, 900)"
Dask graph,251120 chunks in 345 graph layers,251120 chunks in 345 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,2.33 TiB,2.32 MiB
Shape,"(65885, 2700, 3600)","(1, 675, 900)"
Dask graph,1054160 chunks in 1445 graph layers,1054160 chunks in 1445 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 2.33 TiB 2.32 MiB Shape (65885, 2700, 3600) (1, 675, 900) Dask graph 1054160 chunks in 1445 graph layers Data type float32 numpy.ndarray",3600  2700  65885,

Unnamed: 0,Array,Chunk
Bytes,2.33 TiB,2.32 MiB
Shape,"(65885, 2700, 3600)","(1, 675, 900)"
Dask graph,1054160 chunks in 1445 graph layers,1054160 chunks in 1445 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,121.67 GiB,2.32 MiB
Shape,"(3360, 2700, 3600)","(1, 675, 900)"
Dask graph,53760 chunks in 2233 graph layers,53760 chunks in 2233 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 121.67 GiB 2.32 MiB Shape (3360, 2700, 3600) (1, 675, 900) Dask graph 53760 chunks in 2233 graph layers Data type float32 numpy.ndarray",3600  2700  3360,

Unnamed: 0,Array,Chunk
Bytes,121.67 GiB,2.32 MiB
Shape,"(3360, 2700, 3600)","(1, 675, 900)"
Dask graph,53760 chunks in 2233 graph layers,53760 chunks in 2233 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


## 3. The frequency vocabulary:

In the catalog, frequency follows a standard vocabulary that is very similar to CMIP6:

```python
"fx" # fixed
"subhr" # subhourly
"<int>hr" # hourly
"<int>day" # daily
"<int>mon" # monthly
"<int>yr" # yearly
"<int>dec" # decadal
```

## 4. Method for passing keyword arguments

With the catalog, keyword argments for xarray's `open_dataset` and `combine_by_coords` functions are passed separately to `to_dask` (or `to_dataset_dict`). For example:

In [15]:
xarray_open_kwargs=dict(
    chunks={"xt_ocean": -1, "yt_ocean": -1}
)
xarray_combine_by_coords_kwargs=dict(
    compat="override",
    data_vars="minimal",
    coords="minimal"
)

data_ic_kw = catalog[experiment].search(
    variable=variable, 
    frequency="1mon"
).to_dask(
    xarray_open_kwargs=xarray_open_kwargs,
    xarray_combine_by_coords_kwargs=xarray_combine_by_coords_kwargs,
)

  records = grouped.get_group(internal_key).to_dict(orient='records')


## 5. Catalog does not allow search by start and end date

It's not possible to query on a time range with the Intake catalog.

We can always slice the time axis afterwards though. That is, with the catalog you'd just do:

In [17]:
data_ic = catalog[experiment].search(
    variable=variable, 
    frequency="1mon"
).to_dask()

start_time = "2000-01-01"
end_time = "2180-01-01"

data_ic_times = data_ic.sel(time=slice(start_time, end_time))

  records = grouped.get_group(internal_key).to_dict(orient='records')


which takes a few seconds longer.

This difference is acceptable because the opening of datasets is a parallelized task that is done [lazily](https://docs.xarray.dev/en/stable/user-guide/dask.html#parallel-computing-with-dask),  so opening all files and reducing the times using xarray's `sel` methods doesn't add too much overhead. In most cases where the overhead of opening the files seems large, this can be reduced through sensible choices of keyword arguments provided to `open_dataset` and `combine_by_coords` - see the xarray documentation on [Reading multi-file datasets](https://docs.xarray.dev/en/stable/user-guide/io.html#reading-multi-file-datasets) for details.

## 6. Applying a preprocessing function

You can use `xarray`'s preprocess function to apply a function to each dataset prior to `intake`'s concatenation. In some cases, this can make the loading into memory fast. 

For example:

In [18]:
def select_region(ds):
    ds = ds.sel(xt_ocean=slice(-230, -180), yt_ocean=slice(-50, -20))
    return ds

data_ic = catalog['01deg_jra55v13_ryf9091'].search(
    variable='surface_temp', 
    frequency="1mon"
).to_dask(preprocess=select_region)
data_ic

  records = grouped.get_group(internal_key).to_dict(orient='records')


Unnamed: 0,Array,Chunk
Bytes,2.35 GiB,571.88 kiB
Shape,"(3360, 375, 500)","(1, 366, 400)"
Dask graph,13440 chunks in 3349 graph layers,13440 chunks in 3349 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 2.35 GiB 571.88 kiB Shape (3360, 375, 500) (1, 366, 400) Dask graph 13440 chunks in 3349 graph layers Data type float32 numpy.ndarray",500  375  3360,

Unnamed: 0,Array,Chunk
Bytes,2.35 GiB,571.88 kiB
Shape,"(3360, 375, 500)","(1, 366, 400)"
Dask graph,13440 chunks in 3349 graph layers,13440 chunks in 3349 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


# Tips, gotchas and workarounds

## 1. Speeding up opening your datasets

Try passing the following argument to your `to_dask` or `to_dataset_dict` call:

```python
xarray_combine_by_coords_kwargs=dict(
    compat="override",
    data_vars="minimal",
    coords="minimal"
)
```

See the xarray documentation on [Reading multi-file datasets](https://docs.xarray.dev/en/stable/user-guide/io.html#reading-multi-file-datasets) for more details about these arguments.

## 2. Choosing chunksizes

Correctly choosing chunk sizes when you open datasets will greatly improve the speed of your analysis. Check out the [Chunking tutorial](https://access-nri-intake-catalog.readthedocs.io/en/latest/usage/chunking.html) in the ACCESS-NRI Intake catalog documentation

## 3. Loading time-invariant variables

Many COSIMA experiments include multiple repeated files containing the same fixed frequency data (e.g. grid information). You can use the option `fx` for the frequency argument, otherwise the catalog fails to concatenate these files since they don't contain clear dimension to concatenate along.

In [20]:
data_ic_fixed = catalog[experiment].search(
    variable='area_t',
    frequency='fx'
).to_dask()
data_ic_fixed

  records = grouped.get_group(internal_key).to_dict(orient='records')


Unnamed: 0,Array,Chunk
Bytes,37.08 MiB,2.32 MiB
Shape,"(2700, 3600)","(675, 900)"
Dask graph,16 chunks in 2 graph layers,16 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 37.08 MiB 2.32 MiB Shape (2700, 3600) (675, 900) Dask graph 16 chunks in 2 graph layers Data type float32 numpy.ndarray",3600  2700,

Unnamed: 0,Array,Chunk
Bytes,37.08 MiB,2.32 MiB
Shape,"(2700, 3600)","(675, 900)"
Dask graph,16 chunks in 2 graph layers,16 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,37.08 MiB,2.32 MiB
Shape,"(2700, 3600)","(675, 900)"
Dask graph,16 chunks in 2 graph layers,16 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 37.08 MiB 2.32 MiB Shape (2700, 3600) (675, 900) Dask graph 16 chunks in 2 graph layers Data type float32 numpy.ndarray",3600  2700,

Unnamed: 0,Array,Chunk
Bytes,37.08 MiB,2.32 MiB
Shape,"(2700, 3600)","(675, 900)"
Dask graph,16 chunks in 2 graph layers,16 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,37.08 MiB,2.32 MiB
Shape,"(2700, 3600)","(675, 900)"
Dask graph,16 chunks in 2 graph layers,16 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 37.08 MiB 2.32 MiB Shape (2700, 3600) (675, 900) Dask graph 16 chunks in 2 graph layers Data type float32 numpy.ndarray",3600  2700,

Unnamed: 0,Array,Chunk
Bytes,37.08 MiB,2.32 MiB
Shape,"(2700, 3600)","(675, 900)"
Dask graph,16 chunks in 2 graph layers,16 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


## 4. Determining what can be searched upon in an experiment

You can see what can be `search`ed on within an experiment with:

In [21]:
catalog[experiment].df.columns.tolist()

['filename',
 'path',
 'file_id',
 'frequency',
 'start_date',
 'end_date',
 'variable',
 'variable_long_name',
 'variable_standard_name',
 'variable_cell_methods',
 'variable_units',
 'realm']

It can also be helpful sometimes to look at the `catalog[experiment].df` object itself, which is a dataframe of all of the files in the experiment and their metadata

In [22]:
catalog[experiment].df.head()

Unnamed: 0,filename,path,file_id,frequency,start_date,end_date,variable,variable_long_name,variable_standard_name,variable_cell_methods,variable_units,realm
0,iceh.1900-01.nc,/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v13_ryf9091/output000/ice/OUTPUT/iceh.1900-01.nc,seaIce.1mon.d2:2.nc:5.ni:3600.nj:2700,1mon,"1900-01-01, 00:00:00","1900-02-01, 00:00:00","(ANGLE, ANGLET, HTE, HTN, NCAT, TLAT, TLON, Tsfc_m, ULAT, ULON, aice_m, aicen_m, alidf_ai_m, alidr_ai_m, alvdf_ai_m, alvdr_ai_m, blkmask, congel_m, divu_m, dxt, dxu, dyt, dyu, flatn_ai_m, fmeltt_a...","(angle grid makes with latitude line on U grid, angle grid makes with latitude line on T grid, T cell width on East side, T cell width on North side, category maximum thickness, T grid center lati...","(, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , )","(, , , , , , , time: mean, , , time: mean, time: mean, time: mean, time: mean, time: mean, time: mean, , time: mean, time: mean, , , , , time: mean, time: mean, time: mean, time: mean, time: mean,...","(radians, radians, m, m, m, degrees_north, degrees_east, C, degrees_north, degrees_east, 1, 1, %, %, %, %, , cm/day, %/day, m, m, m, m, W/m^2, W/m^2, W/m^2, cm/day, day of year, kg/m^2/s, kg/m^2/s...",seaIce
1,iceh.1900-02.nc,/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v13_ryf9091/output000/ice/OUTPUT/iceh.1900-02.nc,seaIce.1mon.d2:2.nc:5.ni:3600.nj:2700,1mon,"1900-02-01, 00:00:00","1900-03-01, 00:00:00","(ANGLE, ANGLET, HTE, HTN, NCAT, TLAT, TLON, Tsfc_m, ULAT, ULON, aice_m, aicen_m, alidf_ai_m, alidr_ai_m, alvdf_ai_m, alvdr_ai_m, blkmask, congel_m, divu_m, dxt, dxu, dyt, dyu, flatn_ai_m, fmeltt_a...","(angle grid makes with latitude line on U grid, angle grid makes with latitude line on T grid, T cell width on East side, T cell width on North side, category maximum thickness, T grid center lati...","(, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , )","(, , , , , , , time: mean, , , time: mean, time: mean, time: mean, time: mean, time: mean, time: mean, , time: mean, time: mean, , , , , time: mean, time: mean, time: mean, time: mean, time: mean,...","(radians, radians, m, m, m, degrees_north, degrees_east, C, degrees_north, degrees_east, 1, 1, %, %, %, %, , cm/day, %/day, m, m, m, m, W/m^2, W/m^2, W/m^2, cm/day, day of year, kg/m^2/s, kg/m^2/s...",seaIce
2,iceh.1900-03.nc,/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v13_ryf9091/output000/ice/OUTPUT/iceh.1900-03.nc,seaIce.1mon.d2:2.nc:5.ni:3600.nj:2700,1mon,"1900-03-01, 00:00:00","1900-04-01, 00:00:00","(ANGLE, ANGLET, HTE, HTN, NCAT, TLAT, TLON, Tsfc_m, ULAT, ULON, aice_m, aicen_m, alidf_ai_m, alidr_ai_m, alvdf_ai_m, alvdr_ai_m, blkmask, congel_m, divu_m, dxt, dxu, dyt, dyu, flatn_ai_m, fmeltt_a...","(angle grid makes with latitude line on U grid, angle grid makes with latitude line on T grid, T cell width on East side, T cell width on North side, category maximum thickness, T grid center lati...","(, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , )","(, , , , , , , time: mean, , , time: mean, time: mean, time: mean, time: mean, time: mean, time: mean, , time: mean, time: mean, , , , , time: mean, time: mean, time: mean, time: mean, time: mean,...","(radians, radians, m, m, m, degrees_north, degrees_east, C, degrees_north, degrees_east, 1, 1, %, %, %, %, , cm/day, %/day, m, m, m, m, W/m^2, W/m^2, W/m^2, cm/day, day of year, kg/m^2/s, kg/m^2/s...",seaIce
3,ocean.nc,/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v13_ryf9091/output000/ocean/ocean.nc,ocean.3mon.grid_xt_ocean:3600.grid_xu_ocean:3600.grid_yt_ocean:2700.grid_yu_ocean:2700.neutral:80.neutralrho_edges:81.nv:2.potrho:80.potrho_edges:81.st_edges_ocean:76.st_ocean:75.sw_edges_ocean:76...,3mon,"1900-01-01, 00:00:00","1900-04-01, 00:00:00","(age_global, average_DT, average_T1, average_T2, dzt, grid_xt_ocean, grid_xu_ocean, grid_yt_ocean, grid_yu_ocean, neutral, neutralrho_edges, nv, pot_rho_0, pot_temp, potrho, potrho_edges, rho, sal...","(Age (global), Length of average period, Start time for average period, End time for average period, t-cell thickness, tcell longitude, ucell longitude, tcell latitude, ucell latitude, neutral den...","(sea_water_age_since_surface_contact, , , , cell_thickness, , , , , , , , sea_water_potential_density, sea_water_potential_temperature, , , , sea_water_salinity, , , , , , , , , , ocean_mass_x_tra...","(time: mean, , , , time: mean, , , , , , , , time: mean, time: mean, , , time: mean, time: mean, , , , , time: mean, time: mean, time: mean, , , time: mean, time: mean, time: mean, time: mean, tim...","(yr, days, days since 1900-01-01 00:00:00, days since 1900-01-01 00:00:00, m, degrees_E, degrees_E, degrees_N, degrees_N, kg/m^3, kg/m^3, none, kg/m^3, degrees K, kg/m^3, kg/m^3, kg/m^3, psu, mete...",ocean
4,ocean_grid.nc,/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v13_ryf9091/output000/ocean/ocean_grid.nc,ocean.fx.xt_ocean:3600.xu_ocean:3600.yt_ocean:2700.yu_ocean:2700,fx,"1900-04-01, 00:00:00","1900-04-01, 00:00:00","(area_t, area_u, drag_coeff, dxt, dxu, dyt, dyu, geolat_c, geolat_t, geolon_c, geolon_t, ht, hu, kmt, kmu, time, xt_ocean, xu_ocean, yt_ocean, yu_ocean)","(tracer cell area, velocity cell area, Dimensionless bottom drag coefficient, ocean dxt on t-cells, ocean dxu on u-cells, ocean dyt on t-cells, ocean dyu on u-cells, uv latitude, tracer latitude, ...","(, , , , , , , , , , , sea_floor_depth_below_geoid, , , , , , , , )","(time: point, time: point, time: point, time: point, time: point, time: point, time: point, time: point, time: point, time: point, time: point, time: point, time: point, time: point, time: point, ...","(m^2, m^2, dimensionless, m, m, m, m, degrees_N, degrees_N, degrees_E, degrees_E, m, m, dimensionless, dimensionless, days since 1900-01-01 00:00:00, degrees_E, degrees_E, degrees_N, degrees_N)",ocean


## 5. Finding all variables in an experiment

You can get a list of all available variable names from an experiment with:


In [23]:
variables = catalog.search(name=experiment).unique().variable
print(variables)

['HTN', 'st_ocean', 'nv', 'mlt_onset_m', 'mld', 'grid_yu_ocean', 'total_ocean_lw_heat', 'grid_xt_ocean', 'dzt', 'eta_t', 'pme_net', 'kmt', 'sens_heat', 'potrho', 'total_ocean_swflx_vis', 'xu_ocean_sub01', 'total_ocean_melt', 'sfc_hflux_from_runoff', 'salt_global_ave', 'TLAT', 'total_ocean_fprec_melt_heat', 'alvdf_ai_m', 'average_T1', 'evap', 'yu_ocean_sub02', 'pbot_t', 'bih_fric_v', 'neutralrho_edges', 'yt_ocean_sub01', 'HTE', 'frazil_3d_int_z', 'fmeltt_ai_m', 'fprec_melt_heat', 'blkmask', 'vocn_m', 'temp_submeso', 'shear_m', 'alidf_ai_m', 'aicen_m', 'temp_surface_ave', 'tx_trans', 'eta_global', 'yu_ocean_sub01', 'total_ocean_hflux_prec', 'vhrho_nt', 'tx_trans_submeso', 'xu_ocean', 'salt_surface_ave', 'usurf', 'net_sfc_heating', 'total_ocean_hflux_evap', 'ULON', 'geolon_t', 'fswup_m', 'surface_salt', 'ke_tot', 'rho', 'river', 'ANGLE', 'temp_xflux_adv', 'vvel_m', 'pot_rho_1', 'sw_heat', 'yt_ocean', 'potrho_edges', 'total_ocean_runoff_heat', 'tx_trans_rho', 'xt_ocean', 'sfc_salt_flux_cou

### We could similarly filter for any of the keys in our catalog - see the intake dataframe catalog below


In [24]:
catalog.search(name=experiment)

Unnamed: 0_level_0,model,description,realm,frequency,variable
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
01deg_jra55v13_ryf9091,{ACCESS-OM2-01},{0.1 degree ACCESS-OM2 global model configuration with JRA55-do v1.3 RYF9091 repeat year forcing (May 1990 to Apr 1991)},"{ocean, seaIce}","{3mon, 3hr, 1day, fx, 1mon}","{HTN, st_ocean, nv, mlt_onset_m, mld, grid_yu_ocean, total_ocean_lw_heat, grid_xt_ocean, dzt, eta_t, pme_net, kmt, sens_heat, potrho, total_ocean_swflx_vis, xu_ocean_sub01, total_ocean_melt, sfc_h..."


In [25]:
catalog.search(name=experiment, realm = 'ocean')

Unnamed: 0_level_0,model,description,realm,frequency,variable
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
01deg_jra55v13_ryf9091,{ACCESS-OM2-01},{0.1 degree ACCESS-OM2 global model configuration with JRA55-do v1.3 RYF9091 repeat year forcing (May 1990 to Apr 1991)},{ocean},"{3mon, 3hr, 1day, fx, 1mon}","{st_ocean, nv, mld, grid_yu_ocean, total_ocean_lw_heat, grid_xt_ocean, dzt, eta_t, pme_net, kmt, sens_heat, potrho, total_ocean_swflx_vis, xu_ocean_sub01, total_ocean_melt, sfc_hflux_from_runoff, ..."


In [26]:
# Lets pull out all the unique frequencies, just like we did for variable above
catalog.search(name=experiment, realm='ocean').unique().frequency

['3mon', '3hr', '1day', 'fx', '1mon']

We could also open the experiment (using square brackets) to search for variables, frequencies, etc. - but this opens the datastore: see how the output of the cell below is displayed differently.

If we open the datastore: 
1. It is slower - opening datastores requires extra work
2. The items we can search on might change - the datastore below contains no model field, for example.

The opened datastore can contain extra information, eg. `variable_long_name` below - so sometimes you might want to open it to search the datastore. In general, try to use `catalog.search(name='xyz',...)` before you use `catalog['xyz'].search(...)`, though.

In [27]:
catalog[experiment]

Unnamed: 0,unique
filename,3469
path,11947
file_id,22
frequency,5
start_date,3361
end_date,3360
variable,205
variable_long_name,197
variable_standard_name,36
variable_cell_methods,3


For more information about the available variables, you can use the following command function - just copy and paste it in where you need it:

In [28]:
from intake_esm.utils import MinimalExploder

def get_detailed_variable_info(intake_catalog, experiment_name : str, variable : str | None = None) -> "pd.Dataframe":
    """
    Get detailed information about all the variables available in an experiment contained within the catalog.

    If a specific variable is passed, then the returned dataframe will be filtered to include only information
    about that variable

    Returns a pandas dataframe, reorganised to use the variable as the index.

    Parameters:
    -----------
    intake_catalog: 
        The variable holding the intake catalog. If you have opened the catalog using
        `cat = intake.cat.access_nri`, then `intake_catalog=cat`, etc.
    experiment_name: str
        The name of the experiment you are interested in. Eg. `experiment = "01deg_jra55v13_ryf9091"`
    variable: str | None
        If you want detailed information about just a single variable, then pass it here. For 
        example, if you only want information about potential temperature, pass `variable='pot_temp'`
    """


    expt_ds = intake_catalog[experiment_name]
    df = MinimalExploder(expt_ds.esmcat.pl_df)()

    df = df.unique('variable').sort('variable')
        
    df = df.to_pandas().set_index("variable")

    return df
    

To get detailed info about all variables:

In [29]:
df = get_detailed_variable_info(catalog, experiment)
df.head(10)

Unnamed: 0_level_0,filename,path,file_id,frequency,start_date,end_date,variable_long_name,variable_standard_name,variable_cell_methods,variable_units,realm
variable,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
ANGLE,iceh.1900-01.nc,/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v13_ryf9091/output000/ice/OUTPUT/iceh.1900-01.nc,seaIce.1mon.d2:2.nc:5.ni:3600.nj:2700,1mon,"1900-01-01, 00:00:00","1900-02-01, 00:00:00",angle grid makes with latitude line on U grid,,,radians,seaIce
ANGLET,iceh.1900-01.nc,/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v13_ryf9091/output000/ice/OUTPUT/iceh.1900-01.nc,seaIce.1mon.d2:2.nc:5.ni:3600.nj:2700,1mon,"1900-01-01, 00:00:00","1900-02-01, 00:00:00",angle grid makes with latitude line on T grid,,,radians,seaIce
HTE,iceh.1900-01.nc,/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v13_ryf9091/output000/ice/OUTPUT/iceh.1900-01.nc,seaIce.1mon.d2:2.nc:5.ni:3600.nj:2700,1mon,"1900-01-01, 00:00:00","1900-02-01, 00:00:00",T cell width on East side,,,m,seaIce
HTN,iceh.1900-01.nc,/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v13_ryf9091/output000/ice/OUTPUT/iceh.1900-01.nc,seaIce.1mon.d2:2.nc:5.ni:3600.nj:2700,1mon,"1900-01-01, 00:00:00","1900-02-01, 00:00:00",T cell width on North side,,,m,seaIce
NCAT,iceh.1900-01.nc,/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v13_ryf9091/output000/ice/OUTPUT/iceh.1900-01.nc,seaIce.1mon.d2:2.nc:5.ni:3600.nj:2700,1mon,"1900-01-01, 00:00:00","1900-02-01, 00:00:00",category maximum thickness,,,m,seaIce
TLAT,iceh.1900-01.nc,/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v13_ryf9091/output000/ice/OUTPUT/iceh.1900-01.nc,seaIce.1mon.d2:2.nc:5.ni:3600.nj:2700,1mon,"1900-01-01, 00:00:00","1900-02-01, 00:00:00",T grid center latitude,,,degrees_north,seaIce
TLON,iceh.1900-01.nc,/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v13_ryf9091/output000/ice/OUTPUT/iceh.1900-01.nc,seaIce.1mon.d2:2.nc:5.ni:3600.nj:2700,1mon,"1900-01-01, 00:00:00","1900-02-01, 00:00:00",T grid center longitude,,,degrees_east,seaIce
Tsfc_m,iceh.1900-01.nc,/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v13_ryf9091/output000/ice/OUTPUT/iceh.1900-01.nc,seaIce.1mon.d2:2.nc:5.ni:3600.nj:2700,1mon,"1900-01-01, 00:00:00","1900-02-01, 00:00:00",snow/ice surface temperature,,time: mean,C,seaIce
ULAT,iceh.1900-01.nc,/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v13_ryf9091/output000/ice/OUTPUT/iceh.1900-01.nc,seaIce.1mon.d2:2.nc:5.ni:3600.nj:2700,1mon,"1900-01-01, 00:00:00","1900-02-01, 00:00:00",U grid center latitude,,,degrees_north,seaIce
ULON,iceh.1900-01.nc,/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v13_ryf9091/output000/ice/OUTPUT/iceh.1900-01.nc,seaIce.1mon.d2:2.nc:5.ni:3600.nj:2700,1mon,"1900-01-01, 00:00:00","1900-02-01, 00:00:00",U grid center longitude,,,degrees_east,seaIce


Say we are only interested in zonal wind stress, `tau_x`:

In [30]:
df = get_detailed_variable_info(catalog, experiment, 'tau_x')
df.head(10)

Unnamed: 0_level_0,filename,path,file_id,frequency,start_date,end_date,variable_long_name,variable_standard_name,variable_cell_methods,variable_units,realm
variable,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
ANGLE,iceh.1900-01.nc,/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v13_ryf9091/output000/ice/OUTPUT/iceh.1900-01.nc,seaIce.1mon.d2:2.nc:5.ni:3600.nj:2700,1mon,"1900-01-01, 00:00:00","1900-02-01, 00:00:00",angle grid makes with latitude line on U grid,,,radians,seaIce
ANGLET,iceh.1900-01.nc,/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v13_ryf9091/output000/ice/OUTPUT/iceh.1900-01.nc,seaIce.1mon.d2:2.nc:5.ni:3600.nj:2700,1mon,"1900-01-01, 00:00:00","1900-02-01, 00:00:00",angle grid makes with latitude line on T grid,,,radians,seaIce
HTE,iceh.1900-01.nc,/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v13_ryf9091/output000/ice/OUTPUT/iceh.1900-01.nc,seaIce.1mon.d2:2.nc:5.ni:3600.nj:2700,1mon,"1900-01-01, 00:00:00","1900-02-01, 00:00:00",T cell width on East side,,,m,seaIce
HTN,iceh.1900-01.nc,/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v13_ryf9091/output000/ice/OUTPUT/iceh.1900-01.nc,seaIce.1mon.d2:2.nc:5.ni:3600.nj:2700,1mon,"1900-01-01, 00:00:00","1900-02-01, 00:00:00",T cell width on North side,,,m,seaIce
NCAT,iceh.1900-01.nc,/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v13_ryf9091/output000/ice/OUTPUT/iceh.1900-01.nc,seaIce.1mon.d2:2.nc:5.ni:3600.nj:2700,1mon,"1900-01-01, 00:00:00","1900-02-01, 00:00:00",category maximum thickness,,,m,seaIce
TLAT,iceh.1900-01.nc,/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v13_ryf9091/output000/ice/OUTPUT/iceh.1900-01.nc,seaIce.1mon.d2:2.nc:5.ni:3600.nj:2700,1mon,"1900-01-01, 00:00:00","1900-02-01, 00:00:00",T grid center latitude,,,degrees_north,seaIce
TLON,iceh.1900-01.nc,/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v13_ryf9091/output000/ice/OUTPUT/iceh.1900-01.nc,seaIce.1mon.d2:2.nc:5.ni:3600.nj:2700,1mon,"1900-01-01, 00:00:00","1900-02-01, 00:00:00",T grid center longitude,,,degrees_east,seaIce
Tsfc_m,iceh.1900-01.nc,/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v13_ryf9091/output000/ice/OUTPUT/iceh.1900-01.nc,seaIce.1mon.d2:2.nc:5.ni:3600.nj:2700,1mon,"1900-01-01, 00:00:00","1900-02-01, 00:00:00",snow/ice surface temperature,,time: mean,C,seaIce
ULAT,iceh.1900-01.nc,/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v13_ryf9091/output000/ice/OUTPUT/iceh.1900-01.nc,seaIce.1mon.d2:2.nc:5.ni:3600.nj:2700,1mon,"1900-01-01, 00:00:00","1900-02-01, 00:00:00",U grid center latitude,,,degrees_north,seaIce
ULON,iceh.1900-01.nc,/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v13_ryf9091/output000/ice/OUTPUT/iceh.1900-01.nc,seaIce.1mon.d2:2.nc:5.ni:3600.nj:2700,1mon,"1900-01-01, 00:00:00","1900-02-01, 00:00:00",U grid center longitude,,,degrees_east,seaIce


If you have any further questions after reading this notebook and the documentation linked from this notebook, please open an issue in the [ACCESS-NRI Intake catalog Github repo](https://github.com/ACCESS-NRI/access-nri-intake-catalog) or open topic on the [ACCESS-Hive forum](https://forum.access-hive.org.au/).

In [None]:
client.close()