# [NCI fs38 catalog] what does `file_type = f` vs `l` mean and which is the correct one to use?

#### https://github.com/Thomas-Moore-Creative/ACDtools/issues/2

#### Date: 28 October, 2024

Author = {"name": "Thomas Moore", "affiliation": "CSIRO", "email": "thomas.moore@csiro.au", "orcid": "0000-0003-3930-1946"}

### filter warnings

In [1]:
import warnings
warnings.filterwarnings("ignore") # Suppress warnings

# Dask cluster

In [2]:
from dask.distributed import Client, LocalCluster
dask_settings = {'threads_per_worker': 1} # threads per worker set to deal with https://forum.access-hive.org.au/t/netcdf-not-a-valid-id-errors/389
# Start the Dask cluster with the settings by unpacking the dictionary using **
cluster = LocalCluster(**dask_settings) 
# Connect a client to the cluster
client = Client(cluster)
# Show some basic information about the cluster
print(f"Cluster started with {len(cluster.workers)} workers.")
print(f"Dashboard available at: {cluster.dashboard_link}")

Cluster started with 28 workers.
Dashboard available at: /proxy/8787/status


# Issue: [NCI fs38 catalog] what does `file_type = f` vs `l` mean and which is the correct one to use?
- https://github.com/Thomas-Moore-Creative/ACDtools/issues/2

##### Information on climate data catalogs across Australian HPC

**ACCESS-NRI** https://access-nri-intake-catalog.readthedocs.io/en/latest/usage/how.html <br>
**NCI** https://opus.nci.org.au/pages/viewpage.action?pageId=213713098


## import packages

In [3]:
import intake
import xarray as xr
import numpy as np

## get catalog for `fs38` and apply search dictionaries for `piControl` & `intpp` examples

In [4]:
nri_catalog = intake.cat.access_nri

In [5]:
cmip6_fs38_datastore = nri_catalog.search(name='cmip6_fs38').to_source()
search_dict = {'experiment_id': ['piControl'],
      'source_id': 'ACCESS-ESM1-5',
      'variable_id': ['intpp'],
      'frequency': 'mon',
      'file_type': 'f'}
piControl_intpp_catalog_f = cmip6_fs38_datastore.search(**search_dict)
piControl_intpp_catalog_f

Unnamed: 0,unique
path,3
file_type,1
realm,1
frequency,1
table_id,1
project_id,1
institution_id,1
source_id,1
experiment_id,1
member_id,1


In [6]:
piControl_intpp_catalog_f.unique().path

['/g/data/fs38/publications/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/piControl/r1i1p1f1/Omon/intpp/gn/files/d20191112/intpp_Omon_ACCESS-ESM1-5_piControl_r1i1p1f1_gn_010101-060012.nc',
 '/g/data/fs38/publications/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/piControl/r1i1p1f1/Omon/intpp/gn/files/d20210316/intpp_Omon_ACCESS-ESM1-5_piControl_r1i1p1f1_gn_100101-110012.nc',
 '/g/data/fs38/publications/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/piControl/r1i1p1f1/Omon/intpp/gn/files/d20191214/intpp_Omon_ACCESS-ESM1-5_piControl_r1i1p1f1_gn_060101-100012.nc']

## compare to ls of `/g/data/fs38/publications/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/piControl/r1i1p1f1/Omon/intpp/gn/latest` 
```
(base) tm4888@gadi-login-03 /g/data/fs38/publications/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/piControl/r1i1p1f1/Omon/intpp/gn/latest ls -ltrh
total 38K
lrwxrwxrwx 1 fo3_esgfpub fs38 82 Apr 26  2021 intpp_Omon_ACCESS-ESM1-5_piControl_r1i1p1f1_gn_060101-100012.nc -> ../files/d20191214/intpp_Omon_ACCESS-ESM1-5_piControl_r1i1p1f1_gn_060101-100012.nc
lrwxrwxrwx 1 fo3_esgfpub fs38 82 Apr 26  2021 intpp_Omon_ACCESS-ESM1-5_piControl_r1i1p1f1_gn_010101-060012.nc -> ../files/d20191112/intpp_Omon_ACCESS-ESM1-5_piControl_r1i1p1f1_gn_010101-060012.nc
lrwxrwxrwx 1 fo3_esgfpub fs38 82 Apr 26  2021 intpp_Omon_ACCESS-ESM1-5_piControl_r1i1p1f1_gn_100101-110012.nc -> ../files/d20210316/intpp_Omon_ACCESS-ESM1-5_piControl_r1i1p1f1_gn_100101-110012.nc
```
### In this case `'file_type': 'f'` yeilds the correct result

In [7]:
cmip6_fs38_datastore = nri_catalog.search(name='cmip6_fs38').to_source()
search_dict = {'experiment_id': ['piControl'],
      'source_id': 'ACCESS-ESM1-5',
      'variable_id': ['intpp'],
      'frequency': 'mon',
      'file_type': 'l'}
piControl_intpp_catalog_l = cmip6_fs38_datastore.search(**search_dict)
piControl_intpp_catalog_l

Unnamed: 0,unique
path,6
file_type,1
realm,1
frequency,1
table_id,1
project_id,1
institution_id,1
source_id,1
experiment_id,1
member_id,1


In [8]:
piControl_intpp_catalog_l.unique().path

['/g/data/fs38/publications/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/piControl/r1i1p1f1/Omon/intpp/gn/v20210316/intpp_Omon_ACCESS-ESM1-5_piControl_r1i1p1f1_gn_100101-110012.nc',
 '/g/data/fs38/publications/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/piControl/r1i1p1f1/Omon/intpp/gn/v20191214/intpp_Omon_ACCESS-ESM1-5_piControl_r1i1p1f1_gn_060101-100012.nc',
 '/g/data/fs38/publications/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/piControl/r1i1p1f1/Omon/intpp/gn/v20210316/intpp_Omon_ACCESS-ESM1-5_piControl_r1i1p1f1_gn_060101-100012.nc',
 '/g/data/fs38/publications/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/piControl/r1i1p1f1/Omon/intpp/gn/v20191214/intpp_Omon_ACCESS-ESM1-5_piControl_r1i1p1f1_gn_010101-060012.nc',
 '/g/data/fs38/publications/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/piControl/r1i1p1f1/Omon/intpp/gn/v20191112/intpp_Omon_ACCESS-ESM1-5_piControl_r1i1p1f1_gn_010101-060012.nc',
 '/g/data/fs38/publications/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/piControl/r1i1p1f1/Omon/intpp/gn/v20210316/intpp_Omon_ACCESS-ESM1-5_piControl_r1i1p1f1_gn_010101-0600

## In this case `l` yields multiple file versions for the `010101-060012` and `060101-100012` timeframes

# Try `pr` variable in `historical`

In [9]:
cmip6_fs38_datastore = nri_catalog.search(name='cmip6_fs38').to_source()
search_dict = {'experiment_id': ['historical'],
      'source_id': 'ACCESS-ESM1-5',
      'variable_id': ['pr'],
      'frequency': 'mon',
      'file_type': 'l'}
historical_pr_catalog_l = cmip6_fs38_datastore.search(**search_dict)
historical_pr_catalog_l

Unnamed: 0,unique
path,40
file_type,1
realm,1
frequency,1
table_id,1
project_id,1
institution_id,1
source_id,1
experiment_id,1
member_id,40


# In this case above (`pr` & `historical` & `l`) 40 paths (for 40 members) are returned

In [10]:
cmip6_fs38_datastore = nri_catalog.search(name='cmip6_fs38').to_source()
search_dict = {'experiment_id': ['historical'],
      'source_id': 'ACCESS-ESM1-5',
      'variable_id': ['pr'],
      'frequency': 'mon',
      'file_type': 'f'}
historical_pr_catalog_f = cmip6_fs38_datastore.search(**search_dict)
historical_pr_catalog_f

Unnamed: 0,unique
path,40
file_type,1
realm,1
frequency,1
table_id,1
project_id,1
institution_id,1
source_id,1
experiment_id,1
member_id,40


# In this case above (`pr` & `historical` & `f`) 40 paths (for 40 members) are returned

# Are these two path lists the same?

In [11]:
path_list_f = historical_pr_catalog_f.unique().path
path_list_l = historical_pr_catalog_l.unique().path

In [12]:
if path_list_l == path_list_f:
    print("The path lists are exactly equal")
else:
    print("The path lists are not equal")

The path lists are not equal


### is this just a sorting issue? YES it is.

In [13]:
if path_list_l.sort() == path_list_f.sort():
    print("The path lists are exactly equal")
else:
    print("The path lists are not equal")

The path lists are exactly equal


# CLEX example
https://github.com/coecms/nci-intake-catalogue/blob/main/docs/intake_cmip6_demo.ipynb

In [14]:
cat = intake.cat.nci
list(cat)

['era5', 'era5_land', 'ecmwf', 'esgf', 'cosima', 'erai']

In [15]:
cmip6 = cat['esgf'].cmip6
cmip6.df.head()

Unnamed: 0,project,activity_id,institution_id,source_id,experiment_id,member_id,table_id,variable_id,grid_label,date_range,path,version
0,CMIP6,AerChemMIP,BCC,BCC-ESM1,histSST,r1i1p1f1,AERmon,o3,gn,185001-201412,/g/data/oi10/replicas/CMIP6/AerChemMIP/BCC/BCC-ESM1/histSST/r1i1p1f1/AERmon/o3/gn/v20190718/o3_AERmon_BCC-ESM1_histSST_r1i1p1f1_gn_185001-201412.nc,v20190718
1,CMIP6,AerChemMIP,BCC,BCC-ESM1,ssp370,r1i1p1f1,AERmon,ch4,gn,201501-205512,/g/data/oi10/replicas/CMIP6/AerChemMIP/BCC/BCC-ESM1/ssp370/r1i1p1f1/AERmon/ch4/gn/v20190718/ch4_AERmon_BCC-ESM1_ssp370_r1i1p1f1_gn_201501-205512.nc,v20190718
2,CMIP6,AerChemMIP,BCC,BCC-ESM1,ssp370,r1i1p1f1,AERmon,lossch4,gn,201501-205512,/g/data/oi10/replicas/CMIP6/AerChemMIP/BCC/BCC-ESM1/ssp370/r1i1p1f1/AERmon/lossch4/gn/v20190718/lossch4_AERmon_BCC-ESM1_ssp370_r1i1p1f1_gn_201501-205512.nc,v20190718
3,CMIP6,AerChemMIP,BCC,BCC-ESM1,ssp370,r1i1p1f1,AERmon,oh,gn,201501-205512,/g/data/oi10/replicas/CMIP6/AerChemMIP/BCC/BCC-ESM1/ssp370/r1i1p1f1/AERmon/oh/gn/v20190718/oh_AERmon_BCC-ESM1_ssp370_r1i1p1f1_gn_201501-205512.nc,v20190718
4,CMIP6,AerChemMIP,BCC,BCC-ESM1,ssp370,r1i1p1f1,Amon,fco2nat,gn,201501-205512,/g/data/oi10/replicas/CMIP6/AerChemMIP/BCC/BCC-ESM1/ssp370/r1i1p1f1/Amon/fco2nat/gn/v20190624/fco2nat_Amon_BCC-ESM1_ssp370_r1i1p1f1_gn_201501-205512.nc,v20190624


In [16]:
cmip6.description

"CMIP6 (Latest Versions)\n\nDatasets on Gadi, both publised and replicated. Only the latest available file versions are in the listing, see catalogue 'cmip6_all' for all available versions\n\nCatalogue columns match those used by ESGF search (esgf.nci.org.au). intake-esm dict keys are in the form '{esgf instance_id}'.\n\nProject: oi10, fs38\nMaintained By: NCI\nContact: help@nci.org.au\nReferences:\n    - https://pcmdi.llnl.gov/CMIP6/\n"

In [17]:
search_dict = {'experiment_id': ['piControl'],
      'source_id': 'ACCESS-ESM1-5',
      'variable_id': ['intpp']}
piControl_intpp_catalog_clex = cmip6.search(**search_dict)
piControl_intpp_catalog_clex

Unnamed: 0,unique
project,1
activity_id,1
institution_id,1
source_id,1
experiment_id,1
member_id,1
table_id,1
variable_id,1
grid_label,1
date_range,3


In [18]:
piControl_intpp_catalog_clex.unique()['path']

['/g/data/fs38/publications/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/piControl/r1i1p1f1/Omon/intpp/gn/v20210316/intpp_Omon_ACCESS-ESM1-5_piControl_r1i1p1f1_gn_010101-060012.nc',
 '/g/data/fs38/publications/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/piControl/r1i1p1f1/Omon/intpp/gn/v20210316/intpp_Omon_ACCESS-ESM1-5_piControl_r1i1p1f1_gn_060101-100012.nc',
 '/g/data/fs38/publications/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/piControl/r1i1p1f1/Omon/intpp/gn/v20210316/intpp_Omon_ACCESS-ESM1-5_piControl_r1i1p1f1_gn_100101-110012.nc']

In [19]:
search_dict = {'experiment_id': ['historical'],
      'source_id': 'ACCESS-ESM1-5',
      'variable_id': ['intpp']}
historical_intpp_catalog_clex = cmip6.search(**search_dict)
historical_intpp_catalog_clex

Unnamed: 0,unique
project,1
activity_id,1
institution_id,1
source_id,1
experiment_id,1
member_id,40
table_id,1
variable_id,1
grid_label,1
date_range,1


In [20]:
historical_intpp_catalog_clex.unique()['path']

['/g/data/fs38/publications/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/historical/r10i1p1f1/Omon/intpp/gn/v20200605/intpp_Omon_ACCESS-ESM1-5_historical_r10i1p1f1_gn_185001-201412.nc',
 '/g/data/fs38/publications/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/historical/r11i1p1f1/Omon/intpp/gn/v20200803/intpp_Omon_ACCESS-ESM1-5_historical_r11i1p1f1_gn_185001-201412.nc',
 '/g/data/fs38/publications/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/historical/r12i1p1f1/Omon/intpp/gn/v20200803/intpp_Omon_ACCESS-ESM1-5_historical_r12i1p1f1_gn_185001-201412.nc',
 '/g/data/fs38/publications/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/historical/r13i1p1f1/Omon/intpp/gn/v20200803/intpp_Omon_ACCESS-ESM1-5_historical_r13i1p1f1_gn_185001-201412.nc',
 '/g/data/fs38/publications/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/historical/r14i1p1f1/Omon/intpp/gn/v20200803/intpp_Omon_ACCESS-ESM1-5_historical_r14i1p1f1_gn_185001-201412.nc',
 '/g/data/fs38/publications/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/historical/r15i1p1f1/Omon/intpp/gn/v20200803/intpp_Omon_ACCESS-ESM1-5_historical_

# Comparison

In [21]:
piControl_intpp_catalog_f.unique().path

['/g/data/fs38/publications/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/piControl/r1i1p1f1/Omon/intpp/gn/files/d20191112/intpp_Omon_ACCESS-ESM1-5_piControl_r1i1p1f1_gn_010101-060012.nc',
 '/g/data/fs38/publications/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/piControl/r1i1p1f1/Omon/intpp/gn/files/d20210316/intpp_Omon_ACCESS-ESM1-5_piControl_r1i1p1f1_gn_100101-110012.nc',
 '/g/data/fs38/publications/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/piControl/r1i1p1f1/Omon/intpp/gn/files/d20191214/intpp_Omon_ACCESS-ESM1-5_piControl_r1i1p1f1_gn_060101-100012.nc']

In [22]:
piControl_intpp_catalog_clex.unique().path

['/g/data/fs38/publications/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/piControl/r1i1p1f1/Omon/intpp/gn/v20210316/intpp_Omon_ACCESS-ESM1-5_piControl_r1i1p1f1_gn_010101-060012.nc',
 '/g/data/fs38/publications/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/piControl/r1i1p1f1/Omon/intpp/gn/v20210316/intpp_Omon_ACCESS-ESM1-5_piControl_r1i1p1f1_gn_060101-100012.nc',
 '/g/data/fs38/publications/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/piControl/r1i1p1f1/Omon/intpp/gn/v20210316/intpp_Omon_ACCESS-ESM1-5_piControl_r1i1p1f1_gn_100101-110012.nc']

In [23]:
!readlink -f /g/data/fs38/publications/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/piControl/r1i1p1f1/Omon/intpp/gn/v20210316/intpp_Omon_ACCESS-ESM1-5_piControl_r1i1p1f1_gn_010101-060012.nc

/g/data/fs38/publications/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/piControl/r1i1p1f1/Omon/intpp/gn/files/d20191112/intpp_Omon_ACCESS-ESM1-5_piControl_r1i1p1f1_gn_010101-060012.nc


In [24]:
import os

def resolve_symlinks(paths):
    """
    Given a list of paths, this function returns a list of resolved actual paths.
    
    Parameters:
    - paths (list of str): A list of symbolic link paths to be resolved.
    
    Returns:
    - list of str: A list of resolved actual paths.
    """
    return [os.path.realpath(path) for path in paths]

In [25]:
resolve_symlinks(piControl_intpp_catalog_clex.unique().path)

['/g/data/fs38/publications/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/piControl/r1i1p1f1/Omon/intpp/gn/files/d20191112/intpp_Omon_ACCESS-ESM1-5_piControl_r1i1p1f1_gn_010101-060012.nc',
 '/g/data/fs38/publications/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/piControl/r1i1p1f1/Omon/intpp/gn/files/d20191214/intpp_Omon_ACCESS-ESM1-5_piControl_r1i1p1f1_gn_060101-100012.nc',
 '/g/data/fs38/publications/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/piControl/r1i1p1f1/Omon/intpp/gn/files/d20210316/intpp_Omon_ACCESS-ESM1-5_piControl_r1i1p1f1_gn_100101-110012.nc']

In [26]:
if resolve_symlinks(piControl_intpp_catalog_clex.unique().path).sort() == piControl_intpp_catalog_f.unique().path.sort():
    print("The path lists are exactly equal")
else:
    print("The path lists are not equal")

The path lists are exactly equal


In [27]:
cmip6_fs38_datastore = nri_catalog.search(name='cmip6_fs38').to_source()
search_dict = {'experiment_id': ['historical'],
      'source_id': 'ACCESS-ESM1-5',
      'variable_id': ['intpp'],
      'frequency': 'mon',
      'file_type': 'f'}
historical_intpp_catalog_f = cmip6_fs38_datastore.search(**search_dict)
historical_intpp_catalog_f

Unnamed: 0,unique
path,40
file_type,1
realm,1
frequency,1
table_id,1
project_id,1
institution_id,1
source_id,1
experiment_id,1
member_id,40


In [28]:
cmip6_fs38_datastore = nri_catalog.search(name='cmip6_fs38').to_source()
search_dict = {'experiment_id': ['historical'],
      'source_id': 'ACCESS-ESM1-5',
      'variable_id': ['intpp'],
      'frequency': 'mon',
      'file_type': 'l'}
historical_intpp_catalog_l = cmip6_fs38_datastore.search(**search_dict)
historical_intpp_catalog_l

Unnamed: 0,unique
path,40
file_type,1
realm,1
frequency,1
table_id,1
project_id,1
institution_id,1
source_id,1
experiment_id,1
member_id,40


In [29]:
if resolve_symlinks(historical_intpp_catalog_clex.unique().path).sort() == historical_intpp_catalog_f.unique().path.sort():
    print("The path lists are exactly equal")
else:
    print("The path lists are not equal")

The path lists are exactly equal


In [30]:
if resolve_symlinks(historical_intpp_catalog_clex.unique().path).sort() == historical_intpp_catalog_l.unique().path.sort():
    print("The path lists are exactly equal")
else:
    print("The path lists are not equal")

The path lists are exactly equal


# Rui's email from "[NCI] (HELP-198258) Meaning of file_type for CMIP6 intake catalog" 

In [32]:
cmip6 = intake.open_esm_datastore("/g/data/dk92/catalog/v2/esm/cmip6-fs38/catalog.json")
search_dict = {'experiment_id': ['piControl'],
      'source_id': 'ACCESS-ESM1-5',
      'variable_id': ['intpp'],
      'frequency': 'mon',}
subset=cmip6.search(**search_dict)

In [33]:
subset.keys()

['f.CMIP.CSIRO.ACCESS-ESM1-5.piControl.r1i1p1f1.mon.ocnBgchem.Omon.intpp.gn.d20191112',
 'f.CMIP.CSIRO.ACCESS-ESM1-5.piControl.r1i1p1f1.mon.ocnBgchem.Omon.intpp.gn.d20191214',
 'f.CMIP.CSIRO.ACCESS-ESM1-5.piControl.r1i1p1f1.mon.ocnBgchem.Omon.intpp.gn.d20210316',
 'l.CMIP.CSIRO.ACCESS-ESM1-5.piControl.r1i1p1f1.mon.ocnBgchem.Omon.intpp.gn.v20191112',
 'l.CMIP.CSIRO.ACCESS-ESM1-5.piControl.r1i1p1f1.mon.ocnBgchem.Omon.intpp.gn.v20191214',
 'l.CMIP.CSIRO.ACCESS-ESM1-5.piControl.r1i1p1f1.mon.ocnBgchem.Omon.intpp.gn.v20210316']

In [34]:
for key in subset.keys():
    print(key)
    print(subset[key].df[['time_range','version','file_type']])

f.CMIP.CSIRO.ACCESS-ESM1-5.piControl.r1i1p1f1.mon.ocnBgchem.Omon.intpp.gn.d20191112
      time_range    version file_type
0  010101-060012  d20191112         f
f.CMIP.CSIRO.ACCESS-ESM1-5.piControl.r1i1p1f1.mon.ocnBgchem.Omon.intpp.gn.d20191214
      time_range    version file_type
0  060101-100012  d20191214         f
f.CMIP.CSIRO.ACCESS-ESM1-5.piControl.r1i1p1f1.mon.ocnBgchem.Omon.intpp.gn.d20210316
      time_range    version file_type
0  100101-110012  d20210316         f
l.CMIP.CSIRO.ACCESS-ESM1-5.piControl.r1i1p1f1.mon.ocnBgchem.Omon.intpp.gn.v20191112
      time_range    version file_type
0  010101-060012  v20191112         l
l.CMIP.CSIRO.ACCESS-ESM1-5.piControl.r1i1p1f1.mon.ocnBgchem.Omon.intpp.gn.v20191214
      time_range    version file_type
0  060101-100012  v20191214         l
1  010101-060012  v20191214         l
l.CMIP.CSIRO.ACCESS-ESM1-5.piControl.r1i1p1f1.mon.ocnBgchem.Omon.intpp.gn.v20210316
      time_range    version file_type
0  100101-110012  v20210316         l
1 

# suggested approach for 02 Dec 2024 ACCESS-ESM1.5

In [36]:
clex_esgf_cat = intake.cat.nci['esgf']
list(clex_esgf_cat)

['cmip5',
 'cmip5_all',
 'cmip5_gr1p5',
 'cmip6',
 'cmip6_all',
 'cmip6_gr1p5',
 'cordex',
 'cordex_all']

In [40]:
clex_esgf_cat._entries

{'cmip5': name: cmip5
 container: xarray
 plugin: ['esm_datastore']
 driver: ['esm_datastore']
 description: CMIP5 (Latest Versions)
 
 Datasets on Gadi, both publised and replicated. Only the latest available file versions are in the listing, see catalogue 'cmip5_all' for all available versions
 
 Catalogue columns match those used by ESGF search (esgf.nci.org.au). intake-esm dict keys are in the form '{esgf instance_id}.{variable}'. Columns 'model_id' and 'institution_id' mirror the non-'_id' columns, with minor formatting differences needed to create the 'instance_id'.
 
 Project: al33, rr3
 Maintained By: NCI
 Contact: help@nci.org.au
 References:
     - https://pcmdi.llnl.gov/mips/cmip5/
 
 direct_access: forbid
 user_parameters: []
 metadata: 
 args: 
   obj: {{CATALOG_DIR}}/cmip5/catalogue_latest.json,
 'cmip5_all': name: cmip5_all
 container: xarray
 plugin: ['esm_datastore']
 driver: ['esm_datastore']
 description: CMIP5 (All Versions)
 
 Datasets on Gadi, both publised and re

In [41]:
clex_cmip6_cat = clex_esgf_cat['cmip6']

In [45]:
clex_cmip6_cat.df.head()

Unnamed: 0,project,activity_id,institution_id,source_id,experiment_id,member_id,table_id,variable_id,grid_label,date_range,path,version
0,CMIP6,AerChemMIP,BCC,BCC-ESM1,histSST,r1i1p1f1,AERmon,o3,gn,185001-201412,/g/data/oi10/replicas/CMIP6/AerChemMIP/BCC/BCC-ESM1/histSST/r1i1p1f1/AERmon/o3/gn/v20190718/o3_AERmon_BCC-ESM1_histSST_r1i1p1f1_gn_185001-201412.nc,v20190718
1,CMIP6,AerChemMIP,BCC,BCC-ESM1,ssp370,r1i1p1f1,AERmon,ch4,gn,201501-205512,/g/data/oi10/replicas/CMIP6/AerChemMIP/BCC/BCC-ESM1/ssp370/r1i1p1f1/AERmon/ch4/gn/v20190718/ch4_AERmon_BCC-ESM1_ssp370_r1i1p1f1_gn_201501-205512.nc,v20190718
2,CMIP6,AerChemMIP,BCC,BCC-ESM1,ssp370,r1i1p1f1,AERmon,lossch4,gn,201501-205512,/g/data/oi10/replicas/CMIP6/AerChemMIP/BCC/BCC-ESM1/ssp370/r1i1p1f1/AERmon/lossch4/gn/v20190718/lossch4_AERmon_BCC-ESM1_ssp370_r1i1p1f1_gn_201501-205512.nc,v20190718
3,CMIP6,AerChemMIP,BCC,BCC-ESM1,ssp370,r1i1p1f1,AERmon,oh,gn,201501-205512,/g/data/oi10/replicas/CMIP6/AerChemMIP/BCC/BCC-ESM1/ssp370/r1i1p1f1/AERmon/oh/gn/v20190718/oh_AERmon_BCC-ESM1_ssp370_r1i1p1f1_gn_201501-205512.nc,v20190718
4,CMIP6,AerChemMIP,BCC,BCC-ESM1,ssp370,r1i1p1f1,Amon,fco2nat,gn,201501-205512,/g/data/oi10/replicas/CMIP6/AerChemMIP/BCC/BCC-ESM1/ssp370/r1i1p1f1/Amon/fco2nat/gn/v20190624/fco2nat_Amon_BCC-ESM1_ssp370_r1i1p1f1_gn_201501-205512.nc,v20190624


In [47]:
search_dict = {'experiment_id': ['piControl'],
      'source_id': 'ACCESS-ESM1-5',
      'variable_id': ['intpp']}
search = clex_cmip6_cat.search(**search_dict)

In [48]:
search.df.head()

Unnamed: 0,project,activity_id,institution_id,source_id,experiment_id,member_id,table_id,variable_id,grid_label,date_range,path,version
0,CMIP6,CMIP,CSIRO,ACCESS-ESM1-5,piControl,r1i1p1f1,Omon,intpp,gn,010101-060012,/g/data/fs38/publications/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/piControl/r1i1p1f1/Omon/intpp/gn/v20210316/intpp_Omon_ACCESS-ESM1-5_piControl_r1i1p1f1_gn_010101-060012.nc,v20210316
1,CMIP6,CMIP,CSIRO,ACCESS-ESM1-5,piControl,r1i1p1f1,Omon,intpp,gn,060101-100012,/g/data/fs38/publications/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/piControl/r1i1p1f1/Omon/intpp/gn/v20210316/intpp_Omon_ACCESS-ESM1-5_piControl_r1i1p1f1_gn_060101-100012.nc,v20210316
2,CMIP6,CMIP,CSIRO,ACCESS-ESM1-5,piControl,r1i1p1f1,Omon,intpp,gn,100101-110012,/g/data/fs38/publications/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/piControl/r1i1p1f1/Omon/intpp/gn/v20210316/intpp_Omon_ACCESS-ESM1-5_piControl_r1i1p1f1_gn_100101-110012.nc,v20210316


# $THE$ $END$