# [NCI fs38 catalog] what does `file_type = f` vs `l` mean and which is the correct one to use?

#### https://github.com/Thomas-Moore-Creative/ACDtools/issues/2

#### Date: 28 October, 2024

Author = {"name": "Thomas Moore", "affiliation": "CSIRO", "email": "thomas.moore@csiro.au", "orcid": "0000-0003-3930-1946"}

### filter warnings

In [1]:
import warnings
warnings.filterwarnings("ignore") # Suppress warnings

# Dask cluster

In [2]:
from dask.distributed import Client, LocalCluster
dask_settings = {'threads_per_worker': 1} # threads per worker set to deal with https://forum.access-hive.org.au/t/netcdf-not-a-valid-id-errors/389
# Start the Dask cluster with the settings by unpacking the dictionary using **
cluster = LocalCluster(**dask_settings) 
# Connect a client to the cluster
client = Client(cluster)
# Show some basic information about the cluster
print(f"Cluster started with {len(cluster.workers)} workers.")
print(f"Dashboard available at: {cluster.dashboard_link}")

Cluster started with 28 workers.
Dashboard available at: /proxy/8787/status


# Issue: [NCI fs38 catalog] what does `file_type = f` vs `l` mean and which is the correct one to use?
- https://github.com/Thomas-Moore-Creative/ACDtools/issues/2

##### Information on climate data catalogs across Australian HPC

**ACCESS-NRI** https://access-nri-intake-catalog.readthedocs.io/en/latest/usage/how.html <br>
**NCI** https://opus.nci.org.au/pages/viewpage.action?pageId=213713098


## import packages

In [3]:
import intake
import xarray as xr
import numpy as np

### import the ACCESS-NRI catalog

In [4]:
catalog = intake.cat.access_nri

## get catalog for `fs38` and apply search dictionaries for `piControl` & `intpp` examples

In [5]:
nri_catalog = intake.cat.access_nri

In [6]:
cmip6_fs38_datastore = nri_catalog.search(name='cmip6_fs38').to_source()
search_dict = {'experiment_id': ['piControl'],
      'source_id': 'ACCESS-ESM1-5',
      'variable_id': ['intpp'],
      'frequency': 'mon',
      'file_type': 'f'}
piControl_intpp_catalog_f = cmip6_fs38_datastore.search(**search_dict)
piControl_intpp_catalog_f

Unnamed: 0,unique
path,3
file_type,1
realm,1
frequency,1
table_id,1
project_id,1
institution_id,1
source_id,1
experiment_id,1
member_id,1


In [7]:
piControl_intpp_catalog_f.unique().path

['/g/data/fs38/publications/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/piControl/r1i1p1f1/Omon/intpp/gn/files/d20191112/intpp_Omon_ACCESS-ESM1-5_piControl_r1i1p1f1_gn_010101-060012.nc',
 '/g/data/fs38/publications/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/piControl/r1i1p1f1/Omon/intpp/gn/files/d20210316/intpp_Omon_ACCESS-ESM1-5_piControl_r1i1p1f1_gn_100101-110012.nc',
 '/g/data/fs38/publications/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/piControl/r1i1p1f1/Omon/intpp/gn/files/d20191214/intpp_Omon_ACCESS-ESM1-5_piControl_r1i1p1f1_gn_060101-100012.nc']

## compare to ls of `/g/data/fs38/publications/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/piControl/r1i1p1f1/Omon/intpp/gn/latest` 
```
(base) tm4888@gadi-login-03 /g/data/fs38/publications/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/piControl/r1i1p1f1/Omon/intpp/gn/latest ls -ltrh
total 38K
lrwxrwxrwx 1 fo3_esgfpub fs38 82 Apr 26  2021 intpp_Omon_ACCESS-ESM1-5_piControl_r1i1p1f1_gn_060101-100012.nc -> ../files/d20191214/intpp_Omon_ACCESS-ESM1-5_piControl_r1i1p1f1_gn_060101-100012.nc
lrwxrwxrwx 1 fo3_esgfpub fs38 82 Apr 26  2021 intpp_Omon_ACCESS-ESM1-5_piControl_r1i1p1f1_gn_010101-060012.nc -> ../files/d20191112/intpp_Omon_ACCESS-ESM1-5_piControl_r1i1p1f1_gn_010101-060012.nc
lrwxrwxrwx 1 fo3_esgfpub fs38 82 Apr 26  2021 intpp_Omon_ACCESS-ESM1-5_piControl_r1i1p1f1_gn_100101-110012.nc -> ../files/d20210316/intpp_Omon_ACCESS-ESM1-5_piControl_r1i1p1f1_gn_100101-110012.nc
```
### In this case `'file_type': 'f'` yeilds the correct result

In [8]:
cmip6_fs38_datastore = nri_catalog.search(name='cmip6_fs38').to_source()
search_dict = {'experiment_id': ['piControl'],
      'source_id': 'ACCESS-ESM1-5',
      'variable_id': ['intpp'],
      'frequency': 'mon',
      'file_type': 'l'}
piControl_intpp_catalog_l = cmip6_fs38_datastore.search(**search_dict)
piControl_intpp_catalog_l

Unnamed: 0,unique
path,6
file_type,1
realm,1
frequency,1
table_id,1
project_id,1
institution_id,1
source_id,1
experiment_id,1
member_id,1


In [9]:
piControl_intpp_catalog_l.unique().path

['/g/data/fs38/publications/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/piControl/r1i1p1f1/Omon/intpp/gn/v20210316/intpp_Omon_ACCESS-ESM1-5_piControl_r1i1p1f1_gn_100101-110012.nc',
 '/g/data/fs38/publications/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/piControl/r1i1p1f1/Omon/intpp/gn/v20191214/intpp_Omon_ACCESS-ESM1-5_piControl_r1i1p1f1_gn_060101-100012.nc',
 '/g/data/fs38/publications/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/piControl/r1i1p1f1/Omon/intpp/gn/v20210316/intpp_Omon_ACCESS-ESM1-5_piControl_r1i1p1f1_gn_060101-100012.nc',
 '/g/data/fs38/publications/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/piControl/r1i1p1f1/Omon/intpp/gn/v20191214/intpp_Omon_ACCESS-ESM1-5_piControl_r1i1p1f1_gn_010101-060012.nc',
 '/g/data/fs38/publications/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/piControl/r1i1p1f1/Omon/intpp/gn/v20191112/intpp_Omon_ACCESS-ESM1-5_piControl_r1i1p1f1_gn_010101-060012.nc',
 '/g/data/fs38/publications/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/piControl/r1i1p1f1/Omon/intpp/gn/v20210316/intpp_Omon_ACCESS-ESM1-5_piControl_r1i1p1f1_gn_010101-0600

## In this case `l` yields multiple file versions for the `010101-060012` and `060101-100012` timeframes

# Try `pr` variable in `historical`

In [10]:
cmip6_fs38_datastore = nri_catalog.search(name='cmip6_fs38').to_source()
search_dict = {'experiment_id': ['historical'],
      'source_id': 'ACCESS-ESM1-5',
      'variable_id': ['pr'],
      'frequency': 'mon',
      'file_type': 'l'}
historical_pr_catalog_l = cmip6_fs38_datastore.search(**search_dict)
historical_pr_catalog_l

Unnamed: 0,unique
path,40
file_type,1
realm,1
frequency,1
table_id,1
project_id,1
institution_id,1
source_id,1
experiment_id,1
member_id,40


# In this case above (`pr` & `historical` & `l`) 40 paths (for 40 members) are returned

In [11]:
cmip6_fs38_datastore = nri_catalog.search(name='cmip6_fs38').to_source()
search_dict = {'experiment_id': ['historical'],
      'source_id': 'ACCESS-ESM1-5',
      'variable_id': ['pr'],
      'frequency': 'mon',
      'file_type': 'f'}
historical_pr_catalog_f = cmip6_fs38_datastore.search(**search_dict)
historical_pr_catalog_f

Unnamed: 0,unique
path,40
file_type,1
realm,1
frequency,1
table_id,1
project_id,1
institution_id,1
source_id,1
experiment_id,1
member_id,40


# In this case above (`pr` & `historical` & `f`) 40 paths (for 40 members) are returned

# Are these two path lists the same?

In [12]:
path_list_f = historical_pr_catalog_f.unique().path
path_list_l = historical_pr_catalog_l.unique().path

In [13]:
if path_list_l == path_list_f:
    print("The path lists are exactly equal")
else:
    print("The path lists are not equal")

The path lists are not equal


### is this just a sorting issue? YES it is.

In [14]:
if path_list_l.sort() == path_list_f.sort():
    print("The path lists are exactly equal")
else:
    print("The path lists are not equal")

The path lists are exactly equal


# CLEX example
https://github.com/coecms/nci-intake-catalogue/blob/main/docs/intake_cmip6_demo.ipynb

In [15]:
cat = intake.cat.nci
list(cat)

['era5', 'era5_land', 'ecmwf', 'esgf', 'cosima', 'erai']

In [17]:
cmip6 = cat['esgf'].cmip6
cmip6.df.head()

Unnamed: 0,project,activity_id,institution_id,source_id,experiment_id,member_id,table_id,variable_id,grid_label,date_range,path,version
0,CMIP6,AerChemMIP,BCC,BCC-ESM1,histSST,r1i1p1f1,AERmon,o3,gn,185001-201412,/g/data/oi10/replicas/CMIP6/AerChemMIP/BCC/BCC-ESM1/histSST/r1i1p1f1/AERmon/o3/gn/v20190718/o3_AERmon_BCC-ESM1_histSST_r1i1p1f1_gn_185001-201412.nc,v20190718
1,CMIP6,AerChemMIP,BCC,BCC-ESM1,ssp370,r1i1p1f1,AERmon,ch4,gn,201501-205512,/g/data/oi10/replicas/CMIP6/AerChemMIP/BCC/BCC-ESM1/ssp370/r1i1p1f1/AERmon/ch4/gn/v20190718/ch4_AERmon_BCC-ESM1_ssp370_r1i1p1f1_gn_201501-205512.nc,v20190718
2,CMIP6,AerChemMIP,BCC,BCC-ESM1,ssp370,r1i1p1f1,AERmon,lossch4,gn,201501-205512,/g/data/oi10/replicas/CMIP6/AerChemMIP/BCC/BCC-ESM1/ssp370/r1i1p1f1/AERmon/lossch4/gn/v20190718/lossch4_AERmon_BCC-ESM1_ssp370_r1i1p1f1_gn_201501-205512.nc,v20190718
3,CMIP6,AerChemMIP,BCC,BCC-ESM1,ssp370,r1i1p1f1,AERmon,oh,gn,201501-205512,/g/data/oi10/replicas/CMIP6/AerChemMIP/BCC/BCC-ESM1/ssp370/r1i1p1f1/AERmon/oh/gn/v20190718/oh_AERmon_BCC-ESM1_ssp370_r1i1p1f1_gn_201501-205512.nc,v20190718
4,CMIP6,AerChemMIP,BCC,BCC-ESM1,ssp370,r1i1p1f1,Amon,fco2nat,gn,201501-205512,/g/data/oi10/replicas/CMIP6/AerChemMIP/BCC/BCC-ESM1/ssp370/r1i1p1f1/Amon/fco2nat/gn/v20190624/fco2nat_Amon_BCC-ESM1_ssp370_r1i1p1f1_gn_201501-205512.nc,v20190624


In [19]:
cmip6.description

"CMIP6 (Latest Versions)\n\nDatasets on Gadi, both publised and replicated. Only the latest available file versions are in the listing, see catalogue 'cmip6_all' for all available versions\n\nCatalogue columns match those used by ESGF search (esgf.nci.org.au). intake-esm dict keys are in the form '{esgf instance_id}'.\n\nProject: oi10, fs38\nMaintained By: NCI\nContact: help@nci.org.au\nReferences:\n    - https://pcmdi.llnl.gov/CMIP6/\n"

In [29]:
search_dict = {'experiment_id': ['piControl'],
      'source_id': 'ACCESS-ESM1-5',
      'variable_id': ['intpp']}
piControl_intpp_catalog_clex = cmip6.search(**search_dict)
piControl_intpp_catalog_clex

Unnamed: 0,unique
project,1
activity_id,1
institution_id,1
source_id,1
experiment_id,1
member_id,1
table_id,1
variable_id,1
grid_label,1
date_range,3


In [30]:
piControl_intpp_catalog_clex.unique()['path']

['/g/data/fs38/publications/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/piControl/r1i1p1f1/Omon/intpp/gn/v20210316/intpp_Omon_ACCESS-ESM1-5_piControl_r1i1p1f1_gn_010101-060012.nc',
 '/g/data/fs38/publications/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/piControl/r1i1p1f1/Omon/intpp/gn/v20210316/intpp_Omon_ACCESS-ESM1-5_piControl_r1i1p1f1_gn_060101-100012.nc',
 '/g/data/fs38/publications/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/piControl/r1i1p1f1/Omon/intpp/gn/v20210316/intpp_Omon_ACCESS-ESM1-5_piControl_r1i1p1f1_gn_100101-110012.nc']

In [31]:
search_dict = {'experiment_id': ['historical'],
      'source_id': 'ACCESS-ESM1-5',
      'variable_id': ['intpp']}
historical_intpp_catalog_clex = cmip6.search(**search_dict)
historical_intpp_catalog_clex

Unnamed: 0,unique
project,1
activity_id,1
institution_id,1
source_id,1
experiment_id,1
member_id,40
table_id,1
variable_id,1
grid_label,1
date_range,1


In [32]:
historical_intpp_catalog_clex.unique()['path']

['/g/data/fs38/publications/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/historical/r10i1p1f1/Omon/intpp/gn/v20200605/intpp_Omon_ACCESS-ESM1-5_historical_r10i1p1f1_gn_185001-201412.nc',
 '/g/data/fs38/publications/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/historical/r11i1p1f1/Omon/intpp/gn/v20200803/intpp_Omon_ACCESS-ESM1-5_historical_r11i1p1f1_gn_185001-201412.nc',
 '/g/data/fs38/publications/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/historical/r12i1p1f1/Omon/intpp/gn/v20200803/intpp_Omon_ACCESS-ESM1-5_historical_r12i1p1f1_gn_185001-201412.nc',
 '/g/data/fs38/publications/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/historical/r13i1p1f1/Omon/intpp/gn/v20200803/intpp_Omon_ACCESS-ESM1-5_historical_r13i1p1f1_gn_185001-201412.nc',
 '/g/data/fs38/publications/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/historical/r14i1p1f1/Omon/intpp/gn/v20200803/intpp_Omon_ACCESS-ESM1-5_historical_r14i1p1f1_gn_185001-201412.nc',
 '/g/data/fs38/publications/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/historical/r15i1p1f1/Omon/intpp/gn/v20200803/intpp_Omon_ACCESS-ESM1-5_historical_

# Comparison

In [33]:
piControl_intpp_catalog_f.unique().path

['/g/data/fs38/publications/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/piControl/r1i1p1f1/Omon/intpp/gn/files/d20191112/intpp_Omon_ACCESS-ESM1-5_piControl_r1i1p1f1_gn_010101-060012.nc',
 '/g/data/fs38/publications/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/piControl/r1i1p1f1/Omon/intpp/gn/files/d20210316/intpp_Omon_ACCESS-ESM1-5_piControl_r1i1p1f1_gn_100101-110012.nc',
 '/g/data/fs38/publications/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/piControl/r1i1p1f1/Omon/intpp/gn/files/d20191214/intpp_Omon_ACCESS-ESM1-5_piControl_r1i1p1f1_gn_060101-100012.nc']

In [34]:
piControl_intpp_catalog_clex.unique().path

['/g/data/fs38/publications/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/piControl/r1i1p1f1/Omon/intpp/gn/v20210316/intpp_Omon_ACCESS-ESM1-5_piControl_r1i1p1f1_gn_010101-060012.nc',
 '/g/data/fs38/publications/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/piControl/r1i1p1f1/Omon/intpp/gn/v20210316/intpp_Omon_ACCESS-ESM1-5_piControl_r1i1p1f1_gn_060101-100012.nc',
 '/g/data/fs38/publications/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/piControl/r1i1p1f1/Omon/intpp/gn/v20210316/intpp_Omon_ACCESS-ESM1-5_piControl_r1i1p1f1_gn_100101-110012.nc']

In [36]:
!readlink -f /g/data/fs38/publications/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/piControl/r1i1p1f1/Omon/intpp/gn/v20210316/intpp_Omon_ACCESS-ESM1-5_piControl_r1i1p1f1_gn_010101-060012.nc

/g/data/fs38/publications/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/piControl/r1i1p1f1/Omon/intpp/gn/files/d20191112/intpp_Omon_ACCESS-ESM1-5_piControl_r1i1p1f1_gn_010101-060012.nc


In [37]:
import os

def resolve_symlinks(paths):
    """
    Given a list of paths, this function returns a list of resolved actual paths.
    
    Parameters:
    - paths (list of str): A list of symbolic link paths to be resolved.
    
    Returns:
    - list of str: A list of resolved actual paths.
    """
    return [os.path.realpath(path) for path in paths]

In [38]:
resolve_symlinks(piControl_intpp_catalog_clex.unique().path)

['/g/data/fs38/publications/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/piControl/r1i1p1f1/Omon/intpp/gn/files/d20191112/intpp_Omon_ACCESS-ESM1-5_piControl_r1i1p1f1_gn_010101-060012.nc',
 '/g/data/fs38/publications/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/piControl/r1i1p1f1/Omon/intpp/gn/files/d20191214/intpp_Omon_ACCESS-ESM1-5_piControl_r1i1p1f1_gn_060101-100012.nc',
 '/g/data/fs38/publications/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/piControl/r1i1p1f1/Omon/intpp/gn/files/d20210316/intpp_Omon_ACCESS-ESM1-5_piControl_r1i1p1f1_gn_100101-110012.nc']

In [39]:
if resolve_symlinks(piControl_intpp_catalog_clex.unique().path).sort() == piControl_intpp_catalog_f.unique().path.sort():
    print("The path lists are exactly equal")
else:
    print("The path lists are not equal")

The path lists are exactly equal


In [41]:
cmip6_fs38_datastore = nri_catalog.search(name='cmip6_fs38').to_source()
search_dict = {'experiment_id': ['historical'],
      'source_id': 'ACCESS-ESM1-5',
      'variable_id': ['intpp'],
      'frequency': 'mon',
      'file_type': 'f'}
historical_intpp_catalog_f = cmip6_fs38_datastore.search(**search_dict)
historical_intpp_catalog_f

Unnamed: 0,unique
path,40
file_type,1
realm,1
frequency,1
table_id,1
project_id,1
institution_id,1
source_id,1
experiment_id,1
member_id,40


In [42]:
cmip6_fs38_datastore = nri_catalog.search(name='cmip6_fs38').to_source()
search_dict = {'experiment_id': ['historical'],
      'source_id': 'ACCESS-ESM1-5',
      'variable_id': ['intpp'],
      'frequency': 'mon',
      'file_type': 'l'}
historical_intpp_catalog_l = cmip6_fs38_datastore.search(**search_dict)
historical_intpp_catalog_l

Unnamed: 0,unique
path,40
file_type,1
realm,1
frequency,1
table_id,1
project_id,1
institution_id,1
source_id,1
experiment_id,1
member_id,40


In [43]:
if resolve_symlinks(historical_intpp_catalog_clex.unique().path).sort() == historical_intpp_catalog_f.unique().path.sort():
    print("The path lists are exactly equal")
else:
    print("The path lists are not equal")

The path lists are exactly equal


In [44]:
if resolve_symlinks(historical_intpp_catalog_clex.unique().path).sort() == historical_intpp_catalog_l.unique().path.sort():
    print("The path lists are exactly equal")
else:
    print("The path lists are not equal")

The path lists are exactly equal


# $THE$ $END$

### (1) "I know I want Australian CMIP6 data - so that's fs38 and I need access to that NCI project"

In [None]:
cmip6_fs38_datastore = catalog.search(name='cmip6_fs38').to_source()

### (2) "what are the realms covered by cmip6_fs38?"

In [None]:
report_esm_unique(cmip6_fs38_datastore,keep_list=['realm'])

### (3) I want to see what variables, over what frequencies, are available in both the 'ocean' & 'oceanBgchem' realms

In [None]:
cmip6_fs38_ocean_datastore = cmip6_fs38_datastore.search(realm=['ocean','ocnBgchem'])

In [None]:
[sorted_unique_dict, table_data] = report_esm_unique(cmip6_fs38_ocean_datastore,return_results=True)

## what is the long name of a particular variable?

In [None]:
var_name_info(cmip6_fs38_ocean_datastore,'intpp')

## filter catalog for final ACCESS-ESM1.5 dataset

In [None]:
final_search = cmip6_fs38_ocean_datastore.search(file_type='l',
                    variable_id='intpp',source_id='ACCESS-ESM1-5',experiment_id='historical')

In [None]:
report_esm_unique(final_search)

## what is the chunking of the files in this final_search catalog?

In [None]:
final_search.df['path'].iloc[0]

In [None]:
find_chunking_info(final_search,'intpp',return_results=False)

## load without specifying any chunking

In [None]:
%%time
ds_ESM15_esorted = load_ACCESS_ESM_ensemble(final_search)

In [None]:
ds_ESM15_esorted

#### One still needs to know what dimensions (1, 300, 360 ; ) refers to and something about MB size per chunk to set the time to 220 . . . these rules of thumb should be in the yaml settings file until much more complicated heuristics could be coded

In [None]:
%%time
ds_ESM15_esorted = load_ACCESS_ESM_ensemble(final_search,chunking_settings={'chunks':{'member':1,'time':220,'j':300,'i':360}})

In [None]:
ds_ESM15_esorted

In [None]:
%%time
ds_ESM15_esorted = load_ACCESS_ESM_ensemble(final_search,chunking_key='ACCESS_ESM15_2D')

In [None]:
ds_ESM15_esorted

In [None]:
ds_ESM15_esorted.isel(member=0).mean('time').intpp.plot()

## 3D dataset?

In [None]:
thetao_search = cmip6_fs38_ocean_datastore.search(file_type='l',
                    variable_id='thetao',source_id='ACCESS-ESM1-5',experiment_id='historical')

In [None]:
report_esm_unique(thetao_search)

In [None]:
find_chunking_info(thetao_search,'thetao',return_results=False)

In [None]:
find_chunking_info(thetao_search,'thetao',return_results=True)

In [None]:
xr.open_mfdataset('/g/data/fs38/publications/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/historical/r3i1p1f1/Omon/thetao/gn/v20191203/thetao_Omon_ACCESS-ESM1-5_historical_r3i1p1f1_gn_189001-189912.nc')

In [None]:
%%time
ds_ESM15_esorted = load_ACCESS_ESM_ensemble(thetao_search)

In [None]:
ds_ESM15_esorted

In [None]:
%%time
ds_ESM15_esorted = load_ACCESS_ESM_ensemble(thetao_search,chunking_key='ACCESS_ESM15_3D')

In [None]:
ds_ESM15_esorted

# let's use the tools as they exist to try to start the workflow

## I want Australian CMIP6 data

In [None]:
cmip6_fs38_datastore = load_cmip6_fs38_datastore()

In [None]:
report_esm_unique(cmip6_fs38_datastore.search(**load_config()['catalog_search_query_dict']['ACCESS_ESM15']['CSEPTA']))

In [None]:
CSEPTA_intpp_catalog = cmip6_fs38_datastore.search(**load_config()['catalog_search_query_dict']['ACCESS_ESM15']['CSEPTA'])

In [None]:
CSEPTA_intpp_catalog

In [None]:
show_methods(CSEPTA_intpp_catalog)

In [None]:
report_esm_unique(CSEPTA_intpp_catalog)

In [None]:
CSEPTA_intpp_catalog.unique()['path']

In [None]:
search_dict = dict(experiment_id = 'historical',source_id = 'ACCESS-ESM1-5',variable_id = ['intpp'],realm = ['ocnBgchem'], frequency = 'mon',file_type='f')

In [None]:
search = cmip6_fs38_datastore.search(**search_dict)
search

In [None]:
search.unique()['path']

In [None]:
CSEPTA_datatree = CSEPTA_intpp_catalog.to_datatree(index=["experiment_id"],progressbar=False)

In [None]:
# Iterate over the experiments in the datatree
for experiment_id, node in CSEPTA_datatree.items():
    # Access the dataset
    ds = node.ds
    print(f"Working with dataset for experiment: {experiment_id}")
    
    # Perform operations on the dataset, for example, print variable names
    print(ds.variables)