# Introduction to the ACCESS-NRI Intake catalog

**Aims**: This tutorial will introduce the ACCESS-NRI Intake catalog and show you how to use it to find and load model data for analysis

**Project membership requirements**:

 - If using the `xp65` conda environment: `xp65`, `dk92`, `fs38`, `p73`, `ik11` and `oi10`
 - If using the `hh5` conda environment: as above but also `hh5`


The ACCESS-NRI Intake Catalog is curated by ACCESS-NRI with further information on its <a href="https://access-nri-intake-catalog.readthedocs.io/en/latest/index.html" target="_blank">documentation website</a> and <a href="https://access-hive.org.au/model_evaluation/data/model_catalogs/" target="_blank">this ACCESS-Hive page</a>.

----

# Exercise 1

In [None]:
import intake

In [None]:
catalog = intake.cat.access_nri

View the catalog

In [None]:
catalog

Search the catalog by any of its columns, e.g.

```python
catalog.search(model='ACCESS-OM2')
catalog.search(model=..., frequency=..., variable=...)
```

Select a single datastore using it's <b>name</b>, e.g.

```python
datastore = catalog['01deg_jra...']
```

If your search returns a single experiment:

```python
datastore = catalog.search(name=...).to_source()
```


View the datastore

```python
datastore
```
and

```python
datastore.df
```

Find a variable

```python
datastore.search(variable=..., frequency=...)
```
search by the columns in the datastore.

Refine your search to reach <b>1 dataset</b>, using `.keys()` and `.keys_info()` to assist

# Exercise 2

Save your search returning one dataset

```python
search=datastore.search(...)
```

(If you need to catch up, you could use:
```python
search = catalog.search(name='025deg_jra55_iaf_omip2_cycle6', variable = 'sst', frequency='1mon')
```
)

Start a dask cluster

In [None]:
from dask.distributed import Client

In [None]:
client = Client(threads_per_worker=1)

In [None]:
client

Open your dataset

```python
ds = search.to_dask()
```

and check `ds` contains the variable you expect

Try a search which returns more than one variable:
``` python
search = catalog['01deg_jra55v140_iaf'].search(variable=['temp_surface_ave', 'salt_surface_ave'])
search
```

Can we open that with `to_dask()` ?

Repeat for 

```python
search = catalog['01deg_jra55v140_iaf'].search(variable=['surface_salt', 'surface_temp'], frequency='1mon')
search
```

Use `search.keys()` and `search.keys_info()` if needed

For the second search, we can combine after opening:


``` python
import xarray as xr
search = catalog['01deg_jra55v140_iaf'].search(variable=['surface_salt', 'surface_temp'], frequency='1mon')
ds_dict = search.to_dataset_dict()
ds_dict
ds = xr.merge(
    ds_dict.values(), 
)
```

Try a search with more than one experiment
```python
search = catalog.search(name='01deg_jra55v140_iaf_cycle4.*')
search
```

With more than one experiment, we need to use `.to_source_dict()`
```python
datastore_dict = catalog.search(name='01deg_jra55v140_iaf_cycle4.*').to_source_dict()

This returns a dictionary or intake-esm datastores, we can open them in a loop:

```python

dataset_dict = {
    name: datastore.search(variable="temp_surface_ave").to_dask()
    for name, datastore in datastore_dict.items()
}
```

We could just merge them:

```python
ds = xr.merge(
    dataset_dict.values(),
)
```

but it's slow due to the number of file operations incurred. One of the issues is that by default, this opens each chunk in the source files individually. Instead try this with the chunks argument:

```python
dataset_dict = {
    name: datastore.search(variable="temp_surface_ave").to_dask(
        xarray_open_kwargs={'chunks':{'time':-1}}
    )
    for name, datastore in datastore_dict.items()
}

ds = xr.merge(dataset_dict.values())
```

This method, means that the chunks from each netcdf file are all loaded together. Inspect the resulting dataset, what are the chunksizes? What defines the chunksizes now?

Search for some high resolution daily data, e.g.

```python
search = catalog['01deg_jra55v140_iaf'].search(variable='temp_surface_ave')
```

or 

```python
search = catalog['01deg_jra55v140_iaf'].search(variable='surface_temp', frequency='1day')
```

Run `search.to_dask()` on your search, and inspect the resulting variable. How many chunks are there, and how big are they ? Can you reduce the number of chunks using `search.to_dask(xarray_open_kwargs={'chunks':{...}}`

In [None]:
client.close()