I'm pleased to announce the release of `intake-esm` version 2020.3.16. This is a new release with bug fixes and new features. Everyone is invited to give it a try, and make their thoughts, suggestions, feedback known! This blogpost outlines these changes. Full changelog is available [here](https://intake-esm.readthedocs.io/en/latest/changelog.html#intake-esm-v2020-03-16).


On GitHub: https://github.com/NCAR/intake-esm

Documentation: https://intake-esm.readthedocs.io/

<!-- TEASER_END -->

## Installation

`Intake-esm` can be installed from PyPI with pip:

```bash
python -m pip intall intake-esm --upgrade
```

It is also available from conda-forge channel for conda isntallations:

```bash
conda install -c conda-forge intake-esm
```

## New Features


### Enhanced search: enforce query criteria via `require_all_on` argument

By default `intake-esm`'s `search()` method returns entries that fulfill **any of the criteria** specified in the query. Today `intake-esm` can return entries that fulfill **all query criteria** when the user supplies the `require_all_on` argument. The `require_all_on` parameter can be **a dataframe column** or **a list of dataframe columns** across which all elements must satisfy the query criteria.

The `require_all_on` argument is best explained with the following example.  Consider the `intake-esm` catalog for the CMIP6 data stored on Google Cloud Storage:


In [1]:
# Open collection for CMIP6 data hosted on Google Storage
import intake

url = "https://git.io/JvP9r"
col = intake.open_esm_datastore(url)
col

pangeo-cmip6-ESM Collection with 235624 entries:
	> 15 activity_id(s)

	> 32 institution_id(s)

	> 69 source_id(s)

	> 101 experiment_id(s)

	> 135 member_id(s)

	> 29 table_id(s)

	> 313 variable_id(s)

	> 10 grid_label(s)

	> 235624 zstore(s)

	> 60 dcpp_init_year(s)

Let's define a query for our collection that requests multiple `variable_ids` and multiple `experiment_ids` from the Omon `table_id`, all from 3 different `source_ids`:

In [2]:
# Define our query

query = dict(
    variable_id=["thetao", "o2"],
    experiment_id=["historical", "ssp245", "ssp585"],
    table_id=["Omon"],
    source_id=["ACCESS-ESM1-5", "AWI-CM-1-1-MR", "FGOALS-f3-L"],
)

Now, let's use this `query` to search for all assets in the collection that satisfy *any combination* of these requests (i.e., with `require_all_on=None`, which is the default):

In [3]:
col_subset = col.search(**query)

col_subset.df.groupby("source_id")[
    ["experiment_id", "variable_id", "table_id"]
].nunique()

Unnamed: 0_level_0,experiment_id,variable_id,table_id
source_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
ACCESS-ESM1-5,3,2,1
AWI-CM-1-1-MR,2,1,1
FGOALS-f3-L,1,1,1


As you can see, the search results above include `source_ids` for which we only have one of the two variables, and one or two of the three experiments.

We can tell `intake-esm` to discard any `source_id` that doesn't have *both* variables `["thetao", "o2"]` *and* all three experiments `["historical", "ssp245", "ssp585"]` by passing `require_all_on=["source_id"]` to the search method:


Next, let's search for assets that fulfill our query with `require_all_on=["source_id"]`:

In [4]:
col_subset = col.search(require_all_on=["source_id"], **query)
col_subset.df.groupby("source_id")[
    ["experiment_id", "variable_id", "table_id"]
].nunique()

Unnamed: 0_level_0,experiment_id,variable_id,table_id
source_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
ACCESS-ESM1-5,3,2,1


Notice that with the `require_all_on=["source_id"]` option, the only `source_id` that was returned by our query was the `source_id` for which all of the variables and experiments were found.


Thanks to [Julius Busecke](https://github.com/jbusecke) for proposing this feature and reviewing the implementation. 

### Single File Catalogs



The earlier version of [esm collection spec](https://github.com/NCAR/esm-collection-spec) required that the `catalog_file` entry in the input JSON file point to a CSV file. In some cases, it is useful to embed the content that would otherwise be in the CSV file in the input JSON file, itself. To support this use case, a `catalog_dict` entry was added to the esm collection spec (see [NCAR/esm-collection-spec#15](https://github.com/NCAR/esm-collection-spec/pull/15))

Example: 


`sample-collection.json`:
```json
{
    "esmcat_version":"0.1.0",
    "id":"aws-cesm1-le",
    "description":"This is an ESM collection for CESM1 Large Ensemble Zarr dataset publicly available on Amazon S3 (us-west-2 region)",
    "catalog_file": "sample-catalog.csv",
    "attributes":[
        { "column_name":"component", "vocabulary":""},
        { "column_name":"frequency", "vocabulary":""},
        { "column_name":"experiment", "vocabulary":""},
        { "column_name":"variable", "vocabulary": ""}
        ],
        
    "assets":{ "column_name":"path", "format":"zarr"},
    ...
 }

```

`sample-catalog.csv`:
```
component,frequency,experiment,variable,path
atm,daily,20C,FLNS,s3://ncar-cesm-lens/atm/daily/cesmLE-20C-FLNS.zarr
atm,daily,20C,FLNSC,s3://ncar-cesm-lens/atm/daily/cesmLE-20C-FLNSC.zarr
```
    
    
For the example above, we have both `sample-collection.json` and `sample-catalog.csv` files. We can put the content of the `sample-catalog.csv` in the `sample-collection.json` file as follows:
    


`sample-collection.json`:
```json
{
    "esmcat_version":"0.1.0",
    "id":"aws-cesm1-le",
    "description":"This is an ESM collection for CESM1 Large Ensemble Zarr dataset publicly available on Amazon S3 (us-west-2 region)",
    "catalog_dict":[
        {
            "component":"atm",
            "frequency":"daily",
            "experiment":"20C",
            "variable":"FLNS",
            "path":"s3://ncar-cesm-lens/atm/daily/cesmLE-20C-FLNS.zarr"
        },
        {
            "component":"atm",
            "frequency":"daily",
            "experiment":"20C",
            "variable":"FLNSC",
            "path":"s3://ncar-cesm-lens/atm/daily/cesmLE-20C-FLNSC.zarr"
        },
       ...
    ],
  "attributes":[
        { "column_name":"component", "vocabulary":""},
        { "column_name":"frequency", "vocabulary":""},
        { "column_name":"experiment", "vocabulary":""},
        { "column_name":"variable", "vocabulary": ""}
        ],
        
    "assets":{ "column_name":"path", "format":"zarr"},
    ...
    
}
```


And you can now see that the `catalog_file` entry has been replaced with the appropriate `catalog_dict` entry. Now, we have to keep track of a single file (`sample-collection.json`) which `intake-esm` can parse: 
 

```python
import intake
col = intake.open_esm_datastore("sample-collection.json")
```

Thanks to [Joe Hamman](https://github.com/jhamman) for proposing this feature and reviewing the implementation. Thanks to [Brian Bonnlander](https://github.com/bonnland) for implementing this feature. 

### Relative paths for catalog files

Fetching and loading catalog files in earlier version of `intake-esm` required using absolute paths/urls for the catalog file (CSV). 

For example:


- `old_sample.json`: 
```json
{
  "esmcat_version": "0.1.0",
  "id": "campaign-cesm2-cmip6-timeseries",
  "description": "ESM collection for the CESM2 raw output that went into CMIP6 data. Located in campaign storage, accessible via GLADE on casper",
  "catalog_file": "/glade/collections/cmip/catalog/intake-esm-datastore/catalogs/campaign-cesm2-cmip6-timeseries.csv.gz",
    ...
}
```
    

Today  the `catalog_file` can point to a full path or a path relative to the input JSON file path:


- `new_sample.json`:
```json
{
  "esmcat_version": "0.1.0",
  "id": "campaign-cesm2-cmip6-timeseries",
  "description": "ESM collection for the CESM2 raw output that went into CMIP6 data. Located in campaign storage, accessible via GLADE on casper",
  "catalog_file": "campaign-cesm2-cmip6-timeseries.csv.gz",
    ...
}
```

## Acknowledgements

The following people contributed to the [NCAR/intake-esm](https://github.com/NCAR/intake-esm), [NCAR/esm-collection-spec](https://github.com/NCAR/esm-collection-spec) repositories since `intake-esm` release `2019.12.13` on December 13th, 2019:


- Anderson Banihirwe
- Brian Bonnlander
- Joe Hamman
- Julius Busecke