Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More intuitive exclusions #280

Merged
merged 3 commits into from Oct 31, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 1 addition & 0 deletions HISTORY.rst
Expand Up @@ -30,6 +30,7 @@ Bug fixes
* Fixed a bug in ``xs.search_data_catalogs`` when searching for fixed fields and specific experiments/members. (:pull:`251`).
* Fixed a bug in the documentation build configuration that prevented stable/latest and tagged documentation builds from resolving on ReadTheDocs. (:pull:`256`).
* Fixed ``get_warming_level`` to avoid incomplete matches. (:pull:`269`).
* `search_data_catalogs` now eliminates anything that matches any entry in `exclusions`. (:issue:`275`, :pull:`280`).

Internal changes
^^^^^^^^^^^^^^^^
Expand Down
6 changes: 4 additions & 2 deletions tests/test_extract.py
Expand Up @@ -28,9 +28,11 @@ def test_basic(self, variables_and_freqs, other_arg):
other_search_criteria={"experiment": ["ssp585"]}
if other_arg == "other"
else None,
exclusions={"member": "r2.*"} if other_arg == "exclusion" else None,
exclusions={"member": "r2.*", "domain": ["gr2"]}
if other_arg == "exclusion"
else None,
)
assert len(out) == 13 if other_arg is None else 2 if other_arg == "other" else 6
assert len(out) == 13 if other_arg is None else 2 if other_arg == "other" else 4

@pytest.mark.parametrize(
"periods, coverage_kwargs",
Expand Down
15 changes: 9 additions & 6 deletions xscen/extract.py
Expand Up @@ -616,7 +616,7 @@ def search_data_catalogs(
You can also pass 'require_all_on: list(columns_name)' in order to only return results that correspond to all other criteria across the listed columns.
More details available at https://intake-esm.readthedocs.io/en/stable/how-to/enforce-search-query-criteria-via-require-all-on.html .
exclusions : dict, optional
Same as other_search_criteria, but for eliminating results.
Same as other_search_criteria, but for eliminating results. Any result that matches any of the exclusions will be removed.
match_hist_and_fut: bool, optional
If True, historical and future simulations will be combined into the same line, and search results lacking one of them will be rejected.
periods : list
Expand Down Expand Up @@ -712,11 +712,14 @@ def search_data_catalogs(

# Cut entries that do not match search criteria
if exclusions:
ex = catalog.search(**exclusions)
catalog.esmcat._df = pd.concat([catalog.df, ex.df]).drop_duplicates(keep=False)
logger.info(
f"Removing {len(ex.df)} assets based on exclusion dict : {exclusions}."
)
for k in exclusions.keys():
ex = catalog.search(**{k: exclusions[k]})
catalog.esmcat._df = pd.concat([catalog.df, ex.df]).drop_duplicates(
keep=False
)
logger.info(
f"Removing {len(ex.df)} assets based on exclusion dict '{k}': {exclusions[k]}."
)
full_catalog = deepcopy(catalog) # Used for searching for fixed fields
if other_search_criteria:
catalog = catalog.search(**other_search_criteria)
Expand Down