Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proper way to handle failing preprocess output. #332

Open
jbusecke opened this issue Apr 9, 2021 · 2 comments
Open

Proper way to handle failing preprocess output. #332

jbusecke opened this issue Apr 9, 2021 · 2 comments
Labels
enhancement Issues that are found to be a reasonable candidate feature additions

Comments

@jbusecke
Copy link
Contributor

jbusecke commented Apr 9, 2021

I am encountering an issue with one dataset when loading many CMIP6 datasets using intake-esm (see #331).

I believe this is actually an issue with the raw data, but either way it got me curious if there is a way to handle the following scenario properly:

Lets say I have 2 dataset (ds_a,ds_b) in 2 different zarr stores and an appropriately set up intake-esm catalog.
Now I have some preprocessing function func.

func modifies something on each datasets, works fine on ds_a, but fails on ds_b.
Currently that will lead to a complete failure when reading in the full catalog with .to_datasets_dict().

Is there a way to simply exclude the failing dataset but continue to process only the ones that work? This would be very helpful to me.

EDIT: In further investigating this, it seems that in #331 the preprocessing is not even needed, but I guess this question can be phrased more generally: Is there a way to still output some datasets if errors are coming up for some of them?

@andersy005
Copy link
Member

@jbusecke,

Yes, I think this is doable. We could make this an optional setting that the user could opt-in. To be sure that this doesn't happen silently, we could raise a warning to let them know which keys failed.

Do you have suggestions on what the API would look like? I am imagining something along these lines:

col.to_dataset_dict(...., errors='ignore')

or

col.to_dataset_dict(...., skip_erroneous_datasets=True)

@andersy005 andersy005 added the enhancement Issues that are found to be a reasonable candidate feature additions label Apr 15, 2021
@jbusecke
Copy link
Contributor Author

How about skip_errors=True? Its a happy medium?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Issues that are found to be a reasonable candidate feature additions
Projects
None yet
Development

No branches or pull requests

2 participants