Proper way to handle failing `preprocess` output. #332

jbusecke · 2021-04-09T14:57:25Z

I am encountering an issue with one dataset when loading many CMIP6 datasets using intake-esm (see #331).

I believe this is actually an issue with the raw data, but either way it got me curious if there is a way to handle the following scenario properly:

Lets say I have 2 dataset (ds_a,ds_b) in 2 different zarr stores and an appropriately set up intake-esm catalog.
Now I have some preprocessing function func.

func modifies something on each datasets, works fine on ds_a, but fails on ds_b.
Currently that will lead to a complete failure when reading in the full catalog with .to_datasets_dict().

Is there a way to simply exclude the failing dataset but continue to process only the ones that work? This would be very helpful to me.

EDIT: In further investigating this, it seems that in #331 the preprocessing is not even needed, but I guess this question can be phrased more generally: Is there a way to still output some datasets if errors are coming up for some of them?

The text was updated successfully, but these errors were encountered:

andersy005 · 2021-04-15T14:50:03Z

@jbusecke,

Yes, I think this is doable. We could make this an optional setting that the user could opt-in. To be sure that this doesn't happen silently, we could raise a warning to let them know which keys failed.

Do you have suggestions on what the API would look like? I am imagining something along these lines:

col.to_dataset_dict(...., errors='ignore')

or

col.to_dataset_dict(...., skip_erroneous_datasets=True)

jbusecke · 2021-04-15T15:02:35Z

How about skip_errors=True? Its a happy medium?

andersy005 added the enhancement Issues that are found to be a reasonable candidate feature additions label Apr 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proper way to handle failing `preprocess` output. #332

Proper way to handle failing `preprocess` output. #332

jbusecke commented Apr 9, 2021

andersy005 commented Apr 15, 2021

jbusecke commented Apr 15, 2021

Proper way to handle failing preprocess output. #332

Proper way to handle failing preprocess output. #332

Comments

jbusecke commented Apr 9, 2021

andersy005 commented Apr 15, 2021

jbusecke commented Apr 15, 2021

Proper way to handle failing `preprocess` output. #332

Proper way to handle failing `preprocess` output. #332