Minor fixes following ESPO-R5's diagnostics #106

aulemahal · 2022-10-17T21:37:34Z

Pull Request Checklist:

pre-commit hooks are installed/active in my local clone ($ pre-commit install)
This PR addresses an already opened issue (for bug fixes / features)
- This PR fixes #xyz
(If applicable) Documentation has been added / updated (for bug fixes / features)
If a merge request has been made in parallel to this PR in xscen-notebooks, it is merged and the submodules have been updated.
HISTORY.rst has been updated (with summary of main changes)
- Link to issue (:issue:number) and pull request (:pull:number) has been added

What kind of change does this PR introduce?

New "timeout_cleanup" option for save_to_zarr, which removes variables that were in the process of being written when receiving a TimeoutException. (:pull:106).
New scripting.skippable context, allowing the use of CTRL-C to skip code sections. (:pull:106)
- properties_and_measures no longer casts month coordinates to string (:pull:106).
Fixed parse_config. There is a very special option that logs what args were parsed from the config. The log calls have been fixed,
Added a bit more information in the "insufficient coverage" log warning.

Does this PR introduce a breaking change?

No.

Other information:

These changes are the ones I used to help me run my "properties and diagnostics" script for ESPO-R5. Forget the branch name it is meaningless.

…ose insufficient coverage

RondeauG

Looks good! Minor comments.

RondeauG · 2022-10-18T13:35:18Z

xscen/diagnostics.py

@@ -244,8 +244,6 @@ def properties_and_measures(
        # to be able to save in zarr, convert object to string
        if "season" in ds1:
            ds1["season"] = ds1.season.astype("str")
-        if "month" in ds1:
-            ds1["month"] = ds1.month.astype("str")


@juliettelavoie Would that break the change you are implementing in your PR?

see my comment below

xscen/io.py

juliettelavoie · 2022-10-18T13:22:22Z

xscen/diagnostics.py

@@ -244,8 +244,6 @@ def properties_and_measures(
        # to be able to save in zarr, convert object to string
        if "season" in ds1:
            ds1["season"] = ds1.season.astype("str")
-        if "month" in ds1:


Not sure why you removed month but not season ?

Also, I am not 100% sure, but I think the new code I added in save_to_zarr in PR#100 might fix the problem I was trying to fix here.
The bit of code:

# to address this issue https://github.com/pydata/xarray/issues/3476 for v in list(ds.coords.keys()): if ds.coords[v].dtype == object: ds[v].encoding.clear()

Hum, on my side month is an int! This line would make my script fail, because I was creating the ref properties with a string month coordinate, than passing it to compute measures of sim or scen. It was raising an error above, when trying to compare both datasets, because at that point, sim's properties are still using integers.

If the new code fixes it, it looks less invasive, that's good!

oh ok! mais oui je pense que ça va être correct avec le nouveau code.

xscen/extract.py

juliettelavoie · 2022-10-18T14:01:48Z

xscen/io.py

+            return ds.to_zarr(
+                filename, compute=compute, mode="a", encoding=encoding, **zarr_kwargs
+            )
+        except TimeoutException:


I don't mind keeping that but it doesn't seem that useful to me, if you use a catalog..
If you follow the order [not exist_in_catlog, xscen function, save_to_zarr, update_catalog], it doesn't matter that the file is half written, it will be overwritten.

Also, we use timeout to get out of dask being frozen. Will it even be able to do the removal or will it keep being frozen ?

In my case the usefulness comes from the fact that I am appending to the zarr dataset. I think I am indeed not using the catalog to it's full potential. But my workflow creates the full dataset than writes it to zarr and it skips all variables already existing. This timeout cleanup ensures partially written variables are being rewritten on the second pass.

AFAIU, within the except TimeoutException: the fact that dask hangs is not important. The shutil calls do not call dask, so the files are removed even if any dask object are still "linked" to them. My usecase looks like:

for dataset in catalog: with Timeout(): with dask.Client(): out = do_things(dataset) save_to_zarr(out)

Thus, on Timeout files are first removed, then the Client is killed and its object are garbage collected and then we go to the next iteration and we restart the counter and recreate a new client.

A small advantage of timeout_cleanup in your case is that before you re-run your workflow to patch the partially written files, the zarr are still openable. Without the cleanup, you would get broken datasets.
This again in the case you are appending to an existing dataset or when you are writing by iterating over each variables.

okok I understand that this way makes sens when you write a lot of indicators in one zarr.

aulemahal added 3 commits October 17, 2022 17:26

Timeout cleanup - skippable context - minor fix to config - more verb…

599ac81

…ose insufficient coverage

Merge branch 'main' into propmeas-combo

aadbf4f

upd hist

be436d7

aulemahal requested review from RondeauG and juliettelavoie October 17, 2022 21:37

RondeauG approved these changes Oct 18, 2022

View reviewed changes

juliettelavoie approved these changes Oct 18, 2022

View reviewed changes

aulemahal added 2 commits October 18, 2022 11:27

Insufficient cov details - timeout_cleanup by default

710a4f0

Merge branch 'main' into propmeas-combo

b091f30

aulemahal merged commit b53bb9d into main Nov 1, 2022

aulemahal deleted the propmeas-combo branch November 1, 2022 21:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minor fixes following ESPO-R5's diagnostics #106

Minor fixes following ESPO-R5's diagnostics #106

aulemahal commented Oct 17, 2022 •

edited

RondeauG left a comment

RondeauG Oct 18, 2022

juliettelavoie Oct 18, 2022

juliettelavoie Oct 18, 2022

aulemahal Oct 18, 2022

juliettelavoie Oct 18, 2022

juliettelavoie Oct 18, 2022

juliettelavoie Oct 18, 2022

aulemahal Oct 18, 2022

aulemahal Oct 18, 2022

juliettelavoie Oct 18, 2022

Minor fixes following ESPO-R5's diagnostics #106

Minor fixes following ESPO-R5's diagnostics #106

Conversation

aulemahal commented Oct 17, 2022 • edited

Pull Request Checklist:

What kind of change does this PR introduce?

Does this PR introduce a breaking change?

Other information:

RondeauG left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aulemahal commented Oct 17, 2022 •

edited