Slight refactoring of diag `galytska23/select_variables_for_tigramite.py` for generality and portability (for Changelog v2.10: authors: @valeriupredoi and @egalytska) #3298

valeriupredoi · 2023-07-18T15:27:43Z

Description

@egalytska and myself are trying to refactor galytska23/select_variables_for_tigramite.py diagnostic so it is:

more readable
allows for more variables to be added and handled
Closes Further improvement of the diagnostic select_variables_for_tigramite.py #3256
Link to documentation: to be added at final version run

Before you get started

☝ Create an issue to discuss what you are going to do

Checklist

It is the responsibility of the author to make sure the pull request is ready to review. The icons indicate whether the item will be subject to the 🛠 Technical or 🧪 Scientific review.

🛠 This pull request has a descriptive title
🛠 Code is written according to the code quality guidelines
🛠 Documentation is available
🛠 Tests run successfully
🛠 All checks below this pull request were successful

New or updated recipe/diagnostic

🧪 Recipe runs successfully
🧪 Recipe is well documented
🧪 Figure(s) and data look as expected from literature
🛠 Provenance information has been added

valeriupredoi · 2023-07-19T13:58:01Z

@egalytska I have slightly refactored the diag, and made some variable names changes (code variables, not actual analysis variables eg "key" -> "dataset" etc - so that the code reads better), so that the thing is more readable, also simplified some functionality, and, more importantly, I pulled all your references to specific variables out of the bellows of code, so that they can be set at a higher level. I have not tested it, but I don't expect it to fail, or to produce different results, since I've kept the functionality as it was before. Note the variables dictionaries at the very top - you can pull those out and make them recipe options now, so you allow contributors to add/remove variables etc. If you are happy with these changes, please take it from here and tune it more 👍

esmvaltool/diag_scripts/galytska23/select_variables_for_tigramite.py

egalytska

Hi @valeriupredoi
Thanks a lot for the amazing suggestions.
Unfortunately, the code crashed (please see the comments), but I suggested some alternatives.
Now the test run was successful.
But Levante is super slow today, so the full run takes forever (to compare the results).
P.S. This is my first review, I hope I did not mess things up.

valeriupredoi · 2023-07-20T14:33:57Z

excellent review work @egalytska 🎖️ Please commit the suggested changes straight to the branch here - me not having run the actual thing was not aware of the subtleties of previous implementation - this way we can still keep the higher degree of generality, and it also works (which is rather important 😁 ) 🍺

Also, I'll leave it up to you how to bring forth those variables and expose them to the API outside the code (if you think that's useful - I reckon so) ie put them straight into the recipe, or create a configuration file for the diag etc. Let me know if I can help in any way 👍

valeriupredoi · 2023-08-02T15:17:07Z

hi @egalytska you got a bit of time operate the changes you requested - no worries for time, people are mostly on holidays around this time, will go on hols myself in a couple weeks 🏖️

egalytska · 2023-08-03T06:14:18Z

Hi @valeriupredoi
I just came back from vacation myself. Planning to fix this next days.

…ite.py

egalytska · 2023-08-07T11:40:25Z

Hi @valeriupredoi , I did several tests to check if the output is the same. All looks good. Should I now go for "Ready for review"?

valeriupredoi · 2023-08-07T15:16:29Z

stellar work @egalytska - many thanks for your changes and checks, I made this RfR, and if you are happy with it, please approve, so we can get it in 🍺

egalytska

Hi @valeriupredoi. I am approving this.
Though there is an issue: for a few days this diagnostics is incredibly slow for me in Levante. And I am concerned whether its the fault of Levante, my diagnostics, or smth in ESMVAlTool?
To be on a safe side, I tested my original diagnostics (before its improvements) and it is now also incredibly slow (which it was not before).
So I wanted to ask whether you could do a test run (just with a few models for a short time, and without cvdp), so we could hopefully track down the reason.

valeriupredoi · 2023-08-08T16:07:18Z

Many thanks, Evgenia! I will, tomorrow - not too fussed about it since it's most probably Levante acting up, but a test won't hurt 👍

remi-kazeroni · 2023-08-09T11:16:49Z

Hi @valeriupredoi. I am approving this. Though there is an issue: for a few days this diagnostics is incredibly slow for me in Levante. And I am concerned whether its the fault of Levante, my diagnostics, or smth in ESMVAlTool? To be on a safe side, I tested my original diagnostics (before its improvements) and it is now also incredibly slow (which it was not before). So I wanted to ask whether you could do a test run (just with a few models for a short time, and without cvdp), so we could hopefully track down the reason.

Could this be related to the new Dask configuration? recipe_galytska23jgr was run successfully for the release of v2.9 and it took about 2h on a Levante compute node (see this table). Maybe this is one of the recipes for which the default configuration leads to longer runtimes?

valeriupredoi · 2023-08-09T15:04:20Z

cheers @remi-kazeroni - running a minitest now myself; don't think it's the Dask issue since @egalytska could run fine this very recipe using this branch until the very last time she tested, unless she's updated her env fully to bring in the new Core and iris for the last run. Think it's ready to merge, would you have a minute to have a quick high-level look at this PR, Remi, pls mate? 🍺

valeriupredoi · 2023-08-10T11:30:29Z

OK my minitest is indeed taking for ages, bit it's the same with the latest main - the problem is the preprocessing and not the diagnostic (it didn't even reach diag running) - it seems almost 100% that @remi-kazeroni is correct about the new Dask+iris being the roadblock here, so this can be merged (I would literally eat my hat if the changes we made here did indeed slow down the show). @egalytska when you tested this PR and it ran fast enough for you, do you remember which esmvalcore/iris version did you use? At any rate, I am very happy with this PR (from a technical standpoint, no clue about the science, but that's Evgenia's land) 🍺

egalytska · 2023-08-10T12:41:40Z

So my environment has ESMValCore v2.9.0 and iris 3.6.0 and I did not update the environment between the "fast" runs and "slow" runs. I think the runs became this slow for me after I pulled the main branch into my local branch. Does it make sense?

valeriupredoi · 2023-08-10T13:35:31Z

So my environment has ESMValCore v2.9.0 and iris 3.6.0 and I did not update the environment between the "fast" runs and "slow" runs. I think the runs became this slow for me after I pulled the main branch into my local branch. Does it make sense?

cheers @egalytska - only Thor knows, it could be the file system is just slow on Levante since a while ago, it could be the shared node is busier than normal, all manners of causes, defo not this poor branch here 😁

remi-kazeroni · 2023-08-10T15:32:33Z

@egalytska and @valeriupredoi, I have just run successfully recipe_galytska23jgr on a Levante compute node using the latest main branches for Tool and Core and got the same runtime and memory consumption as we had for the testing of v2.9. I'm not saying the performance is "optimal", but simply that this has not changed since the release of version v2.9. I will make another test using the updated diagnostic from this branch. If the performance is still unchanged, this can be merged.

For reference, we have a utility script that produces batch scripts for DKRZ-Levante with parameters automatically tweaked for each recipe. This is what I have just used again to get something like:

#!/bin/bash -l

#SBATCH --job-name=recipe_galytska23jgr.%J
#SBATCH --output=recipe_galytska23jgr.%J.out
#SBATCH --error=recipe_galytska23jgr.%J.err
#SBATCH --account=bdXXXX
#SBATCH --partition=compute
#SBATCH --time=04:00:00
#SBATCH --mem=0

set -eo pipefail
unset PYTHONPATH

. PATH_TO/mambaforge/etc/profile.d/conda.sh
conda activate my_env

esmvaltool run recipe_galytska23jgr.yml

valeriupredoi · 2023-08-10T15:44:06Z

thanks, Remi, breathing a statistics-induced sigh of relief 😁

remi-kazeroni · 2023-08-11T07:42:38Z

Running the recipe with this branch leads to very similar runtime (Time for running the recipe was: 2:05:49.256957), so nothing to worry about. I agree with @valeriupredoi, it's good to see that there is no further performance decrease since v2.9.0 release.

I understand from @egalytska 's comments that the recipe is slower than what it used to be with earlier versions of the Core. But this is part of a larger issue related to the ongoing daskification that has some side-effects in terms of performance...

This PR is ready to be merged imo

remi-kazeroni

Thanks for your contribution @valeriupredoi and @egalytska! I could run the recipe successfully and the time series produced look identical to that of results from v2.9 testing 👍

valeriupredoi · 2023-08-11T10:37:26Z

brilliant, very many thanks @remi-kazeroni for spending time test it 🍺 And many thanks to @egalytska for the work here too 🍺

@valeriupredoi

…_RTW * recipe_test_workflow_prototype: (237 commits) Add version to dataset in python example recipe to avoid "Unknown file format" issue on JASMIN (#3322) CMORizer for NASA MERRA reanalysis (#3039) Add `OBS-maintainers` team to documentation on OBS data maintenance and CMORizer reviews (#3335) Fixed provenance tracking for NCL multipanel PNGs (#3332) Cmorizer for NOAA-CIRES-20CR v3 reanalysis (clt, clwvi, hus, prw, rlut, rlutcs, rsut, rsutcs) (#3137) [Condalock] Update Linux condalock file (#3321) Slight refactoring of diag `galytska23/select_variables_for_tigramite.py` for generality and portability (for Changelog v2.10: authors: @valeriupredoi and @egalytska) (#3298) Removed recipe_carvalhais14nat from list of broken recipes (#3319) add Romain Beucher to CITATION as contributor (#3318) update `mamba` version in readthedocs configuration docs builds (#3310) [Github Actions] Compress all bash shell setters into one default option per workflow (#3315) [condalock] update conda lock creation Github Action workflow and ship updated (bot-generated) conda-lock file (#3307) Allow NCL unit conversion `kg s-1` -> `GtC y-1` (#3300) Add list of failing recipes for v2.9.0 release (#3294) Update diag_shapeselect.py to work with shapely v2 (#3283) Update release schedule after release of v2.9.0 (#3289) Add merge instructions to release instructions (#3292) Made sklearn test backwards-compatible with sklearn < 1.3 (#3285) Add release notes for v2.9 (#3266) Add release notes for v2.9 (#3266) ...

@valeriupredoi

….py` for generality and portability (for Changelog v2.10: authors: @valeriupredoi and @egalytska) (#3298) Co-authored-by: Evgenia Galytska <egalytska@iup.physik.uni-bremen.de>

first set of simplifications

d423cde

valeriupredoi added enhancement diagnostic labels Jul 18, 2023

valeriupredoi added this to the v2.10.0 milestone Jul 18, 2023

valeriupredoi requested a review from egalytska July 18, 2023 15:27

typo

968d3d1

valeriupredoi mentioned this pull request Jul 18, 2023

Further improvement of the diagnostic select_variables_for_tigramite.py #3256

Closed

valeriupredoi added 3 commits July 18, 2023 17:18

second round of refactoring

b17f0aa

yapf it

ee91009

make the plotting func more readable

e1ae770

egalytska reviewed Jul 20, 2023

View reviewed changes

esmvaltool/diag_scripts/galytska23/select_variables_for_tigramite.py Outdated Show resolved Hide resolved

egalytska reviewed Jul 20, 2023

View reviewed changes

esmvaltool/diag_scripts/galytska23/select_variables_for_tigramite.py Outdated Show resolved Hide resolved

egalytska reviewed Jul 20, 2023

View reviewed changes

esmvaltool/diag_scripts/galytska23/select_variables_for_tigramite.py Outdated Show resolved Hide resolved

egalytska reviewed Jul 20, 2023

View reviewed changes

egalytska added 5 commits August 3, 2023 13:37

Update esmvaltool/diag_scripts/galytska23/select_variables_for_tigram…

8d8529f

…ite.py

Update esmvaltool/diag_scripts/galytska23/select_variables_for_tigram…

01d0033

…ite.py

Update esmvaltool/diag_scripts/galytska23/select_variables_for_tigram…

c27e5f9

…ite.py

typo fixes

1529071

Merge branch 'main' into improve_diag_select_vars_tigramite_galytska23

e733760

valeriupredoi marked this pull request as ready for review August 7, 2023 15:15

egalytska approved these changes Aug 8, 2023

View reviewed changes

valeriupredoi mentioned this pull request Aug 8, 2023

Add two authors in release Changelog (per PR) if contributions warrant it ESMValGroup/ESMValCore#2167

Open

remi-kazeroni approved these changes Aug 11, 2023

View reviewed changes

valeriupredoi merged commit a8186dd into main Aug 11, 2023
6 checks passed

valeriupredoi deleted the improve_diag_select_vars_tigramite_galytska23 branch August 11, 2023 10:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slight refactoring of diag `galytska23/select_variables_for_tigramite.py` for generality and portability (for Changelog v2.10: authors: @valeriupredoi and @egalytska) #3298

Slight refactoring of diag `galytska23/select_variables_for_tigramite.py` for generality and portability (for Changelog v2.10: authors: @valeriupredoi and @egalytska) #3298

valeriupredoi commented Jul 18, 2023 •

edited

valeriupredoi commented Jul 19, 2023

egalytska left a comment

valeriupredoi commented Jul 20, 2023 •

edited

valeriupredoi commented Aug 2, 2023

egalytska commented Aug 3, 2023

egalytska commented Aug 7, 2023

valeriupredoi commented Aug 7, 2023

egalytska left a comment

valeriupredoi commented Aug 8, 2023

remi-kazeroni commented Aug 9, 2023

valeriupredoi commented Aug 9, 2023

valeriupredoi commented Aug 10, 2023

egalytska commented Aug 10, 2023

valeriupredoi commented Aug 10, 2023 •

edited

remi-kazeroni commented Aug 10, 2023

valeriupredoi commented Aug 10, 2023

remi-kazeroni commented Aug 11, 2023

remi-kazeroni left a comment

valeriupredoi commented Aug 11, 2023

Slight refactoring of diag galytska23/select_variables_for_tigramite.py for generality and portability (for Changelog v2.10: authors: @valeriupredoi and @egalytska) #3298

Slight refactoring of diag galytska23/select_variables_for_tigramite.py for generality and portability (for Changelog v2.10: authors: @valeriupredoi and @egalytska) #3298

Conversation

valeriupredoi commented Jul 18, 2023 • edited

Description

Before you get started

Checklist

New or updated recipe/diagnostic

valeriupredoi commented Jul 19, 2023

egalytska left a comment

Choose a reason for hiding this comment

valeriupredoi commented Jul 20, 2023 • edited

valeriupredoi commented Aug 2, 2023

egalytska commented Aug 3, 2023

egalytska commented Aug 7, 2023

valeriupredoi commented Aug 7, 2023

egalytska left a comment

Choose a reason for hiding this comment

valeriupredoi commented Aug 8, 2023

remi-kazeroni commented Aug 9, 2023

valeriupredoi commented Aug 9, 2023

valeriupredoi commented Aug 10, 2023

egalytska commented Aug 10, 2023

valeriupredoi commented Aug 10, 2023 • edited

remi-kazeroni commented Aug 10, 2023

valeriupredoi commented Aug 10, 2023

remi-kazeroni commented Aug 11, 2023

remi-kazeroni left a comment

Choose a reason for hiding this comment

valeriupredoi commented Aug 11, 2023

Slight refactoring of diag `galytska23/select_variables_for_tigramite.py` for generality and portability (for Changelog v2.10: authors: @valeriupredoi and @egalytska) #3298

Slight refactoring of diag `galytska23/select_variables_for_tigramite.py` for generality and portability (for Changelog v2.10: authors: @valeriupredoi and @egalytska) #3298

valeriupredoi commented Jul 18, 2023 •

edited

valeriupredoi commented Jul 20, 2023 •

edited

valeriupredoi commented Aug 10, 2023 •

edited