Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate defining write_plots and write_netcdf in config-user file #808

Merged
merged 14 commits into from Jan 29, 2021

Conversation

bouweandela
Copy link
Member

@bouweandela bouweandela commented Oct 2, 2020

Deprecate use of write_plots, write_netcdf, output_file_type config-user file

Please define those settings in the diagnostic script setting of the recipe instead, for those diagnostic scripts that support these settings

See #93

Tasks

  • Create an issue to discuss what you are going to do, if you haven't done so already (and add the link at the bottom)
  • This pull request has a descriptive title that can be used in a changelog
    Add unit tests Skipped because all this code is going to be replaced in Implement importable config object #785 soon
  • Update documentation
  • Circle/CI tests pass. Status can be seen below your pull request. If the tests are failing, click the link to find out why.
  • Codacy code quality checks pass. Status can be seen below your pull request. If there is an error, click the link to find out why. If you suspect Codacy may be wrong, please ask by commenting.
  • If you make backward incompatible changes to the recipe format, make a new pull request in the ESMValTool repository and add the link below

…fig-user.yml

Please define those settings in the diagnostic script setting of the recipe instead, for those diagnostic scripts that support these settings
@bouweandela
Copy link
Member Author

@stefsmeets Do you think it still makes sense to implement unit tests for this, given that you're going to change everything anyway in #785?

Copy link
Contributor

@stefsmeets stefsmeets left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I was thinking just marking the config parameters is enough (see comments).

@@ -1242,15 +1242,22 @@ def _initialize_scripts(self, diagnostic_name, raw_scripts,
if self._cfg['write_ncl_interface']:
settings['exit_on_ncl_warning'] = self._cfg['exit_on_warning']
for key in (
'output_file_type',
Copy link
Contributor

@stefsmeets stefsmeets Oct 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is enough to just mark the parameters as deprecated in the config and raise a warning. I think it would be better to keep this part of the code 'as-is' for now. The checks to re-add the keys below adds clutter. Save these for a later patch, and apply it to remove the parameters from the entire codebase for 2.3.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with keeping this as is, is that the parameters from config-user.yml then overwrite what it says in the recipe, and people might get confused that it doesn't work if they add the setting in the recipe

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if the warning is given for the config, then that should be enough in my opinion. Maybe it would be then a good idea to do a double check in the recipe runner and re-issue the warning there.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it would be then a good idea to do a double check in the recipe runner and re-issue the warning there.

Do you mean here by the 'recipe runner'? Wouldn't that result in the same amount of code clutter, but less user friendly?

Copy link
Contributor

@stefsmeets stefsmeets Oct 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I was thinking if that is the problem, we can also warn users as late as possible -- when the recipe is being executed by what I called the recipe runner 😅

I did not understand that the variables from the user config overwrite the recipe, which I guess is what added the complexity in the first place. Whatever is added here, we should make sure it is easy to undo.

@@ -367,10 +367,7 @@ def _write_ncl_settings(self):
'run_dir',
'plot_dir',
'work_dir',
'output_file_type',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, don't delete these variables just yet and remove the block to add them back below.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some change is needed here, because otherwise these settings will not be available as variable diag_script_info@write_plots etc, making it impossible to adjust diagnostic scripts to the new situtation before they are removed in 2.3. As an intermediate step, I thought I would make them available as both diag_script_info@write_plots and config_user_info@write_plots and then remove the latter in version 2.3.

@@ -549,15 +553,12 @@ def _collect_provenance(self):
'exit_on_ncl_warning',
'input_files',
'log_level',
'output_file_type',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would ignore recording their value in the provenance, so it would break the idea that all diagnostic script settings are recorded in the provenance

esmvalcore/_config.py Show resolved Hide resolved
@stefsmeets
Copy link
Contributor

@stefsmeets Do you think it still makes sense to implement unit tests for this, given that you're going to change everything anyway in #785?

Should be alright. The fact that they are deprecated means we should not be testing for it anyway.

@bouweandela
Copy link
Member Author

Looks good. I was thinking just marking the config parameters is enough (see comments).

Thanks for reviewing!

Copy link
Contributor

@valeriupredoi valeriupredoi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

am not actually requesting changes, I am just thinking these options are indeed not needed anymore but the functional infrastructure they offer is useful - am thinking for the future when we may get to a point where we use a web based interface that runs diagnostics in the backend and the user gets a nice output that they can immediately use for a paper or presentation etc - the write_netcdf and output_format would be nice to be a tunable button

esmvalcore/_config.py Show resolved Hide resolved
@schlunma
Copy link
Contributor

schlunma commented Oct 2, 2020

I vote strongly against removing output_file_type. Especially having the command line option output-file-type to quickly change the file format for plots is really convenient. For example, I often switch between PDF (used for TeX or for seperate files of plots sent to journals) and PNG (used for Word).

I don't have a strong opinion about the other two options (more in favor of removing them), but I would really like to keep output_file_type.

@valeriupredoi
Copy link
Contributor

I vote strongly against removing output_file_type. Especially having the command line option output-file-type to quickly change the file format for plots is really convenient. For example, I often switch between PDF (used for TeX or for seperate files of plots sent to journals) and PNG (used for Word).

I don't have a strong opinion about the other two options (more in favor of removing them), but I would really like to keep output_file_type.

albeit never used by me (but then again I don't do science), am glad to see this being used and with a good potential for the future quick-run WPS user. Cheers for commenting @schlunma 🍺

@bouweandela
Copy link
Member Author

bouweandela commented Oct 5, 2020

Thanks for the comments @valeriupredoi and @schlunma! Did you also have a look at issue #93 for the reasoning behind removing these settings?

I vote strongly against removing output_file_type. Especially having the command line option output-file-type to quickly change the file format for plots is really convenient. For example, I often switch between PDF (used for TeX or for seperate files of plots sent to journals) and PNG (used for Word).

@schlunma It would be even more convenient if the diagnostic simply wrote both file formats, because then you wouldn't even need to re-run or change a setting to get the plots in a different the format. Since plotting is not a very time-consuming process, I think the time saving is very minimal. The arguments against having the feature are that it provides a poor user experience because many of the diagnostics do not respect the setting and the user needs to select the right output type before running. Also, it might not be so easy to implement this feature consistently across all diagnostics, for example: I doubt whether the R plotting library supports the same output file types as the Python plotting library or the NCL plotting library. I can add a small plotting function to the ESMValTool diag_scripts/shared function that plots several types by default and use it from the example recipe for demonstration purposes, would that make sense? Do you think .png and .pdf would be enough, or would other file formats also make sense?

am thinking for the future when we may get to a point where we use a web based interface that runs diagnostics in the backend and the user gets a nice output that they can immediately use for a paper or presentation etc - the write_netcdf and output_format would be nice to be a tunable button

@valeriupredoi write_netcdf might reduce the runtime and disk space usage of diagnostics that write very large output files, but the majority of diagnostics write very small netcdf files, because there is only so much information that can go into a plot. Therefore it's much more convenient to simply always write all netcdf files. For diagnostic developers who really want to be able to disable this, we can recommend in the documentation that they implement this setting in the recipe. For a web based interface it would probably be nicest to have the netcdf files, because then you can implement an interactive viewer so that the users of the website can zoom in, etc. Regarding configuring the output file type: it's much simpler to always write all plot types, the extra compute time won't be very much.

@schlunma
Copy link
Contributor

schlunma commented Oct 5, 2020

@schlunma It would be even more convenient if the diagnostic simply wrote both file formats, because then you wouldn't even need to re-run or change a setting to get the plots in a different the format. Since plotting is not a very time-consuming process, I think the time saving is very minimal. The arguments against having the feature are that it provides a poor user experience because many of the diagnostics do not respect the setting and the user needs to select the right output type before running. Also, it might not be so easy to implement this feature consistently across all diagnostics, for example: I doubt whether the R plotting library supports the same output file types as the Python plotting library or the NCL plotting library. I can add a small plotting function to the ESMValTool diag_scripts/shared function that plots several types by default and use it from the example recipe for demonstration purposes, would that make sense? Do you think .png and .pdf would be enough, or would other file formats also make sense?

Alright, that does make sense for me. A small function that writes .png and .pdf would be sufficient for me. Maybe something like this (to consider matplotlib.pyplot.savefig and matplotlib.figure.Figure.savefig)?

import matplotlib.pyplot as plt

def savefig(basename, cfg, fig=None, ext=None **kwargs):
    if ext is None:
        ext = ['png', 'pdf']
    paths = ...  # get correct paths with basename, ext and cfg
    plot_object = plt if fig is None else fig
    for path in paths:
        plot_object.savefig(path, **kwargs)

Just as a small side note: Plotting might be time-consuming in rare cases, e.g. I have an example where I plot a scatterplot with ~200'000 points. In this case it's also not useful to save the file as '*.pdf', since this has more than 100 MB and takes 10min to open on my local machine. However, I think is a rare exception.

@valeriupredoi
Copy link
Contributor

cheers for the debate, guys - always useful to discuss things this way 👍 I agree with you all - I am happy the defaults are back and as for the plots file types I think there's more than two right? - eps, png, jpg, pdf, gif (yeah we should implement a gif, we can make climate change memes 😁 - no but seriously, for looking at moving stuff in plots) - so I think that should be tunable since I don't see the need to output all these out. I reckon it's best kept under the hood now (hence remove the optionality from config) but keep the functionality 🍺

@valeriupredoi
Copy link
Contributor

@schlunma

However, I think is a rare exception.

might not be as rare as David Attenborough would say - the Autoassess diags output hundreds of plots 🍺

@bouweandela
Copy link
Member Author

This needs to be adjusted to take into account #868 and the feature will now be removed in v2.4, because it didn't make it for the 2.1 release.

@bouweandela bouweandela added this to the v2.2.0 milestone Dec 8, 2020
@stefsmeets
Copy link
Contributor

stefsmeets commented Dec 9, 2020

This needs to be adjusted to take into account #868 and the feature will now be removed in v2.4, because it didn't make it for the 2.1 release.

Do we have a deprecation plan somewhere to keep track of which variables are deprecated / removed in which version? Maybe this can be added in #93?

@bouweandela
Copy link
Member Author

bouweandela commented Jan 5, 2021

Do we have a deprecation plan somewhere

No, there is no big plan for what will be deprecated when, but once something is deprecated this should be written down in the changelog for that release, including a note on when the feature will be removed.

Maybe this can be added in #93?

I updated the issue

@jvegreg
Copy link
Contributor

jvegreg commented Jan 26, 2021

Can you please update these, so we can do it for this release?

Copy link
Contributor

@jvegreg jvegreg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You missed a couple of 2.3, but nothing really serious

esmvalcore/_config.py Outdated Show resolved Hide resolved
esmvalcore/_config.py Outdated Show resolved Hide resolved
bouweandela and others added 2 commits January 27, 2021 13:27
Co-authored-by: Javier Vegas-Regidor <javier.vegas@bsc.es>
@stefsmeets
Copy link
Contributor

Just wondering if these are still up-to-date:

I marked them for 2.2, but I think it should be 2.3 or 2.4 now also?

@bouweandela
Copy link
Member Author

I think they're fine, deprecate in 2.2 (the upcoming release) and remove in 2.4 (in autumn).

@bouweandela
Copy link
Member Author

After looking through the code of ESMValTool to see what it would take to remove output_file_type from config-user.yml, I think that may be a lot of work. I think it's probably not worth the effort.

@bouweandela
Copy link
Member Author

@jvegasbsc @valeriupredoi @stefsmeets Could you have another look at this please to see if anything remains to be done before this can be merged? It would be nice if we could include it in the v2.2 release.

Copy link
Contributor

@valeriupredoi valeriupredoi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, cheers @bouweandela - do we have an issue in ESMValTool about this (to remove the bits in the diags that are deprecated)?

@bouweandela
Copy link
Member Author

Good point! See ESMValGroup/ESMValTool#2005

@valeriupredoi
Copy link
Contributor

Good point! See ESMValGroup/ESMValTool#2005

noice! I added a "future release" label there because it's cool to have one 😁

@bouweandela bouweandela changed the title Deprecate defining write_plots, write_netcdf, output_file_type in config-user file Deprecate defining write_plots and write_netcdf in config-user file Jan 27, 2021
Copy link
Contributor

@jvegreg jvegreg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This two should point to 2.4, no?

esmvalcore/experimental/config/_config_validators.py Outdated Show resolved Hide resolved
esmvalcore/experimental/config/_config_validators.py Outdated Show resolved Hide resolved
Co-authored-by: Javier Vegas-Regidor <javier.vegas@bsc.es>
@bouweandela bouweandela dismissed stale reviews from jvegreg and stefsmeets January 29, 2021 15:44

done

@bouweandela bouweandela merged commit 1c9885b into master Jan 29, 2021
@bouweandela bouweandela deleted the deprecate-write-plots-netcdf-type branch January 29, 2021 15:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants