Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Util (& Helper) Functions #46

Closed
Evelyn-M opened this issue Jul 27, 2020 · 6 comments
Closed

Util (& Helper) Functions #46

Evelyn-M opened this issue Jul 27, 2020 · 6 comments
Labels
conventions Coding conventions or style question

Comments

@Evelyn-M
Copy link
Collaborator

Evelyn-M commented Jul 27, 2020

Issue Description

  • Where is the appropriate location for functions shared between modules?
    No matter what solution, under all circumstances we should avoid "cross imports", i.e. situations where module A imports a method from B and vice versa

I'd like to deviate a bit from this location-only focus of the issue and add the following:

  • How do we deal with functions that should be util functions (because they perform a frequently-requested action), but are coded individually (in slightly differing ways) in several modules at the moment?
  • How do we deal with existing util functions that could be used in more modules than they currently are?

Some background

What is a util function? What is a helper function? Are there meaningful differences?
Most sources admit the terms util and helper are used ambiguously and inconsistently. There is no very clear consensus about their distinction.

A helper method is a term used to describe some method that is reused often by other methods or parts of a program. Helper methods are typically not too complex and help shorten code for frequently used minor tasks. Using helper methods can also help to reduce error in code by having the logic in one place

Some point out that "helper" functions could simply be sub-functions of a main function, used for refactoring & readability purposes within one single module, whereas the term util makes the inter-modular use purpose relatively explicit.

I'll use the term util functions from now on for the scope of this issue, referring to short, re-usable, relatively general functions performing repetitive tasks.

How are we handling util functions at the moment in climada?

  • In folder climada/util we have 10 files that contain some sort of util functions (and 4 other items: config.py, constants.py, test/, and earth_engine, which is a module that never quite made it to a stand-alone feature and was hence dumped into util)

  • Checking "references" of use of those util functions via github's code navigation tool reveals that, while indeed many are used in several modules, quite a few of those util functions are used only by one module or not at all (i.e. apart from its definition and a corresponding unit test)

Example

alpha_shape

coordinates

checker

scalebar_plot

  • The logic of how util functions are currently grouped within those 10 files is not always very intuitive (e.g. there is a file_handler, but also a hdf5_handler, which is a file-format, after all)

  • Checking some of the code base, there are many instances of module-agnostic, repetitive tasks that could be handled by a util function (mostly reading / loading files, converting dates / times, handling strings, converting object-types, etc.)

Example Will add examples!

Propositions

  • Location in folder "utils" generally makes sense. However: Should there be some naming hint in the scripts to distinguish scripts containing util funcs form other scripts in that folder (config, earth_engine, constants), e.g. ufuncs_scriptname.py?

  • Make a list of currently "unused / single-used" functions in utils, try to find author & critically reflect its potential future use in utils vs relocation to specific module.

  • "Awareness raising" on new util functions: Should people send around some kind of notification to currently active climada developers when they add a new util function that is potentially useful to others, incl. short description of what it can do? E.g. via slack / mail? If on the other hand, no use-case can be imagined --> reflect on whether this function really needs to go to utils

  • Re-structuring of some of the scripts to logically group util functions (I did not find a good overview on "common util classes", I guess that's a bit project specific. Ideas for further "groups" will be welcome.

Example

ufuncs_plot.py? --> functions for plotting, maybe move scalebar_plot to this as well?
ufuncs_file_handlers.py --> functions for opening / reading, closing / saving any kind of files. Maybe also the current hdf5_handler script?
ufuncs_interpolation.py --> seems quite sensible to me
ufuncs_dates_times.py --> also quite sensible grouping. See lots of potential for those functions to be used more! And other datetime conversions to be added to this!
...

  • To think about: what about gentle, gradual re-factoring of exisiting modules on master branch, which could integrate more util funcs? Who would take care of this?
@Evelyn-M Evelyn-M added question conventions Coding conventions or style labels Jul 27, 2020
@tovogt
Copy link
Collaborator

tovogt commented Jul 29, 2020

Thanks for this overview! I agree that the climada.utils subpackage is in need of some cleanup. Some of the modules are too large (more than 1000 lines according to pylint). Especially the coordinates module should be split up into different modules, e.g. for geodata and rasterio (many utility functions are actually wrappers to the actual rasterio Python package).

As a side note: ufunc is usually used as an abbreviation for "universal function" in NumPy contexts: https://numpy.org/doc/stable/reference/ufuncs.html Instead, if a submodule of climada.utils doesn't actually contain utility functions, we should probably move it to a different place:

  • There is already an ongoing discussion about (re-)moving the config and the constants module: Conventions on the use of constants #37
  • The earth_engine module isn't tested and can't be used without the earthengine-api Python package which is not in CLIMADA's official requirements. Furthermore, it's not used in any other CLIMADA subpackage. At some point, we should probably either move or remove it.

@chahank
Copy link
Member

chahank commented Jul 31, 2020

All good points! Just wanted to add an example of a function that I think should be rather a utility than a class function:

In climada.engine.cost_benefit there is the private method _norm_values that computes the prefix 'thousand, million, billion'.

@mmyrte
Copy link
Collaborator

mmyrte commented Aug 3, 2020

I agree on all that's been said - a cleanup within the util functions is definitely worthwhile. I think the only realistic way of reducing existing duplicate logic throughout the code base is if someone can dedicate time specifically to refactoring. If we're able to visualise the dependency structure of functions and methods within CLIMADA (sort of like pyreverse UML, see classes_climada.pdf), I think that new contributors could better avoid introducing redundancies.

Regarding util/config.py: it contains what I'd expect in __init.py__ at the base level of the package, since there are other setup functions in there as well.

Maybe off topic: it may be necessary to overthink the setup procedure anyways if we're going to package CLIMADA: setup_environ() e.g. does environment variable manipulation based on the availability of eio. It's not obvious to me why that is necessary, since eio is not called anywhere in the codebase. I also think that migrating to configparser/INI from a custom json-based reader may be sensible, albeit breaking backwards compatibility.

@chahank
Copy link
Member

chahank commented Dec 11, 2020

This was rediscussed in the CLIMADA code convention of 11.12.2020.

A very minimal consensus was to promote better the existence of the util functions in the developer's guide, the coding conventions, and the code reviews.

@chahank
Copy link
Member

chahank commented Mar 26, 2021

Should this discussion continue or can it be closed?

@emanuel-schmid
Copy link
Collaborator

For the time being we stick to the current implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
conventions Coding conventions or style question
Projects
None yet
Development

No branches or pull requests

5 participants