Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pre-commit to CI and fix all typing and linting #306

Merged
merged 27 commits into from
Sep 20, 2022

Conversation

rhugonnet
Copy link
Contributor

@rhugonnet rhugonnet commented Sep 18, 2022

Summary

This PR adds pre-commit to xDEM's CI, lints all package files with the linters, and manually solves all of the codespell errors (~10), flake8 errors (~200) and mypy errors (~500 errors) raised (+ minor errors from other small modules).

Additionally, to avoid np.ndarray types being considered as Any (the default before Python 3.9, and after if their dtype is not specified) which prevents us from differentiating them from Raster objects in function calls/outputs, this PR adds the NumPy plugin to mypy (https://numpy.org/devdocs/reference/typing.html#mypy-plugin). This solved several errors but also raised another ~300, now solved.

What's new in the pre-commit config

The implementation of this PR mirrors that of @erikmannerfelt in GeoUtils, at the exception of the following points:

  1. mypy now understands np.ndarray types through the plugin supported since NumPy 1.21 called by --config-file=mypy.ini in .pre-commit-config.yaml. The mypy.ini file declares the plugin: [mypy] plugins = numpy.typing.mypy_plugin . mypy uses a NumPy version installed locally in pre-commit, and so is independent from that used in the package, ensuring no issue when testing older Python versions;
  2. isort is now made compatible with black to avoid infinite-loop conflicts between the two (that also started happening in GeoUtils recently), by passing args: ["--profile", "black"].
  3. mypy now authorizes implicit optional types in function parameters, for example x: int = None is understood automatically as x: int | None = None, following an old comment of @adehecq and myself. This is done by passing --implicit-optional to mypy. Careful: There are exceptions related to class definition and overload.
  4. codespell now ignores several words that we use in xDEM. It was hard to pass several arguments as a list in the args of the .pre-commit-config.yaml, but I finally managed to make it work from a GitHub issue, the syntax is (need to convert to list of strings): ['--ignore-words-list', 'nd,alos', '--ignore-regex', '\bhist\b', '--'] (the '--' serving to indicate the end of the list of arguments);
  5. flake8 is now more flexible on the writing of dicts. This is done by ignoring C408 errors that forbids the writing of dicts other ways as with literals (e.g., dict(a=b) is forbidden, but {"a":b} is fine). This can be a bit annoying when we define dictionaries with keys typically corresponding to function arguments.
  6. All typing versions of pre-commit have been upgraded.

What's new in xDEM's type hinting

  1. The NumPy dtypes must now be passed to work correctly with mypy typechecking. For this, this PR uses the NDArray[X] NumPy generic array type which is a convenience wrapper corresponding to np.ndarray[Any, np.dtype[X]]. In xDEM, we use almost only floating types, and so this PR defines two new NumPy-based dtypes: NDArrayf, corresponding to NDArray[np.floating[Any]], i.e. an array of any shape with any floating type. And MArrayf, corresponding to np.ma.masked_array[Any, np.dtype[np.floating[Any]]]. This syntax is only supported by Python >= 3.9, and so is made compatible in a new _typing.py module:
if sys.version_info.minor >= 9:

    from numpy.typing import NDArray  # this syntax works starting on Python 3.9

    NDArrayf = NDArray[np.floating[Any]]
    MArrayf = np.ma.masked_array[Any, np.dtype[np.floating[Any]]]

else:
    NDArrayf = np.array  # type: ignore
    MArrayf = np.ma.masked_array  # type: ignore
  1. Some important class or function dictionaries are now typed. Coreg classes and spatialstats.py objects were a bit of a mess for type checking, because they both use dictionaries with many different keys: Coreg._meta in the first case, and various **kwargs arguments in the second. To address this, this PR adds TypeDict objects to describe the components of these dictionaries. For instance, for Coreg classes:
class CoregDict(TypedDict, total=False):
    """
    Defining the type of each possible key in the metadata dictionary of Coreg classes.
    The parameter total=False means that the key are not required. In the recent PEP 655 (
    https://peps.python.org/pep-0655/) there is an easy way to specific Required or NotRequired for each key, if we
    want to change this in the future.
    """

    bias_func: Callable[[NDArrayf], np.floating[Any]]
    func: Callable[[NDArrayf, NDArrayf], NDArrayf]
    bias: np.floating[Any] | float | np.integer[Any] | int
    matrix: NDArrayf
    centroid: tuple[float, float, float]
    offset_east_px: float
    offset_north_px: float
    coefficients: NDArrayf
    coreg_meta: list[Any]
    resolution: float
    # The pipeline metadata can have any value of the above
    pipeline: list[Any]
  1. Some float and integer types were updated with NumPy support. A lot of instances of float had to be changed to np.floating[Any] now that NumPy is supported. However, this is not always the case and is still inconsistent for now. At a later stage, we could create types such as Float = float | np.floating[Any] to ensure typing is consistent everywhere. Here's an example for nmad that fails with a float return type:
def nmad(data: NDArrayf, nfact: float = 1.4826) -> np.floating[Any]:
  1. Most of our @overloading statements are "saved" 👼 by the new NumPy array plugin. Now mypy differentiates Raster and NDArrayf (not considering them both as Any anymore and ignoring the overloading). However, a lot of issues were triggered still by mypy which were really hard to resolve. This is because of two things: 1/ the broadest type overload must always be listed last; 2/ all parameters defaults must be fully copied in the overload (error message quite unclear).
    For example of 1/, this will fail...
@overload
def fractal_roughness(dem: RasterType, window_size: int = 13) -> RasterType:
    ...

@overload
def fractal_roughness(dem: NDArrayf | MArrayf, window_size: int = 13) -> NDArrayf:
    ...

def fractal_roughness(dem: NDArrayf | MArrayf | RasterType, window_size: int = 13) -> NDArrayf | RasterType:

...because RasterType is considered as Any! So the first @overload would also work for NDArrayf, making the second one irrelevant. To solve this, the second @overload with NDArrayf needs to be moved to the first position.
I added these points to xDEM's wiki page on Tips to make mypy happy.

Resolves #53

Final note: is all of this really useful?!

I was skeptical about type hinting when we started doing it. My experience from correcting hundreds of type errors in xDEM in the past days has made me more decisive: Although it can be a bit of a pain for some errors because we aren't used to them (the Wiki on this is growing! 💪), type checking will save us a lot of time on possible type-related errors down the road! 😁

Why? Well, if we look at this xDEM PR: we were already type hinting during the past years (and I was respecting the internal type checker of PyCharm while adding stuff). Despite this, there were still ~800 errors that were raised by mypy (about ~600 redundants one solved in a couple dozen fixes). From going through the ~200 other individual errors, I would say that about ~50 of those would have created small bugs/issues at some point (illogical passing of objects between functions, too broad user-input, etc..). And, actually, ~10 of those helped correct little errors that existed in our libraries! 🙂

Go type hinting! 🦸

@rhugonnet rhugonnet marked this pull request as draft September 18, 2022 18:50
@rhugonnet rhugonnet changed the title Add pre-commit to CI Add pre-commit to CI and fix all typing and linting Sep 19, 2022
@rhugonnet rhugonnet marked this pull request as ready for review September 20, 2022 12:36
@rhugonnet
Copy link
Contributor Author

Ready for merging! 🥳

xdem/dem.py Show resolved Hide resolved
Copy link
Member

@adehecq adehecq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pffff what a work !! Thank you for doing that.
I must admit it was difficult to distinguish between what was really important change and purely linting changes. I had to scroll quite quickly, but your nice summary was super useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Homogenize annotation and typing syntax
2 participants