-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
To what level should astropy be typed? #15170
Comments
I don't have time to attack this right now, but I think it would be awesome if astropy had type annotations. It's pretty annoying to me that I have to force mypy to ignore everything from astropy right now. However, a possibly larger issue is that I don't think numpy has added support for generic ufuncs or array functions (correct me if I'm wrong), so if astropy.units did have typing, using any numpy function would ruin the type information. |
Personally I'm most in favor of 2a (it's easiest to maintain), followed by 2b (at least we got it), then 1 (we got it where people care about it). For some parts of Python typing is no longer optional, but required, e.g. def func(x: str | bytes | os.PathLike | ReadableFileLike | tuple[str, str] | ... | Sequence[str | bytes os.PathLike | ReadableFileLike] | ...):
"""Description.
Parameters
——————-
x # (no longer need to explain this, the types are inserted at docs compilation)
But a text explanation is nice.
A path to a file or the file itself. Can be a sequence, in which case the operation is performed for
each element in the sequence. A 2-elt tuple indicates something special.
""" Furthermore, yes, well-typed code can be better compiled by Python than untyped code. Compilation tools like |
Very good points above! And thank you for raising this issue. I'd be hesitant to have typing information in One approach we've taken in PlasmaPy is to create typing constructs for commonly used situations. For example, we have I've also been meaning to look into tools that could automatically add type hints. |
I'm not answering exactly the question @mhvk asked, but I think this will still be relevant for the discussion. The way I see it there are three tiers to typing. In the order of increasing strictness and complexity they are:
Tier 1. should be uncontroversial because Once a large enough fraction of @namurphy mentioned introducing definitions for commonly used union types. |
I agree with this on the whole, with the caveat that we currently compile performance-necessary code. With mypy+mypyc applied to very select and agreed-upon sections of the library we can increase performance of some currently-not-compiled-but-performance-desired code. That's just saying we might want to have some tier-3 stuff right away. NOT EVERYWHERE, just where we want. E.g. @mhvk and I are interested in this for some units stuff. This naturally fits in with our other compiled C code, so I think it falls under the same set of guidelines, we just need to add the requisite settings to our config. But I think the important takeaway is that this is a related, but separate, discussion about how to start with tier-1 and shouldn't stop us from helping our users and improving the code. |
I'm strongly in favor of having type annotations in the modules rather than I think that "Go with the flow" and "Embrace typing" are not mutually exclusive, and I am taking this incremental approach with the dozens of packages we use in Chandra operations. Many of these are 10+ years old and have varying degrees of rigor. So my process is:
I find that any level of type hints is helpful and that it can be entirely incremental. It can even be little things like taking 15 minutes to add annotations to a few functions. For astropy I think a big issue is dealing with the very flexible API of some of the most prominent classes. Basically the original design approach for some core classes like In these cases we could define type aliases like It would likely be very beneficial for an incremental process to provide a few things:
I think that an incremental crowd-sourced approach can be effective since these PR's are in the good-first-issue category of requiring very little detailed astropy knowledge. |
Related note, typing syntax is about to get a whole lot cleaner — https://peps.python.org/pep-0695. |
It sounds like a bit of a convergence on option 1, of just adding typing as we go, and clearly the preference is for doing it inside the So, maybe what needs to be done now is to open an issue about creating a p.s. To slightly argue for a less as-we-go mode: most of our docstrings are, I think, fairly good & standard in laying out the type of the parameters - it should be possible to have some kind of script that transfer that information to the signature. Indeed, maybe that exists already? |
Agreed. I also think most modules will end up having a domain-specific set of types and thus their own typing module (though it may be private). For many things we can organically discover what are the common types and move those to the common |
I'm joining the emerging consensus on option 1 and I think that the roadmap drawn by @taldcroft makes perfect sense.
I would agree here on the condition that we don't add a |
A |
@namurphy - since we generate the docstring for the various submodules with unit definitions automatically, it should be possible to generate the astropy/astropy/units/utils.py Line 80 in 6568d90
|
Would it make sense to start by enabling mypy in pre-commit, while initially setting the mypy configuration to ignore all existing errors? I've been attempting to do that for PlasmaPy (https://github.com/PlasmaPy/PlasmaPy/pull/2424)...and got mypy to pass by temporarily ignoring all errors in only 79 files! 🥲 Our next step will be to gradually improve type hint annotations throughout the package and hopefully get mypy to eventually pass in strict mode. I'm planning to look in the broader pythoniverse for which tools are best for automagically add type hint annotations...since without the tools, it'll probably take a lot of effort. I'd be happy to share my experiences about what does or does not work well. |
@namurphy - I'm not sure it is not a good idea to start requiring typing in new PRs; I think this should be opt-in by submodule maintainers who have done the basics in their modules and are willing to do the reviewing. I don't think we should make contributing harder before we have the basics in place, and have documented where to find the existing astropy types, etc. That said, once we have that, I'm all for letting mypy check PRs to submodules that are (mostly) typed. Or might it be possible to do it on a per-file basis? If it passes mypy on main, it should pass on the PR? Or is this what you meant? It certainly is good to ensure one doesn't regress! Also, yes, some automation would be great! We're fairly good with our numpydoc format, so some automation should be possible. It may be worth reaching out to the numpy folks who did it. |
Thanks @namurphy! I'm looking forward to seeing how this works in PlasmaPy. |
To quickly reply to this: mypy is limited if ran from pre-commit because each hook is installed in isolation, while mypy needs to be installed alongside the package being analysed (and most important, that package's dependencies). I would recommend that type-checking be instead integrated with tox/CI |
This is a really helpful and enjoyable discussion — thank you all so much!
Yeah, I'm not sure how to do that automagically. A related approach would be to have mypy ignore particular lines using
Awesome! I'm wondering if it would make sense to start the typing process by going up the dependency tree, i.e. starting with the subpackages & modules that other subpackages & modules depend on.
Interesting! Another advantage of using tox & GitHub Actions would be that we could add a note at the end that says something like, "For more information about typing and troubleshooting with mypy, see mypy's typing cheatsheet and this page in Astropy's contributor guide" with the appropriate hyperlinks. Having mypy in pre-commit would also add a performance overhead to pre-commit since it needs to be run on the whole project. It'd probably be helpful to have a section or page in the contributor guide on this too, which would at least point to the appropriate typing resources and then discuss particular Astropy nuances. |
Agreed! This is especially true since I've been finding the error messages in mypy a bit hard to understand, in particular because a type error in one line may have been caused by an incorrect annotation in a completely different location.
Yes, this is essentially what I'm doing in PlasmaPy/PlasmaPy#2424. Instead of ignoring all subpackages, I'm setting it to ignore the individual files that mypy found errors in. We'll likely be going through the files one-by-one to either fix/add annotations or add We could perhaps start by configuring mypy so that it doesn't require type hints to be present, but rather check that type hint annotations are correct if they are present. |
A good thing to do sooner rather than later would be to address the remaining violations of the Ruff ANN ruleset: Lines 6 to 11 in 6568d90
|
I just submitted a demo PR in #15794 that takes the approach that I described above and am using in PlasmaPy/PlasmaPy#2424. It gets mypy to pass by ignoring errors on a per-file and per-error basis. It's by no means the only approach, and since there are tradeoffs, it's definitely worth considering other approaches (like in #12971). In any case, it would be a starting point at least! I also created an issue about #15170 about writing an APE about typing, but I'd suggest we continue this conversation here for the time being. Thank you again everyone! |
I'm not going to read all this, but I'm going to assume you're gonna need some hands to actually type all these types out. I volunteer. |
I'm in the process of using So far, I'm adding return annotations to special methods like
|
Just wanted to chime in on this and point out that this behaviour can be configured: https://mypy.readthedocs.io/en/stable/config_file.html#confval-check_untyped_defs |
We recently added mypy to PlasmaPy's CI 🎉, and I've been finding it to be pretty good overall. The main pain points that I've found are (1) it's a bit annoying to configure, and (2) the error messages are sometimes a bit hard to understand. For example, sometimes the error message will point to a line, while the problem was actually in an earlier line. A takeaway lesson from my experience with PlasmaPy is that it is worthwhile to investigate alternatives to mypy. For example, pyright is an open source static type checker developed by Microsoft that is designed for high performance and to work with large projects. Most of the informal discussions I've seen on the World Wide Web tend to favor pyright over mypy, including because of better error messages. I also found an article on static type checkers that also covers pytype and pyre. Edited to add: I just tried out pyright and share my initial impressions in PlasmaPy/PlasmaPy#2451. For the moment I'm inclined to stick with mypy, but the next time I go through a file to add type hint annotations in PlasmaPy, I'm going to try using both mypy and pyright so that I can better compare them. |
Just to add my opinion at this point after a lot of good discussion on implementing static type checking. The original issue description did not touch on adding type checking to CI and here I think we need to be cautious. I'm a huge fan of type annotation as documentation for the developer. Basically you and your IDE can look at code and quickly see the intended types. This helps the developer and makes it easier to make quality code contributions. I am worried that enforcing static type checking, even in a limited way, will make it substantially harder for developers to contribute code. This is based on my experience when trying to enable pyright "basic" checking in VS code (not even strict) on some small pieces of code that have basic type annotations (at the level which provide useful documentation). Basically what happens is that pyright shows a lot of warnings which are not related to code correctness. Many of them come from upstream untyped packages, which is basically a show-stopper because that means my own code always has a bunch of spurious warnings and those hide real warnings that I should be noticing. Beyond that, trying to fix them all ends up taking a lot of time, and sometimes they should not even be fixed. Python is still fundamentally a dynamic language after all. A simple example is a function that requires a |
What @taldcroft describes seems to be very similar to what I referred to as tier 1 in an earlier comment of mine. One of the reasons implementing that level of typing should be uncontroversial is that it does not require us to settle on using any particular type checker. I will also point out that it is not at all obvious that a type checker that might be the best for an untyped code base, like Right now the best thing to do would be to simply start adding type annotations without worrying too much about configuring any type checkers for a CI job. If developers find some tool or another to be useful then those tools can be used without having to run them in CI. PS: @taldcroft, do you know about type guards? |
@eerovaher - about type guards, I had seen them previously but have not used them. I feel like this is a case where typing starts to drive code changes that may be undesirable. For instance replacing a very-fast inline statement like In my particular case I don't think a type guard would even help to remove that warning. The first function is typed
The type checker will always complain that I'm passing the wrong type to I do appreciate that typing and the sorts of warnings that are revealed tend to drive us to generally better code. The biggest lesson I am learning is being specific about return types. |
@taldcroft, we are getting somewhat off topic, so I'll be very brief: from typing import cast
...
def func2(arg: str | Path) -> SomeType:
try:
out = func1(cast(str, arg))
except Func1Error:
out = something_else(arg) |
Thank you all for continuing this discussion!
Agreed! The biggest part of my motivation for adding type hint annotations to
This is the biggest worry I have about enabling mypy in CI for PlasmaPy, and I've been working on some mitigation strategies (like discussing type hint annotations in our contributor guide, and mentioning type hints in the comment that gets posted to every PR). I'll be happy to share which strategies end up working best for us. Given the size of the Astropy community, it's probably worth writing an APE about the process of adding type hint annotations. To come at this from a different angle, a friend pointed out that adding type hint annotations could potentially be a great first issue for new contributors, in particular if we can point to a good discussion of the process of adding type hint annotations.
The way I handled this situation for PlasmaPy was to configure |
Thanks @eerovaher and @namurphy - I'm learning a lot and this seems productive! The experience from plasmapy is quite valuable. About a document for adding type hint annotations, I might suggest a wiki page rather than APE. APE's are quite difficult to update, whereas I think a process document / tutorial on add type annotations should be a living document and get frequent updates as we gain experience and as typing continues to evolve. The example from @eerovaher would be a perfect thing to quickly add to a typing hints guide without much friction. |
Perhaps we could add a page to Astropy's contributor guide? I'm planning to add a section or page to PlasmaPy's contributor guide on type check annotations and static type checking, which could potentially be adapted to Astropy's as well. The main reason why I was suggesting an APE was that I figured that this would be a big enough change that we'd need the coordinating committee to decide. But, if we could get by without an APE, that would be fantastic! |
I also wanted to say more about why I'm suggesting that a static type checker be added to Astropy's CI as the first step before moving towards any of the tiers of typing. 🤔 We have been adding annotations to functions in PlasmaPy during most of its development. We used type hints when possible, but had unit annotations in a lot of our functions (i.e., Long story short, we ended up with a code base that was ∼half annotated, but without a consistent way to check that the annotations are actually correct. Since starting to use mypy in the last few weeks 🥳, I've frequently had to correct pre-existing annotations after mypy found errors in them. If type hint annotations start being added to Astropy without having a consistent way to check that the annotations are correct, then it will probably be necessary to revisit those annotations again after static type checking has been enabled in CI. But that doesn't mean that the static type checker should be enabled globally. Rather it should be enabled gradually. I really like #12971 as a first step because it configured mypy to not check any files. We could have mypy ignore files until we start adding type hint annotations to them, at which point we could enable mypy for those files and use it to check that the type hints we add are actually correct. That way, it's less likely that we'd have to revisit those files later. |
https://cosmology.readthedocs.io/projects/api/en/latest/introduction.html for an introduction to typing that focuses on structural sub-typing. |
@namurphy - can you clarify what it means to check that the annotations are "correct"? That seems like a question that can have many different answers. I have no experience with static type checking so I need some help here, but maybe you mean "passes mypy" with some level of checking enabled. This is actually an important point to establish in this discussion of "to what level should astropy by typed". Are we going to require that new type annotations pass some static type check analysis prior to being committed? I suspect this would be a show-stopper for an effort to incrementally add type annotations to astropy. Requiring an entire file to be "correct" means that adding type annotations cannot be done on a few functions at a time. Knowing that getting all 4200 lines of Footnotes
|
About the APE vs. contributor guide -- if we are talking about a new policy that adding type annotations goes hand-in-hand with enforcing some level of static type checking, that would definitely require an APE (which would certainly be controversial). Just starting to add annotations can be done any time at the discretion of package maintainers. |
Although I agree that there exist tools that can be helpful for adding type annotations, I think at this point it might be best if any interested developers run such tools locally. This can lead to different developers using different tools and settings, but if we ever get to a point where we might want to add a type checking CI job then it could be for the better because then we could have a more informed discussion about which tools and settings the CI job should use. |
@nstarman — that looks like an awesome resource! @taldcroft and @eerovaher — thank you for continuing to bring up important points!
Operationally, that's pretty much exactly what I mean. It's worth taking a look at the mypy error codes enabled by default and the error codes for optional checks.
The approach I've been taking in PlasmaPy has been to:
For Astropy, I'd suggest enabling mypy only for a single subpackage or module at first to make the process even more gradual. I've also been using tools like autotyping to automagically add type hints. The |
Hi! Would someone kindly give me a brief rundown of what an APE is? I looked on google, but I didn't get definitive answers. |
Astropy Proposal for Enhancement (https://github.com/astropy/astropy-APEs) |
OK. We should write an APE regardless, as a Plan B. Bc if we go the wiki route, there is a non-zero chance that someone will want an APE. |
See https://lwn.net/Articles/958326/ "Growing pains for typing in Python" for a summary of the discussion on typing within python itself. Let's ensure that /the focus should be on user experience and not "making programming harder for humans in order to make it easier for IDEs"/. |
Thank you for posting this article!
Looks like that article is behind a paywall at the moment, but will become freely available on Jan 25.
A flip side of this is that making it easier for IDEs can turn into making things easier for humans. There have been a few times recently where type hints helped me find errors as I was writing code. For example, PyCharm highlighted when I was calling a method that only existed on some of the types in the type hints, so I was able to fix it in the moment without having to run tests or mypy. With all that said, my biggest worry about type hints is adding a barrier to entry to new contributors, so I'm particularly curious about what the article says. |
I just started a PR for an APE on the process for gradually enabling static type checking to Astropy. If anyone wants to work on this too, please let me know! It's still in the early stages, and I want to make sure that the APE encompasses the important concerns and considerations brought up in this incredibly helpful discussion. |
Not sure whether discussion should now move to the APE, but it is always great to see actual examples, and while looking at #15914 (which introduces typing in
I'd strongly suggest that we agree in advance that this kind of stuff does not belong with the regular code, but has to be delegated to a More generally, I think this reinforces @taldcroft's suggestion to start with adding type information to public functions, and not worry about getting mypy to pass, etc. Indeed, I think it might be wise not to put any checkers in CI for quite a while, to avoid getting drawn in to over-typing -- some of the |
Actually, that bit about the CI is an implementation detail, which is probably best discussed with the APE. But the more general point is that I really want to avoid the tools dictating us. |
What is the question that needs to be resolved?
A bit of typing has entered astropy, with some sub-packages being substantially further along than others. We also have had a few bug reports pointing out that type information is missing. It may be good to decide package-wide how we want to approach this. Options:
.pyi
files (as numpy has done). This would mean only those who care about typing would do the work, but that only well set-up editing environments have this information readily available.Personally, I have no particular preference, though I think I'd prefer to spend time reviewing bug fixes/enhancements (though I'd change my mind if there were, e.g., clear performance improvements to be had, as @nstarman suggested might be possible with
mypy
). So, I'd like a scheme where it is clear who is responsible for adding and maintaining the typing information (which may be easier with option 1!).Describe the desired outcome
Some kind of decision on whether to add typing information as we go, and where to add it. If it is effectively option 1, then I think we are done, otherwise an APE may be needed.
The text was updated successfully, but these errors were encountered: