Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace "import *"; define __all__; no implicit relative imports #17

Closed
wants to merge 2 commits into from

Conversation

tobych
Copy link
Contributor

@tobych tobych commented May 24, 2020

Using explicit, individual imports instead of "from X import *" makes it
clear to contributors (and others reading the code) which module
each name is being imported from, without relying on an IDE to tell you.

None of the imports in __init__.py are now being relied on, in the
source. The unit tests still rely on them. Ideally they wouldn't,
as this would make the unit tests similarly clearer, but I don't want
to touch the unit tests in this commit, or really in this PR at all.

Defining __all__ in each module makes it explicit what names are
intended to be used by client modules. Another benefit is that your
IDE can warn you when importing something that isn't necessarily
a sensible thing to import.

Defining __all__ also, incidentally, means we could use "import *"
in the __init__.py for most of the modules. But I didn't want to touch
__init__.py in this PR.

I've also made all relative imports explicit (by making sure there's
a . in front of the module name). Implicit relative imports are
deprecated in Python 3. Using them can get you into all sorts of messes.

Finally, I've used PyCharm's "Optimize Imports" feature to remove a few
redundant imports and to get those that remain into alphabetical order.
I find that useful and I reckon you will too.

@tobych
Copy link
Contributor Author

tobych commented May 24, 2020

The 2 flake8 CI test failures are as expected. I saw these problems but didn't want to fix them in this PR. I'll...

  • Create separate PR's for these

@alexcrichton
Copy link
Member

Out of curiosity is this an idiomatic thing to do? It seems like it's a bit of a bummer to have to redundantly state what's imported for each module?

@whitequark
Copy link
Contributor

Out of curiosity is this an idiomatic thing to do?

Some Python programmers insist that it's the only reasonable approach, but that is by no means an universally agreed upon truth. There are downsides for it as you mention and I would suggest that you choose based on personal preference and experience rather than perceived legitimacy as an idiom.

@alexcrichton
Copy link
Member

Ah ok, thanks for the clarification! I'd personally like to at least ensure that we're not accidentally exporting private internals (like setter_property), so I think __init__.py is going to always need __all__ defined, and if we're defining it there I'm not sure if we're gaining much defining it everywhere else too?

@whitequark
Copy link
Contributor

whitequark commented May 26, 2020

I'd personally like to at least ensure that we're not accidentally exporting private internals (like setter_property)

Unfortunately you can't really do that in Python idiomatically, it's one of the things I consider a significant language flaw. __all__ only affects glob imports; if you have a function defined in a module you can always import it with an explicit import statement. Basically the toplevel lexical scope of a module is both the scope of globals (including anything imported into that module) and scope for exports and you can't separate the two, by design.

The unidiomatic fix to this is to wrap your entire module into a function that hides the imports into its lexical scope but literally no one actually does that.

@tobych
Copy link
Contributor Author

tobych commented May 26, 2020

@whitequark, I like your style. "... but literally no one actually does that." Made me laugh. :-)

This is indeed a big ol' PR from someone you'd never heard of until last week isn't it: I'm not surprised it wasn't merged straight away. It's been fun to see my other smaller commits get merged in, especially the README one!

Over the weekend I gradually came to terms with the fact that I would probably need to make a case for this one, and that this would be a good thing to have to do.

So... I'm very sure that what I'm suggesting has big benefits and few costs. But I clearly need to make a case, and I might be wrong. This is my first open source project for a while and at work I'm used to being able to pick up the phone or get on call: doing this by writing messages is hard to do without failing to get anywhere in a reasonable time, or failing to convince anyone, or very occasionally, failing to be thought a decent person. @whitequark I can see you've a lot of experience (probably more than @alexcrichton) with Python (I've done a bit of work trying to figure out who's who on this project and in WebAssembly world). Would you be up for a phone call or something? Or is that just not a thing in open source world?

Meanwhile I'll write a little more here. Because maybe I can convince, or learn, in less time than I was thinking. And it'll be good practice anyway and no doubt useful for me to try writing about something technical I feel strongly about.

  • all does indeed only affect glob (*) imports at run-time
  • Though that's not why I'm suggesting this, actually.
  • For me, it's not about runtime at all. It's about expressing the intent of the code. About what I can tell from reading the code, and moreover what my IDE (PyCharm) can tell me about the code, and about what might be a bad idea. I rely on this a lot, along with unit tests of course, given the usual problems with a dynamically-typed language, and Python's habit of not really letting you hide anything at all.
  • Specifically, PyCharm (and no doubt other tools) will tell me (as I'm merrily breaking things, and/or when I'm doing a package-wide code check) when I'm importing something into a module that isn't included in its __all__ (it shows me squiggly lines under the name of the name I'm importing). Assuming the __all__ is right, that usually means I probably shouldn't be importing it.
  • I think that's a huge gain for anyone reading the code (to respond to @alexcrichton's "I'm not sure if we're gaining much defining it everywhere else too?"). One of many gains you get when working with Python in an IDE, but only if the code is helping the IDE out.
  • Another benefit is that it'll be clear what modules are depending on other modules. I don't think it's possible to do that with import * being used everywhere. When I first looked at the source for the module, the first thing that struck me was that there weren't any clear layers, of groups of modules dending on each other. And that there were probably too many modules in the first place. The fact that WasmError couldn't be imported anywhere without causing circular dependencies was a bit of an indication of the problem. When there's a lot of interdependencies between modules (some of them circular) (lots of coupling), it's usually worth considering either bringing all the code together into one module (so more cohesion), or creating subpackages that can act as layers. I wanted to get in there and start moving things around, but couldn't even begin to think about that until it was clearer who was importing what from where.

That's all for now! I hope this makes sense, regardless of its ability to convince.

@whitequark
Copy link
Contributor

whitequark commented May 26, 2020

Would you be up for a phone call or something? Or is that just not a thing in open source world?

For personal reasons I am not able to do phone calls. But I've heard that some people working on OSS do them, so don't take this as an universal.

But I clearly need to make a case, and I might be wrong.

To be clear, I am not rejecting this PR (or proposing that it be). I understand that a lot of people have very strong opinions on how code should be structured in Python and ultimately a lot of this is a personal call. That's fine though and you don't need to convince me otherwise as I'm not blocking anything! In the projects I maintain I would not accept it, but I am not the maintainer of wasmtime-py and this is not my call at all.

All I'm doing here is providing @alexcrichton with the context: namely that the approach here is not quite as overwhelmingly prevalent as some other things we discussed in previous PRs, and that I believe it should be considered on its own technical merits rather than simply because it is the idiom to use. I'm sure he'll consider your points fairly and do something that's right for wasmtime-py.

@tobych
Copy link
Contributor Author

tobych commented May 26, 2020

Hi @whitequark. Thanks, nice! Okay, message received about phone calls. I did realize you're not "rejecting" the PR, and I know you're a contributor and not a committer. But I've assumed that your opinion is likely to be trusted by @alexcrichton more than mine, so figured I'd try to convince you, and if that fails, Alex. Your comment has shown me I should be explicit about my intent here, just as I try to be in code. :-)

Also I really liked "There are downsides for it as you mention and I would suggest that you choose based on personal preference and experience rather than perceived legitimacy as an idiom." and wanted to address some of the upsides and share my experience with @alexcrichton (and you too) to help him decide.

If you don't mind, and have a moment, what are the downsides, from your point of view? One is no doubt having to figure out where each name is. If you've an IDE, it'll help you do that in milliseconds, of course. And out of interest, for context, what IDE do you use? I'm beginning to think my reliance on PyCharm might make me sounds like a bit of a zealot, or like I'm all about enabling PyCharm's squiggling habit.

@whitequark
Copy link
Contributor

And out of interest, for context, what IDE do you use?

I don't. I use Sublime Text 3. That makes maintaining import lists manual work, which essentially causes issues during testing (because a name might not be available), and during PR submission (because I have to remove unused names).

In my experience figuring out where a name comes from is not something that is a significant obstacle, even in projects I'm unfamiliar with, provided that the project does not have the same name in many different namespaces. In general I think that it is a good idea to avoid having identical names in multiple namespaces (with some exceptions that don't apply to wasmtime, e.g. sometimes it makes sense to have identical names that are always qualified when used), because you are not always able to rely on an IDE. You can't use an IDE when you're reviewing a PR on GitHub!

On the other hand ST3, GitHub, and many other editors (maybe even ctags?) provide a "jump to definition" feature, which makes it a lot less important to be able to see at a glance where an (unique) imported name comes from just by looking at the source. Read-only indexing is a lot easier to add to an editor than a feature that rewrites source code so I think it is reasonable to expect such a feature to be widely available.

@tobych
Copy link
Contributor Author

tobych commented May 26, 2020

Interesting. And a good data point. I'm usually kinda "What! No IDE!" so its good to see another data point. I guess if I was reviewing a big PR, which I've never done, I might pull the commit into my working copy and review it using the IDE. Perhaps by "not always able to rely on an IDE" you're thinking of situations where you're just looking at GitHub and don't even have your development environment to hand.

@whitequark
Copy link
Contributor

Perhaps by "not always able to rely on an IDE" you're thinking of situations where you're just looking at GitHub and don't even have your development environment to hand.

I don't normally check out the source code of a PR I am reviewing, I'm using just the GitHub interface for that. In my experience this is a farily common approach since it'd be laborous to track down the right line to comment on when you want to raise some concern.

@alexcrichton
Copy link
Member

Thanks for the background provided here @tobych!

I am indeed trying to boot up on all the context here because I'm no Python wizard myself. To clarify something I'm wondering about, do the red squigglies in your IDE come from the usage of from x import *? Was I perhaps a bit too eager initially writing * imports with all this? Would this be fixed by just removing * imports entirely?

@tobych
Copy link
Contributor Author

tobych commented May 26, 2020

You're welcome. Glad it helped.

The (yellow) squigglies show if you're importing a name from a module with an __all__ defined, but the name isn't included in the __all__.

image

I want to see those squigglies if I'm importing something I shouldn't be. I don't see them in wasmtime-py because, as you say, you were quite eager writing all those import *. So I'm importing kinda all the things all of time.

Adding the __all__ definitions could be seen as a separate task to removing the import *, I suppose. I've been conflating two mostly-separate issues, I guess. I'm happy to split that work out into a separate PR, as I've perhaps been muddying the waters a little.

@alexcrichton
Copy link
Member

Oh it's definitely a bug we should fix if any names are imported from wasmtime that aren't listed in the __all__ array, but are there other classes of warnings that your IDE is warning you about? Could this PR perhaps be slimmed down to just fix the warnings in your IDE?

@tobych
Copy link
Contributor Author

tobych commented May 27, 2020

My motivation for this work is not because there are bugs in the package as it stands. It's about being able to change the code, perhaps refactoring it, with confidence. I see potential for improving the architecture of the package, but wouldn't want to move forward without sorting out the imports problems.

What I'll do next is separate out the work into three commits, ordered as follows:

  • Define __all__ for all modules
  • Replace all import * with explicit imports

This is the order I'd typically do this work on a codebase.

The commits will also, it seems, be ordered by controversiality.

Defining `__all__` in each module makes it explicit what names are
intended to be used by client modules. An "import *" from a module
will only import names that included in `__all__`, if it's defined.

IDEs can warn you if you try to explicitly import something not
included in a module's `__all__`.
Using explicit, individual imports instead of "from X import *" makes it
clear to contributors (and others reading the code) which module
each name is being imported from, without relying on an IDE to tell you.

None of the imports in __init__.py are now being relied on, in the
source. The unit tests still rely on them. Ideally they wouldn't,
as this would make the unit tests similarly clearer, but I don't want
to touch the unit tests in this commit, or really in this PR at all.
@tobych
Copy link
Contributor Author

tobych commented May 28, 2020

@alexcrichton Okay, I've split this PR into two commits, as planned. I'm envisaging you merging the __all__ without much fanfare, then unless you've had a sudden change of heart, perhaps talking more about the import * commit. Hope it all makes sense!

@alexcrichton
Copy link
Member

Sorry I'm still a bit confused. To be clear I would prefer to not list everything in __all__ multiple times, that seems like a lot of overhead that isn't worth it for these internal modules. I think the only reason I did it for __init__.py was to appease pdoc, but it doesn't even have the semantics I thought (I thought it locked the exports of the modules to just that set, turns out it only affects * imports).

I've been trying to figure out why this change was motivated in the discussion here. It sounded related to your IDE but I still don't feel like I've pinned down exactly why? I'm sort of confused...

In any case for any * imports if you'd prefer it seems fine to switch them out for imports of particular items. If it imports more than like 10 items though it seems reasonable to leave it as a * import. My hope though is that for these "internal" modules it's just an organization of the package so there's not too much overhead in managing them?

@alexcrichton alexcrichton changed the base branch from master to main June 16, 2020 19:31
@alexcrichton
Copy link
Member

It's been quite some time since this was opened so I'm going to close this, but if it's rebased we can continue the discussion!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants