Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add B031: Warn when using groupby() result multiple times #347

Merged
merged 1 commit into from
Feb 9, 2023

Conversation

malthejorgensen
Copy link
Contributor

I ran into a somewhat subtle bug by accidentally iterating over the result of itertools.groupby() twice:

from itertools import groupby

items = [
    ('lettuce', 'greens'),
    ('tomatoes', 'greens'),
    ('chicken breast', 'meats & fish'),
]

# Group by shopping section
for _section, section_items in groupby(items, key=lambda p: p[1]):
    for shopper in ['Jane', 'Joe']:
        collect_shop_items(shopper, section_items) # "Joe" won't get any items
        # assuming that `collect_shop_items()` iterates over `section_items`

I've written B031 (since there's a PR open for B030) to detect these types of erroneous usages.

There are likely other functions in itertools that return generators and should potentially be covered by this check.
Also, the check is fairly rudimentary. It only works if the groupby() is used in a loop and unpacked directly into a tuple as shown above.

Another subtle thing about groupby() is that it only groups when the items are sorted (but this doesn't cover that).

@cooperlees
Copy link
Collaborator

Man, I suck @ the web edits :|

@cooperlees
Copy link
Collaborator

Anyways, that aside, This is pretty standard behavior of generators or am I missing something here? I don't know if we want to cover every generator the standard library returns ... Thoughts there? If I'm missing something just say so ...

I think mypy should maybe warn on this if it does not ...

@malthejorgensen
Copy link
Contributor Author

malthejorgensen commented Feb 5, 2023

You're right, I had a very similar thought when making the PR. I made the PR anyways because of how surprised I was when I discovered the bug in my code, and how subtle the bug ended up being.

I wasn't able to formulate it then – but thinking about it now –I think the special thing about groupby is that it's a generator returning a generator (in a tuple).
E.g. for i in range(10) is used all the time and is a generator. But it's also hard to misuse in the same way since the generator it returns is "consumed" immediately on the same line where the function is called. The same is the case for most other stdlib functions returning generators (enumerate, itertools.takewhile, itertools.dropwhile and so on). The most common usage pattern for those will not cause any bugs.

Since groupby returns a "nested" generator that isn't consumed on the same line it's much easier to misuse.

@malthejorgensen
Copy link
Contributor Author

From a brief look at the built-in Python functions and itertools, the only other function that returns "nested" generators like this is itertools.tee. I'm not very familiar with that function so I'd have to think a bit about whether the code in this PR can apply directly to that function.

@malthejorgensen malthejorgensen changed the title Add B031: Warn when using groupby() result multiple time Add B031: Warn when using groupby() result multiple times Feb 5, 2023
@malthejorgensen
Copy link
Contributor Author

I rebased on main and did a git push --force so it should hopefully be cleaned up now (passes tests and black/isort locally).

@cooperlees
Copy link
Collaborator

cooperlees commented Feb 8, 2023

This seems fine to me and "cheap" to add, so I guess it can't hurt to add. I would like a second opinion tho. Have been waiting, but requested another maintainer :)

@Zac-HD
Copy link
Member

Zac-HD commented Feb 9, 2023

Thanks for your careful contribution, @malthejorgensen! Open source is made of this kind of generosity with time and skills, and we deeply appreciate it 😍

I haven't done a really deep review, but this looks fine to me - I think the strongest argument against would be along the lines of "we shouldn't add an incomplete check", and, well... basically every Python analysis is incomplete. Heck, most of the project is from people who had a bug, and said something like "never. again.!" so I find the patchwork nature kinda charming even besides several of the bugbears being mine.

So... passes the "improves on status quo" test for me, let's merge 🚀

@Zac-HD Zac-HD merged commit a7c7ac9 into PyCQA:main Feb 9, 2023
@malthejorgensen
Copy link
Contributor Author

Thank you! 🙏🏻

@eachimei
Copy link

FYI: #356

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants