Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure regen utility class gets marked as done when concretised #6789

Merged
merged 4 commits into from
Jun 14, 2021

Conversation

maybe-sybr
Copy link
Contributor

Description

Ensure regen() instances get marked as done when concretised via the
UserList.data property implementation. We also improve laziness of
regen.__repr__() so it reflects the fact that it's lazy.

Fixes: #6786

@maybe-sybr maybe-sybr changed the title Fix/regen data doesnt set done Ensure regen utility class gets marked as done when concretised May 28, 2021
@codecov
Copy link

codecov bot commented May 28, 2021

Codecov Report

Merging #6789 (d419012) into master (799f839) will increase coverage by 0.00%.
The diff coverage is 85.71%.

❗ Current head d419012 differs from pull request most recent head 64da3e8. Consider uploading reports for the commit 64da3e8 to get more accurate results
Impacted file tree graph

@@           Coverage Diff           @@
##           master    #6789   +/-   ##
=======================================
  Coverage   70.69%   70.70%           
=======================================
  Files         138      138           
  Lines       16602    16601    -1     
  Branches     2091     2093    +2     
=======================================
  Hits        11737    11737           
+ Misses       4669     4668    -1     
  Partials      196      196           
Flag Coverage Δ
unittests 70.70% <85.71%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
celery/utils/functional.py 70.51% <85.71%> (+0.44%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 799f839...64da3e8. Read the comment docs.

g = regen(die())
assert "..." in repr(g)

@pytest.mark.xfail(raises=StopIteration, reason="XXX NEEDS AN ISSUE")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is for an edge case where a repeatable iterator could raise an exception at some point before completion which would result in regen() stashing the elements it saw but never marking itself as done. Then the next time we yield elements from an iterator derived from the regen() instance, we'd yield the stashed elements, then repeat them from the underlying iterator if it's capable of giving them to us again.

IDK if this is something we can even bother handling. If the underlying iterator explodes, we could choose to mark the regen instance as __done and just give up on attempting to get more elements? If we want to go down that route, I can roll a change into this PR since it'd be pretty simple and still thematically relevant. If we think we need to do something more involved or optimistic, I'll make a follow up issue for discussion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made #6794 to track the discussion on this point. Could just be a thing we need to document, happy to have this PR merged without addressing that issue yet.

@lgtm-com
Copy link

lgtm-com bot commented May 28, 2021

This pull request fixes 1 alert when merging 42685f2 into c93371d - view on LGTM.com

fixed alerts:

  • 1 for Non-exception in 'except' clause

@maybe-sybr
Copy link
Contributor Author

Why do we have except StopIteration here at all? list() should handle that.

Good question. That was added in 9982773 and looks like it could have been deliberate, but I have no idea why that might be the case. AFAICT you'd have to be raising StopIteration from inside a generator (which is wrong) to need something like that :/ A quick sanity check of list(iter(range(5))) works without raising and, as expected, spits out StopIteration when I next() the iterator manually. Not sure what the point is there.

Removing the try: except StopIteration: doesn't seem to break any unit or integration tests (against redis), so I'll amend the diff and re-push shortly.

@maybe-sybr maybe-sybr marked this pull request as ready for review June 1, 2021 00:12
@lgtm-com
Copy link

lgtm-com bot commented Jun 1, 2021

This pull request introduces 2 alerts when merging 9852391 into b0ebc3b - view on LGTM.com

new alerts:

  • 1 for Non-exception in 'except' clause
  • 1 for Module is imported with 'import' and 'import from'

celery/utils/functional.py Outdated Show resolved Hide resolved
@Suor
Copy link

Suor commented Jun 1, 2021

AFAICT you'd have to be raising StopIteration from inside a generator (which is wrong) to need something like that

Since Python 3.7 raising StopIteration inside generator results in a RuntimeError.

@thedrow
Copy link
Member

thedrow commented Jun 1, 2021

AFAICT you'd have to be raising StopIteration from inside a generator (which is wrong) to need something like that

Since Python 3.7 raising StopIteration inside generator results in a RuntimeError.

This is probably an oversight we made during the Python 3 only migration in 5.0.
Thanks for noticing 😄

@maybe-sybr maybe-sybr force-pushed the fix/regen-data-doesnt-set-done branch 2 times, most recently from d419012 to 64da3e8 Compare June 3, 2021 00:31
@maybe-sybr
Copy link
Contributor Author

maybe-sybr commented Jun 3, 2021

I just pushed a change which removed what I thought was a pointless iter(self) but I now realise that it was to ensure that the __done attribute gets set when the underlying iterator is completed without having to do hard stuff in the __getitem__() method. I've stripped that commit out, but I still think it's wasteful to spend time iterating on the (potentially many) initial elements of ourself just to get some subsequent element from the underlying iterator.

Edit: I spent some more time on this and have an improved implementation but some of the tests aren't playing nice with it. For the moment, I'm going to call this changeset done and ready for a review prior to merge

@lgtm-com
Copy link

lgtm-com bot commented Jun 3, 2021

This pull request introduces 1 alert and fixes 1 when merging 64da3e8 into 799f839 - view on LGTM.com

new alerts:

  • 1 for Non-exception in 'except' clause

fixed alerts:

  • 1 for Module is imported with 'import' and 'import from'

@Suor
Copy link

Suor commented Jun 3, 2021

I just pushed a change which removed what I thought was a pointless iter(self) but I now realise that it was to ensure that the __done attribute gets set when the underlying iterator is completed without having to do hard stuff in the getitem() method. I've stripped that commit out, but I still think it's wasteful to spend time iterating on the (potentially many) initial elements of ourself just to get some subsequent element from the underlying iterator.

I wonder how much use that .__getitem__() call gets. At least for .map() and friends we simply serialize, i.e. convert to list the thing eventually, we only need regen() to not consume immediately, since a signature might never get sent.

@maybe-sybr
Copy link
Contributor Author

I wonder how much use that .__getitem__() call gets. At least for .map() and friends we simply serialize, i.e. convert to list the thing eventually, we only need regen() to not consume immediately, since a signature might never get sent.

I'd guess the only time it really gets called is to access the 0th or -1th item which are used for various things in the canvas code. Otherwise I'd expect it to be mostly simple for x in some_regen_thing: usage.

Copy link
Member

@auvipy auvipy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's merge it after some 5.1.x point releases

@thedrow
Copy link
Member

thedrow commented Jun 13, 2021

let's merge it after some 5.1.x point releases

I think we can include it in 5.1.1 unless there's a special reason not to do so that I missed.

@auvipy auvipy merged commit 536849c into master Jun 14, 2021
@auvipy
Copy link
Member

auvipy commented Jun 14, 2021

OK. merged as bugfix

@auvipy auvipy removed this from the 5.2 milestone Jun 14, 2021
@thedrow thedrow deleted the fix/regen-data-doesnt-set-done branch June 14, 2021 08:44
@auvipy auvipy added this to the 5.1.x milestone Jun 15, 2021
thedrow pushed a commit that referenced this pull request Jun 27, 2021
* fix: `regen.data` property now marks self as done

Fixes: #6786

* improv: Don't concretise regen on `repr()`

This ensures that the generator remains lazy if it's passed to `repr()`,
e.g. for logging or something.

* test: Add failing test for regen duping on errors

* refac: Remove unnecessary try in `regen.data`
jeyrce pushed a commit to jeyrce/celery that referenced this pull request Aug 25, 2021
…ry#6789)

* fix: `regen.data` property now marks self as done

Fixes: celery#6786

* improv: Don't concretise regen on `repr()`

This ensures that the generator remains lazy if it's passed to `repr()`,
e.g. for logging or something.

* test: Add failing test for regen duping on errors

* refac: Remove unnecessary try in `regen.data`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

regen() duplicates any non-list and non-tuple sequence
4 participants