Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regen() duplicates any non-list and non-tuple sequence #6786

Closed
10 of 18 tasks
Suor opened this issue May 26, 2021 · 2 comments · Fixed by #6789
Closed
10 of 18 tasks

regen() duplicates any non-list and non-tuple sequence #6786

Suor opened this issue May 26, 2021 · 2 comments · Fixed by #6789

Comments

@Suor
Copy link

Suor commented May 26, 2021

Checklist

  • I have verified that the issue exists against the master branch of Celery.
  • This has already been asked to the discussion group first.
  • I have read the relevant section in the
    contribution guide
    on reporting bugs.
  • I have checked the issues list
    for similar or identical bug reports.
  • I have checked the pull requests list
    for existing proposed fixes.
  • I have checked the commit log
    to find out if the bug was already fixed in the master branch.
  • I have included all related issues and possible duplicate issues
    in this issue (If there are none, check this box anyway).

Mandatory Debugging Information

  • I have included the output of celery -A proj report in the issue.
    (if you are not able to do this, then at least specify the Celery
    version affected).
  • I have verified that the issue exists against the master branch of Celery.
  • I have included the contents of pip freeze in the issue.
  • I have included all the versions of all the external dependencies required
    to reproduce this bug.

Optional Debugging Information

  • I have tried reproducing the issue on more than one Python version
    and/or implementation.
  • I have tried reproducing the issue on more than one message broker and/or
    result backend.
  • I have tried reproducing the issue on more than one version of the message
    broker and/or result backend.
  • I have tried reproducing the issue on more than one operating system.
  • I have tried reproducing the issue on more than one workers pool.
  • I have tried reproducing the issue with autoscaling, retries,
    ETA/Countdown & rate limits disabled.
  • I have tried reproducing the issue after downgrading
    and/or upgrading Celery and its dependencies.

Related Issues and Possible Duplicates

Related Issues

  • None

Possible Duplicates

  • None

Environment & Settings

Celery version:
5.1.0 (sun-harmonics)

celery report Output:

Steps to Reproduce

In [1]: from celery.utils.functional import regen
In [2]: from collections import deque
In [3]: l = regen(deque([1]))
In [4]: l
Out[4]: [1]
In [5]: l
Out[5]: [1, 1]
In [6]: l
Out[6]: [1, 1, 1]

When using .map() with non-list, non-tuple sequence passed all the items are processed 3 times:

@shared_task
def print_task(i):
    print(i)

# Print out 42 three times, can also use Django Queryset or any other Sequence
print_task.map(deque([42])).delay()

Required Dependencies

No dependencies

  • Minimal Python Version: checked 3.8.2 and 3.9.0+
  • Minimal Celery Version: checked 4.4.2, 4.4.7 and 5.10
  • Minimal Kombu Version: N/A or Unknown
  • Minimal Broker Version: N/A or Unknown
  • Minimal Result Backend Version: N/A or Unknown
  • Minimal OS and/or Kernel Version: N/A or Unknown
  • Minimal Broker Client Version: N/A or Unknown
  • Minimal Result Backend Client Version: N/A or Unknown

Python Packages

pip freeze Output:

amqp==5.0.6
backcall==0.2.0
billiard==3.6.4.0
-e git+git@github.com:celery/celery.git@025bad6e93087414b3ddc288060c367d1937774b#egg=celery
click==7.1.2
click-didyoumean==0.0.3
click-plugins==1.1.1
click-repl==0.1.6
decorator==5.0.9
ipython==7.23.1
ipython-genutils==0.2.0
jedi==0.18.0
kombu==5.1.0
matplotlib-inline==0.1.2
parso==0.8.2
pexpect==4.8.0
pickleshare==0.7.5
prompt-toolkit==3.0.18
ptyprocess==0.7.0
Pygments==2.9.0
pytz==2021.1
six==1.16.0
traitlets==5.0.5
vine==5.0.0
wcwidth==0.2.5

Other Dependencies

N/A

Minimally Reproducible Test Case

In [1]: from celery.utils.functional import regen
In [2]: from collections import deque
In [3]: l = regen(deque([1]))
In [4]: l
Out[4]: [1]
In [5]: l
Out[5]: [1, 1]
In [6]: l
Out[6]: [1, 1, 1]

Expected Behavior

Expected for regen() to not duplicate entires and for .map() to run each item once.

Actual Behavior

See above.

@open-collective-bot
Copy link

Hey @Suor 👋,
Thank you for opening an issue. We will get back to you as soon as we can.
Also, check out our Open Collective and consider backing us - every little helps!

We also offer priority support for our sponsors.
If you require immediate assistance please consider sponsoring us.

@maybe-sybr
Copy link
Contributor

Quick bisect suggest this has been broken since 998277302 when regen was first rewritten to make it lazy. We duplicate the elements because we don't set self.__done = True when we concretise the regen() instance when .data is accessed for something like an equaility check or repr() in the ipython example above.

I've got a bit of random time so I'll put up a PR for this shortly

@thedrow thedrow assigned maybe-sybr and unassigned xirdneh May 30, 2021
maybe-sybr added a commit that referenced this issue Jun 1, 2021
maybe-sybr added a commit that referenced this issue Jun 3, 2021
auvipy pushed a commit that referenced this issue Jun 14, 2021
* fix: `regen.data` property now marks self as done

Fixes: #6786

* improv: Don't concretise regen on `repr()`

This ensures that the generator remains lazy if it's passed to `repr()`,
e.g. for logging or something.

* test: Add failing test for regen duping on errors

* refac: Remove unnecessary try in `regen.data`
thedrow pushed a commit that referenced this issue Jun 27, 2021
* fix: `regen.data` property now marks self as done

Fixes: #6786

* improv: Don't concretise regen on `repr()`

This ensures that the generator remains lazy if it's passed to `repr()`,
e.g. for logging or something.

* test: Add failing test for regen duping on errors

* refac: Remove unnecessary try in `regen.data`
jeyrce pushed a commit to jeyrce/celery that referenced this issue Aug 25, 2021
…ry#6789)

* fix: `regen.data` property now marks self as done

Fixes: celery#6786

* improv: Don't concretise regen on `repr()`

This ensures that the generator remains lazy if it's passed to `repr()`,
e.g. for logging or something.

* test: Add failing test for regen duping on errors

* refac: Remove unnecessary try in `regen.data`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants