Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Basic Django config files break upgrading from 4.4.6 to 4.4.7 #6285

Closed
chrisconlan opened this issue Aug 4, 2020 · 31 comments
Closed

Basic Django config files break upgrading from 4.4.6 to 4.4.7 #6285

chrisconlan opened this issue Aug 4, 2020 · 31 comments

Comments

@chrisconlan
Copy link

The basic tutorial setup outlined here: https://docs.celeryproject.org/en/latest/django/first-steps-with-django.html now fails to locate set the broker or CELERY_BROKER_URL parameter. Celery just ignores the settings and defaults to searching for a host at amqp://guest:**@127.0.0.1:5672//:, which is not my host of choice.

I'm ignoring the issues template because this is a very basic regression that the last committor can fix quickly.

@pg1671
Copy link

pg1671 commented Aug 4, 2020

4.4.7 broke my django site. Just reverted to 4.4.6.

Did not investigate further but jobs were not being processed. Assume it is related to this issue.

@rdmrocha
Copy link

rdmrocha commented Aug 4, 2020

Another confirmation for this issue. Django3 project stoped being able to connect by ignoring settings.
Going back to 4.4.6 fixed it.

@yoch
Copy link

yoch commented Aug 4, 2020

Same issue for me (no Django, but settings was ignored as well).

@A4TIF
Copy link

A4TIF commented Aug 4, 2020

Same here. Reverting back to 4.4.6

@thedrow
Copy link
Member

thedrow commented Aug 4, 2020

@auvipy Please address this.
Did we do something I'm not aware of?

@dratchkov
Copy link

The symptom in my case is that messages are received, but rejected because the called task is not registered.
What was really confusing is that 'flower', sees the list of tasks.

But yes, 4.4.6 resolves this.

@nkuttler
Copy link

nkuttler commented Aug 5, 2020

The pre 4.0 style config doesn't seem to be supported any more in 4.4.7, despite the docs saying otherwise. We also reverted.

@gargantuanprism
Copy link

confirmed, but not with django. if i have env vars set in my shell, then run celery worker interactively, it's fine. as soon as i add --detach, it seems to ignore my broker URL env var. downgraded to 4.4.6 and it works again.

@fnk0c
Copy link

fnk0c commented Aug 6, 2020

Same here. Whenever I add -D it simply ignores configuration file. Also when running celery worker --help it ignores the --help and starts Celery.

@auvipy
Copy link
Member

auvipy commented Aug 7, 2020

@auvipy Please address this.
Did we do something I'm not aware of?

I dont think so but will look into it on next Monday Hopefully.

@auvipy
Copy link
Member

auvipy commented Aug 7, 2020

The basic tutorial setup outlined here: https://docs.celeryproject.org/en/latest/django/first-steps-with-django.html now fails to locate set the broker or CELERY_BROKER_URL parameter. Celery just ignores the settings and defaults to searching for a host at amqp://guest:**@127.0.0.1:5672//:, which is not my host of choice.

I'm ignoring the issues template because this is a very basic regression that the last committor can fix quickly.

Try to check the commit logs to see which commit could cause the apparent regression

nickdibari added a commit to Moody-Tunes/moodytunes that referenced this issue Aug 9, 2020
There is an issue with Celery setting the BROKER_URL variable
and instead uses the default ampq broker URL. This causes problems
with running Celery so we'll delay upgrading it for now. See
 celery/celery#6285 for context.
@nickdibari
Copy link

As a starting point, perhaps #6223 is somehow related? Looking at the changelog from 4.4.6 -> 4.4.7 this is the one change that is related to argument parsing. This totally a guess on my part but it seems like a good place to start at least.

nickdibari added a commit to Moody-Tunes/moodytunes that referenced this issue Aug 10, 2020
There is an issue with Celery setting the BROKER_URL variable
and instead uses the default ampq broker URL. This causes problems
with running Celery so we'll delay upgrading it for now. See
 celery/celery#6285 for context.
@ShaheedHaque
Copy link
Contributor

ShaheedHaque commented Aug 10, 2020

Unable to start as per other reports. After reverting to Celery 4.4.6:

2020-08-10 06:09:50,469 [INFO] MainProcess: Connected to amqp://<some username>:**@127.0.0.1:5672/celery
2020-08-10 06:09:50,495 [INFO] MainProcess: mingle: searching for neighbors
2020-08-10 06:09:51,527 [INFO] MainProcess: mingle: all alone
2020-08-10 06:09:51,546 [INFO] MainProcess: celery@ip-172-31-10-102 ready.

With Celery 4.4.7:

[2020-08-10 06:11:51,832: CRITICAL/MainProcess] Unrecoverable error: AccessRefused(403, 'ACCESS_REFUSED - Login was refused using authentication mechanism AMQPLAIN. For details see the broker logfile.', (0, 0), '')
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/celery/worker/worker.py", line 208, in start
    self.blueprint.start(self)
  File "/usr/local/lib/python3.8/dist-packages/celery/bootsteps.py", line 119, in start
    step.start(parent)
  File "/usr/local/lib/python3.8/dist-packages/celery/bootsteps.py", line 369, in start
    return self.obj.start()
  File "/usr/local/lib/python3.8/dist-packages/celery/worker/consumer/consumer.py", line 318, in start
    blueprint.start(self)
  File "/usr/local/lib/python3.8/dist-packages/celery/bootsteps.py", line 119, in start
    step.start(parent)
  File "/usr/local/lib/python3.8/dist-packages/celery/worker/consumer/connection.py", line 23, in start
    c.connection = c.connect()
  File "/usr/local/lib/python3.8/dist-packages/celery/worker/consumer/consumer.py", line 405, in connect
    conn = self.connection_for_read(heartbeat=self.amqheartbeat)
  File "/usr/local/lib/python3.8/dist-packages/celery/worker/consumer/consumer.py", line 411, in connection_for_read
    return self.ensure_connected(
  File "/usr/local/lib/python3.8/dist-packages/celery/worker/consumer/consumer.py", line 437, in ensure_connected
    conn = conn.ensure_connection(
  File "/usr/local/lib/python3.8/dist-packages/kombu/connection.py", line 389, in ensure_connection
    self._ensure_connection(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/kombu/connection.py", line 441, in _ensure_connection
    return retry_over_time(
  File "/usr/local/lib/python3.8/dist-packages/kombu/utils/functional.py", line 344, in retry_over_time
    return fun(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/kombu/connection.py", line 874, in _connection_factory
    self._connection = self._establish_connection()
  File "/usr/local/lib/python3.8/dist-packages/kombu/connection.py", line 809, in _establish_connection
    conn = self.transport.establish_connection()
  File "/usr/local/lib/python3.8/dist-packages/kombu/transport/pyamqp.py", line 130, in establish_connection
    conn.connect()
  File "/usr/local/lib/python3.8/dist-packages/amqp/connection.py", line 320, in connect
    self.drain_events(timeout=self.connect_timeout)
  File "/usr/local/lib/python3.8/dist-packages/amqp/connection.py", line 508, in drain_events
    while not self.blocking_read(timeout):
  File "/usr/local/lib/python3.8/dist-packages/amqp/connection.py", line 514, in blocking_read
    return self.on_inbound_frame(frame)
  File "/usr/local/lib/python3.8/dist-packages/amqp/method_framing.py", line 55, in on_frame
    callback(channel, method_sig, buf, None)
  File "/usr/local/lib/python3.8/dist-packages/amqp/connection.py", line 520, in on_inbound_method
    return self.channels[channel_id].dispatch_method(
  File "/usr/local/lib/python3.8/dist-packages/amqp/abstract_channel.py", line 145, in dispatch_method
    listener(*args)
  File "/usr/local/lib/python3.8/dist-packages/amqp/connection.py", line 650, in _on_close
    raise error_for_code(reply_code, reply_text,
amqp.exceptions.AccessRefused: (0, 0): (403) ACCESS_REFUSED - Login was refused using authentication mechanism AMQPLAIN. For details see the broker logfile.

And from the broker log file:

2020-08-10 06:11:51.830 [info] <0.993.0> accepting AMQP connection <0.993.0> (127.0.0.1:44638 -> 127.0.0.1:5672)
2020-08-10 06:11:51.832 [error] <0.993.0> Error on AMQP connection <0.993.0> (127.0.0.1:44638 -> 127.0.0.1:5672, state: starting):
AMQPLAIN login refused: user 'guest' - invalid credentials

Note that it is trying to use the guest account instead of our <some username>. So 4.4.7 is - as the original reporter has said - not reading the established configuration, but instead applying some defaults.

In case it is relevant, this is what our command line looks like according to "ps":

ubuntu   2036879       1  0 Aug06 ?        00:06:44 [celeryd: celery@ip-172-31-7-165:MainProcess] -active- (worker -A=<some username> -l=info --concurrency=2 --max-tasks-per-child=50 -O=fair --logfile=/someplace/celery.log --pidfile=/someplace/celery.pid)

and according to systemd:

ExecStart=/usr/local/bin/celery worker -A=<some username> --detach -l=info \
    --concurrency=2 \
    --max-tasks-per-child=50 \
    -O=fair \
    --pidfile=/someplace/%i.pid \
    --logfile=/someplace/%i.log

@auvipy
Copy link
Member

auvipy commented Aug 12, 2020

commit log celery/py-amqp@9e9f4c0

@diemuzi
Copy link
Contributor

diemuzi commented Aug 14, 2020

I too have a similar problem but I also noticed something I did not see posted here about this issue. When starting the multi worker I can see this running for example: /home/gwhcp/api.gwhcp.dev/venv/bin/python -m celery worker -Q 10.2.1.1 --loglevel=DEBUG --time-limit=300 --concurrency=2 --logfile=/var/log/celery/worker.log

This was supposed to also contain the -A application option. Downgrading to 4.4.6 fixes the entire problem for me and now shows: /home/gwhcp/api.gwhcp.dev/venv/bin/python -m celery worker -Q 10.2.1.1 -A application --loglevel=DEBUG --time-limit=300 --concurrency=2 --logfile=/var/log/celery/worker.log

With 4.4.7 the entire Django settings are completely ignored which is also why it is trying to connect as guest to the amqp server. I believe it may have something to do with this recent patch #6223

@auvipy
Copy link
Member

auvipy commented Aug 14, 2020

I too have a similar problem but I also noticed something I did not see posted here about this issue. When starting the multi worker I can see this running for example: /home/gwhcp/api.gwhcp.dev/venv/bin/python -m celery worker -Q 10.2.1.1 --loglevel=DEBUG --time-limit=300 --concurrency=2 --logfile=/var/log/celery/worker.log

This was supposed to also contain the -A application option. Downgrading to 4.4.6 fixes the entire problem for me and now shows: /home/gwhcp/api.gwhcp.dev/venv/bin/python -m celery worker -Q 10.2.1.1 -A application --loglevel=DEBUG --time-limit=300 --concurrency=2 --logfile=/var/log/celery/worker.log

With 4.4.7 the entire Django settings are completely ignored which is also why it is trying to connect as guest to the amqp server. I believe it may have something to do with this recent patch #6223

@JulienPalard would you mind checking this?

@thenewguy
Copy link

I have also had to revert to 4.4.6 on some projects using django with the redis broker. Oddly, there are no logs from celery on the projects when using 4.4.7 so I cannot be more specific other than all works well under 4.4.6.

@auvipy auvipy added this to the 4.5 milestone Aug 17, 2020
@garw
Copy link

garw commented Aug 17, 2020

I can confirm that commit e3ac73b (#6223) introduces the issue. Just reverting this makes celery multi behave in the old way and pass the -A option on to the worker. Otherwise v4.4.7 just doesn't pass the -A option to the worker and some default configuration seems to be loaded as consequence.

@auvipy
Copy link
Member

auvipy commented Aug 17, 2020

I can confirm that commit e3ac73b (#6223) introduces the issue. Just reverting this makes celery multi behave in the old way and pass the -A option on to the worker. Otherwise v4.4.7 just doesn't pass the -A option to the worker and some default configuration seems to be loaded as consequence.

if you can provide an integration test with a failing unit test it would be easier to find a fix rather then reverting that fix

@garw
Copy link

garw commented Aug 18, 2020

I can confirm that commit e3ac73b (#6223) introduces the issue. Just reverting this makes celery multi behave in the old way and pass the -A option on to the worker. Otherwise v4.4.7 just doesn't pass the -A option to the worker and some default configuration seems to be loaded as consequence.

if you can provide an integration test with a failing unit test it would be easier to find a fix rather then reverting that fix

See PR #6305

I took a deeper look into the situation. The problem seems to arise from the following sequence of events:
The celery command implementation CeleryCommand calls execute_from_commandline from Command in base.py. This in turn, calls setup_app_from_commandline and passes the return value of this function as arguments to handle_argv.

The behavior of setup_app_from_commandline has changed in #6223 so that it only returns those parameters that cannot be parsed already by parse_preload_options. Consequently, all global options are now gone from argv.

The actual implementation of CeleryCommand from this point onwards uses only the (already filtered) argv. In the case of multi, it just passes the (filtered) argv into the multi command that then instaniates a MultiTool and calls execute_from_commandline on this with the filtered commandline. It is also not connected to the instance of the CeleryCommand that previously read the application information somewhere. So at this point the information which application to start seems lost to me.

Sorry for not actually providing a fix but I feel a bit lost in the huge code base. If other commands than multi also need the global options still in argv, they will have a similar problem right now and it is probably easiest to change the behavior of setup_app_from_commandline back. This, however, seems to also reintroduce the issue to be fixed by #6223.
If multi is special, it might be possible to change the run_from_argv to pass the arguments closer to sys.argv. However, there was also previously some filtering be done to argv. Thus, I cannot really judge if this will then pass too many parameters.

The regression test that I provided in #6305 just checks if the parameter still ends up in run_from_argv of the multi command. So any solution approach that just feeds in the argument again from sys.argv there invalidates this test-case. It already makes a bit of an assumption how this can be solved. It just illustrates the problem. It passes in v4.4.6 and fails in v4.4.7.
If one changes setup_app_from_commandline back so that it does not filter the general arguments out, this test passes. However, then test_short_and_long_arguments_be_the_same fails.

This behavior seems to be also the origin of #6287. The argument parser in _parse_preload_options accepts --help as valid option and thus doesn't return it as unknown argument. Consequently, --help is also stripped and will not be passed on to the individual commands to be handled.

Additionally, adding a CELERY_APP environment variable provides a workaround for this issue on my end. Can be added to a systemd service file by adding Environment="CELERY_APP=myapp" line

@auvipy
Copy link
Member

auvipy commented Aug 19, 2020

I can confirm that commit e3ac73b (#6223) introduces the issue. Just reverting this makes celery multi behave in the old way and pass the -A option on to the worker. Otherwise v4.4.7 just doesn't pass the -A option to the worker and some default configuration seems to be loaded as consequence.

if you can provide an integration test with a failing unit test it would be easier to find a fix rather then reverting that fix

See PR #6305

I took a deeper look into the situation. The problem seems to arise from the following sequence of events:
The celery command implementation CeleryCommand calls execute_from_commandline from Command in base.py. This in turn, calls setup_app_from_commandline and passes the return value of this function as arguments to handle_argv.

The behavior of setup_app_from_commandline has changed in #6223 so that it only returns those parameters that cannot be parsed already by parse_preload_options. Consequently, all global options are now gone from argv.

The actual implementation of CeleryCommand from this point onwards uses only the (already filtered) argv. In the case of multi, it just passes the (filtered) argv into the multi command that then instaniates a MultiTool and calls execute_from_commandline on this with the filtered commandline. It is also not connected to the instance of the CeleryCommand that previously read the application information somewhere. So at this point the information which application to start seems lost to me.

Sorry for not actually providing a fix but I feel a bit lost in the huge code base. If other commands than multi also need the global options still in argv, they will have a similar problem right now and it is probably easiest to change the behavior of setup_app_from_commandline back. This, however, seems to also reintroduce the issue to be fixed by #6223.
If multi is special, it might be possible to change the run_from_argv to pass the arguments closer to sys.argv. However, there was also previously some filtering be done to argv. Thus, I cannot really judge if this will then pass too many parameters.

The regression test that I provided in #6305 just checks if the parameter still ends up in run_from_argv of the multi command. So any solution approach that just feeds in the argument again from sys.argv there invalidates this test-case. It already makes a bit of an assumption how this can be solved. It just illustrates the problem. It passes in v4.4.6 and fails in v4.4.7.
If one changes setup_app_from_commandline back so that it does not filter the general arguments out, this test passes. However, then test_short_and_long_arguments_be_the_same fails.

This behavior seems to be also the origin of #6287. The argument parser in _parse_preload_options accepts --help as valid option and thus doesn't return it as unknown argument. Consequently, --help is also stripped and will not be passed on to the individual commands to be handled.

Additionally, adding a CELERY_APP environment variable provides a workaround for this issue on my end. Can be added to a systemd service file by adding Environment="CELERY_APP=myapp" line

thanks a lot for the test and te detail observation. someone will provide a fix for sure.

@leonardoarroyo
Copy link

Additionally, adding a CELERY_APP environment variable provides a workaround for this issue on my end. Can be added to a systemd service file by adding Environment="CELERY_APP=myapp" line

Thanks for the workaround. This fixes the issue for me after upgrading from 4.4.6 to 4.4.7.

@JulienPalard
Copy link
Contributor

If other commands than multi also need the global options still in argv, they will have a similar problem right now and it is probably easiest to change the behavior of setup_app_from_commandline back.

I agree.

This, however, seems to also reintroduce the issue to be fixed by #6223.

Which is not a big deal. #4558 can be reopened. I don't think there's a clean fix: Celery argument parsing is just too complicated and does not have enough tests. I think the sane way is to deprecate odd usages (like giving subcommands arguments before subcommands), which will break a lot of things.

nijel added a commit to WeblateOrg/weblate that referenced this issue Sep 5, 2020
The command line processing is broken in Celery 4.4.7 release, see
celery/celery#6285

Fixes #4445
@ShaheedHaque
Copy link
Contributor

I suggest that the correct way to handle both #4558 while not causing this regression is separate the handling of global options from command-specific options. The standard argparse is perfectly happy to do this simply by declaring stuff in the natural order, i.e. with all the global options first, followed by the command verb, followed by command-specific options. For example, one of my CLIs does this (-v or --verbose is a global option):

$ ./deploy.py list -v
usage: deploy.py [-h] [-v] {archive,create,dbdump,delete,list,restore,upgrade,who} ...

while also doing this:

$ ./deploy.py -v list --help
usage: deploy.py list [-h]

optional arguments:
  -h, --help  show this help message and exit

$ ./deploy.py -v dbdump -h
usage: deploy.py dbdump [-h] [--old-prefix OLD_PREFIX] old-sha1 identity tgzfile
...

This suggests that simply by getting rid of the attempt to reorder options to after the verb would work just fine, provided the global options are set up first, and the commands handled using subparsers. OTOH, I have no idea if/how that might be made to work while using click.

@thedrow
Copy link
Member

thedrow commented Sep 10, 2020

@auvipy Any updates?

@auvipy
Copy link
Member

auvipy commented Sep 13, 2020

It is a regression, I know which commit cause this but still not getting time to revert that & create a new release

browniebroke added a commit to cookiecutter/cookiecutter-django that referenced this issue Sep 14, 2020
It looks like 4.4.7 has some issues with Redis broker:

celery/celery#6285
browniebroke added a commit to cookiecutter/cookiecutter-django that referenced this issue Sep 14, 2020
It looks like 4.4.7 has some issues with Redis broker:

celery/celery#6285
@bsolomon1124
Copy link

This is not a Django issue. As reported in #6370 it is reproducible with a 25-line module with no dependencies.

It may be helpful to create a pinned issue marking 4.4.7 as a bad release. This is a regression that truly breaks Celery and makes it unusable.

nokome added a commit to stencila/hub that referenced this issue Oct 13, 2020
Celery v4.4.7 introduced a regression in setting of configuration options which still seems to exist in v5.0.0.

See celery/celery#6285 and others.
@ogai
Copy link

ogai commented Oct 23, 2020

Additionally, adding a CELERY_APP environment variable provides a workaround for this issue on my end. Can be added to a systemd service file by adding Environment="CELERY_APP=myapp" line

In my case, it also fixed it but only after removing the parameter -A myapp

@sbnmanthey
Copy link

Does this Issue affects only 4.4.7 or also all versions above (Including 5.0+)?

jraddaoui added a commit to CCA-Public/scope that referenced this issue Nov 24, 2020
@bsolomon1124
Copy link

@mensachan according to #6370 (comment), this issue isn't present in 5.0.2+. (I haven't verified that, though.)

@auvipy
Copy link
Member

auvipy commented Sep 16, 2021

the issues doesn't exist in celery v5 as the CLI was completely rewritten.

@auvipy auvipy closed this as completed Sep 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests