Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exception while fetching queue length #143

Closed
abellotti opened this issue Jul 22, 2022 · 8 comments · Fixed by #144
Closed

Exception while fetching queue length #143

abellotti opened this issue Jul 22, 2022 · 8 comments · Fixed by #144

Comments

@abellotti
Copy link
Contributor

abellotti commented Jul 22, 2022

Creating this issue so I don't lose track of it.

I was giving the latest 0.4.0 build a try as we could use the new celery_queue_length metrics. However, for our use, I'm getting an exception for each of the queues we have and those counters are staying at 0.

Our usage is:

python cli.py --port 8000 --broker-url "redis://...:6379" --retry-interval 5 --log-level INFO \ 
                       --broker-transport-option global_keyprefix=aab-

In the output, we get an error like the following for each queue:

2022-07-22 20:38:18.033 | ERROR    | src.exporter:track_queue_length:127 - Queue check_azure_subscription_and_create_cloud_account declare failed: Channel.queue_declare: (404) NOT_FOUND - no queue 'check_azure_subscription_and_create_cloud_account' in vhost '/'
Traceback (most recent call last):

  File "/usr/lib64/python3.9/threading.py", line 930, in _bootstrap
    self._bootstrap_inner()
    │    └ <function Thread._bootstrap_inner at 0x7f08c5f77ee0>
    └ <Thread(waitress-0, started daemon 139675548083968)>
  File "/usr/lib64/python3.9/threading.py", line 973, in _bootstrap_inner
    self.run()
    │    └ <function Thread.run at 0x7f08c5f77c10>
    └ <Thread(waitress-0, started daemon 139675548083968)>
  File "/usr/lib64/python3.9/threading.py", line 910, in run
    self._target(*self._args, **self._kwargs)
    │    │        │    │        │    └ {}
    │    │        │    │        └ <Thread(waitress-0, started daemon 139675548083968)>
    │    │        │    └ (0,)
    │    │        └ <Thread(waitress-0, started daemon 139675548083968)>
    │    └ <bound method ThreadedTaskDispatcher.handler_thread of <waitress.task.ThreadedTaskDispatcher object at 0x7f08bff4f460>>
    └ <Thread(waitress-0, started daemon 139675548083968)>

  File "/usr/local/lib/python3.9/site-packages/waitress/task.py", line 84, in handler_thread
    task.service()
    │    └ <function HTTPChannel.service at 0x7f08c012d4c0>
    └ <waitress.channel.HTTPChannel connected 10.128.4.1:56130 at 0x7f08bfef6940>

  File "/usr/local/lib/python3.9/site-packages/waitress/channel.py", line 397, in service
    task.service()
    │    └ <function Task.service at 0x7f08c00db700>
    └ <waitress.task.WSGITask object at 0x7f08bfef6040>


  File "/usr/local/lib/python3.9/site-packages/waitress/task.py", line 168, in service
    self.execute()
    │    └ <function WSGITask.execute at 0x7f08c00dbb80>
    └ <waitress.task.WSGITask object at 0x7f08bfef6040>

  File "/usr/local/lib/python3.9/site-packages/waitress/task.py", line 434, in execute
    app_iter = self.channel.server.application(environ, start_response)
               │    │       │      │           │        └ <function WSGITask.execute.<locals>.start_response at 0x7f08bd0993a0>
               │    │       │      │           └ {'REMOTE_ADDR': '10.128.4.1', 'REMOTE_HOST': '10.128.4.1', 'REMOTE_PORT': '56130', 'REQUEST_METHOD': 'GET', 'SERVER_PORT': '8...
               │    │       │      └ <Flask 'src.http_server'>
               │    │       └ <waitress.server.TcpWSGIServer listening 0.0.0.0:8000 at 0x7f08bfee4f10>
               │    └ <waitress.channel.HTTPChannel connected 10.128.4.1:56130 at 0x7f08bfef6940>
               └ <waitress.task.WSGITask object at 0x7f08bfef6040>

  File "/usr/local/lib/python3.9/site-packages/flask/app.py", line 2091, in __call__
    return self.wsgi_app(environ, start_response)
           │    │        │        └ <function WSGITask.execute.<locals>.start_response at 0x7f08bd0993a0>
           │    │        └ {'REMOTE_ADDR': '10.128.4.1', 'REMOTE_HOST': '10.128.4.1', 'REMOTE_PORT': '56130', 'REQUEST_METHOD': 'GET', 'SERVER_PORT': '8...
           │    └ <function Flask.wsgi_app at 0x7f08c0188f70>
           └ <Flask 'src.http_server'>

  File "/usr/local/lib/python3.9/site-packages/flask/app.py", line 2073, in wsgi_app
    response = self.full_dispatch_request()
               │    └ <function Flask.full_dispatch_request at 0x7f08c0188550>
               └ <Flask 'src.http_server'>

  File "/usr/local/lib/python3.9/site-packages/flask/app.py", line 1516, in full_dispatch_request
    rv = self.dispatch_request()
         │    └ <function Flask.dispatch_request at 0x7f08c01884c0>
         └ <Flask 'src.http_server'>

  File "/usr/local/lib/python3.9/site-packages/flask/app.py", line 1502, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args)
           │    │           │    │              │    │            │   └ {}
           │    │           │    │              │    │            └ <Request 'http://10.128.5.5:8000/metrics' [GET]>
           │    │           │    │              │    └ 'celery_exporter.metrics'
           │    │           │    │              └ <Rule '/metrics' (OPTIONS, HEAD, GET) -> celery_exporter.metrics>
           │    │           │    └ {'static': <function Flask.__init__.<locals>.<lambda> at 0x7f08c00a5550>, 'celery_exporter.index': <function index at 0x7f08c...
           │    │           └ <Flask 'src.http_server'>
           │    └ <function Flask.ensure_sync at 0x7f08c0188820>
           └ <Flask 'src.http_server'>

  File "/opt/celery-exporter/src/http_server.py", line 32, in metrics
    current_app.config["metrics_puller"]()
    └ <Flask 'src.http_server'>

> File "/opt/celery-exporter/src/exporter.py", line 122, in track_queue_length
    ret = connection.default_channel.queue_declare(
          │          └ <property object at 0x7f08c1a96860>
          └ <Connection: redis://cloudigrade-redis.ephemeral-ztg6s3.svc:6379// at 0x7f08bd02ef70>

  File "/usr/local/lib/python3.9/site-packages/kombu/transport/virtual/base.py", line 516, in queue_declare
    raise ChannelError(
          └ <class 'amqp.exceptions.ChannelError'>

amqp.exceptions.ChannelError: Channel.queue_declare: (404) NOT_FOUND - no queue 'check_azure_subscription_and_create_cloud_account' in vhost '/'

Quickly looking at the new code, the logged exception is coming from here:

logger.exception(f"Queue {queue} declare failed: {str(ex)}")

Temporarily updating to a logger.info quiets down the error, but still a caught exception and setting counters to 0.

We've been running celery-exporter without any issues since May.

@danihodovic
Copy link
Owner

Can you revert to an earlier release (probably one without queue length metrics) while I have a look at this?

cc @homholueng

@derom
Copy link
Contributor

derom commented Jul 28, 2022

redis "removes" the queue when it's empty, that would explain NOT_FOUND - no queue

@danihodovic
Copy link
Owner

@derom Figure you could resolve this in a PR?

@derom
Copy link
Contributor

derom commented Jul 28, 2022

@danihodovic I'll try

@danihodovic
Copy link
Owner

@abellotti See if danihodovic/celery-exporter:0.5.3 resolves your issue. It worked for my Django project @ https://django.wtf

@abellotti
Copy link
Contributor Author

Hi @danihodovic the 0.5.3 and latest 0.4.1 tagged release does resolve the issue (no exception being logged), however, the queue lengths are always 0.0, even though I'm creating hundreds of async tasks.

@derom
Copy link
Contributor

derom commented Aug 3, 2022

Hi @abellotti Could you stop celery workers for a few minutes and see if it shows that queues are growing?
Also, I've noticed a change on the queues graph. Not sure how it's related to the update.
The new version of exporter shows "empty queues" with 30s interval, but with 10s, it shows queues as before
Screenshot 2022-08-03 at 10 55 13
Screenshot 2022-08-03 at 10 55 47

@danihodovic
Copy link
Owner

It's working for me with 0.5.3

# HELP celery_queue_length The number of message in broker queue.
# TYPE celery_queue_length gauge
celery_queue_length{queue_name="celery"} 3187.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants