Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add celery_queue_length, celery_active_consumer_count gauge metrics for exporter #118

Merged

Conversation

homholueng
Copy link
Contributor

@homholueng homholueng commented Jun 23, 2022

close #100

Hi Dani, as promised I am making a PR to add queue_length metric for exporter.

The implementation is simple. A daemon thread will start with main thread and it will scrape length of queues which declare by developer.

Developer can add concerned queues to exporter by --track-queue option. e.g. celery-exporter --broker-url=amqp://guest:guest@host.docker.internal:5672/ --track-queue celery --track-queue my_queue

Default queue length scrape interval is 30s and it can be customized by --queue-track-interval option. e.g. celery-exporter --broker-url=amqp://guest:guest@host.docker.internal:5672/ --track-queue celery --queue-track-interval 60

Then the exporter will export broker queue length metrics like this

celery_queue_length{queue_name="celery"} 3.0
celery_queue_length{queue_name="my_queue"} 0.0

Looking forward to your opinions.

Thx.

@homholueng homholueng changed the title feature: add queue_length gauge metrics for exporter Add queue_length gauge metrics for exporter Jun 23, 2022
@danihodovic
Copy link
Owner

Looking good. Are there any other metrics we can get from .queue_declare() ?

@homholueng
Copy link
Contributor Author

homholueng commented Jun 24, 2022

Looking good. Are there any other metrics we can get from .queue_declare() ?

According to kombu API Refrence, queue_declare returns

A named tuple representing the declared queue as a named tuple. The tuple values are ordered as queue, message count, and the active consumer count.

We can add Gauge(active_consumer_count){label=queuename} metric to exporter if it's necessary.

@danihodovic
Copy link
Owner

Could you add the active consumer count? That would be awesome :)

@homholueng
Copy link
Contributor Author

celery_active_consumer_count is added.

@homholueng homholueng changed the title Add queue_length gauge metrics for exporter Add celery_queue_length, celery_active_consumer_count gauge metrics for exporter Jun 27, 2022
@danihodovic
Copy link
Owner

danihodovic commented Jun 27, 2022

Would it be possible to get the queue length and consumer count for all queues, without having to specify them as options in the exporter cli?

I've managed to get the queue length metric in a production environment by deploying this branch, however the active consumer count shows 0, despite tasks being consumed from the queue (and the queue shrinking in size).

# HELP celery_queue_length The number of message in broker queue.
# TYPE celery_queue_length gauge
celery_queue_length{queue_name="celery"} 15226.0
# HELP celery_active_consumer_count The number of active consumer in broker queue.
# TYPE celery_active_consumer_count gauge
celery_active_consumer_count{queue_name="celery"} 0.0

Could you have a second look that it works? I believe the test should also show a consumer count of 1.

Instead of scheduling scrapes on a internal timer, we could pull metrics on demand when the /metrics endpoint is hit. This is what the Prometheus community recommends when writing scrapers. https://prometheus.io/docs/instrumenting/writing_exporters/#scheduling

Metrics should only be pulled from the application when Prometheus scrapes them, exporters should not perform scrapes based on their own timers. That is, all scrapes should be synchronous.

Perhaps we could scrape queue length before building the output:

output = encoder(current_app.config["registry"])

@homholueng
Copy link
Contributor Author

homholueng commented Jun 28, 2022

Q1

Would it be possible to get the queue length and consumer count for all queues, without having to specify them as options in the exporter cli?

The reasons for doing this are as follows:

  1. The lower kombu API have to declare queue first and then fetch the information of queue. So the specify of queues from developer is required.
  2. Maybe there is a way to auto discover queues in broker. But I think not every queue in broker is used by celery or by celery instance of "current developer". There is possible that the broker is used by many developer's celery instance so the data of every queues in the broker may cause confused. e.g. Celery will create queues like celeryev.xxxx to transport worker's event message, and these queues should not be concerned by developer.

So, I think the specify from developer is more reasonable and explicit. What do you think?

Q2

Could you have a second look that it works? I believe the test should also show a consumer count of 1.

Sure. I have checked the source code of kombu and the documentation of rabbitmq, this metric does reflect the number of consumers. And I also check it on my host. It's really works. Could you check the broker's consumer data and consumer process state when you read the metrics?
image

Q3

Perhaps we could scrape queue length before building the output:

Yep. But after that, the pull request will become a non-constant complexity operation. It will affect by the length of queues which is O(n) complexity. It's acceptable?

@danihodovic
Copy link
Owner

danihodovic commented Jun 28, 2022

The lower kombu API have to declare queue first and then fetch the information of queue. So the specify of queues from developer is required.

I managed to run a breakpoint() in the test suite to inspect the celery app. Listing queues seems to work for Redis, but you can try it for RabbitMQ to see if it shows the same data. I set the breakpoint() here https://github.com/danihodovic/celery-exporter/blob/master/src/exporter.py#L201

self.app.control.inspect().active_queues()

{
	'gen89100@x1': [
		{
			'name': 'celery',
			'exchange': {'name': 'celery', 'type': 'direct', 'arguments': None, 'durable': True, 'passive': False, 'auto_delete': False, 'delivery_mode': None, 'no_declare': False},
			'routing_key': 'celery',
			'queue_arguments': None,
			'binding_arguments': None,
			'consumer_arguments': None,
			'durable': True,
			'exclusive': False,
			'auto_delete': False,
			'no_ack': False,
			'alias': None,
			'bindings': [],
			'no_declare': None,
			'expires': None,
			'message_ttl': None,
			'max_length': None,
			'max_length_bytes': None,
			'max_priority': None
		}
	]
}

This should give us a list of queues for each worker. We can flatten the list and then get the length for each.

Sure. I have checked the source code of kombu and the documentation of rabbitmq, this metric does reflect the number of consumers.

If so, shouldn't the unit test you added show 1 consumer instead of 0? https://github.com/danihodovic/celery-exporter/pull/118/files#diff-fb60222581054c3ac15e1981679b0a596c22dfeb483873337979183826f559ffR76. In any case this is not a deal breaker. We can merge the changes without the active consumers working for Redis.

Yep. But after that, the pull request will become a non-constant complexity operation. It will affect by the length of queues which is O(n) complexity. It's acceptable?

That's fine. Given that there is a finite set of queues (usually less than 10) I don't think reading queued data for all queues will have an outsized performance impact. If it does, we can cache the values:

If a metric is particularly expensive to retrieve, i.e. takes more than a minute, it is acceptable to cache it. This should be noted in the HELP string.

https://prometheus.io/docs/instrumenting/writing_exporters/#scheduling

@homholueng
Copy link
Contributor Author

homholueng commented Jun 28, 2022

Thanks for your advice.

According to the previous discussion, I will make the following modifications:

  1. Remove --track-queue and --queue-track-interval option and fetch track queue list by celery_app.inspect().active_queues.
  2. Remove queue metrics scrape daemon thread. celery_queue_length and celery_active_consumer_count metrics scrape only perform at metrics pulling
  3. Add non-zero celery_active_consumer_count metrics unit test case.

Those modifications will be committed ASAP.

@danihodovic
Copy link
Owner

Awesome!

@homholueng
Copy link
Contributor Author

homholueng commented Jun 29, 2022

Something unexpected happened while I was adding non-zero celery_active_consumer_count metrics unit test case.

It seems the memory://localhost broker type does not follow the amqp protocol so the queue_declare method can not get the correct celery_active_consumer_count.

So I also verified it with Redis as broker and can not get the correct celery_active_consumer_count too.

Should we keep the celery_active_consumer_count metric?

@homholueng
Copy link
Contributor Author

homholueng commented Jun 29, 2022

Something unexpected happened while I was adding non-zero celery_active_consumer_count metrics unit test case.

It seems the memory://localhost broker type does not follow the amqp protocol so the queue_declare method can not get the correct celery_active_consumer_count.

So I also verified it with Redis as broker and can not get the correct celery_active_consumer_count too.

Should we keep the celery_active_consumer_count metric?

I dig into the kombu implementation and find some clue.

queue_declare will raise amqp.exceptions.ChannelError: Channel.queue_declare: (404) NOT_FOUND - no queue 'celery' in vhost '/' from _has_queue check when there is no message in Redis broker because the "queue" in Redis is a list and it seems not durable. (The key will be deleted when the queue is empty)

Redis keys return

when queue is not empty

127.0.0.1:6379> keys celery
1) "celery"

when queue is empty

127.0.0.1:6379> keys celery
(empty array)

So the _has_queue method of kombu Redis transport will return different value base on the empty state of queue

@danihodovic
Copy link
Owner

If the queue is empty there is no active consumer, have I understood this right? In that case we could set it to zero.

@homholueng
Copy link
Contributor Author

If the queue is empty there is no active consumer, have I understood this right? In that case we could set it to zero.

Yep, but I also test with non-empty Redis queue and alive worker. The queue_declare method didn't raise but celery_active_consumer_count is still 0.

So, I guess only broker that support amqp protocol will return consumer count.

@danihodovic
Copy link
Owner

What is your preference? Keep if for RabbitMQ or remove it so that it's uniform for all brokers?

@homholueng
Copy link
Contributor Author

homholueng commented Jun 29, 2022

If the queue is empty there is no active consumer, have I understood this right? In that case we could set it to zero.

Yep, but I also test with non-empty Redis queue and alive worker. The queue_declare method didn't raise but celery_active_consumer_count is still 0.

So, I guess only broker that support amqp protocol will return consumer count.

Root cause found, the base class return 0 consumer count as default: queue_declare in base Channel

And Redis Channel doesn't override the method: Redis channel

Only qpid transport override it. 😓

image

@homholueng
Copy link
Contributor Author

homholueng commented Jun 29, 2022

What is your preference? Keep if for RabbitMQ or remove it so that it's uniform for all brokers?

I think we can keep the metric as long as we explain the limitation of broker type in the documentation.

@danihodovic
Copy link
Owner

Sounds good, maybe we can make a PR to Kombu to add the functionality for Redis.

@homholueng
Copy link
Contributor Author

homholueng commented Jun 29, 2022

Sounds good, maybe we can make a PR to Kombu to add the functionality for Redis.

We can mark as a future work. 😄

Need more learning on the pattern of Celery interact with Redis broker.

@danihodovic
Copy link
Owner

Is the PR ready for review?

@homholueng
Copy link
Contributor Author

Is the PR ready for review?

Yes

Copy link
Owner

@danihodovic danihodovic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit, rest looks good :)

src/test_cli.py Outdated Show resolved Hide resolved
@danihodovic danihodovic merged commit 1145721 into danihodovic:master Jun 30, 2022
@danihodovic
Copy link
Owner

Nice work @homholueng 👏 伟大的工程工作

@danihodovic
Copy link
Owner

Released as danihodovic/celery-exporter:0.5.0

Can you update README.md to add the new metrics? https://github.com/danihodovic/celery-exporter#metrics

@homholueng
Copy link
Contributor Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

queue length metrics support
2 participants