New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Throttling does not work in ThrottleableTransport #4321

Closed
ueokande opened this Issue Nov 6, 2017 · 8 comments

Comments

Projects
None yet
6 participants
@ueokande

ueokande commented Nov 6, 2017

Throttling logs in KafkaTransport does not work even if the server status become THROTTLED.
As result, KafkaTransport continues to write messages to disk, and disk usage is increasing.

Expected Behavior

KafkaTransport throttles input data when disk journal is grater than 100% (or lb_throttle_threshold_percentage). Then KafkaTransport pause to subscribe messages until the server status become ALIVE or buffer gets free spaces.

Current Behavior

Throttling does not work. KafkaTransport continues to write to journal even if utilization of disk journal is grater than 100%.

Possible Solution

Throttle state in ThrottleableTransport is changed by updateThrottleState, but the method does not invoked from anywhere when a node become throttled. The reason is nobody post ThrottleState to EventBus.

I expect there are two solutions

  1. Post ThrottleState to EvenBus.
  2. ThrottleableTransport subscribe Lifecycle.

Steps to Reproduce (for bugs)

  1. Stop elasticsearch cluster
  2. Launch Raw/Plaintext Kafka
  3. Wait for disk utilization get to throttled

Context

Kafka is a one of solution to construct an at-least-once data delivery systems, because the Kafka garantee that. Therefore, KafkaTransport should garantee at-least-once transport from Kafka to Elasticsearch by observing throttled state.

Your Environment

  • Graylog Version: 2.3 (4824e52) and master (f1ab021)
  • Elasticsearch Version: 5.6.3
  • MongoDB Version: 2.6.10
  • Operating System: Ubuntu 16.04
  • Browser version:
@leffen

This comment has been minimized.

leffen commented Dec 5, 2017

We are experiencing the same issue with graylog version 2.3.2 and kafka 10.2.1

@bernd bernd added bug triaged labels Dec 18, 2017

@bernd bernd added this to the 3.0.0 milestone Dec 18, 2017

@jam49

This comment has been minimized.

jam49 commented Apr 18, 2018

same; without throttling, when having multiple sources into a Kafka cluster; the Graylog journals can be easily over run due to volume, rather than having the input throttled and then leaving the messages in the kafka queue when journals near max. Kind of defeats the purpose of using Kafka. I would suggest this needs to be corrected prior to version 3.0 ; e.g 2.4.x maybe?

@gizmonicus

This comment has been minimized.

gizmonicus commented May 22, 2018

I also have this issue with Raw/Plaintext AMQP inputs using Graylog 2.4.4-1 Docker image. The option to "Allow throttling this input" is set, but doesn't seem to actually work. It just drains the queue and fills up the disk journal. I have plenty of space on the RabbitMQ nodes that I was hoping would reduce the need for a large disk journal in Graylog. Am I missing something? Is this expected behavior?

@ueokande

This comment has been minimized.

ueokande commented May 22, 2018

@gizmonicus AMQP transport inherits ThrottleableTransport, too. Your problem might be cased by same bug.

@gizmonicus

This comment has been minimized.

gizmonicus commented May 23, 2018

@ueokande have you found any work arounds? I hadn't planned on using the Graylog disk journal for queueing when I have RabbitMQ for that but I suppose one work around would be to add more disk space for the journal. It's not totally pointless since the RabbitMQ nodes still can queue messages when GL is down, it's just not ideal.

@awlx

This comment has been minimized.

awlx commented May 29, 2018

Any updates or plans for this? If you need any input to track down the issue please contact me. I am glad to help.

@bernd bernd self-assigned this Jun 18, 2018

@bernd

This comment has been minimized.

Member

bernd commented Jun 18, 2018

Thank you for the reports. This seems to be broken since #1948.

We will work on a fix and let you know once it's done.

bernd added a commit that referenced this issue Jun 18, 2018

Unbreak input throttling by publishing throttle state again
This broke in PR #1948 where I refactored the cluster event handling and
removed the throttle state publishing by accident.

Fixes #4321
Refs #1948

kroepke added a commit that referenced this issue Jun 18, 2018

Unbreak input throttling by publishing throttle state again (#4849)
This broke in PR #1948 where I refactored the cluster event handling and removed the throttle state publishing by accident.

Fixes #4321
Refs #1948

**Note:** This should be cherry-picked into 2.4 as well.

bernd added a commit that referenced this issue Jun 18, 2018

Unbreak input throttling by publishing throttle state again
This broke in PR #1948 where I refactored the cluster event handling and removed the throttle state publishing by accident.

Fixes #4321
Refs #1948

(cherry picked from commit 060bd15)
@bernd

This comment has been minimized.

Member

bernd commented Jun 18, 2018

This will be fixed in Graylog 3.0 and the next 2.4 stable release.

kroepke added a commit that referenced this issue Jun 19, 2018

Unbreak input throttling by publishing throttle state again (#4850)
This broke in PR #1948 where I refactored the cluster event handling and removed the throttle state publishing by accident.

Fixes #4321
Refs #1948

(cherry picked from commit 060bd15)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment