Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add throttling to the AWS Flow Logs/Logs inputs (#88) #94

Merged
merged 5 commits into from Nov 28, 2018
Merged

Conversation

@bernd
Copy link
Member

@bernd bernd commented Nov 28, 2018

  • Add throttling support for the AWS Logs and AWS Flow Logs inputs

This change applies only to the Kinesis transport which is used for these inputs. If the throttling state becomes active while processing messages, then the Kinesis consumer will wait up to 60 seconds (during which the input will continue processing if the throttling state clears). If the throttling state does not clear, then the consumer will be temporarily shut down until the throttling state is cleared.

  • Improve documentation and log messages

  • Add Throttling documentation to AWS plugin Readme file

  • Improve Throttling documentation in Readme

  • Fix documentation typos

  • Add config options for Kinesis record batch size, Throttle pause time ms

  • Add state management around throttlable Kinesis consumer restart

The Kinesis consumer shuts down if it was paused too long due to throttling. The restart of the consumer needed state management to ensure that the consumer does not restart if unthrottling occurs during deliberate input shutdown.

  • Fix bug causing multiple Kinesis consumers to start during throttling

If the consumer was stopped (due to extended throttling), and restarted during the time when it was shutting down, a resource collision could occur that causes persistent errors from within the AWS Kinesis client.

  • Prevent duplicate records due to missing checkpoint when throttled

Checkpointing is now performed before the Kinesis consumer is shut down due to sustained throttling. This prevents those records since the last checkpoint from being reprocessed.

Change info logs -> Debug.

  • Put individual Flow Log entries in Journal instead of lg Kinesis batches

Kinesis record record batches can contain up to 1MB of log files (could be more than 1k log entries). This prevents throttling from engaging at the appropriate time (eg. should engage at 100k uncommitted journal entries), so this would mean that the journal would have to full up to 90% before throttling would trigger (possibly preventing other inputs from utilizing even a small portion of the Journal).

Now, the Kinesis record is broken out into individual log messages and committed to the Journal, and throttling engages at the appropriate early time.

  • Remove explicit throttling criteria and instead reference new docs page

  • Improve AWS Logs/AWS Flow Logs overview/setup instructions

  • Add an overview for now these inputs operate
  • Add IAM policy rights needed to run these inputs
  • Add details on throttling
  • Minor code cleanup from reviewing diffs

  • Simplify default specification for THROTTLED_WAIT parameter

Move the default specification to the location where the configuration parameter is originally parsed from the UI. This improved the maintainability since the parsing and default are now handled in the same location (instead of in different classes).

  • Remove emojis in log messages

  • Cleanup comments

  • Make accessors private

  • Initialize executor in constructor, so it can be final

  • Change info log messages to debug

  • Fix invalid KINESIS_RECORD_BATCH_SIZE default argument

  • Use atomic reference for throttle state

  • Perform initialization in constructor

  • Unconditionally set throttling state to false when input is stopped

Fixes #85

(cherry picked from commit 8b06e61)

danotorrey and others added 2 commits Nov 28, 2018
* Add throttling support for the AWS Logs and AWS Flow Logs inputs

This change applies only to the Kinesis transport which is used for these inputs. If the throttling state becomes active while processing messages, then the Kinesis consumer will wait up to 60 seconds (during which the input will continue processing if the throttling state clears). If the throttling state does not clear, then the consumer will be temporarily shut down until the throttling state is cleared.

* Improve documentation and log messages

* Add Throttling documentation to AWS plugin Readme file

* Improve Throttling documentation in Readme

* Fix documentation typos

* Add config options for Kinesis record batch size, Throttle pause time ms

* Add state management around throttlable Kinesis consumer restart

The Kinesis consumer shuts down if it was paused too long due to throttling. The restart of the consumer needed state management to ensure that the consumer does not restart if unthrottling occurs during deliberate input shutdown.

* Fix bug causing multiple Kinesis consumers to start during throttling

If the consumer was stopped (due to extended throttling), and restarted during the time when it was shutting down, a resource collision could occur that causes persistent errors from within the AWS Kinesis client.

* Prevent duplicate records due to missing checkpoint when throttled

Checkpointing is now performed before the Kinesis consumer is shut down due to sustained throttling. This prevents those records since the last checkpoint from being reprocessed.

Change info logs -> Debug.

* Put individual Flow Log entries in Journal instead of lg Kinesis batches

Kinesis record record batches can contain up to 1MB of log files (could be more than 1k log entries). This prevents throttling from engaging at the appropriate time (eg. should engage at 100k uncommitted journal entries), so this would mean that the journal would have to full up to 90% before throttling would trigger (possibly preventing other inputs from utilizing even a small portion of the Journal).

Now, the Kinesis record is broken out into individual log messages and committed to the Journal, and throttling engages at the appropriate early time.

* Remove explicit throttling criteria and instead reference new docs page

* Improve AWS Logs/AWS Flow Logs overview/setup instructions

 - Add an overview for now these inputs operate
 - Add IAM policy rights needed to run these inputs
 - Add details on throttling

* Minor code cleanup from reviewing diffs

* Simplify default specification for THROTTLED_WAIT parameter

Move the default specification to the location where the configuration parameter is originally parsed from the UI. This improved the maintainability since the parsing and default are now handled in the same location (instead of in different classes).

* Remove emojis in log messages

* Cleanup comments

* Make accessors private

* Initialize executor in constructor, so it can be final

* Change info log messages to debug

* Fix invalid KINESIS_RECORD_BATCH_SIZE default argument

* Use atomic reference for throttle state

* Perform initialization in constructor

* Unconditionally set throttling state to false when input is stopped

Fixes #85

(cherry picked from commit 8b06e61)
@bernd bernd added this to the 2.5.0 milestone Nov 28, 2018
@bernd bernd requested a review from danotorrey Nov 28, 2018
danotorrey and others added 3 commits Nov 28, 2018
Import location reflects the current location in 2.5 branch (which subsequently changed in master)
This resolves an errorprone error: [MissingCasesInEnumSwitch] Non-exhaustive switch; either add a default or handle the remaining cases: STARTING, RUNNING
@danotorrey danotorrey merged commit bad1fdd into 2.5 Nov 28, 2018
2 checks passed
2 checks passed
graylog-project/pr Jenkins build graylog-project-pr-snapshot 2308 has succeeded
Details
license/cla Contributor License Agreement is signed.
Details
@danotorrey danotorrey deleted the pr-88-2.5 branch Nov 28, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

2 participants
You can’t perform that action at this time.