Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add throttling support for the AWS Flow Logs input #85

Closed
danotorrey opened this issue Sep 6, 2018 · 1 comment
Closed

Add throttling support for the AWS Flow Logs input #85

danotorrey opened this issue Sep 6, 2018 · 1 comment
Assignees
Milestone

Comments

@danotorrey
Copy link
Contributor

danotorrey commented Sep 6, 2018

The configuration for the AWS Flow Logs input includes an "Allow throttling this input" checkbox. Currently, the status of this checkbox is ignored on the backend. Implement throttling support for this input and enable it when this box is checked.

Currently, if large batches of Flow Logs are received (sometimes as large as 5GB), the Graylog journal may completely fill up, which may disrupt processing for the input.

The temporary workaround is to increase the size of the Graylog journal.

@danotorrey danotorrey self-assigned this Sep 6, 2018
@danotorrey danotorrey added this to the 3.0.0 milestone Sep 6, 2018
@danotorrey
Copy link
Contributor Author

This is implemented. Just waiting for PR review. We're planning to release this at the same time as 2.5.

@danotorrey danotorrey modified the milestones: 3.0.0, 2.5.0 Nov 19, 2018
@bernd bernd closed this as completed in 8b06e61 Nov 28, 2018
bernd pushed a commit that referenced this issue Nov 28, 2018
* Add throttling support for the AWS Logs and AWS Flow Logs inputs

This change applies only to the Kinesis transport which is used for these inputs. If the throttling state becomes active while processing messages, then the Kinesis consumer will wait up to 60 seconds (during which the input will continue processing if the throttling state clears). If the throttling state does not clear, then the consumer will be temporarily shut down until the throttling state is cleared.

* Improve documentation and log messages

* Add Throttling documentation to AWS plugin Readme file

* Improve Throttling documentation in Readme

* Fix documentation typos

* Add config options for Kinesis record batch size, Throttle pause time ms

* Add state management around throttlable Kinesis consumer restart

The Kinesis consumer shuts down if it was paused too long due to throttling. The restart of the consumer needed state management to ensure that the consumer does not restart if unthrottling occurs during deliberate input shutdown.

* Fix bug causing multiple Kinesis consumers to start during throttling

If the consumer was stopped (due to extended throttling), and restarted during the time when it was shutting down, a resource collision could occur that causes persistent errors from within the AWS Kinesis client.

* Prevent duplicate records due to missing checkpoint when throttled

Checkpointing is now performed before the Kinesis consumer is shut down due to sustained throttling. This prevents those records since the last checkpoint from being reprocessed.

Change info logs -> Debug.

* Put individual Flow Log entries in Journal instead of lg Kinesis batches

Kinesis record record batches can contain up to 1MB of log files (could be more than 1k log entries). This prevents throttling from engaging at the appropriate time (eg. should engage at 100k uncommitted journal entries), so this would mean that the journal would have to full up to 90% before throttling would trigger (possibly preventing other inputs from utilizing even a small portion of the Journal).

Now, the Kinesis record is broken out into individual log messages and committed to the Journal, and throttling engages at the appropriate early time.

* Remove explicit throttling criteria and instead reference new docs page

* Improve AWS Logs/AWS Flow Logs overview/setup instructions

 - Add an overview for now these inputs operate
 - Add IAM policy rights needed to run these inputs
 - Add details on throttling

* Minor code cleanup from reviewing diffs

* Simplify default specification for THROTTLED_WAIT parameter

Move the default specification to the location where the configuration parameter is originally parsed from the UI. This improved the maintainability since the parsing and default are now handled in the same location (instead of in different classes).

* Remove emojis in log messages

* Cleanup comments

* Make accessors private

* Initialize executor in constructor, so it can be final

* Change info log messages to debug

* Fix invalid KINESIS_RECORD_BATCH_SIZE default argument

* Use atomic reference for throttle state

* Perform initialization in constructor

* Unconditionally set throttling state to false when input is stopped

Fixes #85

(cherry picked from commit 8b06e61)
danotorrey pushed a commit that referenced this issue Nov 28, 2018
* Add throttling to the AWS Flow Logs/Logs inputs (#88)

* Add throttling support for the AWS Logs and AWS Flow Logs inputs

This change applies only to the Kinesis transport which is used for these inputs. If the throttling state becomes active while processing messages, then the Kinesis consumer will wait up to 60 seconds (during which the input will continue processing if the throttling state clears). If the throttling state does not clear, then the consumer will be temporarily shut down until the throttling state is cleared.

* Improve documentation and log messages

* Add Throttling documentation to AWS plugin Readme file

* Improve Throttling documentation in Readme

* Fix documentation typos

* Add config options for Kinesis record batch size, Throttle pause time ms

* Add state management around throttlable Kinesis consumer restart

The Kinesis consumer shuts down if it was paused too long due to throttling. The restart of the consumer needed state management to ensure that the consumer does not restart if unthrottling occurs during deliberate input shutdown.

* Fix bug causing multiple Kinesis consumers to start during throttling

If the consumer was stopped (due to extended throttling), and restarted during the time when it was shutting down, a resource collision could occur that causes persistent errors from within the AWS Kinesis client.

* Prevent duplicate records due to missing checkpoint when throttled

Checkpointing is now performed before the Kinesis consumer is shut down due to sustained throttling. This prevents those records since the last checkpoint from being reprocessed.

Change info logs -> Debug.

* Put individual Flow Log entries in Journal instead of lg Kinesis batches

Kinesis record record batches can contain up to 1MB of log files (could be more than 1k log entries). This prevents throttling from engaging at the appropriate time (eg. should engage at 100k uncommitted journal entries), so this would mean that the journal would have to full up to 90% before throttling would trigger (possibly preventing other inputs from utilizing even a small portion of the Journal).

Now, the Kinesis record is broken out into individual log messages and committed to the Journal, and throttling engages at the appropriate early time.

* Remove explicit throttling criteria and instead reference new docs page

* Improve AWS Logs/AWS Flow Logs overview/setup instructions

 - Add an overview for now these inputs operate
 - Add IAM policy rights needed to run these inputs
 - Add details on throttling

* Minor code cleanup from reviewing diffs

* Simplify default specification for THROTTLED_WAIT parameter

Move the default specification to the location where the configuration parameter is originally parsed from the UI. This improved the maintainability since the parsing and default are now handled in the same location (instead of in different classes).

* Remove emojis in log messages

* Cleanup comments

* Make accessors private

* Initialize executor in constructor, so it can be final

* Change info log messages to debug

* Fix invalid KINESIS_RECORD_BATCH_SIZE default argument

* Use atomic reference for throttle state

* Perform initialization in constructor

* Unconditionally set throttling state to false when input is stopped

Fixes #85

(cherry picked from commit 8b06e61)

* Fix AWSObjectMapper import

* Fix import location for AWSObjectMapper

Import location reflects the current location in 2.5 branch (which subsequently changed in master)

* Add default switch statement to clarify intent

This resolves an errorprone error: [MissingCasesInEnumSwitch] Non-exhaustive switch; either add a default or handle the remaining cases: STARTING, RUNNING

* Add missing "break"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant