Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Send data in batches #3

Merged
merged 12 commits into from Feb 11, 2018

Conversation

patrobinson
Copy link
Contributor

@patrobinson patrobinson commented Feb 2, 2018

While using this plugin in production I discovered performance was extremely bad when the volume of events was more than a dozen per second. Output plugins are, by default, single threaded and since the multi_receive method isn't implemented it's just a single thread sending one message at a time to firehose. That basically limits this plugin to 10-20 events per second.

Firehose supports batch sends, which allows sending up to 500 at a time. This PR implements the multi_receive method, which Logstash automatically takes advantage of, to send up to 500 records at a time. We also check to ensure other limits, such as the maximum size of a single payload or single record, are not exceeded. Finally we refactor the tests to provide better guarantees it's working as required.

Copy link
Owner

@chupakabr chupakabr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for your PR mate. Just curious, does logstash have an official repo of this plugin? I think I made PR to their repo long time ago..

@@ -43,12 +43,19 @@ class LogStash::Outputs::Firehose < LogStash::Outputs::Base
TEMPFILE_EXTENSION = "txt"
FIREHOSE_STREAM_VALID_CHARACTERS = /[\w\-]/

# These are hard limits
FIREHOSE_PUT_BATCH_SIZE_LIMIT = 4_000_000 # 4MB
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be nice to have these properties configurable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a hard limit from AWS, you cannot change it so I'm not sure the value in making it configurable.

@patrobinson
Copy link
Contributor Author

patrobinson commented Feb 8, 2018

@chupakabr there is, but your original PR has not been merged https://github.com/logstash-plugins/logstash-output-firehose

I figured this is as close to the upstream as I can get my changes.

Alternatively I can release this as a gem on Rubygems and become the unofficial maintainer

@chupakabr
Copy link
Owner

@patrobinson if you are fancy becoming an unofficial maintainer - feel free! :) I'll approve your PRs meanwhile here.
Cheers!

@chupakabr
Copy link
Owner

@patrobinson please resolve merge conflicts first, happened after merging your first PR ;)

@chupakabr chupakabr merged commit 7b34b8f into chupakabr:master Feb 11, 2018
@chupakabr
Copy link
Owner

cheers mate! @patrobinson

@patrobinson patrobinson deleted the send-data-in-batches branch June 22, 2018 02:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants