-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Throughput question #374
Comments
Hello. When you're looking at the lc-admin. Can you see what it says under the transport bit? Key metric is the "pending payloads". If that sticks at 10 always, then essentially it's saying it's sent all it could (max 10 is default) to Logstash and is waiting for responses. If it's less than 10 then it's never waiting and always sending. In all honesty I've never seen a case where it was less than 10 and there was active harvesters ("completion" < 100) which would indicate Logstash running faster than logs can be read and sent. Usually things are IO bound and the slowest link is nearly always Logstash due to the grok/user-agent lookup/ip lookup stuff going on. Based on that the usual way to speed up would be look at Logstash side and see if it's needing more CPU, or at the ES ingestion side to see if IO/CPU is there. Regarding files held open - nowadays I see that as undesirable - it's a case where when things are working fine, holding the file after deletion for a few minutes to make sure everything was read and sent is a good thing - but if things aren't working fine it will hold it until all logs are sent and that could mean many deleted files held open if Logstash is hitting resource limits. It's something I haven't got around to looking at but log-courier should really have a setting that will abandon holding a deleted file open more than a specific amount of time. |
Hello, and thanks for the response. Wonderful software. I must have an "older" version (ppa from ubuntu) because I don't see that metric. This is an output of what I have currently from running a status:
Or is there a command line switch I need to add? Thanks again! |
Hmmm would you be willing to post the full status output? Possibly it is called publisher not transport. |
Ah yes, I see it now:
|
It looks like one endpoint isn’t doing much as it’s got 0 pending - what’s the “method”? I think default is random but will only send to one at any one time and use any other as failover - if both are fine to be processing it may be worth trying loadbalance method. https://github.com/driskell/log-courier/blob/master/docs/Configuration.md#method It is saying 700 lines per second speed though so maybe it is catching up? If you started log courier with lots of full files it might still be sending everything and might catch up eventually. |
Method is "loadbalance". I ran a status again and now I see this:
|
Ah here we have speed 0 and momentarily blocked pipeline in essence so yes I think there is some bottleneck on the logstash side. Adding another could help but one thing I found helpful was to also optimise the processing on logstash. Specifically consider looking at the actions getting taken on events like grok as I know that tends to be the slowest. Some patterns can be slower than others. |
Worth checking logstash CPU etc too a and the ES cluster CPU to make sure you pinpoint where the restricted element is. |
I tweaked my logstash using workers and batch size setting and CPU/Mem are doing just fine so this leads me to believe it to be on the elasticsearch side of things since they are CPU bound. |
As a follow up, when I run lc-admin to get a status, I see pending payload maxed at 10, which stated above, says it is waiting on logstash. I have 2 logstash servers, load balanced, and both have CPU (x4) that runs at about 50% and memory is not fully consumed either and no IOWait. In you experiences, could this also be network issues? |
You could try increasing the max pending payloads. Memory usage will increase but perhaps the throughput is such that 10 is not enough. Essentially it dictates max amount of data in transit at any one moment in time. |
Documentation reads default is 4. Is that correct? Or is it 10? Assuming since I see |
Thank you for all the help! Great app. |
@dale-busse-av No problem at all! Thanks for the patience. Did you manage to get things improved in the end? |
Yep we did. Thanks to your help, I was able to determine my ES cluster was the bottleneck. Learned about log-courier and appreciated all the help. Thanks again!! |
Hello,
I have a question about general throughput. I have a log file that gets about 200 logs/second, in which the average log line length is about 225 characters. We have it running on AWS (c5n.4xlarge instance) with plenty of RAM/CPU to spare. What I am witnessing is that when I run lc-admin and looking at this particular file, I am seeing the "completion" statistic sits at about 58 (which is my understanding that is the amount read). Since I use logrotate and rotate every hour, the file gets deleted but log-courier holds onto it and continues processing. Then I have two files being processed, the orphaned one and the new one that was created. However this does take up space on the filesystem. So my questions (we are on version 2.0.6):
2 Any way to speed this up?
Thanks!
The text was updated successfully, but these errors were encountered: