New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid opening too many jobs in the collector. Only parse so many at o… #203
Conversation
This corrects the problems identified on #201 with too many file handles being open at once by the collector. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tested and approved :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want this configurable (you can add to your .yath.rc) with no behavior change by default.
This would also require documentation on when/why someone would use the option (your situation)
I know ZipRecruiter used a -j56 when I was there on a super beefy machine. I do think we ran into this limit and simply bumped the max open filehandles limit. ok, I am fine with a default, since it is a number of jobs we can stick with 200. But I want it configurable. Also, I wonder if we can catch this exception and have it handle it gracefully... or if nothing else catch it, report that the user should probably lower this config options value, then die. That does not need to be part of this PR though if it is a lot of work. |
Concerning the |
We're totally ok with an option and documenting it. Updating the PR now. |
ok, yeah, add the option and docs, I will merge this, and then without a rush we can look into speeding up the collector (or fixing whatever slowed it down) |
@exodist: To add an option, am I going to have to create |
Hmm, yes, that may be necessary, any command that starts a collector (test and run currently) would need to include those options, so yeah they should be their own category/module. But that's the kind of flexibility this system was made to allow, so should not be hard to do. If you get stuck let me know, but I think everything is likely to get passed to all the right places already, so should not be a difficult change. Let me know if you get stuck. |
f0e7316
to
83c92da
Compare
@toddr @atoomic I was going to explain how I needed things to change, but I had the time so I just added a commit instead. Now I need you guys to verify my changes still fix your problem. The changes you made would have fixed your problem, but the option was not actually usable and the collector would have been broken in subtle ways. If any of my changes need explanations please let me know. If this does fix your issues I can merge and release. |
I can't look till the morning but I am almost certain that 1000 will break it. processing each job requires multiple handles so 1000x2 is > 1024. It's really not important that the loop doesn't suck everything up each time since it'll get them on the next go around. |
Yeah, please do wait until you have time to fully read it, the 1000 you commented on is a different setting that has no effect on number of open handles, it is just a second option I chose to expose and the value is unchanged. I kept the 300 default that you had in your PR commit. |
Thanks for your help on this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes look sane except for how and when it is messaging when it's backed up. We probably should be recommending they reduce the -j
value, since setting it higher than CPU can cause a load which is pretty much the root cause of this problem. This also means that -j
can never exceed 300 without significant complications.
my $max_open_jobs = $self->settings->collector->max_open_jobs // 1024; | ||
my $additional_jobs_to_parse = $max_open_jobs - keys %$jobs; | ||
if($additional_jobs_to_parse <= 0) { | ||
$self->send_backed_up; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should never happen. If it does, it means an entire processing loop happened and no jobs were processed to completion. Given there are 300 jobs, that'd be bad or you passed a REALLY big number to -j.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It happened in my testing when I set -j to 2 and set --max-open-jobs to 1. Not everyone will be using the defaults.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure but I would argue it's a bug to set --max-open-jobs to less than -j and maybe we should warn on that?
I've pushed a commit trying to straighten out the messaging. I'm going to try to test it now. |
…ne time. Most unix systems limit the number of file handles any one process can have open at a time to 1024. When the collector gets behind, it may try to open too many files for completed jobs at once. Limit this with a new collector setting which defaults to 300. They'll be picked up once the ones polled are processed. This commit adds App::Yath::Options::Collector
* Added another collector option * Renamed collector option * Put option in test and start commands * Make warning when job limit is hit a proper event * Only issue the warning once per test run * Use the correct settings file in collector
We're testing this right now on our systems. Surprisingly even with |
This has been merged |
…ne time.
Most unix systems limit the number of file handles any one process can
have open at a time to 1024. When the collector gets behind, it may try
to open too many files for completed jobs at once. Limit this with a
constant. They'll be picked up once the ones polled are processed.