Avoid opening too many jobs in the collector. Only parse so many at o… #203

toddr · 2020-11-09T19:06:42Z

…ne time.

Most unix systems limit the number of file handles any one process can
have open at a time to 1024. When the collector gets behind, it may try
to open too many files for completed jobs at once. Limit this with a
constant. They'll be picked up once the ones polled are processed.

toddr · 2020-11-09T19:07:23Z

This corrects the problems identified on #201 with too many file handles being open at once by the collector.

atoomic

tested and approved :-)

exodist

I want this configurable (you can add to your .yath.rc) with no behavior change by default.

This would also require documentation on when/why someone would use the option (your situation)

lib/Test2/Harness/Collector.pm

exodist · 2020-11-09T19:50:04Z

I know ZipRecruiter used a -j56 when I was there on a super beefy machine. I do think we ran into this limit and simply bumped the max open filehandles limit.

ok, I am fine with a default, since it is a number of jobs we can stick with 200. But I want it configurable.

Also, I wonder if we can catch this exception and have it handle it gracefully... or if nothing else catch it, report that the user should probably lower this config options value, then die. That does not need to be part of this PR though if it is a lot of work.

toddr · 2020-11-09T19:52:13Z

Concerning the -j56, I think some recent jitter in releases has somehow slowed down the collector. I was even wondering if we were accidentally not using Cpanel::JSON::XS. What we do know is that updating to the latest T2 has caused this to happen where it was not previously.

toddr · 2020-11-09T19:53:01Z

We're totally ok with an option and documenting it. Updating the PR now.

exodist · 2020-11-09T20:00:53Z

ok, yeah, add the option and docs, I will merge this, and then without a rush we can look into speeding up the collector (or fixing whatever slowed it down)

toddr · 2020-11-09T22:35:03Z

@exodist: To add an option, am I going to have to create lib/App/Yath/Options/Collector.pm ?

exodist · 2020-11-09T22:49:12Z

Hmm, yes, that may be necessary, any command that starts a collector (test and run currently) would need to include those options, so yeah they should be their own category/module. But that's the kind of flexibility this system was made to allow, so should not be hard to do. If you get stuck let me know, but I think everything is likely to get passed to all the right places already, so should not be a difficult change. Let me know if you get stuck.

lib/App/Yath/Command/collector.pm

lib/Test2/Harness/Collector.pm

lib/Test2/Harness/Util/Queue.pm

exodist · 2020-11-10T04:05:28Z

@toddr @atoomic I was going to explain how I needed things to change, but I had the time so I just added a commit instead. Now I need you guys to verify my changes still fix your problem. The changes you made would have fixed your problem, but the option was not actually usable and the collector would have been broken in subtle ways. If any of my changes need explanations please let me know.

If this does fix your issues I can merge and release.

toddr · 2020-11-10T04:09:31Z

I can't look till the morning but I am almost certain that 1000 will break it. processing each job requires multiple handles so 1000x2 is > 1024. It's really not important that the loop doesn't suck everything up each time since it'll get them on the next go around.

exodist · 2020-11-10T04:18:03Z

Yeah, please do wait until you have time to fully read it, the 1000 you commented on is a different setting that has no effect on number of open handles, it is just a second option I chose to expose and the value is unchanged. I kept the 300 default that you had in your PR commit.

toddr · 2020-11-10T04:20:46Z

Thanks for your help on this

toddr

The changes look sane except for how and when it is messaging when it's backed up. We probably should be recommending they reduce the -j value, since setting it higher than CPU can cause a load which is pretty much the root cause of this problem. This also means that -j can never exceed 300 without significant complications.

toddr · 2020-11-10T14:32:31Z

lib/Test2/Harness/Collector.pm

+    my $max_open_jobs = $self->settings->collector->max_open_jobs // 1024;
+    my $additional_jobs_to_parse = $max_open_jobs - keys %$jobs;
+    if($additional_jobs_to_parse <= 0) {
+        $self->send_backed_up;


This should never happen. If it does, it means an entire processing loop happened and no jobs were processed to completion. Given there are 300 jobs, that'd be bad or you passed a REALLY big number to -j.

It happened in my testing when I set -j to 2 and set --max-open-jobs to 1. Not everyone will be using the defaults.

Sure but I would argue it's a bug to set --max-open-jobs to less than -j and maybe we should warn on that?

lib/Test2/Harness/Collector.pm

toddr · 2020-11-10T15:14:56Z

I've pushed a commit trying to straighten out the messaging. I'm going to try to test it now.

…ne time. Most unix systems limit the number of file handles any one process can have open at a time to 1024. When the collector gets behind, it may try to open too many files for completed jobs at once. Limit this with a new collector setting which defaults to 300. They'll be picked up once the ones polled are processed. This commit adds App::Yath::Options::Collector

* Added another collector option * Renamed collector option * Put option in test and start commands * Make warning when job limit is hit a proper event * Only issue the warning once per test run * Use the correct settings file in collector

lib/Test2/Harness/Collector.pm

toddr · 2020-11-12T16:30:23Z

We're testing this right now on our systems. Surprisingly even with -j$nproc, the collector runs behind pretty much the whole run until it gets to the end where single threaded tests run.

exodist · 2021-11-18T17:40:15Z

This has been merged

atoomic mentioned this pull request Nov 9, 2020

Explicitly close some file handles #201

Open

atoomic approved these changes Nov 9, 2020

View reviewed changes

exodist requested changes Nov 9, 2020

View reviewed changes

lib/Test2/Harness/Collector.pm Outdated Show resolved Hide resolved

toddr force-pushed the collector_fh branch from 0e9380e to cee2f71 Compare November 10, 2020 00:34

toddr commented Nov 10, 2020

View reviewed changes

lib/App/Yath/Command/collector.pm Outdated Show resolved Hide resolved

lib/App/Yath/Command/collector.pm Outdated Show resolved Hide resolved

lib/Test2/Harness/Collector.pm Outdated Show resolved Hide resolved

lib/Test2/Harness/Util/Queue.pm Show resolved Hide resolved

toddr force-pushed the collector_fh branch 2 times, most recently from f0e7316 to 83c92da Compare November 10, 2020 00:50

exodist self-assigned this Nov 10, 2020

exodist requested a review from atoomic November 10, 2020 04:06

toddr commented Nov 10, 2020

View reviewed changes

toddr and others added 2 commits November 10, 2020 09:15

Clean up and re-work PR for too many open jobs

9748124

* Added another collector option * Renamed collector option * Put option in test and start commands * Make warning when job limit is hit a proper event * Only issue the warning once per test run * Use the correct settings file in collector

toddr force-pushed the collector_fh branch from a279836 to d088849 Compare November 10, 2020 15:16

exodist reviewed Nov 10, 2020

View reviewed changes

lib/Test2/Harness/Collector.pm Outdated Show resolved Hide resolved

atoomic reviewed Nov 10, 2020

View reviewed changes

lib/Test2/Harness/Collector.pm Show resolved Hide resolved

atoomic reviewed Nov 10, 2020

View reviewed changes

lib/Test2/Harness/Collector.pm Show resolved Hide resolved

Update messaging related to the collector running behind.

deeb6ee

toddr force-pushed the collector_fh branch from d088849 to deeb6ee Compare November 10, 2020 17:26

exodist closed this Nov 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid opening too many jobs in the collector. Only parse so many at o… #203

Avoid opening too many jobs in the collector. Only parse so many at o… #203

toddr commented Nov 9, 2020

toddr commented Nov 9, 2020

atoomic left a comment

exodist left a comment

exodist commented Nov 9, 2020

toddr commented Nov 9, 2020

toddr commented Nov 9, 2020

exodist commented Nov 9, 2020

toddr commented Nov 9, 2020

exodist commented Nov 9, 2020

exodist commented Nov 10, 2020

toddr commented Nov 10, 2020 •

edited

exodist commented Nov 10, 2020

toddr commented Nov 10, 2020

toddr left a comment

toddr Nov 10, 2020

exodist Nov 10, 2020

toddr Nov 10, 2020

toddr commented Nov 10, 2020

toddr commented Nov 12, 2020

exodist commented Nov 18, 2021

Avoid opening too many jobs in the collector. Only parse so many at o… #203

Avoid opening too many jobs in the collector. Only parse so many at o… #203

Conversation

toddr commented Nov 9, 2020

toddr commented Nov 9, 2020

atoomic left a comment

Choose a reason for hiding this comment

exodist left a comment

Choose a reason for hiding this comment

exodist commented Nov 9, 2020

toddr commented Nov 9, 2020

toddr commented Nov 9, 2020

exodist commented Nov 9, 2020

toddr commented Nov 9, 2020

exodist commented Nov 9, 2020

exodist commented Nov 10, 2020

toddr commented Nov 10, 2020 • edited

exodist commented Nov 10, 2020

toddr commented Nov 10, 2020

toddr left a comment

Choose a reason for hiding this comment

toddr Nov 10, 2020

Choose a reason for hiding this comment

exodist Nov 10, 2020

Choose a reason for hiding this comment

toddr Nov 10, 2020

Choose a reason for hiding this comment

toddr commented Nov 10, 2020

toddr commented Nov 12, 2020

exodist commented Nov 18, 2021

toddr commented Nov 10, 2020 •

edited