Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Processing files not in language list #134

Closed
ChristopherRogers opened this issue Nov 7, 2019 · 13 comments
Closed

Processing files not in language list #134

ChristopherRogers opened this issue Nov 7, 2019 · 13 comments
Labels
bug Something isn't working question Further information is requested

Comments

@ChristopherRogers
Copy link

Describe the bug
Filtering on a language properly filters out the output to only be files of that language, but it seems like there's an inefficiency where it's still processing or attempting to process all files regardless of this language filter.

To Reproduce

  1. Run scc -i language-of-choice --no-cocomo in a directory & environment that will cause "too many files open" errors to be output.
  2. Notice that the file names may be for files that aren't in the language filter passed. (Whether this actually occurs is likely dependent on the composition of the files.)

Expected behavior
Specifying the language of choice should result in scc not attempting to open files that aren't in the language list.

Desktop (please complete the following information):
macOS 10.15.1

@boyter
Copy link
Owner

boyter commented Nov 7, 2019

While that's not a great outcome I am more worried about the fact you are getting "too many files open". Certainly a performance optimisation in there potentially but the crash should not happen.

Are you able to share details about how many files are in the directory you are trying to count? Both file count and size of the directory. Id love to know how much RAM you have and what CPU you are running.

@boyter boyter added bug Something isn't working question Further information is requested labels Nov 7, 2019
@dbaggerman
Copy link
Collaborator

Google tells me the default open file limit on macOS is 256. We run runtime.NumCPU() * 4 file readers, and runtime.NumCPU() gives the number of hyperthreads not physical cores.

According to apple.com, the current maximum spec for a Mac Pro is 28 cores, or 56 hyperthreads. 56 * 4 == 224, which is getting fairly close to the limit, only leaving 32 spare handles. Might be worth capping FileReadJobWorkers at around 64/128, and/or printing a warning on high core count systems.

The directory walker also opens (directory) file handles, which also scale to the number of cores/hyperthreads. This would easily push through the 32 spare handles of the above scenario.

The directory handles can be kept open if the file job chan is full. We can close those directory handles a bit sooner if we change defer file.Close() to explicitly closing immediately after file.Readdir(). It won't reduce the upper limit of file handles, but it wouldn't be bad hygiene anyway.

@ChristopherRogers
Copy link
Author

I'm running this on an 18-core Xeon W (with hyperthreading) iMac Pro with 32 GB of RAM. The entire directory is 20.82 GB, with 81,899 files & directories. The files I'm interested in number 7,754 (per scc's output).

@dbaggerman
Copy link
Collaborator

Based on that, I would expect an upper limit of 144 (36 * 4 file readers) + 36 (directory walkers), or 180 open files at a time. Reasonably high, but within the 256 limit that Google told me about. Maybe I'm missing something else that's keeping files open?

@ChristopherRogers, could you confirm your open file limit please? You can check this by running ulimit -n in a terminal.

@boyter
Copy link
Owner

boyter commented Nov 8, 2019

I agree. A cap on the FileReadJobWorkers might be a good approach here. Its certainly going to be required in the future since AMD seems to be doubling the number of cores every year and HDD speed is not getting close to matching.

I hate to put an artificial limit but not sure what else to do.

I guess it could some number under whatever the ulimit for each of the supported systems is. Maybe half or something? Going to see if there is some cross platform way to determine this at runtime. At least then it should in theory scale with the number of cores and system limits.

As mentioned by @dbaggerman can you confirm your ulimit @ChristopherRogers please. Also would you be willing to try one off builds either from source or ideally from branches or binaries attached to this ticket? I doubt either @dbaggerman or I are going to be able to find a 18 Xeon iMac easily to replicate this and potential fixes.

@dbaggerman
Copy link
Collaborator

I guess it could some number under whatever the ulimit for each of the supported systems is.

ulimit is configurable. The ideal would be to query the number dynamically (via syscall.Getrlimit) and respond appropriately. However, this gets into territory where Windows is very different from POSIX systems, and would need to be have it's own implementation.

@ChristopherRogers
Copy link
Author

Yes, it's 256. launchctl limit maxfiles gives 256 & unlimited as the soft & hard limits, respectively. Setting it to 2048 was reliably not enough, 2365 would give me about a 50% success rate, and I couldn't get it to spit out any errors at 3000.

The manpage for setrlimit says that it should still be able to proceed when the soft limit is exceeded--just that it may receive a signal. I guess scc is skipping files when it gets this signal.

Sure, I wouldn't mind testing out a one-off build.

@boyter
Copy link
Owner

boyter commented Nov 10, 2019

I guess it could some number under whatever the ulimit for each of the supported systems is.

ulimit is configurable. The ideal would be to query the number dynamically (via syscall.Getrlimit) and respond appropriately. However, this gets into territory where Windows is very different from POSIX systems, and would need to be have it's own implementation.

Yep hence my comment about finding a cross platform way to look at it.

@boyter
Copy link
Owner

boyter commented Nov 10, 2019

@ChristopherRogers Would you be able to try out the build attached and let us know if that resolves the issue please.

scc-2.11.0-x86_64-apple-darwin.zip

@ChristopherRogers
Copy link
Author

That build works with ulimit set to 256. 👍

@boyter
Copy link
Owner

boyter commented Nov 12, 2019

Neat. @dbaggerman looks like a winner.

Might push it out as a bugfix release soon.

@dbaggerman
Copy link
Collaborator

Excellent. Maximum open files is still a potential problem that will come up as computers approach the 32 core mark, but it's good that we've got the immediate problem sorted out.

@boyter
Copy link
Owner

boyter commented Nov 12, 2019

Bug fix release pushed out. Going to close this down then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants