Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Excessive CPU and memory use + thread panics #23

Closed
pkolaczk opened this issue Jun 20, 2020 · 11 comments
Closed

Excessive CPU and memory use + thread panics #23

pkolaczk opened this issue Jun 20, 2020 · 11 comments

Comments

@pkolaczk
Copy link

Tried this on my home directory with ~1.5 million files.

/usr/bin/time -v target/release/ddh ~
thread '<unnamed>' panicked at 'Error reading Symlink Metadata: Os { code: 2, kind: NotFound, message: "No such file or directory" }', src/lib.rs:285:25
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread '<unnamed>' panicked at 'Error reading Symlink Metadata: Os { code: 2, kind: NotFound, message: "No such file or directory" }', src/lib.rs:285:25
thread '<unnamed>' panicked at 'Error reading Symlink Metadata: Os { code: 2, kind: NotFound, message: "No such file or directory" }', src/lib.rs:285:25
thread '<unnamed>' panicked at 'Error reading Symlink Metadata: Os { code: 2, kind: NotFound, message: "No such file or directory" }', src/lib.rs:285:25
thread '<unnamed>' panicked at 'Error reading Symlink Metadata: Os { code: 2, kind: NotFound, message: "No such file or directory" }', src/lib.rs:285:25
thread '<unnamed>' panicked at 'Error reading Symlink Metadata: Os { code: 2, kind: NotFound, message: "No such file or directory" }', src/lib.rs:285:25
thread '<unnamed>' panicked at 'Error reading Symlink Metadata: Os { code: 2, kind: NotFound, message: "No such file or directory" }', src/lib.rs:285:25
thread '<unnamed>' panicked at 'Error reading Symlink Metadata: Os { code: 2, kind: NotFound, message: "No such file or directory" }', src/lib.rs:285:25
thread '<unnamed>' panicked at 'Error reading Symlink Metadata: Os { code: 2, kind: NotFound, message: "No such file or directory" }', src/lib.rs:285:25
thread '<unnamed>' panicked at 'Error reading Symlink Metadata: Os { code: 2, kind: NotFound, message: "No such file or directory" }', src/lib.rs:285:25
thread '<unnamed>' panicked at 'Error reading Symlink Metadata: Os { code: 2, kind: NotFound, message: "No such file or directory" }', src/lib.rs:285:25
thread '<unnamed>' panicked at 'Error reading Symlink Metadata: Os { code: 2, kind: NotFound, message: "No such file or directory" }', src/lib.rs:285:25
thread '<unnamed>' panicked at 'Error reading Symlink Metadata: Os { code: 2, kind: NotFound, message: "No such file or directory" }', src/lib.rs:285:25
...

There seems to be no way to turn off symlink following, so I guess it might have escaped from my home dir and tried to scan the whole filesystem. But anyways, the CPU and memory usage was extremely high. System time usage is quite unusal assuming this was I/O bound workload:

^CCommand terminated by signal 2
	Command being timed: "target/release/ddh /home/pkolaczk"
	User time (seconds): 50.08
	System time (seconds): 983.47
	Percent of CPU this job got: 776%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 2:13.10
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 18724728
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 0
	Minor (reclaiming a frame) page faults: 4688422
	Voluntary context switches: 27301
	Involuntary context switches: 150856
	Swaps: 0
	File system inputs: 103592
	File system outputs: 0
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096

ddh consumed 18 GBs, while other programs like
fdupes, jdupes, rmfind, fclones need about 150-500 MB.

@darakian
Copy link
Owner

Interesting. So, ddh shouldn't be following symlinks at all. That error comes from this bit of code

let good_entries: Vec<_> = read_dir_results
                    .filter(|x| x.is_ok())
                    .map(|x| x.unwrap())
                    .collect();
let (files, dirs): (Vec<&DirEntry>, Vec<&DirEntry>) = good_entries.par_iter().partition(|&x|
                        x.path()
                        .as_path()
                        .symlink_metadata()
                        .expect("Error reading Symlink Metadata")
                        .file_type()
                        .is_file()
                        );

Where directory entries are sorted as files and dirs prior to hashing. The symlink term is used to map the rust file api. I can definitely make a patch to better handle that error, but would you be willing to test some changes on a branch? I'd like to dig into what the root cause is if possible.

@darakian
Copy link
Owner

The debug-thread-file-panics branch might resolve your issue due to a change in metadata access. If you can get a reproducible panic can you share what the type of filesystem entry it is?

@pkolaczk
Copy link
Author

Panics are not happening on the debug-thread-file-panics branch, but excessive memory use is still present. It consumes tens of gigabytes very quickly, before even printing anything. Is there any progress bar, to at least know how far it went?

@pkolaczk
Copy link
Author

Ah, I was wrong: there was a panic, but I needed to wait longer:

thread '<unnamed>' panicked at 'Error reading path length: Os { code: 40, kind: Other, message: "Too many levels of symbolic links" }', src/lib.rs:270:17
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

@darakian
Copy link
Owner

darakian commented Jun 23, 2020

I was able to reproduce the error locally and made a new commit on debug-thread-file-panics which resolves the bug for me. Would you mind giving it another spin?

@darakian
Copy link
Owner

@pkolaczk any update?

@pkolaczk
Copy link
Author

pkolaczk commented Jul 3, 2020

I was busy, I'll try to look at this on weekend.

This was referenced Jul 8, 2020
@darakian
Copy link
Owner

I've merged the debug-thread-file-panics branch into master as it corrects a bug. When you have a chance please let me know if that resolves your bug or if there is yet another :)

@darakian
Copy link
Owner

Any update @pkolaczk?

@darakian
Copy link
Owner

Ping @pkolaczk

@darakian
Copy link
Owner

Closing. I believe the issue is resolved and the reporter has not followed up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants