-
Notifications
You must be signed in to change notification settings - Fork 333
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
b3sum: Implement recursive file hashing #170
base: master
Are you sure you want to change the base?
Conversation
I've been thinking about my implementation of this. I'm a Rust newbie and I'd appreciate it if someone could review the code and check that my understanding is correct:
If that's the case, is the order of the output this change gives correct for a |
if md.is_dir() && args.recurse() { | ||
let mut entries = fs::read_dir(path)? | ||
.map(|res| res.map(|e| e.path())) | ||
.collect::<Result<Vec<_>, io::Error>>()?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.collect::<Result<Vec<_>, io::Error>>()?
very smooth but now I'm not sure I believe you when you say "I'm a Rust newbie" :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can neither confirm nor deny this was copied from some documentation I looked up. 😁
This doesn't sound right to me. I think you're referring to the call to |
} | ||
if args.no_names() { | ||
let md = metadata(path).unwrap(); | ||
if md.is_dir() && args.recurse() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you thought about what will happen for directory entries that are symlinks in this case? It looks like if they're symlinks to directories, then the'll be opened as though they were files, which will probably cause an error. But in fixing this, we have to be careful not to infinitely loop on circular symlinks. The right thing to do here isn't clear to me, and we might want to look at what similar recursive tools do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, good point, I hadn't considered that. My use-case is on Windows where symlinks are very unlikely to exist but I'll add another option --follow-symlinks
to follow them, and investigate how to detect loops. IMHO it would make sense not to follow them by default but I don't mind doing the opposite and having a --no-follow-symlinks
option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need something similar to Javascript WeakSet
to track symlinks, thereby avoiding infinite loops.
Edit: found it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My use-case is on Windows where symlinks are very unlikely to exist
Directory junctions have been commonly used by Microsoft to ensure backward compatibility when OS upgrades changed the directory structure, e.g. the XP => Vista migration created %userprofile%/My Documents <<===>> %userprofile%/Documents
.
I was about to suggest that a Zsh glob could accomplish something similar, but lo and behold it didn't work when I tried it:
So I immediately see some value in this feature! :) Could you say more about the use case that you're interested in using it for? A high level thought: So far, So this becomes something of an existential question: Do we want It could be that the answer here is "no". Maybe it could make sense to support |
I have a project to move directory trees of files of differing sizes from one server to another and to be able to cryptographically prove that the source matches the destination - i.e. that nothing was modified in the transfer. The tree might contain terabyte-size files or millions of tiny files. I'd want to use this along with #171 to show the digest of the whole directory tree. You make good points about the parallelism. If the processing were to run in parallel and the output were made upon completion of each file then my objective of comparing the whole set of files wouldn't be as simple as the files may not be processed in the same order on each server. In that case the results would have to be stored in some kind of tree map to be output at the end of the process - actually an idea I was contemplating for #117 anyway. Would you be open to considering that? |
Ok, that's my lack of Rust knowledge coming through. Is the |
I've had a quick look at those crates and they seem like they might fit this project. Where do you envisage the overheads being? Do we want to look at using a producer/consumer pattern with (at least) two threads: one to find the files and put them in a queue; and the other to process them off the queue as quickly as possible? I've done that a few times in Java for similar projects and it works well. Perhaps using |
I had a go at doing this and gave up because it's way beyond my current knowledge of Rust so would take me a few months to implement. I have to stick to synchronous code for now! |
A very practical feature. Is there any progress now? |
If there was, you'd see it above in the activity/discussion. The associated issue provides a workaround CLI command to use The associated issue also had a comment after that recently, linking to |
Add an argument -r (--recurse) to recurse through any directories in the list of files and process all the containing files in a defined order.