-
Notifications
You must be signed in to change notification settings - Fork 333
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding --check to b3sum #44
Conversation
OK I think this is ready for review. |
|
||
// Check if hashes match. | ||
// TODO: Should this be constant time? | ||
if check_hash == hash { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@oconnor663 , I'd like your opinion of if we should be doing a constant-time check for equality here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any reason not to, and it might matter in some very obscure scenario. Let's stick to constant-time. We could even do a constant-time comparison on the hex bytes directly, and save ourselves the trouble of parsing them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You would have to lowercase both strings in that case, which is about the same trouble as parsing the hex string into bytes and comparing there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we allowed to assume that the output we're parsing came from b3sum
? If so, maybe we don't have to support uppercase hex if we don't emit any? But anyway, parsing hex to bytes is also fine with me.
let line: String = line.with_context(|| format!("line {}", line_num + 1))?; | ||
let line = line.trim(); | ||
|
||
// Skip the line if it's a comment or empty |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do these conditions come up in practice? What does coreutils do with these cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was following along with the behavior here:
https://github.com/jtdowney/b2sum-rs/blob/master/src/main.rs#L127-L129
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For what it's worth, shasum
(on my mac at least) allows one trailing empty line, but errors on more than one empty line (which seems like odd behavior..)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we can simplify things by demanding that the input format is exactly the same as the tool output, I'm all for simplifying.
let filename: &str = line | ||
.next() | ||
.ok_or_else(|| Error::msg(format!("missing filename on line {}", line_num + 1)))?; | ||
let os_filename: std::ffi::OsString = filename.into(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This raises an interesting point. Should we try to handle really weird filenames and escaping? It looks like md5sum * | md5sum --check
can handle some very nasty stuff, including filenames with newlines or invalid unicode. The newline case even involves some special syntax where the output line starts with a backslash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a big fan of drawing the line at non-unicode filenames. Non-unicode filenames need to be relegated to the dustbin of history :)
But if you really think we need to support non-unicode filenames, then I can work on adding support.
I agree that we should add tests for all sorts of pathological filename cases though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the one hand, it might not be worth the effort. On the other hand, if we treat everything as raw bytes, maybe we get this "support" for free? (But then again, Windows paths aren't amenable to raw bytes, are they? Not making Window support an annoyance might be a good reason to stick with Unicode/UTF-8.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, deciding what is whitespace and a line-end when interpreting raw bytes in a cross platform way is a PITA. I think I would want to bring in regex::bytes
as a dependency to handle this if we decide to support non-unicode.
.with_context(|| format!("filename: {}", filename))?; | ||
check_reader | ||
.take(hash.len() as u64) | ||
.read_to_end(&mut check_hash)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little skeptical of deriving the length from the input hashes. Say for example that the input hashes are just 1 byte long. That could lead to a situation where you think you're checking file integrity, but you're really not checking much. This gets more problematic if it's possible for some attacker to control the length of the hashes you check.
On the flip side, checking hashes longer than 32 bytes doesn't add any security, but people might get confused and think that it does.
Maybe just have --check
always assume the default length? Maybe even conflict with --length
? (Incidentally it should probably also conflict with --keyed
and --derive-key
.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying to ensure that all output from b3sum
is also valid input for b3sum --check
. So that if you do b3sum * --length=128
and pass that back into b3sum --check
it should work.
I would recommend two things:
-
Throw a warning if the hash lengths are under some threshold at which we consider the security compromised. We should do this for both checking and generating the checksum file.
-
Add a "strict" mode where warnings are upgraded to errors.
This has some precedence. b2sum-rs
has the following options that we could emulate:
The following five options are useful only when verifying checksums:
--ignore-missing don't fail or report status for missing files
--quiet don't print OK for each successfully verified file
--status don't output anything, status code shows success
--strict exit non-zero for improperly formatted checksum lines
However, if we do decide to always error on checking non-32-byte checksums, then we should make sure we at least generate a warning when generating them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if we have --check
throw an error saying "this only works with 32-byte hashes", we'll get 90% of the value with 10% of the effort. Trying to determine "security compromised" in general is too vague, and it depends on the application. (Collision-resistant checksums need to be 256 bits, but MACs can usually get away with 128, for example.)
@@ -177,8 +184,94 @@ fn read_key_from_stdin() -> Result<[u8; blake3::KEY_LEN]> { | |||
} | |||
} | |||
|
|||
fn check_file(input: impl Read, mmap_disabled: bool) -> Result<u64> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a little confusing that you've got check_files
and check_file
, but it's the latter that loops over all the filenames?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. I will rename things to make it clearer.
.pipe(cmd!(b3sum_exe(), "--check")) | ||
.run(); | ||
assert!(result.is_ok()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll definitely want to test multiple inputs. Also whatever we do with weird whitespace and invalid unicode, we should try to test here.
I've pushed a different |
This is for adding
--check
to theb3sum
utility.