Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding --check to b3sum #44

Closed
wants to merge 2 commits into from
Closed

Conversation

phayes
Copy link
Contributor

@phayes phayes commented Jan 26, 2020

This is for adding --check to the b3sum utility.

@phayes phayes changed the title WIP: Adding --check to b3sum Adding --check to b3sum Jan 26, 2020
@phayes
Copy link
Contributor Author

phayes commented Jan 26, 2020

OK I think this is ready for review.


// Check if hashes match.
// TODO: Should this be constant time?
if check_hash == hash {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@oconnor663 , I'd like your opinion of if we should be doing a constant-time check for equality here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any reason not to, and it might matter in some very obscure scenario. Let's stick to constant-time. We could even do a constant-time comparison on the hex bytes directly, and save ourselves the trouble of parsing them.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You would have to lowercase both strings in that case, which is about the same trouble as parsing the hex string into bytes and comparing there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we allowed to assume that the output we're parsing came from b3sum? If so, maybe we don't have to support uppercase hex if we don't emit any? But anyway, parsing hex to bytes is also fine with me.

let line: String = line.with_context(|| format!("line {}", line_num + 1))?;
let line = line.trim();

// Skip the line if it's a comment or empty
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these conditions come up in practice? What does coreutils do with these cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was following along with the behavior here:

https://github.com/jtdowney/b2sum-rs/blob/master/src/main.rs#L127-L129

Copy link
Contributor Author

@phayes phayes Jan 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what it's worth, shasum (on my mac at least) allows one trailing empty line, but errors on more than one empty line (which seems like odd behavior..)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can simplify things by demanding that the input format is exactly the same as the tool output, I'm all for simplifying.

let filename: &str = line
.next()
.ok_or_else(|| Error::msg(format!("missing filename on line {}", line_num + 1)))?;
let os_filename: std::ffi::OsString = filename.into();
Copy link
Member

@oconnor663 oconnor663 Jan 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This raises an interesting point. Should we try to handle really weird filenames and escaping? It looks like md5sum * | md5sum --check can handle some very nasty stuff, including filenames with newlines or invalid unicode. The newline case even involves some special syntax where the output line starts with a backslash.

Copy link
Contributor Author

@phayes phayes Jan 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a big fan of drawing the line at non-unicode filenames. Non-unicode filenames need to be relegated to the dustbin of history :)

But if you really think we need to support non-unicode filenames, then I can work on adding support.

I agree that we should add tests for all sorts of pathological filename cases though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the one hand, it might not be worth the effort. On the other hand, if we treat everything as raw bytes, maybe we get this "support" for free? (But then again, Windows paths aren't amenable to raw bytes, are they? Not making Window support an annoyance might be a good reason to stick with Unicode/UTF-8.)

Copy link
Contributor Author

@phayes phayes Jan 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, deciding what is whitespace and a line-end when interpreting raw bytes in a cross platform way is a PITA. I think I would want to bring in regex::bytes as a dependency to handle this if we decide to support non-unicode.

.with_context(|| format!("filename: {}", filename))?;
check_reader
.take(hash.len() as u64)
.read_to_end(&mut check_hash)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little skeptical of deriving the length from the input hashes. Say for example that the input hashes are just 1 byte long. That could lead to a situation where you think you're checking file integrity, but you're really not checking much. This gets more problematic if it's possible for some attacker to control the length of the hashes you check.

On the flip side, checking hashes longer than 32 bytes doesn't add any security, but people might get confused and think that it does.

Maybe just have --check always assume the default length? Maybe even conflict with --length? (Incidentally it should probably also conflict with --keyed and --derive-key.)

Copy link
Contributor Author

@phayes phayes Jan 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to ensure that all output from b3sum is also valid input for b3sum --check. So that if you do b3sum * --length=128 and pass that back into b3sum --check it should work.

I would recommend two things:

  1. Throw a warning if the hash lengths are under some threshold at which we consider the security compromised. We should do this for both checking and generating the checksum file.

  2. Add a "strict" mode where warnings are upgraded to errors.

This has some precedence. b2sum-rs has the following options that we could emulate:

The following five options are useful only when verifying checksums:
      --ignore-missing  don't fail or report status for missing files
      --quiet           don't print OK for each successfully verified file
      --status         don't output anything, status code shows success
      --strict          exit non-zero for improperly formatted checksum lines

However, if we do decide to always error on checking non-32-byte checksums, then we should make sure we at least generate a warning when generating them.

Copy link
Member

@oconnor663 oconnor663 Jan 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we have --check throw an error saying "this only works with 32-byte hashes", we'll get 90% of the value with 10% of the effort. Trying to determine "security compromised" in general is too vague, and it depends on the application. (Collision-resistant checksums need to be 256 bits, but MACs can usually get away with 128, for example.)

@@ -177,8 +184,94 @@ fn read_key_from_stdin() -> Result<[u8; blake3::KEY_LEN]> {
}
}

fn check_file(input: impl Read, mmap_disabled: bool) -> Result<u64> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a little confusing that you've got check_files and check_file, but it's the latter that loops over all the filenames?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. I will rename things to make it clearer.

.pipe(cmd!(b3sum_exe(), "--check"))
.run();
assert!(result.is_ok());
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll definitely want to test multiple inputs. Also whatever we do with weird whitespace and invalid unicode, we should try to test here.

@oconnor663
Copy link
Member

I've pushed a different --check implementation to master, and documented it at https://github.com/BLAKE3-team/BLAKE3/blob/master/b3sum/what_does_check_do.md. If you get a chance, and let me know what you think at #33?

@oconnor663 oconnor663 closed this May 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants