Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Blake3 support for Verify Store #575

Merged
merged 1 commit into from
Jan 9, 2024

Conversation

steedmicro
Copy link
Contributor

@steedmicro steedmicro commented Dec 22, 2023

Description

I've added Blake3 hash support Verify Store.
I just used the digest_hasher utiliy module to keep consistency.
For simplicity, I just used the default hash function which is set from the global config.
I've done integration tests on my machine to verify blake3 function actually works.
But since the global hash function is OnceLock and it can't be set twice, I couldn't add specific unit tests for blake3 and sha256 separately.
May it would be better if we add hash function config variable for the Verify Store Config?
Or is it just adding extra complexity?
I'd like to make this clear.

Fixes #444

Type of change

  • New feature (non-breaking change which adds functionality)

Checklist

  • Updated documentation if needed
  • Tests added/amended
  • bazel test //... passes locally
  • PR is contained in a single commit, using git amend see some docs

This change is Reviewable

@steedmicro
Copy link
Contributor Author

Hope you to review this. cc: @allada, @MarcusSorealheis, @aaronmondal

@steedmicro steedmicro changed the title Add blake3 support for verify store Add Blake3 support for Verify Store Dec 22, 2023
Copy link
Member

@allada allada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 LGTMs obtained

a discussion (no related file):
Sadly I don't think we should do it this way. The original request knows what hash function to use and here we just assume it's the default.

I'd rather do it the right way.

I'm not sure what the best way to do this is. On first glance it feels like adding the hash enum type to DigestInfo seems like the right way, but we frequently cast DigestInfo into the proto's Digest which looses this information.

Another possible option is to change the store api so it passes the digest function along with the requests. This will require a lot of pluming though and should be done in multiple PRs if this is the right way.

As for right now I'm not sure what the best option is.


Copy link
Member

@allada allada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 LGTMs obtained

a discussion (no related file):

Previously, allada (Nathan (Blaise) Bruer) wrote…

Sadly I don't think we should do it this way. The original request knows what hash function to use and here we just assume it's the default.

I'd rather do it the right way.

I'm not sure what the best way to do this is. On first glance it feels like adding the hash enum type to DigestInfo seems like the right way, but we frequently cast DigestInfo into the proto's Digest which looses this information.

Another possible option is to change the store api so it passes the digest function along with the requests. This will require a lot of pluming though and should be done in multiple PRs if this is the right way.

As for right now I'm not sure what the best option is.

I thought about this a bit more and did some rough tests to see how difficult it would be for a refactor in this area, and believe that adding a config for what hash function to use on the config for this store is the right way for now.

So if we change the config to support a hash function I think we can proceed.


Copy link
Contributor Author

@steedmicro steedmicro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 LGTMs obtained

a discussion (no related file):

Previously, allada (Nathan (Blaise) Bruer) wrote…

I thought about this a bit more and did some rough tests to see how difficult it would be for a refactor in this area, and believe that adding a config for what hash function to use on the config for this store is the right way for now.

So if we change the config to support a hash function I think we can proceed.

Thanks for your feedback, @allada . Let me do work in that way.


Copy link
Member

@allada allada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 LGTMs obtained

a discussion (no related file):
Also needs tests.


Copy link
Member

@allada allada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 4 files at r2.
Reviewable status: 0 of 1 LGTMs obtained, and pending CI: Remote / large-ubuntu-22.04


nativelink-config/src/stores.rs line 17 at r2 (raw file):

use serde::{Deserialize, Serialize};

use crate::cas_server::ConfigDigestHashFunction;

Lets move this into this crate. By doing this, it creates a circular reference, which is bad practice.


nativelink-config/src/stores.rs line 338 at r2 (raw file):

    /// This should be set to false for AC, but true for CAS stores.
    #[serde(default)]
    pub verify_hash: bool,

nit: Lets rename this:
hash_verification_function and make it an Option<ConfigDigestHashFunction>

Also add a comment above saying that None means it won't verify the hash.


nativelink-store/src/verify_store.rs line 60 at r2 (raw file):

        mut rx: DropCloserReadHalf,
        size_info: UploadSizeInfo,
        mut maybe_hasher: Option<([u8; 32], DigestHasher)>,

nit: Lets split this up. Les pass


nativelink-store/src/verify_store.rs line 83 at r2 (raw file):

                }
                if let Some((original_hash, hasher)) = maybe_hasher.as_mut() {
                    let hash_result: [u8; 32] = hasher.finalize_digest(i64::try_from(sum_size)?).packed_hash;

nit: We can't trust this size. But we also don't need it. So instead, lets pass -1 here and make a comment that this digest is only validating the hash.

Copy link
Collaborator

@MarcusSorealheis MarcusSorealheis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll defer to Blaise on this one, as I think his comments point in the right direction and you have done some important work to get started. At a high level:

Tests, as Blaise said
Config, as Blaise said. You already have a lot in place for that direction.
The only thing I would think about on the code is making the implementation in verify_store.rs less coupled to one hashing algo or another. It should be straightforward but happy to help (or ask someone else) if needed.

Nice work and thank you for the contribution.


/// Digest hash function to use for hashing contents in the verify store
///
/// Default: ConfigDigestHashFunction::sha256
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe specify what another option might be here in the comments. Blake3

@@ -544,7 +544,7 @@ pub enum WorkerConfig {
}

#[allow(non_camel_case_types)]
#[derive(Deserialize, Debug, Clone, Copy)]
#[derive(Serialize, Deserialize, Debug, Clone, Copy)]
pub enum ConfigDigestHashFunction {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, just below the changed line, we have the enum we need. to build upon to support sha256 or Blake3.

hasher = Some((
digest.packed_hash,
DigestHasher::from(DigestHasherFunc::from(
self.digest_hash_function.unwrap_or(ConfigDigestHashFunction::sha256),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this here? Im confused.

if let Some((original_hash, hasher)) = maybe_hasher {
let hash_result: [u8; 32] = hasher.finalize().into();
if original_hash != hash_result {
if let Some((original_hash, hasher)) = maybe_hasher.as_mut() {
Copy link
Collaborator

@MarcusSorealheis MarcusSorealheis Dec 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the if let conditional is fine, but computing the hash should live elsewhere. That way, here you can still have the comparator on current line 84 (if *original_hash != hash_result), with which type of hash being used is outside of the block. It should work for either hashing algorithm.

Right now, hash_result depends on Blake3 and that would be bad.

@steedmicro steedmicro force-pushed the add-blake3-verify-store branch 2 times, most recently from 121af6d to c6cd616 Compare December 27, 2023 11:10
Copy link
Contributor Author

@steedmicro steedmicro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 LGTMs obtained, and pending CI: Analyze (javascript-typescript), Analyze (python), Bazel Dev / ubuntu-22.04, Cargo Dev / ubuntu-22.04, Local / ubuntu-22.04, Remote / large-ubuntu-22.04, asan / ubuntu-22.04, docker-compose-compiles-nativelink (20.04), docker-compose-compiles-nativelink (22.04), integration-tests (20.04), integration-tests (22.04), pre-commit-checks, publish-image, ubuntu-20.04, ubuntu-20.04 / stable, ubuntu-22.04, ubuntu-22.04 / stable, windows-2022 / stable

a discussion (no related file):

Previously, allada (Nathan (Blaise) Bruer) wrote…

Also needs tests.

Done.



nativelink-config/src/cas_server.rs line 548 at r2 (raw file):

Previously, MarcusSorealheis (Marcus Eagan) wrote…

oh, just below the changed line, we have the enum we need. to build upon to support sha256 or Blake3.

Done.


nativelink-config/src/stores.rs line 17 at r2 (raw file):

Previously, allada (Nathan (Blaise) Bruer) wrote…

Lets move this into this crate. By doing this, it creates a circular reference, which is bad practice.

Done.


nativelink-config/src/stores.rs line 342 at r2 (raw file):

Previously, MarcusSorealheis (Marcus Eagan) wrote…

maybe specify what another option might be here in the comments. Blake3

Done.


nativelink-store/src/verify_store.rs line 82 at r2 (raw file):

Previously, MarcusSorealheis (Marcus Eagan) wrote…

I think the if let conditional is fine, but computing the hash should live elsewhere. That way, here you can still have the comparator on current line 84 (if *original_hash != hash_result`), with which type of hash being used is outside of the block. It should work for either hashing algorithm.

Right now, hash_result depends on Blake3 and that would be bad.

Done.


nativelink-store/src/verify_store.rs line 146 at r2 (raw file):

Previously, MarcusSorealheis (Marcus Eagan) wrote…

why is this here? Im confused.

Done.

Copy link
Member

@allada allada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 8 of 11 files at r3, all commit messages.
Reviewable status: 0 of 1 LGTMs obtained, and pending CI: pre-commit-checks


nativelink-config/src/stores.rs line 342 at r2 (raw file):

Previously, steed924 (Steed) wrote…

Done.

I don't think we should encourage blake3. Sha256 is what the default is for most implementations. Using blake3 requires the client to be setup for blake3 (which none to my knowledge do by default. They all use sha256 by default).

I'd rather just say "Like sha256" and not even mention blake3, since it requires a lot to go right.


nativelink-store/src/verify_store.rs line 141 at r3 (raw file):

        }

        let mut hasher = None;

nit: use (you can inline this too after it's done):

let hasher = self.hash_verification_function.map(|v| DigestHasher::from(DigestHasherFunc::from(v)));

nativelink-store/src/verify_store.rs line 186 at r3 (raw file):

            "If the verification store is verifying the size of the data",
        );
        if let Some(hash_verification_function) = self.hash_verification_function {

nit: Just use:

c.publish("hash_verification_function", format!("{:?}", self.hash_verification_function));

Copy link
Contributor Author

@steedmicro steedmicro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 LGTMs obtained, and pending CI: Analyze (javascript-typescript), Analyze (python), Bazel Dev / ubuntu-22.04, Cargo Dev / ubuntu-22.04, Local / ubuntu-22.04, Remote / large-ubuntu-22.04, asan / ubuntu-22.04, docker-compose-compiles-nativelink (20.04), docker-compose-compiles-nativelink (22.04), integration-tests (20.04), integration-tests (22.04), pre-commit-checks, publish-image, ubuntu-20.04, ubuntu-20.04 / stable, ubuntu-22.04, ubuntu-22.04 / stable, windows-2022 / stable


nativelink-config/src/stores.rs line 342 at r2 (raw file):

Previously, allada (Nathan (Blaise) Bruer) wrote…

I don't think we should encourage blake3. Sha256 is what the default is for most implementations. Using blake3 requires the client to be setup for blake3 (which none to my knowledge do by default. They all use sha256 by default).

I'd rather just say "Like sha256" and not even mention blake3, since it requires a lot to go right.

Done.

Copy link
Member

@allada allada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 2 of 11 files at r3, 2 of 2 files at r4, all commit messages.
Dismissed @MarcusSorealheis from 4 discussions.
Reviewable status: 1 of 1 LGTMs obtained, and pending CI: Remote / large-ubuntu-22.04, pre-commit-checks

Copy link
Member

@adam-singer adam-singer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed all commit messages.
Reviewable status: 2 of 1 LGTMs obtained, and pending CI: pre-commit-checks

Copy link
Collaborator

@MarcusSorealheis MarcusSorealheis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed all commit messages.
Reviewable status: 2 of 1 LGTMs obtained, and pending CI: pre-commit-checks

Copy link
Collaborator

@MarcusSorealheis MarcusSorealheis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: 3 of 1 LGTMs obtained, and pending CI: pre-commit-checks

Copy link
Member

@allada allada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@steed924 , can you rebase and we'll get this landed.

Reviewable status: 3 of 1 LGTMs obtained, and pending CI: pre-commit-checks

Copy link

vercel bot commented Jan 8, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
nativelink-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jan 9, 2024 5:08pm

@steedmicro
Copy link
Contributor Author

@steed924 , can you rebase and we'll get this landed.

Reviewable status: 3 of 1 LGTMs obtained, and pending CI: pre-commit-checks

Just done.

Copy link
Member

@aaronmondal aaronmondal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI @steed924 The failing CI job are the pre-commit hooks. I think it's trailing whitespace in the README.
:lgtm:

Reviewed all commit messages.
Reviewable status: 4 of 1 LGTMs obtained, and pending CI: pre-commit-checks

@steedmicro
Copy link
Contributor Author

steedmicro commented Jan 9, 2024

FYI @steed924 The failing CI job are the pre-commit hooks. I think it's trailing whitespace in the README. :lgtm:

Reviewed all commit messages.
Reviewable status: 4 of 1 LGTMs obtained, and pending CI: pre-commit-checks

Thanks, @aaronmondal . Just fixed that issue. ;)

@steedmicro steedmicro merged commit 3acefc7 into TraceMachina:main Jan 9, 2024
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

VerifyStore should support Blake3
5 participants