Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Blake3 digest support #403

Merged
merged 1 commit into from Nov 20, 2023
Merged

Add Blake3 digest support #403

merged 1 commit into from Nov 20, 2023

Conversation

allada
Copy link
Collaborator

@allada allada commented Nov 17, 2023

Fully support Blake3 digest upload format. This can increase
performance significantly for cases where hashing files is a
bottleneck.

closes #395


This change is Reviewable

Copy link
Collaborator Author

@allada allada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+@aaronmondal , @blakehatch

Let me know and I'll show you how to review just the last commit (the other commits are already up in review).

Reviewable status: 0 of 40 files reviewed, all discussions resolved (waiting on @aaronmondal)

Copy link
Contributor

@aaronmondal aaronmondal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 13 of 13 files at r1, 4 of 4 files at r2, 34 of 34 files at r3, all commit messages.
Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @allada)


cas/cas_main.rs line 646 at r3 (raw file):

            global_cfg
                .default_digest_hash_function
                .unwrap_or(ConfigDigestHashFunction::sha256),

I wonder whether we'd want to make blake3 the default here. I'm aware that bazel defaults to SHA2 (and blake3 support was only just added in 6.4), but this would force users to explicitly set sha2 in their config. Considering that blake3 is ~15x faster than SHA2 this seems reasonable to me to make users think twice whether they actually want to keep using the significantly slower SHA2.


cas/scheduler/tests/utils/scheduler_utils.rs line 42 at r3 (raw file):

        },
        skip_cache_lookup: false,
        digest_function: DigestHasherFunc::Sha256,

Is this intentionally hardcoded?


cas/worker/tests/local_worker_test.rs line 216 at r3 (raw file):

            platform_properties: PlatformProperties::default(),
            priority: 0,
            load_timestamp: SystemTime::UNIX_EPOCH,

Will this still work on Windows?


cas/worker/tests/local_worker_test.rs line 305 at r3 (raw file):

            },
            skip_cache_lookup: true,
            digest_function: DigestHasherFunc::Sha256,

Note: Not for this commit, but we should look into a way to make tests parametric. This way we could "matrix-test" stuff like this.


config/cas_server.rs line 550 at r3 (raw file):

    /// by client.
    ///
    /// Default: ConfigDigestHashFunction::Sha256

Nit: I vote for making Blake3 the default and make users set sha2 if they want 15x slower hashing 😈


util/digest_hasher.rs line 90 at r3 (raw file):

            )),
        }
    }

4 variations of From/TryFrom seems overkill.

Copy link
Collaborator Author

@allada allada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @aaronmondal)


cas/cas_main.rs line 646 at r3 (raw file):

Previously, aaronmondal (Aaron Siddhartha Mondal) wrote…

I wonder whether we'd want to make blake3 the default here. I'm aware that bazel defaults to SHA2 (and blake3 support was only just added in 6.4), but this would force users to explicitly set sha2 in their config. Considering that blake3 is ~15x faster than SHA2 this seems reasonable to me to make users think twice whether they actually want to keep using the significantly slower SHA2.

I would rather trade ease of use in exchange for slower builds.

Bazel and buck2 are the only clients that support blake3 to my knowledge and Bazel requires a command line argument to enable it.

Also, most companies cannot just upgrade bazel at any time, it's a big deal to go through an upgrade.

Copy link
Collaborator Author

@allada allada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @aaronmondal)


cas/scheduler/tests/utils/scheduler_utils.rs line 42 at r3 (raw file):

Previously, aaronmondal (Aaron Siddhartha Mondal) wrote…

Is this intentionally hardcoded?

Yes, its a test util.


cas/worker/tests/local_worker_test.rs line 216 at r3 (raw file):

Previously, aaronmondal (Aaron Siddhartha Mondal) wrote…

Will this still work on Windows?

It should, Unix timestamps are supported on windows.


cas/worker/tests/local_worker_test.rs line 305 at r3 (raw file):

Previously, aaronmondal (Aaron Siddhartha Mondal) wrote…

Note: Not for this commit, but we should look into a way to make tests parametric. This way we could "matrix-test" stuff like this.

I thought of that but was worried of the complexity it'd introduce and difficulty to debug.

Id use integration tests to validate entire builds to for matrix-like tests.


config/cas_server.rs line 550 at r3 (raw file):

Previously, aaronmondal (Aaron Siddhartha Mondal) wrote…

Nit: I vote for making Blake3 the default and make users set sha2 if they want 15x slower hashing 😈

but then most clients wont work.

Remember there is a capabilities property that tells the client we support blake3. the client can choose whatever. This is only used when none is specified.


util/digest_hasher.rs line 90 at r3 (raw file):

Previously, aaronmondal (Aaron Siddhartha Mondal) wrote…

4 variations of From/TryFrom seems overkill.

I was trying to hide complexity here. Otherwise some places will need to define the complexity where the .into()/.from() is called.

Copy link
Contributor

@aaronmondal aaronmondal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r4, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @allada)


cas/worker/tests/local_worker_test.rs line 34 at r3 (raw file):

use action_messages::{ActionInfo, ActionInfoHashKey, ActionResult, ActionStage, ExecutionMetadata};
use common::{encode_stream_proto, fs, DigestInfo};
use config::cas_server::{LocalWorkerConfig, WrokerProperty};

WorkerProperty is correct. This was previously changed in #390

Copy link
Contributor

@aaronmondal aaronmondal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 22 of 22 files at r5, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @allada)

Fully support Blake3 digest upload format. This can increase
performance significantly for cases where hashing files is a
bottleneck.

closes #395
Copy link
Collaborator Author

@allada allada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 3 files at r6, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @allada)

@allada allada merged commit 2c8f0f0 into main Nov 20, 2023
15 checks passed
@allada allada deleted the add-blake3-support branch November 20, 2023 17:22
TripleKai pushed a commit to TripleKai/turbo-cache that referenced this pull request Nov 29, 2023
Fully support Blake3 digest upload format. This can increase
performance significantly for cases where hashing files is a
bottleneck.

closes TraceMachina#395
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants