Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

assets: use blake3 instead of md5 #10208

Merged
merged 3 commits into from Oct 23, 2023
Merged

Conversation

mockersf
Copy link
Member

@mockersf mockersf commented Oct 21, 2023

Objective

Solution

  • Replace md5 by blake3

Putting this PR in the 0.12 as once it's released, changing the hash algorithm will be a painful breaking change

@mockersf mockersf added the A-Assets Load files from disk to use for things like images, models, and sounds label Oct 21, 2023
@mockersf mockersf added this to the 0.12 milestone Oct 21, 2023
@Elabajaba
Copy link
Contributor

Wouldn't it make more sense to use a faster hash like xxhash or wyhash? (ahash specifically tells you not to use it for hashing files as it doesn't have a fixed standard) Siphash is quite slow and afaik provides questionable value re: security (and if we want something actually secure blake3 seems to be faster, but using a cryptographic hash for assets seems questionable).

I'm not sure how hot asset hashes might be with eg. an asset streaming system, but imo for a game engine it's probably better to err on the side of more performance (especially since siphash only seems to manage ~15% of the throughput of a midrange nvme SSD these days).

@mockersf
Copy link
Member Author

I don't really have opinions on hashes, except avoiding md5. Which would you recommend?

@james7132
Copy link
Member

james7132 commented Oct 21, 2023

If the intent is to be used for asset deduplication, I'd agree with @Elabajaba here. xxHash64 and wyhash are about the same speed and quality in terms of output. I personally would veer towards xxHash given its use of platform specific acceleration, all other properties being equal.

There's also meow_hash, which was explicitly made for the use case of large asset deduplication in game engines and heavily leverages platform specific instructions for acceleration. I'm no cryptoanalyst, so the exact security properties of the hash are alien to me. On a more practical side, there is a Rust implementation, but it seems to be under MPL and require C/C++ compiler toolchains to be available to build, so an alternative will probably be required.

@Elabajaba
Copy link
Contributor

For xxhash (specifically xxh3) it looks like there's 3 main choices in rust?

twox-hash doesn't seem to have SIMD implementations making their xxh3 implementation significantly slower.

xxhash-rust has compile time SIMD for sse and avx2 (but nothing for ARM), and seems to be fine but doesn't mark some unsafe functions as unsafe (the Boost Software License seems to basically be MIT, and has nothing to do with the Business Source License)

xxh3 has significantly fewer downloads, but has runtime SIMD detection for AVX2 and NEON (arm).

@alice-i-cecile
Copy link
Member

Agreed that this should be in 0.12, but no strong feelings on what hash algorithm to use.

@mockersf
Copy link
Member Author

switched to https://crates.io/crates/blake3

@mockersf mockersf changed the title assets: use SipHasher instead of md5 assets: use blake3 instead of md5 Oct 22, 2023
@alice-i-cecile alice-i-cecile added this pull request to the merge queue Oct 23, 2023
Merged via the queue into bevyengine:main with commit d938275 Oct 23, 2023
25 checks passed
ameknite pushed a commit to ameknite/bevy that referenced this pull request Nov 6, 2023
# Objective

- Replace md5 by another hasher, as suggested in
bevyengine#8624 (comment)
- md5 is not secure, and is slow. use something more secure and faster

## Solution

- Replace md5 by blake3


Putting this PR in the 0.12 as once it's released, changing the hash
algorithm will be a painful breaking change
@mockersf mockersf mentioned this pull request Nov 6, 2023
26 tasks
rdrpenguin04 pushed a commit to rdrpenguin04/bevy that referenced this pull request Jan 9, 2024
# Objective

- Replace md5 by another hasher, as suggested in
bevyengine#8624 (comment)
- md5 is not secure, and is slow. use something more secure and faster

## Solution

- Replace md5 by blake3


Putting this PR in the 0.12 as once it's released, changing the hash
algorithm will be a painful breaking change
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Assets Load files from disk to use for things like images, models, and sounds
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants