Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upThere should be a standard way to recusively hash structs #29
Comments
This comment has been minimized.
This comment has been minimized.
burdges
commented
Jun 29, 2017
•
|
I'd consider this unwise based on previous discussions. Right now, Conversely, all these cryptographic hash functions would destroy the performance of That said, one could provide a version using roughly the endian-hasher crate, unless its horribly broken.
As previously, we'd keep There is still an issue here in that we trust I could hack up a pull request for this if @newpavlov thinks it wise. I think the first question is : Should the |
This comment has been minimized.
This comment has been minimized.
burdges
commented
Jun 29, 2017
•
|
I'm tired right now, but I think argument against my alternative goes roughly : You should document your protocols and file formats better by using a more explicit serialization. That could just be a warning somewhere. |
This comment has been minimized.
This comment has been minimized.
|
Sorry to be this blunt, but this is a terrible idea.
What? The trait definition stops this, because types. Nor is there any standard notion of how to map a cryptographic hash function's digest onto a 64-bit integer. But beyond that, cryptographic hash functions utilized by The fundamental tradeoff is:
To me, if your goal is #2, anything slower than SipHash is unacceptable, because you still want a Do you disagree? Can you explain the use case where you want something that's simultaneously slow and insecure? |
This comment has been minimized.
This comment has been minimized.
|
@burdges the endianess issues are indeed a problem. If there's no way to fix those it's unfortunate. I'm not sure if I understand why this happens. Are primitive types hashing by feeding the hasher one u8 at a time? Because the I'll have a look if your suggestion works for my use case.
Nothing stops me from doing a You're assuming I want to use the normal Hasher API with SHA256 to use in a Hashmap or something like that. That would be a bad idea indeed, SipHash is strictly better in that application. What I need is a longer hash that I can use to uniquely identify a certain I generate hashes that represent the state of the image pipeline up to that point as a function of it's settings. I'm currently using MetroHash but I want at least 128bit and a crypto hash for this. I'm only using the |
This comment has been minimized.
This comment has been minimized.
burdges
commented
Jun 29, 2017
|
It worse so long as people use You can always locally employ a version of In fact, I doubt I think those are the arguments against |
This comment has been minimized.
This comment has been minimized.
|
@burdges It seems the standard implementation does just that: So it seems |
This comment has been minimized.
This comment has been minimized.
burdges
commented
Jun 29, 2017
|
Yikes, I completely missed that! I'm tempted to raise that as a libs issue. In any case, I suppose serde should be considered the right way to do this for now. |
This comment has been minimized.
This comment has been minimized.
|
@burdges That's very interesting how would you use serde for this? All my structs already implement |
This comment has been minimized.
This comment has been minimized.
|
@pedrocr perhaps something like this is what you want: https://github.com/cryptosphere/objecthash-rs Note that I never wrote a procedural macro for it. I also intend on doing something slightly different and a bit more fleshed out soon. |
This comment has been minimized.
This comment has been minimized.
|
@tarcieri yes, this is precisely my use case, thanks. Wouldn't it be easier to just write a serde |
This comment has been minimized.
This comment has been minimized.
|
@pedrocr it may be possible to finagle it through serde visitors. I would advise against using it for now though. I'll have a better library out soon. |
This comment has been minimized.
This comment has been minimized.
|
@tarcieri it seems objecthash solves a bigger problem of having redactable structs. For me that's not an issue so maybe just writing a |
This comment has been minimized.
This comment has been minimized.
burdges
commented
Jun 29, 2017
|
Yes, even using the As an aside, I'm pretty happy doing all my cryptographic hashing manually so far because frequently if I need a cryptographic hashes then it actually needs data from multiple sources or maybe should not cover all the data present in the |
This comment has been minimized.
This comment has been minimized.
|
@pedrocr as is, the Rust implementation doesn't support redaction, but yes, redaction/Merkle inclusion proofs are one of the nice features you get out of a hashing scheme like that. |
This comment has been minimized.
This comment has been minimized.
|
@burdges cool, I'll have a look at this then. Would a PR with something like an adapter between As for doing it manually I was hoping to do it automatically as these structs are specifically designed to be all the settings for the image operation and nothing more. We'll see :) |
This comment has been minimized.
This comment has been minimized.
burdges
commented
Jun 29, 2017
•
|
I found an easy way : You It'll look vaguely like what I wrote above, but using Also, your original question could now be : Why not implement |
This comment has been minimized.
This comment has been minimized.
|
@burdges sounds good, avoids having to write a full |
This comment has been minimized.
This comment has been minimized.
|
@burdges also is bincode guaranteed to always send the same bytes? Or can it do things like reorder fields between versions? It may be safer to just implement the |
This comment has been minimized.
This comment has been minimized.
|
@pedrocr you seem to be running afoul of the canonicalization problem. One of the nice things about objecthash-like schemes is you don't need to canonicalize prior to hashing. Structurally similar objects will always have the same hash. |
This comment has been minimized.
This comment has been minimized.
|
@tarcieri see this for my conclusions on the topic: https://internals.rust-lang.org/t/f32-f64-should-implement-hash/5436/33 My worry in this case is only that the same exact struct will hash differently because bincode changes. Making sure that reordering fields doesn't change the hash for example isn't as important. Since my structs already implement Having a crate that does derive and allows me to use any hash would be nice though so please do ping me once you have something to test. The other features can't hurt either. |
This comment has been minimized.
This comment has been minimized.
burdges
commented
Jun 29, 2017
|
Yes, serde explicitly supports non-self-describing formats, naming bincode as the example. There are various instructions for Also bincode never emits any thing about the fields besides lengths and contents. I'd imagine bincode would use a feature or a major version update if they broke binary comparability, but who knows. |
This comment has been minimized.
This comment has been minimized.
|
@burdges so hashing bincode does seem fairly reasonable. Will have a look at that then. |
This comment has been minimized.
This comment has been minimized.
|
@pedrocr |
This comment has been minimized.
This comment has been minimized.
burdges
commented
Jul 3, 2017
|
There is a lingering question as to whether the |
This comment has been minimized.
This comment has been minimized.
|
@burdges I think the current |
This comment has been minimized.
This comment has been minimized.
|
It would also be nice if a |
This comment has been minimized.
This comment has been minimized.
|
Should this issue be closed? I think it's semantically incorrect to implement |
pedrocr
changed the title
Why not implement Hasher?
There should be a standard way to recusively hash structs
Jul 4, 2017
This comment has been minimized.
This comment has been minimized.
|
@tarcieri the core of the issue is still unresolved. I've changed the title to reflect that. Right now the best way to use crypto hashes on structs is |
This comment has been minimized.
This comment has been minimized.
|
@pedrocr fn digest_serialized<S: Serialize, F: Serializer>(s: S, f: F) -> Result<Output<Self::OutputSize>, F::Error>;UPD: edited code a bit, removed bincode method |
This comment has been minimized.
This comment has been minimized.
"Recursively hashing structs" is pretty much the definition of a structured hashing scheme. However, you seem to be using it to mean "a content hash for serialized bincode" which, to me, is something very different from "recursively hashing structs"
As covered earlier, this is simply the wrong tool for the job.
So you want to invent your own scheme, which is what I was recommending against. At least think about taking one of these two paths as you invent your own scheme:
Done correctly you can solve all of the problems I just mentioned and have a format that works across languages. Or you can invent your own thing that's specific to Rust, ala bincode, but please think through the issues that have been addressed in all of the content hashing schemes that have been created in the past. You are walking well-worn ground here, where many mistakes have been made by past naive attempts and much hard-won knowledge has been accrued. Before you go invent something, please make sure you study the past. |
This comment has been minimized.
This comment has been minimized.
I've stated multiple times I don't want to use bincode for anything and it's just a hack to get it to work right now. So no, I don't want to hash bincode, all I want is a recursive crypto hash of arbitrary rust structs. I don't want to make any canonical representations of data. I don't want to serialize and then hash. I don't want to invent a hashing scheme, just have a hashing trait that isn't broken like the default |
This comment has been minimized.
This comment has been minimized.
Then, by definition, you need an algorithm that knows how to compute cryptographic hashes of structured data. Better make sure it isn't vulnerable to second-preimage attacks. |
This comment has been minimized.
This comment has been minimized.
|
@tarcieri as far as I can tell that only applies if you are hashing individual structs and then hashing the concatenated hashes. I don't want that at all. I just want to initialize the hash, pass in all the values recursively, and finalize the hash. This has the same properties as serializing and then hashing without the performance penalty. |
This comment has been minimized.
This comment has been minimized.
|
You need to domain separate the values somehow, either by length or by structure, and in either case you'll want to encode the types. The former pretty much involves a bincode-like scheme, and the latter an objecthash one. Otherwise, you'll have ambiguities where different structures compute the same hash. |
This comment has been minimized.
This comment has been minimized.
None of these are an issue for me.
That's how |
This comment has been minimized.
This comment has been minimized.
|
Ok, so it sounds like you want to abandon security... in which case why are you using a cryptographic hash? |
This comment has been minimized.
This comment has been minimized.
|
@tarcieri none of these abandon security but even if they did I'm using hashes for caching. All I need is a statistical guarantee of non-collision absent an attacker. |
This comment has been minimized.
This comment has been minimized.
|
If you're using hashes like this in any sort of security context and have ambiguities in a structured hashing format, an attacker can exploit them to trick you into accepting documents containing e.g. nested content and have it verify as if it was the same as a non-nested message. Second preimage attacks on naive Merkle Trees are an example of this, and it's similar in concept to other attacks regarding confusion of message structure such as packet injection attacks. If you want to build a naive hashing scheme using cryptographically secure hashes for your own purposes, that's fine, but I would be strongly opposed to the inclusion of naive schemes in cryptography libraries, because people generally expect cryptography libraries to be built for security purposes. |
This comment has been minimized.
This comment has been minimized.
|
@tarcieri I suggest you bring this up with the rust lib developers then. All I want is to get a hash calculated the exact same way |
This comment has been minimized.
This comment has been minimized.
You're talking about creating a
These same attacks can happen without hashes-of-hashes. As I pointed out the general attack is one of failure to disambiguate message structures. Packet injection is the same basic idea, without any hashing involved whatsoever. That's a parsing ambiguity, but the same concept applies: the security system is confused about the message structure, allowing an attacker to trick it into doing something it shouldn't. In my opinion if any messages with different structures can realistically produce the same content hash, the scheme is broken from a security perspective. |
This comment has been minimized.
This comment has been minimized.
From what I've seen the current implementation doesn't actually randomize the key. But even if it did that's not relevant if it's trivial to generate a collision by just manipulating the structure.
I'd agree I just haven't found a case where that can happen with a setup like that of |
This comment has been minimized.
This comment has been minimized.
According to the documentation, the implementation used by https://doc.rust-lang.org/std/collections/struct.HashMap.html
|
This comment has been minimized.
This comment has been minimized.
|
At least on my machine the hash is not seeded. This always returns the same value: https://gist.github.com/pedrocr/8d0283bed3f56e5cb6fb9fe0a785f947 But that's not relevant to the point as I specifically mentioned. If you can generate structs that have the same hashing because you manipulate the structure it doesn't matter that your hash function is randomly keyed. Within the same execution of the program those structs will colide, just with different values. And that's what would be the bug in |
This comment has been minimized.
This comment has been minimized.
Use a And to reiterate yet again, in the context where You're colluding threat models here: In the one I would find acceptable for I'm not sure this is continuing to be a productive conversation: I would like the threat of ambiguous messages to be solved strategically for anything named I would find a hand-rolled naive and ambiguous scheme unacceptable. You haven't made a concrete proposal, but you're downplaying everything you need to do to solve these problems from a security perspective, and that's why I'm worried. |
This comment has been minimized.
This comment has been minimized.
And that's the attack that if it works with
My proposal is very simple. Do exactly what |
This comment has been minimized.
This comment has been minimized.
Incorrect. Again, hashDoS is an algorithmic attack that relies on the attacker being able to find large numbers of collisions. Finding a single collision is not catastrophic, because it doesn't help the attacker that much. Furthermore, the interesting cases for hashDoS are With a These are totally different threat models you are trying to collude.
The onus is on you to show why your scheme is secure. Otherwise you're shifting the burden of proof. Your (ill-defined) scheme is not secure simply because I haven't found an attack. You need to be able to answer questions like: how is the scheme domain separated everywhere type-by-type? How does it distinguish a A secure scheme will be demonstrably unambiguous and free of collisions for arbitrarily structured messages. Anything less is insecure. This is a much higher bar than |
This comment has been minimized.
This comment has been minimized.
If you care about collisions across types then yes, it's easy to generate collisions that are benign in |
This comment has been minimized.
This comment has been minimized.
|
After looking at
This makes it easier to have values for which |
This comment has been minimized.
This comment has been minimized.
Trojan295
commented
Jul 5, 2017
•
I would argue, if it's a bug, cause the byte representation of the data is the same. Hash functions work on bytes, not data structures. |
This comment has been minimized.
This comment has been minimized.
|
@Trojan295 I'm not sure if you're arguing that it's a bug or not. |
This comment has been minimized.
This comment has been minimized.
Trojan295
commented
Jul 5, 2017
|
Are you trying to use those structured hashing functions only for internal Rust purposes like Let's take this Generally, I don't think such feature is required in case cryptographic hashes, as in case of crypto you mostly operate on numbers/bytes and not custom data structures. The way of hashing needs to be well known and unified. |
This comment has been minimized.
This comment has been minimized.
|
@Trojan295 short of But as I've previously stated, and if it's the thing I have a bug up my butt about, it's conflating security domains and concerns, and that's really what I want to avoid. I am a huge fan of something like a |
This comment has been minimized.
This comment has been minimized.
|
I will close this issue and will create a separate issue in the |
newpavlov
closed this
Jul 13, 2017
This comment has been minimized.
This comment has been minimized.
burdges
commented
Jul 13, 2017
|
I think a utility crate could provide a wrapper type |
This comment has been minimized.
This comment has been minimized.
|
@burdges |
pedrocr commentedJun 29, 2017
std::hash::Hashercan be derived for structs and is a standard hashing interface in rust. The standard interface only allows 64bit outputs but there's nothing stopping extra outpust tailored to specific hashes. So for ergonomic purposes wouldn't it make sense to have an adapter to allow using theHasherAPI?