-
Notifications
You must be signed in to change notification settings - Fork 439
Conversation
} | ||
|
||
// EncLen encodes a length (int) as a slice of 4 bytes. | ||
func EncLen(length int) (b []byte) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we were changing this to 8 bytes?
The only major thing is using 8 bytes instead of 4. |
I really don't see the value in that. Honest hosts will never send or marshal anything requiring 8 bytes. I can see the ideological appeal of using 8 bytes everywhere, but I don't think it has any grounding in practicality, and it incurs unnecessary overhead. Encoding all integers as 8 bytes doesn't make a lot of sense to me either, except in the case of |
In principle it makes things easier for a newcomer to implement. No need to remember which things are 8 bytes and which things are 4 bytes. wrt "we just need to enforce 8 bytes-ness while hashing things." I just meant that it's okay for there to be multiple encoding protocols (for example, down the line we're 100% going to compress messages before sending - even a 10% savings makes that worthwhile) when messaging, but it's not okay for there to be multiple protocols when hashing things (the original reason we switched away from JSON). For the time being though, it's best just to focus on having one protocol, because it'll simplify the codebase. Switching from 4 bytes to 8 bytes only introduces a tiny bit of overhead, and once we start compressing things before sending them over the network that overhead will disappear, except while hashing (which takes longer as you add more bytes). But we're talking such a little increase in overhead. I think it's worthwhile. |
I should hope that if someone's implementing a client, they aren't working off memory, but rather consulting the spec. Oh well. I'm inclined to believe that message compression is premature optimization. When you say "a 10% savings is worthwhile," what specific benefits are you talking about? The ability to process 10% more blocks/transactions per second? Won't that only be an issue if nodes are pulling down their maximum available bandwidth?
I guess I'll wait for the benchmarks though. For what it's worth, our current encoding scheme is quite efficient (and could be further optimized). It's also very well defined and straightforward to implement; easier, I dare say, than Bitcoin's. |
case reflect.Slice: | ||
// slices are variable length, but otherwise the same as arrays. | ||
// just have to allocate them first, then we can fallthrough to the array logic. | ||
var sliceLen int | ||
sliceLen, b, consumed = int(b[0]), b[1:], 1 // remember to count the length byte as consumed | ||
sliceLen, b, consumed = int(DecUint64(b[:8])), b[8:], 8 | ||
val.Set(reflect.MakeSlice(val.Type(), sliceLen, sliceLen)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see, this is where things get a bit silly. MakeSlice
takes ints as arguments; the language itself prohibits you from allocating slices larger than that. So you have to cast the decoded length and drop the extra 4 bytes anyway. So every time we encode a slice, we add 4 zeros to the end and then lop them off when we decode. They serve no functional purpose whatsoever.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not that we had any need for multi-exabyte slices in the first place.
It's definitely overhead and it's not needed. I'm only advocating 8 bytes because it's consistent and keeps the spec as simple as possible. Low complexity means less mistakes. Even if you're following a spec, a 80 line spec is easier to implement without mistakes than a 90 line spec. Maybe I'm being overly-aggressive with keeping things simple, but right now it's the heuristic I'm emphasizing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it'll actually play out that way. The spec will look something like:
Encoding:
- All integers are encoded as 8 bytes.
- Variable-length values, such as strings, are prefixed with their length,
encoded as an 8-byte unsigned integer.
- ...
Networking:
- All messages are prefixed with their length, encoded as an 8-byte unsigned integer.
- ...
All that changes are the specific values.
But this is getting pedantic.
I'm not advocating implementing network compression any time soon, but a 10% savings means faster block and transaction propagation times, which is healthy for the network. Timestamp is low entropy, nonce is high entropy (It's going to be more or less random, very slightly weighted towards 0 - less so as the hash rate goes up). |
4-byte unsigned length prefixes are now used for both network messages and encoded objects. I think it's safe to assume that we aren't going to be sending or marshalling objects larger than 4GB. The networking code already rejects messages longer than 1 << 24.
CatchUp
andSendBlocks
have also been moved to siad, which was trickier than anticipated because they were accessing private State fields. These have been rewritten to use getters. (I didn't bother writing a BlockID getter; instead I get blocks viaBlockAtHeight()
and then callID()
on them. A BlockID getter would be saner and more efficient, but I've left it unimplemented for now.)I may have fixed a bug or two while rewriting these, but it's hard to say without testing.
Other notes: encoding.go has been split into three files.
CreateGenesisState()
has been moved to state.go.MaxCatchUpBlocks
has been moved to synchronize.go.