Skip to content
This repository has been archived by the owner on Nov 2, 2018. It is now read-only.

Encoding and CatchUp #88

Merged
merged 8 commits into from
Dec 2, 2014
Merged

Encoding and CatchUp #88

merged 8 commits into from
Dec 2, 2014

Conversation

lukechampine
Copy link
Member

4-byte unsigned length prefixes are now used for both network messages and encoded objects. I think it's safe to assume that we aren't going to be sending or marshalling objects larger than 4GB. The networking code already rejects messages longer than 1 << 24.

CatchUp and SendBlocks have also been moved to siad, which was trickier than anticipated because they were accessing private State fields. These have been rewritten to use getters. (I didn't bother writing a BlockID getter; instead I get blocks via BlockAtHeight() and then call ID() on them. A BlockID getter would be saner and more efficient, but I've left it unimplemented for now.)
I may have fixed a bug or two while rewriting these, but it's hard to say without testing.

Other notes: encoding.go has been split into three files. CreateGenesisState() has been moved to state.go. MaxCatchUpBlocks has been moved to synchronize.go.

}

// EncLen encodes a length (int) as a slice of 4 bytes.
func EncLen(length int) (b []byte) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we were changing this to 8 bytes?

@DavidVorick
Copy link
Member

The only major thing is using 8 bytes instead of 4.

@lukechampine
Copy link
Member Author

I really don't see the value in that. Honest hosts will never send or marshal anything requiring 8 bytes. I can see the ideological appeal of using 8 bytes everywhere, but I don't think it has any grounding in practicality, and it incurs unnecessary overhead. Encoding all integers as 8 bytes doesn't make a lot of sense to me either, except in the case of int and uint. In #30 you mentioned "we just need to enforce 8 bytes-ness while hashing things." Can you explain that more?

@DavidVorick
Copy link
Member

In principle it makes things easier for a newcomer to implement. No need to remember which things are 8 bytes and which things are 4 bytes.

wrt "we just need to enforce 8 bytes-ness while hashing things." I just meant that it's okay for there to be multiple encoding protocols (for example, down the line we're 100% going to compress messages before sending - even a 10% savings makes that worthwhile) when messaging, but it's not okay for there to be multiple protocols when hashing things (the original reason we switched away from JSON). For the time being though, it's best just to focus on having one protocol, because it'll simplify the codebase.

Switching from 4 bytes to 8 bytes only introduces a tiny bit of overhead, and once we start compressing things before sending them over the network that overhead will disappear, except while hashing (which takes longer as you add more bytes). But we're talking such a little increase in overhead.

I think it's worthwhile.

@lukechampine
Copy link
Member Author

I should hope that if someone's implementing a client, they aren't working off memory, but rather consulting the spec. Oh well.

I'm inclined to believe that message compression is premature optimization. When you say "a 10% savings is worthwhile," what specific benefits are you talking about? The ability to process 10% more blocks/transactions per second? Won't that only be an issue if nodes are pulling down their maximum available bandwidth?
I don't think most messages will have a high compression ratio anyway:

type Block struct {
    ParentBlockID BlockID       // high entropy
    Timestamp     Timestamp     // low (?) entropy
    Nonce         uint64        // low entropy (depends on how it is chosen)
    MinerAddress  CoinAddress   // high entropy
    MerkleRoot    hash.Hash     // high entropy
    Transactions  []Transaction // see below
}

type Transaction struct {
    ArbitraryData []byte                 // probably low entropy, or left empty
    Inputs        []Input                // high entropy
    MinerFees     []Currency             // low entropy
    Outputs       []Output               // high entropy
    FileContracts []FileContract         // high entropy
    StorageProofs []StorageProof         // high entropy
    Signatures    []TransactionSignature // high entropy
}

I guess I'll wait for the benchmarks though.

For what it's worth, our current encoding scheme is quite efficient (and could be further optimized). It's also very well defined and straightforward to implement; easier, I dare say, than Bitcoin's.

case reflect.Slice:
// slices are variable length, but otherwise the same as arrays.
// just have to allocate them first, then we can fallthrough to the array logic.
var sliceLen int
sliceLen, b, consumed = int(b[0]), b[1:], 1 // remember to count the length byte as consumed
sliceLen, b, consumed = int(DecUint64(b[:8])), b[8:], 8
val.Set(reflect.MakeSlice(val.Type(), sliceLen, sliceLen))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see, this is where things get a bit silly. MakeSlice takes ints as arguments; the language itself prohibits you from allocating slices larger than that. So you have to cast the decoded length and drop the extra 4 bytes anyway. So every time we encode a slice, we add 4 zeros to the end and then lop them off when we decode. They serve no functional purpose whatsoever.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that we had any need for multi-exabyte slices in the first place.

It's definitely overhead and it's not needed. I'm only advocating 8 bytes because it's consistent and keeps the spec as simple as possible. Low complexity means less mistakes. Even if you're following a spec, a 80 line spec is easier to implement without mistakes than a 90 line spec. Maybe I'm being overly-aggressive with keeping things simple, but right now it's the heuristic I'm emphasizing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it'll actually play out that way. The spec will look something like:

Encoding:
- All integers are encoded as 8 bytes.
- Variable-length values, such as strings, are prefixed with their length,
  encoded as an 8-byte unsigned integer.
- ...

Networking:
- All messages are prefixed with their length, encoded as an 8-byte unsigned integer.
- ...

All that changes are the specific values.
But this is getting pedantic.

@DavidVorick
Copy link
Member

I'm not advocating implementing network compression any time soon, but a 10% savings means faster block and transaction propagation times, which is healthy for the network.

Timestamp is low entropy, nonce is high entropy (It's going to be more or less random, very slightly weighted towards 0 - less so as the hash rate goes up).

DavidVorick added a commit that referenced this pull request Dec 2, 2014
@DavidVorick DavidVorick merged commit 9cf388e into master Dec 2, 2014
@lukechampine lukechampine mentioned this pull request Dec 2, 2014
@lukechampine lukechampine deleted the enc branch December 2, 2014 20:53
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants