Encoding and CatchUp #88

lukechampine · 2014-12-01T22:41:41Z

4-byte unsigned length prefixes are now used for both network messages and encoded objects. I think it's safe to assume that we aren't going to be sending or marshalling objects larger than 4GB. The networking code already rejects messages longer than 1 << 24.

CatchUp and SendBlocks have also been moved to siad, which was trickier than anticipated because they were accessing private State fields. These have been rewritten to use getters. (I didn't bother writing a BlockID getter; instead I get blocks via BlockAtHeight() and then call ID() on them. A BlockID getter would be saner and more efficient, but I've left it unimplemented for now.)
I may have fixed a bug or two while rewriting these, but it's hard to say without testing.

Other notes: encoding.go has been split into three files. CreateGenesisState() has been moved to state.go. MaxCatchUpBlocks has been moved to synchronize.go.

also generalize Read/WritePrefix

DavidVorick · 2014-12-02T02:17:42Z

encoding/integers.go

+}
+
+// EncLen encodes a length (int) as a slice of 4 bytes.
+func EncLen(length int) (b []byte) {


I thought we were changing this to 8 bytes?

DavidVorick · 2014-12-02T02:27:41Z

The only major thing is using 8 bytes instead of 4.

lukechampine · 2014-12-02T02:57:29Z

I really don't see the value in that. Honest hosts will never send or marshal anything requiring 8 bytes. I can see the ideological appeal of using 8 bytes everywhere, but I don't think it has any grounding in practicality, and it incurs unnecessary overhead. Encoding all integers as 8 bytes doesn't make a lot of sense to me either, except in the case of int and uint. In #30 you mentioned "we just need to enforce 8 bytes-ness while hashing things." Can you explain that more?

DavidVorick · 2014-12-02T05:50:20Z

In principle it makes things easier for a newcomer to implement. No need to remember which things are 8 bytes and which things are 4 bytes.

wrt "we just need to enforce 8 bytes-ness while hashing things." I just meant that it's okay for there to be multiple encoding protocols (for example, down the line we're 100% going to compress messages before sending - even a 10% savings makes that worthwhile) when messaging, but it's not okay for there to be multiple protocols when hashing things (the original reason we switched away from JSON). For the time being though, it's best just to focus on having one protocol, because it'll simplify the codebase.

Switching from 4 bytes to 8 bytes only introduces a tiny bit of overhead, and once we start compressing things before sending them over the network that overhead will disappear, except while hashing (which takes longer as you add more bytes). But we're talking such a little increase in overhead.

I think it's worthwhile.

lukechampine · 2014-12-02T06:28:54Z

I should hope that if someone's implementing a client, they aren't working off memory, but rather consulting the spec. Oh well.

I'm inclined to believe that message compression is premature optimization. When you say "a 10% savings is worthwhile," what specific benefits are you talking about? The ability to process 10% more blocks/transactions per second? Won't that only be an issue if nodes are pulling down their maximum available bandwidth?
I don't think most messages will have a high compression ratio anyway:

type Block struct {
    ParentBlockID BlockID       // high entropy
    Timestamp     Timestamp     // low (?) entropy
    Nonce         uint64        // low entropy (depends on how it is chosen)
    MinerAddress  CoinAddress   // high entropy
    MerkleRoot    hash.Hash     // high entropy
    Transactions  []Transaction // see below
}

type Transaction struct {
    ArbitraryData []byte                 // probably low entropy, or left empty
    Inputs        []Input                // high entropy
    MinerFees     []Currency             // low entropy
    Outputs       []Output               // high entropy
    FileContracts []FileContract         // high entropy
    StorageProofs []StorageProof         // high entropy
    Signatures    []TransactionSignature // high entropy
}

I guess I'll wait for the benchmarks though.

For what it's worth, our current encoding scheme is quite efficient (and could be further optimized). It's also very well defined and straightforward to implement; easier, I dare say, than Bitcoin's.

lukechampine · 2014-12-02T06:34:10Z

encoding/marshal.go

 	case reflect.Slice:
 		// slices are variable length, but otherwise the same as arrays.
 		// just have to allocate them first, then we can fallthrough to the array logic.
 		var sliceLen int
-		sliceLen, b, consumed = int(b[0]), b[1:], 1 // remember to count the length byte as consumed
+		sliceLen, b, consumed = int(DecUint64(b[:8])), b[8:], 8
 		val.Set(reflect.MakeSlice(val.Type(), sliceLen, sliceLen))


see, this is where things get a bit silly. MakeSlice takes ints as arguments; the language itself prohibits you from allocating slices larger than that. So you have to cast the decoded length and drop the extra 4 bytes anyway. So every time we encode a slice, we add 4 zeros to the end and then lop them off when we decode. They serve no functional purpose whatsoever.

Not that we had any need for multi-exabyte slices in the first place.

It's definitely overhead and it's not needed. I'm only advocating 8 bytes because it's consistent and keeps the spec as simple as possible. Low complexity means less mistakes. Even if you're following a spec, a 80 line spec is easier to implement without mistakes than a 90 line spec. Maybe I'm being overly-aggressive with keeping things simple, but right now it's the heuristic I'm emphasizing.

I don't think it'll actually play out that way. The spec will look something like:

Encoding: - All integers are encoded as 8 bytes. - Variable-length values, such as strings, are prefixed with their length, encoded as an 8-byte unsigned integer. - ... Networking: - All messages are prefixed with their length, encoded as an 8-byte unsigned integer. - ...

All that changes are the specific values.
But this is getting pedantic.

DavidVorick · 2014-12-02T07:15:30Z

I'm not advocating implementing network compression any time soon, but a 10% savings means faster block and transaction propagation times, which is healthy for the network.

Timestamp is low entropy, nonce is high entropy (It's going to be more or less random, very slightly weighted towards 0 - less so as the hash rate goes up).

Encoding and CatchUp

lukechampine added 7 commits December 1, 2014 14:15

use 4-byte marshal prefixes

e42a7d9

use EncLen/DecLen for network prefixes

a145dcd

split encoding.go into separate files

e0371e2

also generalize Read/WritePrefix

more concise integer decoding

2c0eea2

move synchronize to siad

7733bcc

fix CatchUp + SendBlocks

08494a5

update use of CatchUp

10b2621

DavidVorick reviewed Dec 2, 2014
View reviewed changes

8 bytes everywhere

b29fd1b

lukechampine reviewed Dec 2, 2014
View reviewed changes

DavidVorick added a commit that referenced this pull request Dec 2, 2014

Merge pull request #88 from NebulousLabs/enc

9cf388e

Encoding and CatchUp

DavidVorick merged commit 9cf388e into master Dec 2, 2014

lukechampine mentioned this pull request Dec 2, 2014

All ints as 8 bytes #30

Closed

lukechampine deleted the enc branch December 2, 2014 20:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Encoding and CatchUp #88

Encoding and CatchUp #88

lukechampine commented Dec 1, 2014

DavidVorick Dec 2, 2014

DavidVorick commented Dec 2, 2014

lukechampine commented Dec 2, 2014

DavidVorick commented Dec 2, 2014

lukechampine commented Dec 2, 2014

lukechampine Dec 2, 2014

DavidVorick Dec 2, 2014

lukechampine Dec 2, 2014

DavidVorick commented Dec 2, 2014

Encoding and CatchUp #88

Encoding and CatchUp #88

Conversation

lukechampine commented Dec 1, 2014

DavidVorick Dec 2, 2014

Choose a reason for hiding this comment

DavidVorick commented Dec 2, 2014

lukechampine commented Dec 2, 2014

DavidVorick commented Dec 2, 2014

lukechampine commented Dec 2, 2014

lukechampine Dec 2, 2014

Choose a reason for hiding this comment

DavidVorick Dec 2, 2014

Choose a reason for hiding this comment

lukechampine Dec 2, 2014

Choose a reason for hiding this comment

DavidVorick commented Dec 2, 2014