Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal on integer field size adjustments and clever way to shrink value field #273

Open
DeftNerd opened this issue Oct 13, 2014 · 6 comments

Comments

@DeftNerd
Copy link
Contributor

I'm assuming that the field sizes are static, other than the null-terminated string fields.

Just for reference with some common values:
1 byte would allow for unsigned integer values from 0 to 255
2 bytes would allow for unsigned integer values from 0 to 65535
3 bytes would allow for unsigned integer values from 0 to 16777215
4 bytes would allow for unsigned integer values from 0 to 4294967295
5 bytes would allow for unsigned integer values from 0 to 1099511627775
6 bytes would allow for unsigned integer values from 0 to 281474976710655
The specification has a few fields that I think might be optimized.

Ecosystem

The Ecosystem field seems wasteful since it's more of just a boolean value that's using a whole byte. Why not just pay attention to the Currency Identifier field? If the Currency Identifier field is set to 2 (Test MSC) then you know the ecosystem is the Mastercoin Testnet. That lets you drop the entire Ecosystem field. 1 byte can be shaved off.

Currency Identifier

The currency identifier field is set to be a 32-bit unsigned integer that takes 4 bytes. I know it's possibly treading on "640k should be enough for everybody" territory, but is Mastercoin anticipating more than 65535 currency types? 2 bytes could be shaved off.

Transaction Types and Transaction Version

I can understand that you want to be able to support a multitude of transaction types. 2 bytes allows for 65535 transaction types in the future, which is a bit excessive. Then you add the field for Transaction Version with another 65535 types in the future. This is a bit excessive. Why not just use the Transaction Type field and everytime there is a change in the spec for a transaction type just add another type ID rather than increment the version number?

Or perhaps, if you want to keep things cleaner, use 1 byte to store the transaction type (255 possible types) with 1 byte for the transaction version (255 possible versions) and if the transaction version ever gets to 255, then create a new transaction type?

Both proposals would shave off 2 bytes.

Possibly crazy idea but kinda neat way to make the Value integer field a variable length

The size of the Number of Coins field could be shaved down if each coin had multiple listings in currency type. This would allow the system to determine how many bytes is needed for the number of coins field during reading or writing the data. For instance:

Currency ID Currency Value Field Size
1001 Bitcoin Less than 100 Satoshi 1 Byte
1002 Bitcoin Less than 10,000 Satoshi 2 Bytes
1003 Bitcoin Less than 10,000,000 Satoshi (0.1 BTC) 3 Bytes
1004 Bitcoin Less than 1,000,000,000 Satoshi (10 BTC) 4 Bytes
1005 Bitcoin Less than 1,000,000,000,000 Satoshi (10,000 BTC) 5 Bytes
1006 Bitcoin Less than 100,000,000,000,000 Satoshi (1,000,000 BTC) 6 Bytes
1011 USD Less than 100 cents 1 Byte
1012 USD Less than 10,000 cents ($100) 2 Bytes
1013 USD Less than 10,000,000 cents ($100k) 3 Bytes
1014 USD Less than 1,000,000,000 cents ($10m) 4 Bytes
1015 USD Less than 1,000,000,000,000 cents ($10b) 5 Bytes
1016 USD Less than 100,000,000,000,000 cents ($1t) 6 Bytes

So, if someone wanted to sell a Bible for 0.3 BTC, the system would convert that to satoshi's and see that it's 30,000,000 satoshi, which fits in Currency ID #1004. When it puts together the data payload to write into the blockchain, it'll make sure to use a 4 byte field and set the two byte Currency ID field to 1004.

When someone reads the mastercoin transaction, they'll look at the Currency ID and see that it's 1004 and know to expect the Value field to be 4 bytes long. They don't even need to look it up in a table, you can use the last integer of the Currency ID to see how many bytes to expect from the Value field!

Currently the currency ID and value takes a total of 12 bytes of space. For a sale with a value set to 0.05 BTC, the currency ID and Value field will just take 6 Bytes! For a 6 BTC transaction, it'll be a total of 7 bytes!

Conclusion

The Ecosystem, Currency ID, Transaction Type, and Value fields currently take 17 bytes. If these proposals are accepted, that can drop down anywhere from 5 to 10 bytes depending on the value listed in the transaction. That's a 7 to 12 byte savings!

Forgive me if any of my calculations aren't quite right, this is more of a theoretical proposal. Someone should bust out a calculator and double check everything :-)

@DeftNerd
Copy link
Contributor Author

Oh, an addendum. I realize that some parts of the Spec look at the Currency ID. For instance, when you post a property for sale and specify what currency you want.

This can be taken advantage of by specifying the Currency ID to be 1000, which the software will interpret as 1001 to 1009 which maps to all the Bitcoin currency ID's (in my example table above). If they list it as 1010, it would map to 1011 to 1019 which is all the USD entries (in my example table above), etc.

You just have to assign currencies blocks of 10 integers. The 0 integer is like the broadcast address to the currency assigned to that 10 integer space.

This allows for 6553 currency types if the Currency ID field is 2 bytes in size. 6543 if you start at 100 to makes the old currency types backwards compatible.

@DeftNerd
Copy link
Contributor Author

An alternative is using a variable-length integer algorithm like http://en.wikipedia.org/wiki/LEB128 and then just null terminating the field. This lets you not have to have a table of currency ID's mapped to byte sizes and you can just specify the currency ID and put in the value which is compressed down and then add one byte for the NULL. Most values would still shrink from the 8 bytes they are now down to 4 or 5 bytes.

@dexX7
Copy link
Member

dexX7 commented Oct 13, 2014

Yay, nice input, thanks! You mention a point where I see great potential for improvement as well.

In general I see two routes: define a new type for each action or use sub-action. Here is probably the best (worst) example for the second route:

Version: 1
Type: 20 (Sell Mastercoins for Bitcoin)
Currency identifier: 1 (Mastercoin) X
Amount for sale: 1.5 MSC X
Amount of bitcoins desired: 2 BTC X
Payment window: 10 blocks X
Commitment fee: 0.0005 BTC X
Action: 3 (Cancel)

Because there is a constraint of only one active (Bitcoin) offer per "address", all fields marked with X could be removed entirely. But that aside, let's quickly modify the transaction:

Example 1:

Version: 2
Type: 20 (Sell Mastercoins for Bitcoin)
Action: 1 (New)
Currency identifier: 1 (Mastercoin)
Amount for sale: 5.0 MSC
Amount of bitcoins desired: 5 BTC
...

Example 2:

Version: 2
Type: 20 (Sell Mastercoins for Bitcoin)
Action: 3 (Cancel)

In this case the action fields determines the structure of the rest of the transaction, but it would also be possible to define an explicit transaction type for each action, for example:

Type: 23 (New offer for Bitcoin)
Currency identifier: 1 (Mastercoin)
Amount for sale: 5.0 MSC
Amount of bitcoins desired: 5 BTC
...
Type: 24 (Update offer)
Currency identifier: 1 (Mastercoin)
Updated amount for sale: 5.0 MSC
Updated amount of bitcoins desired: 10 BTC
...
Type: 25 (Cancel offer)
...

Why not just use the Transaction Type field and everytime there is a change in the spec for a transaction type just add ...?

I would be surprised, if we'd run out of numbers, even if we'd use only one byte to encode both version as well as transaction type. A special value could be reserved to indicate a transition to a greater range, similar to what you proposed by merging currency identifiers with an "amount length indicator" - which could be either combined into one field like 1012 <> 2 byte wide amount in USD or more explicitly:

[length (or length hint instead of exact length) of next field] [field with variable length]

An alternative is using a variable-length integer algorithm like http://en.wikipedia.org/wiki/LEB128 and then just null terminating the field.

If I'm not mistaken, there would be no need for null-terminating the number due to how LEB128 is constructed: the last byte is always greater or equal than 128 and all bytes before are less than 128.

@dacoinminster
Copy link
Contributor

I agree we'll probably never have more than 65k REAL properties. unfortunately, since creating a property is very cheap, it's not very hard to see people burning through two bytes just messing around. 4 bytes on the other hand is much harder to burn through.

I've tried to not use the ecosystem flag any place the ecosystem could be reliably inferred from a currency ID. Did you find an exception? If so, we should fix that.

I like the variable-length integer idea, although I wonder how much range we give up . . . .

@dexX7
Copy link
Member

dexX7 commented Oct 14, 2014

I like the variable-length integer idea, although I wonder how much range we give up . . .

Actually none. It would - in the worst case - result in fields that are longer than in pure form, but without giving up range.

I played a bit with LEB128. It has an overhead of 1 bit per 1 byte, that means 4 byte no longer hold 32 bit, but only 28, ... This allows the identify the end of a varlength encoded field, even if it's not fully padded - sort of similar to the trick of using 0x00 to terminate strings.

A version of 2 would no longer be encoded as 0x0002, but simply as 0x02. A number of coins field with a value of 321 would be reduced from 0x0000000000000141 to an encoded form of 0xc102 (if I'm not mistaken).

Let me phrase it differently: I believe this would be a very, very significant improvement and could save lots of space without any significant downside. The wikipedia article isn't bad for some information about the encoding.

I already see some further tweaks here and there - for example we may seperate the ecosystem flag from currency identifiers or using an actual 1 bit wide flag instead of a full byte etc.

Now the really tricky question: how could such a transition done best, if at all in the near future?

@dacoinminster
Copy link
Contributor

I'm tentatively in favor of a change like this IF there are suitable
well-maintained third-party libraries for handling these numbers (I'd be
nervous about writing our own parsers), and IF we feel the transition could
be made without significant risk to people running older versions of
MasterCore.

I well-remember the thread about version changes, where we decided that new
versions and old versions cannot peacefully coexist. I believe we
eventually decided that we need everyone to upgrade when we make changes.

On Mon, Oct 13, 2014 at 8:53 PM, dexX7 notifications@github.com wrote:

I like the variable-length integer idea, although I wonder how much range
we give up . . .

Actually none. It would - in the worst case - result in fields that are
longer than in pure form, but without giving up range.

I played a bit with LEB128. It has an overhead of 1 bit per 1 byte, that
means 4 byte no longer hold 32 bit, but only 28, ... This allows the
identify the end of a varlength encoded field, even if it's not fully
padded - sort of similar to the trick of using 0x00 to terminate strings.

A version of 2 would no longer be encoded as 0x0002, but simply as 0x02. A
number of coins field with a value of 321 would be reduced from
0x0000000000000141 to an encoded form of 0xc102 (if I'm not mistaken).

Let me phrase it differently: I believe this would be a very, very
significant improvement and could save lots of space without any
significant downside. The wikipedia article isn't bad for some information
about the encoding.

I already see some further tweaks here and there - for example we may
seperate the ecosystem flag from currency identifiers or using an actual 1
bit wide flag instead of a full byte etc.

Now the really tricky question: how could such a transition be done? I'm
playing with the idea of an optimized MSC for quite some time, but I wish
we could solve the utmost biggest challenge (imho) first: finding a
solution for the case where any change results in a completely different
global state from the POV of an old-version client. The black hole
thread... I'm sure you recall.


Reply to this email directly or view it on GitHub
#273 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants