Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

64 bits unsigned int #64

Merged
merged 10 commits into from
Aug 22, 2018
Merged

64 bits unsigned int #64

merged 10 commits into from
Aug 22, 2018

Conversation

pmconrad
Copy link

Test case and initial implementation for bitshares/bitshares-core#1088

@@ -6,5 +6,5 @@ namespace fc
void to_variant( const signed_int& var, variant& vo, uint32_t max_depth ) { vo = var.value; }
void from_variant( const variant& var, signed_int& vo, uint32_t max_depth ) { vo.value = static_cast<int32_t>(var.as_int64()); }

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All looks good so far. But this opens the can of worms... signed_int is 32 bit and unsigned_int is 64. Perhaps a separate ticket.

@abitmore
Copy link
Member

Steem team is trying this: steemit/steem#2780.

Copy link

@jmjatlanta jmjatlanta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

template<typename Stream> inline void unpack( Stream& s, unsigned_int& vi, uint32_t _max_depth ) {
uint64_t v = 0; char b = 0; uint8_t by = 0;
do {
s.get(b);
v |= uint32_t(uint8_t(b) & 0x7f) << by;
by += 7;
} while( uint8_t(b) & 0x80 );
vi.value = static_cast<uint32_t>(v);
}

I believe the static cast needs to be changed.

Note that the fix @abitmore mentioned (steemit/steem#2780) included some added protection for malformed varint unpacking. I am not sure if we want to add that here too. That may warrant a separate issue.

@abitmore
Copy link
Member

I tend to do the protection in this PR.

Line 196 should use 64 bit as well:

template<typename Stream> inline void unpack( Stream& s, unsigned_int& vi, uint32_t _max_depth ) {
uint64_t v = 0; char b = 0; uint8_t by = 0;
do {
s.get(b);
v |= uint32_t(uint8_t(b) & 0x7f) << by;
by += 7;
} while( uint8_t(b) & 0x80 );
vi.value = static_cast<uint32_t>(v);
}

@pmconrad
Copy link
Author

Latest commit shows that fc::signed_int serialization is broken for values in the range 0x40000000..0x7fffffff.
Will remove it from the codebase since it isn't used by the core (there are a few references though, so bumping fc will require core changes).

Copy link
Member

@abitmore abitmore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about throwing an exception when the input is not valid, e.g. too large?

@@ -193,10 +193,10 @@ namespace fc {
uint64_t v = 0; char b = 0; uint8_t by = 0;
do {
s.get(b);
v |= uint32_t(uint8_t(b) & 0x7f) << by;
v |= uint64_t(uint8_t(b) & 0x7f) << by;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about checking by before the bit shift? For example, if b is 0x7f and by is 7*9=63, it will overflow, so result of the bit shift will be 1.

BTW I suspected that bitshares/bitshares-core#1256 was caused by something like this, but I guess it won't be the case due to limit of maximum message size.

Copy link
Author

@pmconrad pmconrad Aug 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://en.cppreference.com/w/cpp/language/operator_arithmetic

For unsigned and positive a, the value of a << b is the value of a * 2^b, reduced modulo maximum value of the return type plus 1 (that is, bitwise left shift is performed and the bits that get shifted out of the destination type are discarded).

So for uint64_t, (0x7f << 63) == (1 << 63).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, that means this change still doesn't enforce a canonical representation. Not sure if that was the intention.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So (0x7e << 63) == 0 which doesn't seem right to me (talking about our code but not the << operator itself).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still, how about throwing an exception when the input is not valid, e.g. too large?

by += 7;
} while( uint8_t(b) & 0x80 );
vi.value = static_cast<uint32_t>(v);
} while( (uint8_t(b) & 0x80) && by < 64 );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note: according to the discussion in steemit/steem#2780, at a glance this change will break consensus, but our serialization code won't produce data to trigger it, so it's safe to sanitize inputs like this.

@jmjatlanta
Copy link

I don't want to throw a wrench in the machine, but was just thinking "out of the box" a bit.

This implementation is very similar to the varint implementation that Google Protobuf uses. Perhaps looking at their implementation could give some insight as to how to code it. I would guess their implementation has even been tested for when Tuesday falls on a weekend.

@pmconrad
Copy link
Author

pmconrad commented Aug 19, 2018

Rebased on master for testing with bitshares/bitshares-core#1267 branch

@@ -172,9 +172,11 @@ namespace fc {
uint64_t v = 0; char b = 0; uint8_t by = 0;
do {
s.get(b);
if( by >= 64 || (by == 63 && b > 1) )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

b > 1: should convert b to unsigned.

Copy link
Member

@abitmore abitmore Aug 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this check can be changed to if( by == 63 && b != 1 ), since

  • we can reject the data if b==0 (it doesn't make sense to add one more byte which is zero, although it's not an overflow);
  • we will either end the loop or throw an exception when b is 63, so by >= 64 will always be false thus can be removed.

Copy link
Author

@pmconrad pmconrad Aug 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If by == 63 && b < 0 then uint8_t(b) & 0x80 != 0, so the while loop will continue, and in the next round by >= 64 will trigger the condition and throw.

by >= 64 cannot be left out, because I removed the check from the while loop (want to throw, not just exit the loop).

We could disallow the case by == 63 && b == 0, but it will be handled by the enclosing logic (deserialize, then re-serialize and check the signature) anyway. Even if we disallow trailing 0 or 0x80 bytes completely, we do not have a completely canonical representation, because we do not know the original type that is being serialized as an unsigned_int here (could be uint8_t, uint16_t, uint32_t, uint64_t).

I think the original intent of the EOS fix was only to avoid the undefined behaviour associated with by growing bigger than the bit size of value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, thanks.
I still think it's better to check whether v < 0 is true here to make the code/logic clearer.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added cast to uint8_t, it's cheaper than separately checking for <0.

@pmconrad
Copy link
Author

This implementation is very similar to the varint implementation that Google Protobuf uses.

In principle this is not a bad suggestion. I'm all for not re-inventing wheels.

For blockchain code though, it is important to be internally consistent. It is always dangerous to rely on external libraries, because an internal change in that library could lead to breaking consensus.

Copy link
Member

@abitmore abitmore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed code looks fine. Not sure if there is something missing.

@pmconrad
Copy link
Author

@jmjatlanta please re-review.

@pmconrad pmconrad merged commit 2405081 into master Aug 22, 2018
@pmconrad pmconrad deleted the 1088_unsigned_int branch August 22, 2018 15:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants