Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tally calculation has precision issue #7640

Closed
1 of 4 tasks
yihuang opened this issue Oct 23, 2020 · 12 comments · Fixed by #7641
Closed
1 of 4 tasks

tally calculation has precision issue #7640

yihuang opened this issue Oct 23, 2020 · 12 comments · Fixed by #7641
Labels

Comments

@yihuang
Copy link
Collaborator

yihuang commented Oct 23, 2020

Summary of Bug

In our test case, validator V has voting power 1000000000, some A delegate 10 to V and vote independently,
V vote yes and A vote no, current code will calculate tally of no as:

(10 / 1000000010) * 1000000010

which with current decimal settings results in 9.999999999999999000, which is 9 when convert to tokens.
I suggest we calculate as:

(10 * 1000000010) / 1000000010

which should have better result under limited precision.

Version

Steps to Reproduce


For Admin Use

  • Not duplicate issue
  • Appropriate labels applied
  • Appropriate contributors tagged
  • Contributor assigned/self-assigned
@alexanderbez
Copy link
Contributor

/cc @sunnya97

@sunnya97
Copy link
Member

Sounds reasonable

yihuang added a commit to yihuang/cosmos-sdk that referenced this issue Oct 31, 2020
Solution:
- change `(a / b) * c` to `a * b / c`
@alessio
Copy link
Contributor

alessio commented Oct 31, 2020

I wonder whether this is reproducible with Launchpad too? CC'ing @ethanfrey @njmurarka

@ethanfrey
Copy link
Contributor

In general (A * C) / B gives better precision than (A / B) * C, as it does not round off least-significant digits.

However, there is more potential for overflow with the first case. If these are u64 we would have to be very careful about this, and likely use a GCD algorithm. I believe in this case we are using sdk.Dec (which wraps big.Int) so no overflow is possible.

I feel I saw a very similar issue recently. In any case .MultInt(x).Quo(y) looks better than .Quo(y).MultInt(x) and I see no problem with backporting to launchpad

@aaronc
Copy link
Member

aaronc commented Nov 1, 2020

So this is actually a bigger problem than it appears to be.

I was thinking about your reply @ethanfrey and I was thinking well if order of operations matters so much, we should delay rounding as long as possible which something like big.Rat would get us. Needing to think so much about order of operations so much is kind of hard to get right and it should be easy to just get the answer 10. Then I discovered that the SDK used to use big.Rat, but it was replaced with a custom "big decimal" type sdk.Dec (#1807).

So I figured I'd do some tests. As expected big.Rat gets the correct answer independent of order of operations. What about other "big decimal" libraries? I tried https://github.com/cockroachdb/apd which I'd previously suggested here #7113. It properly implements General Decimal Arithmetic/IEEE 754-2008. Sure enough, apd produces the correct result 10 using decimal128, decimal64, and even decimal32 settings! How is it possible that decimal32 is more accurate than sdk.Dec which supposedly has 18 decimal places?

Even worse,float32 gets the correct result with either order of operations. Run this example and see: https://play.golang.org/p/f9cF7ItDtxe. So sdk.Dec is basically less precise than 32-bit floating point...

Looks like we have a bigger problem on our hands.

I would say that if we are building "the financial infrastructure of the future", getting arithmetic correct is non-negotiable.

I will open up a follow-up issue.


Note @odeke-em this sort of stuff is likely good material for fuzzing.

@ethanfrey
Copy link
Contributor

ethanfrey commented Nov 1, 2020

I think decimal32 would have precision rounding errors with other numbers, like (123 / 1234567) * 1234567. Or even better (1/12345678901234567890) * 12345678901234567890 It would be good to check these with many input values.

I think there are two approaches, and both may be useful.

  1. use a general purpose (but possibly slow) library to always get the correct issue regardless of ordering and with no underflow or overflow issues - big.Rat is possibly a good approach here
  2. use a limited precision library that gets correct and fast results over a reasonable range of values and returns errors on any under/overflow - this looks like the cockroachdb decimalXXX cases

The argument to move from big.Rat to big.Dec was for speed, but it was at a cost of correctness. And should be significantly slower than a decimal64/decimal128 type (arbitrary precision ops are always much slower than fixed precision). I would support reverting big.Dec to use big.Rat under the hood (without changing APIs) and test the cockroach version. If it is a good speed, that should be recommended for any operations that really need speed (A * B / C is very unlikely to be a performance bottleneck anyway that even reads one item from disk)

@alessio
Copy link
Contributor

alessio commented Nov 1, 2020

I would support reverting big.Dec to use big.Rat under the hood (without changing APIs)

So would I. I also think we should keep the current API around at least until 1.0.0 is out. That should give us plenty of time to possibly even overhaul all numerical computing types and work out better interfaces and APIs.

Happy to do some work on this 👍

@aaronc
Copy link
Member

aaronc commented Nov 2, 2020

Did some actual benchmarks here: #7772

Obviously this is just one case to consider of many... but some surprising results:

  • sdk.Dec can never get (x/y)*y quite right with an error similar to a float64 or decimal64and always gets (x*y)/y right 🤷
  • big.Rat always gets both correct and is actually marginally faster than sdk.Dec
  • the https://github.com/cockroachdb/apd data types are generally a bit slower, although reasonably accurate
  • surprisingly big.Float configured as a float128 always gets this right and is actually quite fast... there must be some hardware optimization happening there 🤔

@aaronc
Copy link
Member

aaronc commented Nov 2, 2020

I would support reverting big.Dec to use big.Rat under the hood (without changing APIs)

So would I. I also think we should keep the current API around at least until 1.0.0 is out. That should give us plenty of time to possibly even overhaul all numerical computing types and work out better interfaces and APIs.

Ideally we'd have proper numerical computing and an interface we want to stick with for 1.0.

Happy to do some work on this 👍

Awesome 👍

@aaronc
Copy link
Member

aaronc commented Nov 2, 2020

Created a new issue for addressing all this: #7773

@mergify mergify bot closed this as completed in #7641 Nov 2, 2020
mergify bot added a commit that referenced this issue Nov 2, 2020
Solution:
- change `(a / b) * c` to `a * b / c`

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
@njmurarka
Copy link
Contributor

@alessio If I recall, weren't we having some issues related to power back in the early part of this year? I think they had to do with the fact that if you had a small # of tokens associated with the stake, your power lost so much accuracy it no longer made alot of sense. The solution was in general to use small units and therefore, all magnitudes were larger.

@alessio Is there something I can check to help? I am not clear on what that is.

@odeke-em
Copy link
Collaborator

odeke-em commented Nov 3, 2020

Regardless of precision, commutativity is an important fundamental property. I am going to evaluate the issues as well as usages and potential pitfalls. In the future though if performance is a problem but correctness isn't, we can wiggle into high performance, but we shouldn't trade off on correctness at all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants