tally calculation has precision issue #7640

yihuang · 2020-10-23T02:28:35Z

Summary of Bug

In our test case, validator V has voting power 1000000000, some A delegate 10 to V and vote independently,
V vote yes and A vote no, current code will calculate tally of no as:

(10 / 1000000010) * 1000000010

which with current decimal settings results in 9.999999999999999000, which is 9 when convert to tokens.
I suggest we calculate as:

(10 * 1000000010) / 1000000010

which should have better result under limited precision.

Version

Steps to Reproduce

For Admin Use

Not duplicate issue
Appropriate labels applied
Appropriate contributors tagged
Contributor assigned/self-assigned

The text was updated successfully, but these errors were encountered:

alexanderbez · 2020-10-23T13:20:44Z

/cc @sunnya97

sunnya97 · 2020-10-23T20:28:40Z

Sounds reasonable

Solution: - change `(a / b) * c` to `a * b / c`

alessio · 2020-10-31T12:15:48Z

I wonder whether this is reproducible with Launchpad too? CC'ing @ethanfrey @njmurarka

ethanfrey · 2020-10-31T13:20:44Z

In general (A * C) / B gives better precision than (A / B) * C, as it does not round off least-significant digits.

However, there is more potential for overflow with the first case. If these are u64 we would have to be very careful about this, and likely use a GCD algorithm. I believe in this case we are using sdk.Dec (which wraps big.Int) so no overflow is possible.

I feel I saw a very similar issue recently. In any case .MultInt(x).Quo(y) looks better than .Quo(y).MultInt(x) and I see no problem with backporting to launchpad

aaronc · 2020-11-01T20:45:41Z

So this is actually a bigger problem than it appears to be.

I was thinking about your reply @ethanfrey and I was thinking well if order of operations matters so much, we should delay rounding as long as possible which something like big.Rat would get us. Needing to think so much about order of operations so much is kind of hard to get right and it should be easy to just get the answer 10. Then I discovered that the SDK used to use big.Rat, but it was replaced with a custom "big decimal" type sdk.Dec (#1807).

So I figured I'd do some tests. As expected big.Rat gets the correct answer independent of order of operations. What about other "big decimal" libraries? I tried https://github.com/cockroachdb/apd which I'd previously suggested here #7113. It properly implements General Decimal Arithmetic/IEEE 754-2008. Sure enough, apd produces the correct result 10 using decimal128, decimal64, and even decimal32 settings! How is it possible that decimal32 is more accurate than sdk.Dec which supposedly has 18 decimal places?

Even worse,float32 gets the correct result with either order of operations. Run this example and see: https://play.golang.org/p/f9cF7ItDtxe. So sdk.Dec is basically less precise than 32-bit floating point...

Looks like we have a bigger problem on our hands.

I would say that if we are building "the financial infrastructure of the future", getting arithmetic correct is non-negotiable.

I will open up a follow-up issue.

Note @odeke-em this sort of stuff is likely good material for fuzzing.

ethanfrey · 2020-11-01T20:58:45Z

I think decimal32 would have precision rounding errors with other numbers, like (123 / 1234567) * 1234567. Or even better (1/12345678901234567890) * 12345678901234567890 It would be good to check these with many input values.

I think there are two approaches, and both may be useful.

use a general purpose (but possibly slow) library to always get the correct issue regardless of ordering and with no underflow or overflow issues - big.Rat is possibly a good approach here
use a limited precision library that gets correct and fast results over a reasonable range of values and returns errors on any under/overflow - this looks like the cockroachdb decimalXXX cases

The argument to move from big.Rat to big.Dec was for speed, but it was at a cost of correctness. And should be significantly slower than a decimal64/decimal128 type (arbitrary precision ops are always much slower than fixed precision). I would support reverting big.Dec to use big.Rat under the hood (without changing APIs) and test the cockroach version. If it is a good speed, that should be recommended for any operations that really need speed (A * B / C is very unlikely to be a performance bottleneck anyway that even reads one item from disk)

alessio · 2020-11-01T22:15:07Z

I would support reverting big.Dec to use big.Rat under the hood (without changing APIs)

So would I. I also think we should keep the current API around at least until 1.0.0 is out. That should give us plenty of time to possibly even overhaul all numerical computing types and work out better interfaces and APIs.

Happy to do some work on this 👍

aaronc · 2020-11-02T01:01:13Z

Did some actual benchmarks here: #7772

Obviously this is just one case to consider of many... but some surprising results:

sdk.Dec can never get (x/y)*y quite right with an error similar to a float64 or decimal64and always gets (x*y)/y right 🤷
big.Rat always gets both correct and is actually marginally faster than sdk.Dec
the https://github.com/cockroachdb/apd data types are generally a bit slower, although reasonably accurate
surprisingly big.Float configured as a float128 always gets this right and is actually quite fast... there must be some hardware optimization happening there 🤔

aaronc · 2020-11-02T01:02:36Z

I would support reverting big.Dec to use big.Rat under the hood (without changing APIs)

So would I. I also think we should keep the current API around at least until 1.0.0 is out. That should give us plenty of time to possibly even overhaul all numerical computing types and work out better interfaces and APIs.

Ideally we'd have proper numerical computing and an interface we want to stick with for 1.0.

Happy to do some work on this 👍

Awesome 👍

aaronc · 2020-11-02T01:16:33Z

Created a new issue for addressing all this: #7773

Solution: - change `(a / b) * c` to `a * b / c` Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

njmurarka · 2020-11-03T01:13:17Z

@alessio If I recall, weren't we having some issues related to power back in the early part of this year? I think they had to do with the fact that if you had a small # of tokens associated with the stake, your power lost so much accuracy it no longer made alot of sense. The solution was in general to use small units and therefore, all magnitudes were larger.

@alessio Is there something I can check to help? I am not clear on what that is.

odeke-em · 2020-11-03T03:49:58Z

Regardless of precision, commutativity is an important fundamental property. I am going to evaluate the issues as well as usages and potential pitfalls. In the future though if performance is a problem but correctness isn't, we can wiggle into high performance, but we shouldn't trade off on correctness at all.

yihuang mentioned this issue Oct 23, 2020

Fix #7640: tally calculation precision error #7641

Merged

9 tasks

clevinson added the T:Bug label Oct 23, 2020

yihuang added a commit to yihuang/cosmos-sdk that referenced this issue Oct 31, 2020

Fix cosmos#7640: tally calculation precision error

13e784d

Solution: - change `(a / b) * c` to `a * b / c`

aaronc mentioned this issue Nov 2, 2020

DNM: Decimal error measurements and benchmarks #7772

Closed

9 tasks

aaronc mentioned this issue Nov 2, 2020

Decide about future of sdk.Dec #7773

Closed

4 tasks

mergify bot closed this as completed in #7641 Nov 2, 2020

mergify bot added a commit that referenced this issue Nov 2, 2020

Fix #7640: tally calculation precision error (#7641)

aff88bb

Solution: - change `(a / b) * c` to `a * b / c` Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tally calculation has precision issue #7640

tally calculation has precision issue #7640

yihuang commented Oct 23, 2020 •

edited

alexanderbez commented Oct 23, 2020

sunnya97 commented Oct 23, 2020

alessio commented Oct 31, 2020

ethanfrey commented Oct 31, 2020

aaronc commented Nov 1, 2020 •

edited

ethanfrey commented Nov 1, 2020 •

edited

alessio commented Nov 1, 2020

aaronc commented Nov 2, 2020 •

edited

aaronc commented Nov 2, 2020

aaronc commented Nov 2, 2020

njmurarka commented Nov 3, 2020

odeke-em commented Nov 3, 2020

tally calculation has precision issue #7640

tally calculation has precision issue #7640

Comments

yihuang commented Oct 23, 2020 • edited

Summary of Bug

Version

Steps to Reproduce

For Admin Use

alexanderbez commented Oct 23, 2020

sunnya97 commented Oct 23, 2020

alessio commented Oct 31, 2020

ethanfrey commented Oct 31, 2020

aaronc commented Nov 1, 2020 • edited

ethanfrey commented Nov 1, 2020 • edited

alessio commented Nov 1, 2020

aaronc commented Nov 2, 2020 • edited

aaronc commented Nov 2, 2020

aaronc commented Nov 2, 2020

njmurarka commented Nov 3, 2020

odeke-em commented Nov 3, 2020

yihuang commented Oct 23, 2020 •

edited

aaronc commented Nov 1, 2020 •

edited

ethanfrey commented Nov 1, 2020 •

edited

aaronc commented Nov 2, 2020 •

edited