Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vendor+sql: update apd, change str representation of DECIMAL #17029

Merged
merged 1 commit into from Jul 17, 2017
Merged

vendor+sql: update apd, change str representation of DECIMAL #17029

merged 1 commit into from Jul 17, 2017

Conversation

maddyblue
Copy link
Contributor

apd has had various improvements and API changes:

  • ToStandard is now Text('f')
  • ToIntegral renamed
  • Modf can now take nil, easing the implementation of trunc

Additionally, change our default decimal -> string implementation to
use the scientific notation (use exponents when needed) instead of the
so-called standard (a name I made up) representation. The 'standard'
representation was used to match Postgres. It simply never prints
exponents. This causes two problems: large exponents are converted
into lots of zeros, and it is no longer possible to extract the
actual precision from the number if there are zeros appended to the
right, because it's not clear if those were in the original value or
were appended during the string conversion. The change to scientific
notation fixes both of these problems, allowing for smaller numbers
to be printed and allowing users to always know exactly how many
zeros were in their values.

There is a possibility that this change causes problems to users of
decimals if their string -> decimal parser doesn't support exponents. I
think we should allow users to report that problem and attempt to
deal with it if it comes up. Mostly I think it'll be just fine,
because pretty much everything that deals with decimals should know
about exponents.

This change allows for the exception in copy_in_test.go to be removed,
since trailing zeros are handled correctly now.

This also allows us to easily in the future allow session settings to
control decimal formatting, since all decimal formatting operations
now go through the Go fmt.Formatter interface, and thus accept the
standard e, f, and g verbs.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@knz knz changed the title vendor: update apd vendor+sql: update apd, change str representation of DECIMAL Jul 14, 2017
@knz knz requested a review from jordanlewis July 14, 2017 20:45
@knz
Copy link
Contributor

knz commented Jul 14, 2017

@jordanlewis plz confirm this is unlikely to cause orm breakage

@knz
Copy link
Contributor

knz commented Jul 14, 2017

code LGTM

@jordanlewis
Copy link
Member

Hmm, this patch makes me nervous, even though I agree with all of your rationale for making this the new default. This is exactly the kind of change that strikes me as very likely to break clients.

Even though the new output format is valid as an input format for CockroachDB and Postgres, I think that there are almost certainly clients that expect the output format to be the same as Postgres's, and this will upset them.

What about the pgwire binary format? Did that change, or is that the same as before?

At the bare minimum we need to add tests for this to the various client tests (java_test, c_test etc).

@RaduBerinde
Copy link
Member

I'm particularly worried about cases where people use decimals with things that are always just (potentially large) integers. Whatever is consuming those results might not like 1E+1 instead of 10.


Review status: 0 of 13 files reviewed at latest revision, all discussions resolved, all commit checks successful.


Comments from Reviewable

@maddyblue
Copy link
Contributor Author

I've added a test for 1e1 to java_test.go, and it passes. This did not change the pgwire binary representation, just text.

I am sympathetic to these possible problems, but I think we should move forward just like we have with the other differences we have to postgres. If it causes too many client issues we can change it back or add a session setting.


Review status: 0 of 14 files reviewed at latest revision, all discussions resolved.


Comments from Reviewable

@knz
Copy link
Contributor

knz commented Jul 15, 2017

haha, you got me there. You're right. I checked with both MySQL and Postgres and both allow the scientific notation for decimal/numeric. Let's put this through, and see if users complain.

LGTM

@jordanlewis
Copy link
Member

Wait, I am still uncomfortable. I think it is a mistake to land this in 1.1 without more significant testing on our side. Thanks for adding the Java test. We need to check the C libpq, js Ruby and python too. Sorry for the high burden of proof but I strongly believe the onus falls on us to validate this, not our users.

@maddyblue
Copy link
Contributor Author

How would you suggest I do the C version? Is there a standard bigdecimal library to use? There's no example code in c_test.go to use as a template like there was for java.

apd has had various improvements and API changes:

- ToStandard is now Text('f')
- ToIntegral renamed
- Modf can now take nil, easing the implementation of trunc

Additionally, change our default decimal -> string implementation to
use the scientific notation (use exponents when needed) instead of the
so-called standard (a name I made up) representation. The 'standard'
representation was used to match Postgres. It simply never prints
exponents. This causes two problems: large exponents are converted
into lots of zeros, and it is no longer possible to extract the
actual precision from the number if there are zeros appended to the
right, because it's not clear if those were in the original value or
were appended during the string conversion. The change to scientific
notation fixes both of these problems, allowing for smaller numbers
to be printed and allowing users to always know exactly how many
zeros were in their values.

There is a possibility that this change causes problems to users of
decimals if their string -> decimal parser doesn't support exponents. I
think we should allow users to report that problem and attempt to
deal with it if it comes up. Mostly I think it'll be just fine,
because pretty much everything that deals with decimals should know
about exponents.

This change allows for the exception in copy_in_test.go to be removed,
since trailing zeros are handled correctly now.

This also allows us to easily in the future allow session settings to
control decimal formatting, since all decimal formatting operations
now go through the Go fmt.Formatter interface, and thus accept the
standard e, f, and g verbs.
@maddyblue
Copy link
Contributor Author

I've added a nice python test. I also added a less nice ruby test. The ruby test is totally boring because, by default, the pg package just returns everything as a string. You have to manually instruct it to use a type decoder. If we do that, then it barfs on the OID of numeric (so, nothing to do with this change). Trivial to get it working by assigning the decoder to be something that can parse those strings, so I don't think it's a useful test. A useful test would be to hook up a full rails thing, which might have these decoders defined, and see what that does. (If someone wants that, I'll need help doing it since it's out of my familiarity.)

Adding a C test is similar to the ruby one (i.e., not useful): you just have to choose a decimal parsing library that knows how to parse exponents, there's no built in thing (see https://stackoverflow.com/questions/32651069/c-libpq-get-float-value-from-numeric).

@knz
Copy link
Contributor

knz commented Jul 17, 2017 via email

@jordanlewis
Copy link
Member

Thanks for adding the tests - I'm starting to feel better about this change.

I have a few more questions. I think that it might be user unfriendly to always print decimals in scientific notation even in the "normal" case of a number with no trailing zeros. Thoughts on that? Maybe we could output an ordinary value in the "normal" case? As a user, I think I would be confused and maybe even annoyed to see SELECT 100::decimal get outputted as 1E+2.

Also, can you elaborate on the problem with ambiguity around trailing zeroes? I don't understand where string conversion would introduce that ambiguity.

@maddyblue
Copy link
Contributor Author

The exponents are not always printed, only when needed. See https://github.com/cockroachdb/apd/blob/master/testdata/base.decTest for the test cases. Also, 100 would not be printed as 1e2 because that doesn't show the full precision. All the string printing algorithm can do is choose to move the decimal within existing digits and adjust the exponent appropriately. But it only does this in some cases (https://github.com/cockroachdb/apd/blob/f03b0011b0efc3437be20fa3979ae89082de8861/format.go#L67). So it already does what you request, and outputs an ordinary value in the normal case.

To illustrate the problem, take the numbers you had above: 100, 1e2. Both are equal to 100, but the first one is more precise. The second is less precise, and using the compare_total algorithm (https://godoc.org/github.com/cockroachdb/apd#Decimal.CmpTotal), which is defined in the decimal spec, is not equal to 100, because it doesn't have exactly the same precision.

It's like saying: "I'm 1 mile away" and "I'm 1.00000" miles away. If users provide that precision, it should be kept and correctly displayed.

@maddyblue
Copy link
Contributor Author

GMP supports exponents (https://gmplib.org/manual/Assigning-Floats.html#Assigning-Floats) in their mpf_set_str function.

@knz
Copy link
Contributor

knz commented Jul 17, 2017

thanks.

@jordanlewis
Copy link
Member

Got it - thanks for the tutorial. LGTM in that case.

I wonder if it's worth introducing a session setting for this kind of thing at some point down the line, to permit customizable output.

@maddyblue
Copy link
Contributor Author

Yes, I so want decimal session settings. To control: formatting, rounding, context precision during arithmetic operations. Seems like that + distsql would make cockroach a really nice platform to do distributed scientific computing or something.

@maddyblue maddyblue merged commit 1e87710 into cockroachdb:master Jul 17, 2017
@maddyblue maddyblue deleted the update-apd branch July 17, 2017 17:55
@jseldess
Copy link
Contributor

@mjibson, can you help me understand any updates needed to the user-facing DECIMAL type docs?

@maddyblue
Copy link
Contributor Author

@jess-edwards No docs need to be updated, since they don't list the text output format.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants