Skip to content

Comments

docs: clarify Java precision#13671

Merged
vtlim merged 9 commits intoapache:masterfrom
317brian:math-precision-doc
Mar 15, 2023
Merged

docs: clarify Java precision#13671
vtlim merged 9 commits intoapache:masterfrom
317brian:math-precision-doc

Conversation

@317brian
Copy link
Contributor

Description

Clarifies how Druid handles Java precision.



This PR has:

  • been self-reviewed.

Copy link
Contributor

@johnImply johnImply left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.

@benkrug
Copy link
Contributor

benkrug commented Jan 18, 2023

As far as max/min bounds, this seems accurate?
https://cs.fit.edu/~ryan/java/language/java-data.html (see the "Numeric" section.)
So, for long, it's 2^63 (since one bit is for sign).
Doubles are floating point, and precision varies depending on the size of the number.
We follow java, which follows standard IEEE 754 . Precisions are discussed and graphed in the standard here.

I don't pretend to understand that fully, but I think it would be good to list min and max for long, and maybe reference the standards doc for double.

Copy link
Contributor

@paul-rogers paul-rogers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for trying to clear up the precision question! I think we need just a bit more info. See comments.


Operators in [Druid SQL](./sql.md) typically operate on one or two values and return a result based on the values. Types of operators in Druid SQL include arithmetic, comparison, logical, and more, as described here.

When performing math operations, Druid uses the integer datatype unless there are double or float values. If double or float values are involved, Druid uses double. Note that the highest precision way to store digits in Druid are 64-bit integers (long) or 64-bit floats (double). In essence, a double can represent 52 binary digits, so Druid may return incorrect results for doubles if the value exceeds 2^52.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reword: Druid uses the 64-bit integer (long) data type...

If double or float values are involved, Druid uses double.

Not sure what this means. Is it:

  • If an operation uses float or double values, then the result is double.
  • If an operation uses only float types, the result type is float. If an operation uses only double values, or both double and float values, the result is double.

Nit: "datatype" is not really a word. Consider "data type".

Probably worth saying that the precision of float and double are defined by Java and by the IEEE standard. Perhaps we can link to those reference materials.


When performing math operations, Druid uses the integer datatype unless there are double or float values. If double or float values are involved, Druid uses double. Note that the highest precision way to store digits in Druid are 64-bit integers (long) or 64-bit floats (double). In essence, a double can represent 52 binary digits, so Druid may return incorrect results for doubles if the value exceeds 2^52.

For more information about how Java handles primitive data types and how it may impact the results you get, see [Primitive data types in Java are a matter of precision](https://blogs.oracle.com/javamagazine/post/java-primitive-datatypes-int-float-double).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This really punts! The user has no control over the expressions we use. Druid is interpreted: we decide how to do casting to get arguments to the right type, and we decide on the final result value of our functions. Referring to what Java does is not useful except to someone who reads the code, sees where we use bits of Java, then works that backward through our type inference system.

We should spell out our rules explicitly, which are one of the two bullets above. @clintropolis can probably provide details.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commit is a bit off still. According to current docs, druid has long, float and double. "float" is a 32-bit float, and "double" is a 64-bit float.

I suggested some references because the math of float precision is beyond the scope of druid docs, imo. The main points are that longs can store up to 2^63 accurately (the current commit says doubles, that should be longs, iiuc), and floats and doubles use 32-bit and 64-bit floating point. Any floating point storage format will have variable precision depending on the size of the numbers. (See the linked URLs in my earlier comment.) Floating point precision is really complicated and mathematical, beyond the scope of druid docs imo (again), and it's a general condition in software. Just saying that "float" and "double" both use floating point is the main point.

We could get in to the bounds for integers being represented exactly in floats, but that ignores decimals, which is probably the point of using a float. Maybe we can also add a comment along the lines that if exact decimal values are needed, and you need, eg, 3 decimal places, you can store the number multiplied by 1000 as long, and divide again when querying. This will be exact, up to the min and max values for longs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(O.T. but if we had your bigdecimal @paul-rogers, I bet many people would use that instead. Back in my Oracle days, most people I knew, and I, used their DECIMAL datatype, to avoid floating point precision issues.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imo the latest revisions are much improved. Maybe it would further clarify if we mentioned that doubles are 64-bit floats (which is explained elsewhere in the docs too). Eg, where it says "then the result is a double", maybe it could say "then the result is a double (ie, 64-bit float)"? Idk if "ie" is good for docs style, but something like that?

Copy link
Member

@clintropolis clintropolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm other than link


Operators in [Druid SQL](./sql.md) typically operate on one or two values and return a result based on the values. Types of operators in Druid SQL include arithmetic, comparison, logical, and more, as described here.

When performing math operations, Druid uses 64-bit integer (long) data type unless there are double or float values. If an operation uses float or double values, then the result is a double, which is a 64-bit float. The precision of float and double values are defined by [Java](https://docs.oracle.com/javase/specs/jls/se16/html/jls-5.html#jls-5.1) and [the IEEE standard](https://en.wikipedia.org/wiki/IEEE_754).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest linking to java 8, 11, or 17 instead of 16

Copy link
Contributor

@paul-rogers paul-rogers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the revisions. LGTM.

Copy link
Contributor

@ektravel ektravel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a couple of suggestions.

@317brian 317brian requested a review from clintropolis March 2, 2023 19:15
Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>
Copy link
Member

@vtlim vtlim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One nit otherwise LGTM 🦖

@vtlim vtlim merged commit 65a663a into apache:master Mar 15, 2023
@clintropolis clintropolis added this to the 26.0 milestone Apr 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants