docs: clarify Java precision by 317brian · Pull Request #13671 · apache/druid

317brian · 2023-01-14T01:36:08Z

Description

Clarifies how Druid handles Java precision.

This PR has:

been self-reviewed.

johnImply

.

benkrug · 2023-01-18T00:19:26Z

As far as max/min bounds, this seems accurate?
https://cs.fit.edu/~ryan/java/language/java-data.html (see the "Numeric" section.)
So, for long, it's 2^63 (since one bit is for sign).
Doubles are floating point, and precision varies depending on the size of the number.
We follow java, which follows standard IEEE 754 . Precisions are discussed and graphed in the standard here.

I don't pretend to understand that fully, but I think it would be good to list min and max for long, and maybe reference the standards doc for double.

paul-rogers

Thanks for trying to clear up the precision question! I think we need just a bit more info. See comments.

paul-rogers · 2023-01-18T18:23:44Z

docs/querying/sql-operators.md


 Operators in [Druid SQL](./sql.md) typically operate on one or two values and return a result based on the values. Types of operators in Druid SQL include arithmetic, comparison, logical, and more, as described here. 

+When performing math operations, Druid uses the integer datatype unless there are double or float values. If double or float values are involved, Druid uses double. Note that the highest precision way to store digits in Druid are 64-bit integers (long) or 64-bit floats (double). In essence, a double can represent 52 binary digits, so Druid may return incorrect results for doubles if the value exceeds 2^52.


Reword: Druid uses the 64-bit integer (long) data type...

If double or float values are involved, Druid uses double.

Not sure what this means. Is it:

If an operation uses float or double values, then the result is double.

If an operation uses only float types, the result type is float. If an operation uses only double values, or both double and float values, the result is double.

Nit: "datatype" is not really a word. Consider "data type".

Probably worth saying that the precision of float and double are defined by Java and by the IEEE standard. Perhaps we can link to those reference materials.

paul-rogers · 2023-01-18T18:25:39Z

docs/querying/sql-operators.md


+When performing math operations, Druid uses the integer datatype unless there are double or float values. If double or float values are involved, Druid uses double. Note that the highest precision way to store digits in Druid are 64-bit integers (long) or 64-bit floats (double). In essence, a double can represent 52 binary digits, so Druid may return incorrect results for doubles if the value exceeds 2^52.
+
+For more information about how Java handles primitive data types and how it may impact the results you get, see [Primitive data types in Java are a matter of precision](https://blogs.oracle.com/javamagazine/post/java-primitive-datatypes-int-float-double).


This really punts! The user has no control over the expressions we use. Druid is interpreted: we decide how to do casting to get arguments to the right type, and we decide on the final result value of our functions. Referring to what Java does is not useful except to someone who reads the code, sees where we use bits of Java, then works that backward through our type inference system.

We should spell out our rules explicitly, which are one of the two bullets above. @clintropolis can probably provide details.

The commit is a bit off still. According to current docs, druid has long, float and double. "float" is a 32-bit float, and "double" is a 64-bit float.

I suggested some references because the math of float precision is beyond the scope of druid docs, imo. The main points are that longs can store up to 2^63 accurately (the current commit says doubles, that should be longs, iiuc), and floats and doubles use 32-bit and 64-bit floating point. Any floating point storage format will have variable precision depending on the size of the numbers. (See the linked URLs in my earlier comment.) Floating point precision is really complicated and mathematical, beyond the scope of druid docs imo (again), and it's a general condition in software. Just saying that "float" and "double" both use floating point is the main point.

We could get in to the bounds for integers being represented exactly in floats, but that ignores decimals, which is probably the point of using a float. Maybe we can also add a comment along the lines that if exact decimal values are needed, and you need, eg, 3 decimal places, you can store the number multiplied by 1000 as long, and divide again when querying. This will be exact, up to the min and max values for longs.

(O.T. but if we had your bigdecimal @paul-rogers, I bet many people would use that instead. Back in my Oracle days, most people I knew, and I, used their DECIMAL datatype, to avoid floating point precision issues.)

imo the latest revisions are much improved. Maybe it would further clarify if we mentioned that doubles are 64-bit floats (which is explained elsewhere in the docs too). Eg, where it says "then the result is a double", maybe it could say "then the result is a double (ie, 64-bit float)"? Idk if "ie" is good for docs style, but something like that?

clintropolis

lgtm other than link

clintropolis · 2023-01-20T23:03:46Z

docs/querying/sql-operators.md


 Operators in [Druid SQL](./sql.md) typically operate on one or two values and return a result based on the values. Types of operators in Druid SQL include arithmetic, comparison, logical, and more, as described here. 

+When performing math operations, Druid uses 64-bit integer (long) data type unless there are double or float values. If an operation uses float or double values, then the result is a double, which is a 64-bit float. The precision of float and double values are defined by [Java](https://docs.oracle.com/javase/specs/jls/se16/html/jls-5.html#jls-5.1) and [the IEEE standard](https://en.wikipedia.org/wiki/IEEE_754).


suggest linking to java 8, 11, or 17 instead of 16

paul-rogers

Thanks for the revisions. LGTM.

docs/querying/sql-operators.md

ektravel

Added a couple of suggestions.

Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>

docs/querying/sql-operators.md

vtlim

One nit otherwise LGTM 🦖

317brian added 3 commits January 10, 2023 12:36

wip

3138f77

docs: add mention of java precision

70f7f1c

docs: add mention of java precision

87be0b1

johnImply reviewed Jan 14, 2023

View reviewed changes

kfaraz added the Area - Documentation label Jan 17, 2023

fix

d8aff43

paul-rogers reviewed Jan 18, 2023

View reviewed changes

317brian added 2 commits January 20, 2023 11:24

incorporate comments

510b22f

double = 64 bit float

ed66f84

clintropolis reviewed Jan 20, 2023

View reviewed changes

change java version for the link

90e7fda

paul-rogers approved these changes Jan 26, 2023

View reviewed changes

ektravel reviewed Feb 27, 2023

View reviewed changes

docs/querying/sql-operators.md Outdated Show resolved Hide resolved

ektravel reviewed Feb 27, 2023

View reviewed changes

docs/querying/sql-operators.md Outdated Show resolved Hide resolved

ektravel reviewed Feb 27, 2023

View reviewed changes

317brian requested a review from clintropolis March 2, 2023 19:15

Apply suggestions from code review

701f070

Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>

vtlim reviewed Mar 15, 2023

View reviewed changes

docs/querying/sql-operators.md Outdated Show resolved Hide resolved

vtlim approved these changes Mar 15, 2023

View reviewed changes

Update docs/querying/sql-operators.md

99a129a

vtlim merged commit 65a663a into apache:master Mar 15, 2023

clintropolis added this to the 26.0 milestone Apr 10, 2023


		Operators in [Druid SQL](./sql.md) typically operate on one or two values and return a result based on the values. Types of operators in Druid SQL include arithmetic, comparison, logical, and more, as described here.

		When performing math operations, Druid uses the integer datatype unless there are double or float values. If double or float values are involved, Druid uses double. Note that the highest precision way to store digits in Druid are 64-bit integers (long) or 64-bit floats (double). In essence, a double can represent 52 binary digits, so Druid may return incorrect results for doubles if the value exceeds 2^52.


		When performing math operations, Druid uses the integer datatype unless there are double or float values. If double or float values are involved, Druid uses double. Note that the highest precision way to store digits in Druid are 64-bit integers (long) or 64-bit floats (double). In essence, a double can represent 52 binary digits, so Druid may return incorrect results for doubles if the value exceeds 2^52.

		For more information about how Java handles primitive data types and how it may impact the results you get, see [Primitive data types in Java are a matter of precision](https://blogs.oracle.com/javamagazine/post/java-primitive-datatypes-int-float-double).

Comments

Conversation

317brian commented Jan 14, 2023

Description

Uh oh!

johnImply left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

benkrug commented Jan 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

paul-rogers left a comment

Choose a reason for hiding this comment

Uh oh!

paul-rogers Jan 18, 2023

Choose a reason for hiding this comment

Uh oh!

paul-rogers Jan 18, 2023

Choose a reason for hiding this comment

Uh oh!

benkrug Jan 18, 2023

Choose a reason for hiding this comment

Uh oh!

benkrug Jan 18, 2023

Choose a reason for hiding this comment

Uh oh!

benkrug Jan 20, 2023

Choose a reason for hiding this comment

Uh oh!

clintropolis left a comment

Choose a reason for hiding this comment

Uh oh!

clintropolis Jan 20, 2023

Choose a reason for hiding this comment

Uh oh!

paul-rogers left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ektravel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vtlim left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

johnImply left a comment •

edited

Loading

benkrug commented Jan 18, 2023 •

edited

Loading