Conversation
|
As far as max/min bounds, this seems accurate? I don't pretend to understand that fully, but I think it would be good to list min and max for long, and maybe reference the standards doc for double. |
paul-rogers
left a comment
There was a problem hiding this comment.
Thanks for trying to clear up the precision question! I think we need just a bit more info. See comments.
docs/querying/sql-operators.md
Outdated
|
|
||
| Operators in [Druid SQL](./sql.md) typically operate on one or two values and return a result based on the values. Types of operators in Druid SQL include arithmetic, comparison, logical, and more, as described here. | ||
|
|
||
| When performing math operations, Druid uses the integer datatype unless there are double or float values. If double or float values are involved, Druid uses double. Note that the highest precision way to store digits in Druid are 64-bit integers (long) or 64-bit floats (double). In essence, a double can represent 52 binary digits, so Druid may return incorrect results for doubles if the value exceeds 2^52. |
There was a problem hiding this comment.
Reword: Druid uses the 64-bit integer (long) data type...
If double or float values are involved, Druid uses double.
Not sure what this means. Is it:
- If an operation uses float or double values, then the result is double.
- If an operation uses only float types, the result type is float. If an operation uses only double values, or both double and float values, the result is double.
Nit: "datatype" is not really a word. Consider "data type".
Probably worth saying that the precision of float and double are defined by Java and by the IEEE standard. Perhaps we can link to those reference materials.
docs/querying/sql-operators.md
Outdated
|
|
||
| When performing math operations, Druid uses the integer datatype unless there are double or float values. If double or float values are involved, Druid uses double. Note that the highest precision way to store digits in Druid are 64-bit integers (long) or 64-bit floats (double). In essence, a double can represent 52 binary digits, so Druid may return incorrect results for doubles if the value exceeds 2^52. | ||
|
|
||
| For more information about how Java handles primitive data types and how it may impact the results you get, see [Primitive data types in Java are a matter of precision](https://blogs.oracle.com/javamagazine/post/java-primitive-datatypes-int-float-double). |
There was a problem hiding this comment.
This really punts! The user has no control over the expressions we use. Druid is interpreted: we decide how to do casting to get arguments to the right type, and we decide on the final result value of our functions. Referring to what Java does is not useful except to someone who reads the code, sees where we use bits of Java, then works that backward through our type inference system.
We should spell out our rules explicitly, which are one of the two bullets above. @clintropolis can probably provide details.
There was a problem hiding this comment.
The commit is a bit off still. According to current docs, druid has long, float and double. "float" is a 32-bit float, and "double" is a 64-bit float.
I suggested some references because the math of float precision is beyond the scope of druid docs, imo. The main points are that longs can store up to 2^63 accurately (the current commit says doubles, that should be longs, iiuc), and floats and doubles use 32-bit and 64-bit floating point. Any floating point storage format will have variable precision depending on the size of the numbers. (See the linked URLs in my earlier comment.) Floating point precision is really complicated and mathematical, beyond the scope of druid docs imo (again), and it's a general condition in software. Just saying that "float" and "double" both use floating point is the main point.
We could get in to the bounds for integers being represented exactly in floats, but that ignores decimals, which is probably the point of using a float. Maybe we can also add a comment along the lines that if exact decimal values are needed, and you need, eg, 3 decimal places, you can store the number multiplied by 1000 as long, and divide again when querying. This will be exact, up to the min and max values for longs.
There was a problem hiding this comment.
(O.T. but if we had your bigdecimal @paul-rogers, I bet many people would use that instead. Back in my Oracle days, most people I knew, and I, used their DECIMAL datatype, to avoid floating point precision issues.)
There was a problem hiding this comment.
imo the latest revisions are much improved. Maybe it would further clarify if we mentioned that doubles are 64-bit floats (which is explained elsewhere in the docs too). Eg, where it says "then the result is a double", maybe it could say "then the result is a double (ie, 64-bit float)"? Idk if "ie" is good for docs style, but something like that?
docs/querying/sql-operators.md
Outdated
|
|
||
| Operators in [Druid SQL](./sql.md) typically operate on one or two values and return a result based on the values. Types of operators in Druid SQL include arithmetic, comparison, logical, and more, as described here. | ||
|
|
||
| When performing math operations, Druid uses 64-bit integer (long) data type unless there are double or float values. If an operation uses float or double values, then the result is a double, which is a 64-bit float. The precision of float and double values are defined by [Java](https://docs.oracle.com/javase/specs/jls/se16/html/jls-5.html#jls-5.1) and [the IEEE standard](https://en.wikipedia.org/wiki/IEEE_754). |
There was a problem hiding this comment.
suggest linking to java 8, 11, or 17 instead of 16
paul-rogers
left a comment
There was a problem hiding this comment.
Thanks for the revisions. LGTM.
ektravel
left a comment
There was a problem hiding this comment.
Added a couple of suggestions.
Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>
Description
Clarifies how Druid handles Java precision.
This PR has: