Skip to content

Commit

Permalink
[SPARK-31491][SQL][DOCS] Re-arrange Data Types page to document Float…
Browse files Browse the repository at this point in the history
…ing Point Special Values

### What changes were proposed in this pull request?
Re-arrange Data Types page to document Floating Point Special Values

### Why are the changes needed?
To complete SQL Reference

### Does this PR introduce any user-facing change?
Yes

- add Floating Point Special Values in Data Types page
- move NaN Semantics to Data Types page

<img width="1050" alt="Screen Shot 2020-04-24 at 9 14 57 AM" src="https://user-images.githubusercontent.com/13592258/80233996-3da25600-860c-11ea-8285-538efc16e431.png">

<img width="1050" alt="Screen Shot 2020-04-24 at 9 15 22 AM" src="https://user-images.githubusercontent.com/13592258/80234001-4004b000-860c-11ea-8954-72f63c92d50d.png">

<img width="1049" alt="Screen Shot 2020-04-24 at 9 15 44 AM" src="https://user-images.githubusercontent.com/13592258/80234006-41ce7380-860c-11ea-96bf-15e1aa2102ff.png">

### How was this patch tested?
Manually build and check

Closes #28264 from huaxingao/datatypes.

Authored-by: Huaxin Gao <huaxing@us.ibm.com>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
  • Loading branch information
huaxingao authored and maropu committed Apr 25, 2020
1 parent 8424f55 commit 054bef9
Show file tree
Hide file tree
Showing 3 changed files with 119 additions and 31 deletions.
2 changes: 0 additions & 2 deletions docs/_data/menu-sql.yaml
Expand Up @@ -84,8 +84,6 @@
url: sql-ref-literals.html
- text: Null Semantics
url: sql-ref-null-semantics.html
- text: NaN Semantics
url: sql-ref-nan-semantics.html
- text: ANSI Compliance
url: sql-ref-ansi-compliance.html
subitems:
Expand Down
119 changes: 119 additions & 0 deletions docs/sql-ref-datatypes.md
Expand Up @@ -19,6 +19,8 @@ license: |
limitations under the License.
---

### Supported Data Types

Spark SQL and DataFrames support the following data types:

* Numeric types
Expand Down Expand Up @@ -706,3 +708,120 @@ The following table shows the type names as well as aliases used in Spark SQL pa
</table>
</div>
</div>

### Floating Point Special Values

Spark SQL supports several special floating point values in a case-insensitive manner:

* Inf/+Inf/Infinity/+Infinity: positive infinity
* ```FloatType```: equivalent to Scala <code>Float.PositiveInfinity</code>.
* ```DoubleType```: equivalent to Scala <code>Double.PositiveInfinity</code>.
* -Inf/-Infinity: negative infinity
* ```FloatType```: equivalent to Scala <code>Float.NegativeInfinity</code>.
* ```DoubleType```: equivalent to Scala <code>Double.NegativeInfinity</code>.
* NaN: not a number
* ```FloatType```: equivalent to Scala <code>Float.NaN</code>.
* ```DoubleType```: equivalent to Scala <code>Double.NaN</code>.

#### Positive/Negative Infinity Semantics

There is special handling for positive and negative infinity. They have the following semantics:

* Positive infinity multiplied by any positive value returns positive infinity.
* Negative infinity multiplied by any positive value returns negative infinity.
* Positive infinity multiplied by any negative value returns negative infinity.
* Negative infinity multiplied by any negative value returns positive infinity.
* Positive/negative infinity multiplied by 0 returns NaN.
* Positive/negative infinity is equal to itself.
* In aggregations, all positive infinity values are grouped together. Similarly, all negative infinity values are grouped together.
* Positive infinity and negative infinity are treated as normal values in join keys.
* Positive infinity sorts lower than NaN and higher than any other values.
* Negative infinity sorts lower than any other values.

#### NaN Semantics

There is special handling for not-a-number (NaN) when dealing with `float` or `double` types that
do not exactly match standard floating point semantics.
Specifically:

* NaN = NaN returns true.
* In aggregations, all NaN values are grouped together.
* NaN is treated as a normal value in join keys.
* NaN values go last when in ascending order, larger than any other numeric value.

#### Examples

{% highlight sql %}
SELECT double('infinity') AS col;
+--------+
| col|
+--------+
|Infinity|
+--------+

SELECT float('-inf') AS col;
+---------+
| col|
+---------+
|-Infinity|
+---------+

SELECT float('NaN') AS col;
+---+
|col|
+---+
|NaN|
+---+

SELECT double('infinity') * 0 AS col;
+---+
|col|
+---+
|NaN|
+---+

SELECT double('-infinity') * (-1234567) AS col;
+--------+
| col|
+--------+
|Infinity|
+--------+

SELECT double('infinity') < double('NaN') AS col;
+----+
| col|
+----+
|true|
+----+

SELECT double('NaN') = double('NaN') AS col;
+----+
| col|
+----+
|true|
+----+

SELECT double('inf') = double('infinity') AS col;
+----+
| col|
+----+
|true|
+----+

CREATE TABLE test (c1 int, c2 double);
INSERT INTO test VALUES (1, double('infinity'));
INSERT INTO test VALUES (2, double('infinity'));
INSERT INTO test VALUES (3, double('inf'));
INSERT INTO test VALUES (4, double('-inf'));
INSERT INTO test VALUES (5, double('NaN'));
INSERT INTO test VALUES (6, double('NaN'));
INSERT INTO test VALUES (7, double('-infinity'));
SELECT COUNT(*), c2 FROM test GROUP BY c2;
+---------+---------+
| count(1)| c2|
+---------+---------+
| 2| NaN|
| 2|-Infinity|
| 3| Infinity|
+---------+---------+
{% endhighlight %}
29 changes: 0 additions & 29 deletions docs/sql-ref-nan-semantics.md

This file was deleted.

0 comments on commit 054bef9

Please sign in to comment.