[SPARK-42430][SQL][DOC] Add documentation for TimestampNTZ type #40005

gengliangwang · 2023-02-14T06:24:56Z

What changes were proposed in this pull request?

Add documentation for TimestampNTZ type

Why are the changes needed?

Add documentation for the new data type TimestampNTZ.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Build docs and preview:

gengliangwang · 2023-02-14T06:25:28Z

docs/sql-ref-datatypes.md

@@ -185,6 +191,7 @@ from pyspark.sql.types import *
 |**BinaryType**|bytearray|BinaryType()|
 |**BooleanType**|bool|BooleanType()|
 |**TimestampType**|datetime.datetime|TimestampType()|
+|**TimestampNTZType**|datetime.datetime|TimestampNTZType()|


cc @HyukjinKwon on this one.

gengliangwang · 2023-02-14T19:23:21Z

Merging to master/3.4

### What changes were proposed in this pull request? Add documentation for TimestampNTZ type ### Why are the changes needed? Add documentation for the new data type TimestampNTZ. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Build docs and preview: <img width="782" alt="image" src="https://user-images.githubusercontent.com/1097932/218656254-096df429-851d-4046-8a6f-f368819c405b.png"> <img width="777" alt="image" src="https://user-images.githubusercontent.com/1097932/218656277-e8cfe747-2c45-476d-b70f-83c654e0b0f2.png"> Closes #40005 from gengliangwang/ntzDoc. Authored-by: Gengliang Wang <gengliang@apache.org> Signed-off-by: Gengliang Wang <gengliang@apache.org> (cherry picked from commit 46a2341) Signed-off-by: Gengliang Wang <gengliang@apache.org>

cloud-fan · 2023-02-15T09:11:53Z

docs/sql-ref-datatypes.md

@@ -154,6 +159,7 @@ please use factory methods provided in
 |**BinaryType**|byte[]|DataTypes.BinaryType|
 |**BooleanType**|boolean or Boolean|DataTypes.BooleanType|
 |**TimestampType**|java.sql.Timestamp|DataTypes.TimestampType|
+|**TimestampNTZType**|java.time.LocalDateTime| TimestampNTZType|


nit: seems all others are prefixed by DataTypes.

### What changes were proposed in this pull request? With the configuration `spark.sql.timestampType`, TIMESTAMP in Spark is a user-specified alias associated with one of the TIMESTAMP_LTZ and TIMESTAMP_NTZ variations. This is quite complicated to Spark users. There is another option `spark.sql.sources.timestampNTZTypeInference.enabled` for schema inference. I would like to introduce it in #40005 but having two flags seems too much. After thoughts, I decide to merge `spark.sql.sources.timestampNTZTypeInference.enabled` into `spark.sql.timestampType` and let `spark.sql.timestampType` control the schema inference behavior. We can have followups to add data source options "inferTimestampNTZType" for CSV/JSON/partiton column like JDBC data source did. ### Why are the changes needed? Make the new feature simpler. ### Does this PR introduce _any_ user-facing change? No, the feature is not released yet. ### How was this patch tested? Existing UT I also tried ``` git grep spark.sql.sources.timestampNTZTypeInference.enabled git grep INFER_TIMESTAMP_NTZ_IN_DATA_SOURCES ``` to make sure the flag INFER_TIMESTAMP_NTZ_IN_DATA_SOURCES is removed. Closes #40022 from gengliangwang/unifyInference. Authored-by: Gengliang Wang <gengliang@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>

With the configuration `spark.sql.timestampType`, TIMESTAMP in Spark is a user-specified alias associated with one of the TIMESTAMP_LTZ and TIMESTAMP_NTZ variations. This is quite complicated to Spark users. There is another option `spark.sql.sources.timestampNTZTypeInference.enabled` for schema inference. I would like to introduce it in #40005 but having two flags seems too much. After thoughts, I decide to merge `spark.sql.sources.timestampNTZTypeInference.enabled` into `spark.sql.timestampType` and let `spark.sql.timestampType` control the schema inference behavior. We can have followups to add data source options "inferTimestampNTZType" for CSV/JSON/partiton column like JDBC data source did. Make the new feature simpler. No, the feature is not released yet. Existing UT I also tried ``` git grep spark.sql.sources.timestampNTZTypeInference.enabled git grep INFER_TIMESTAMP_NTZ_IN_DATA_SOURCES ``` to make sure the flag INFER_TIMESTAMP_NTZ_IN_DATA_SOURCES is removed. Closes #40022 from gengliangwang/unifyInference. Authored-by: Gengliang Wang <gengliang@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org> (cherry picked from commit 46226c2) Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>

…ANSI interval types ### What changes were proposed in this pull request? As #40005 (review) pointed out, the java doc for data type recommends using factory methods provided in org.apache.spark.sql.types.DataTypes. Since the ANSI interval types missed the `DataTypes` as well, this PR also revise their doc. ### Why are the changes needed? Unify the data type doc ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Local preview <img width="826" alt="image" src="https://user-images.githubusercontent.com/1097932/219821685-321c2fd1-6248-4930-9c61-eec68f0dcb50.png"> Closes #40074 from gengliangwang/reviseNTZDoc. Authored-by: Gengliang Wang <gengliang@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>

…ANSI interval types ### What changes were proposed in this pull request? As #40005 (review) pointed out, the java doc for data type recommends using factory methods provided in org.apache.spark.sql.types.DataTypes. Since the ANSI interval types missed the `DataTypes` as well, this PR also revise their doc. ### Why are the changes needed? Unify the data type doc ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Local preview <img width="826" alt="image" src="https://user-images.githubusercontent.com/1097932/219821685-321c2fd1-6248-4930-9c61-eec68f0dcb50.png"> Closes #40074 from gengliangwang/reviseNTZDoc. Authored-by: Gengliang Wang <gengliang@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com> (cherry picked from commit 8cfd5bf) Signed-off-by: Max Gekk <max.gekk@gmail.com>

…ANSI interval types ### What changes were proposed in this pull request? As apache/spark#40005 (review) pointed out, the java doc for data type recommends using factory methods provided in org.apache.spark.sql.types.DataTypes. Since the ANSI interval types missed the `DataTypes` as well, this PR also revise their doc. ### Why are the changes needed? Unify the data type doc ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Local preview <img width="826" alt="image" src="https://user-images.githubusercontent.com/1097932/219821685-321c2fd1-6248-4930-9c61-eec68f0dcb50.png"> Closes #40074 from gengliangwang/reviseNTZDoc. Authored-by: Gengliang Wang <gengliang@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>

### What changes were proposed in this pull request? Add documentation for TimestampNTZ type ### Why are the changes needed? Add documentation for the new data type TimestampNTZ. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Build docs and preview: <img width="782" alt="image" src="https://user-images.githubusercontent.com/1097932/218656254-096df429-851d-4046-8a6f-f368819c405b.png"> <img width="777" alt="image" src="https://user-images.githubusercontent.com/1097932/218656277-e8cfe747-2c45-476d-b70f-83c654e0b0f2.png"> Closes apache#40005 from gengliangwang/ntzDoc. Authored-by: Gengliang Wang <gengliang@apache.org> Signed-off-by: Gengliang Wang <gengliang@apache.org> (cherry picked from commit 46a2341) Signed-off-by: Gengliang Wang <gengliang@apache.org>

With the configuration `spark.sql.timestampType`, TIMESTAMP in Spark is a user-specified alias associated with one of the TIMESTAMP_LTZ and TIMESTAMP_NTZ variations. This is quite complicated to Spark users. There is another option `spark.sql.sources.timestampNTZTypeInference.enabled` for schema inference. I would like to introduce it in apache#40005 but having two flags seems too much. After thoughts, I decide to merge `spark.sql.sources.timestampNTZTypeInference.enabled` into `spark.sql.timestampType` and let `spark.sql.timestampType` control the schema inference behavior. We can have followups to add data source options "inferTimestampNTZType" for CSV/JSON/partiton column like JDBC data source did. Make the new feature simpler. No, the feature is not released yet. Existing UT I also tried ``` git grep spark.sql.sources.timestampNTZTypeInference.enabled git grep INFER_TIMESTAMP_NTZ_IN_DATA_SOURCES ``` to make sure the flag INFER_TIMESTAMP_NTZ_IN_DATA_SOURCES is removed. Closes apache#40022 from gengliangwang/unifyInference. Authored-by: Gengliang Wang <gengliang@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org> (cherry picked from commit 46226c2) Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>

…ANSI interval types ### What changes were proposed in this pull request? As apache#40005 (review) pointed out, the java doc for data type recommends using factory methods provided in org.apache.spark.sql.types.DataTypes. Since the ANSI interval types missed the `DataTypes` as well, this PR also revise their doc. ### Why are the changes needed? Unify the data type doc ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Local preview <img width="826" alt="image" src="https://user-images.githubusercontent.com/1097932/219821685-321c2fd1-6248-4930-9c61-eec68f0dcb50.png"> Closes apache#40074 from gengliangwang/reviseNTZDoc. Authored-by: Gengliang Wang <gengliang@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com> (cherry picked from commit 8cfd5bf) Signed-off-by: Max Gekk <max.gekk@gmail.com>

add doc

cb6233c

gengliangwang requested review from cloud-fan and MaxGekk February 14, 2023 06:25

github-actions bot added the DOCS label Feb 14, 2023

gengliangwang commented Feb 14, 2023

View reviewed changes

HyukjinKwon approved these changes Feb 14, 2023

View reviewed changes

gengliangwang closed this in 46a2341 Feb 14, 2023

gengliangwang mentioned this pull request Feb 14, 2023

[SPARK-42442][SQL] Use spark.sql.timestampType for data source inference #40022

Closed

cloud-fan reviewed Feb 15, 2023

View reviewed changes

gengliangwang mentioned this pull request Feb 18, 2023

[SPARK-42430][DOC][FOLLOW-UP] Revise the java doc for TimestampNTZ & ANSI interval types #40074

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-42430][SQL][DOC] Add documentation for TimestampNTZ type #40005

[SPARK-42430][SQL][DOC] Add documentation for TimestampNTZ type #40005

gengliangwang commented Feb 14, 2023

gengliangwang Feb 14, 2023

gengliangwang commented Feb 14, 2023

cloud-fan Feb 15, 2023

[SPARK-42430][SQL][DOC] Add documentation for TimestampNTZ type #40005

[SPARK-42430][SQL][DOC] Add documentation for TimestampNTZ type #40005

Conversation

gengliangwang commented Feb 14, 2023

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

gengliangwang Feb 14, 2023

Choose a reason for hiding this comment

gengliangwang commented Feb 14, 2023

cloud-fan Feb 15, 2023

Choose a reason for hiding this comment