Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-35975][SQL] New configuration spark.sql.timestampType for the default timestamp type #33176

Closed
wants to merge 3 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
Expand Up @@ -2502,7 +2502,7 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with SQLConfHelper with Logg
case ("float" | "real", Nil) => FloatType
case ("double", Nil) => DoubleType
case ("date", Nil) => DateType
case ("timestamp", Nil) => TimestampType
case ("timestamp", Nil) => SQLConf.get.timestampType
Copy link
Contributor

@cloud-fan cloud-fan Jul 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to support parsing type string for LTZ and NTZ specifically so that people can avoid relying on config in their queries.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer doing that in separate PRs to keep this one clean. We should support the new keyword both on data type parsing and the type literals.
https://issues.apache.org/jira/browse/SPARK-35977
https://issues.apache.org/jira/browse/SPARK-35978

case ("string", Nil) => StringType
case ("character" | "char", length :: Nil) => CharType(length.getText.toInt)
case ("varchar", length :: Nil) => VarcharType(length.getText.toInt)
Expand Down
Expand Up @@ -44,6 +44,7 @@ import org.apache.spark.sql.catalyst.plans.logical.HintErrorHandler
import org.apache.spark.sql.catalyst.util.DateTimeUtils
import org.apache.spark.sql.connector.catalog.CatalogManager.SESSION_CATALOG_NAME
import org.apache.spark.sql.errors.{QueryCompilationErrors, QueryExecutionErrors}
import org.apache.spark.sql.types.{AtomicType, TimestampNTZType, TimestampType}
import org.apache.spark.unsafe.array.ByteArrayMethods
import org.apache.spark.util.Utils

Expand Down Expand Up @@ -2820,6 +2821,24 @@ object SQLConf {
.booleanConf
.createWithDefault(true)

object TimestampTypes extends Enumeration {
val TIMESTAMP_NTZ, TIMESTAMP_LTZ = Value
}

val TIMESTAMP_TYPE =
buildConf("spark.sql.timestampType")
.doc("Configures the default timestamp type of Spark SQL, including SQL DDL and Cast " +
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scope is bigger than this. It will affect timestamp literal, function to_timestamp and data source io, etc. We will get to that later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder about types literals, for instance timestamp'2021-07-01 01:02:03'. Should the config influence on it too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I mentioned in #33176 (comment). The type literal logic with TIMESTAMP_NTZ:

  1. if there is no time zone part, return timestamp without time zone literal
  2. otherwise, return timestamp with local time zone.

s"clause. Setting the configuration as ${TimestampTypes.TIMESTAMP_NTZ.toString} will " +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the compiler automatically calls the toString method in string interpolations, so, you can skip it. Or I am wrong?

"use TIMESTAMP WITHOUT TIME ZONE as the default type while putting it as " +
s"${TimestampTypes.TIMESTAMP_LTZ.toString} will use TIMESTAMP WITH LOCAL TIME ZONE. " +
"Before the 3.2.0 release, Spark only supports the TIMESTAMP WITH " +
"LOCAL TIME ZONE type.")
.version("3.2.0")
.stringConf
.transform(_.toUpperCase(Locale.ROOT))
.checkValues(TimestampTypes.values.map(_.toString))
.createWithDefault(TimestampTypes.TIMESTAMP_LTZ.toString)

val DATETIME_JAVA8API_ENABLED = buildConf("spark.sql.datetime.java8API.enabled")
.doc("If the configuration property is set to true, java.time.Instant and " +
"java.time.LocalDate classes of Java 8 API are used as external types for " +
Expand Down Expand Up @@ -3897,6 +3916,15 @@ class SQLConf extends Serializable with Logging {

def ansiEnabled: Boolean = getConf(ANSI_ENABLED)

def timestampType: AtomicType = getConf(TIMESTAMP_TYPE) match {
case "TIMESTAMP_LTZ" =>
// For historical reason, the TimestampType maps to TIMESTAMP WITH LOCAL TIME ZONE
TimestampType

case "TIMESTAMP_NTZ" =>
TimestampNTZType
}

def nestedSchemaPruningEnabled: Boolean = getConf(NESTED_SCHEMA_PRUNING_ENABLED)

def serializerNestedSchemaPruningEnabled: Boolean =
Expand Down
Expand Up @@ -18,9 +18,12 @@
package org.apache.spark.sql.catalyst.parser

import org.apache.spark.SparkFunSuite
import org.apache.spark.sql.catalyst.plans.SQLHelper
import org.apache.spark.sql.internal.SQLConf
import org.apache.spark.sql.internal.SQLConf.TimestampTypes
import org.apache.spark.sql.types._

class DataTypeParserSuite extends SparkFunSuite {
class DataTypeParserSuite extends SparkFunSuite with SQLHelper {

def parse(sql: String): DataType = CatalystSqlParser.parseDataType(sql)

Expand Down Expand Up @@ -135,6 +138,15 @@ class DataTypeParserSuite extends SparkFunSuite {
assert(intercept("unknown(1,2,3)").getMessage.contains("unknown(1,2,3) is not supported"))
}

test("Set default timestamp type") {
withSQLConf(SQLConf.TIMESTAMP_TYPE.key -> TimestampTypes.TIMESTAMP_NTZ.toString) {
assert(parse("timestamp") === TimestampNTZType)
}
withSQLConf(SQLConf.TIMESTAMP_TYPE.key -> TimestampTypes.TIMESTAMP_LTZ.toString) {
assert(parse("timestamp") === TimestampType)
}
}

// DataType parser accepts certain reserved keywords.
checkDataType(
"Struct<TABLE: string, DATE:boolean>",
Expand Down