Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP][SPARK-29155][SQL] Support special date/timestamp values in the PostgreSQL dialect only #25834

Closed
wants to merge 18 commits into from

Conversation

MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Sep 18, 2019

What changes were proposed in this pull request?

In the PR, I propose to support special timestamp and date values introduced by #25716 and #25708 only when the SQL config spark.sql.dialect is set to PostgreSQL.

Why are the changes needed?

The special values are PostgreSQL specific feature. It should be supported under a config to avoid performance penalties and potential impact on user apps.

Does this PR introduce any user-facing change?

Yes, special date/timestamp values like epoch, now, today, tomorrow, yesterday are supported only in the PostgreSQL dialect.

In the Spark dialect (the default dialect):

spark-sql> set spark.sql.dialect=Spark;
spark.sql.dialect	Spark
spark-sql> select timestamp'today';
Error in query: 
Cannot parse the TIMESTAMP value: today(line 1, pos 7)

== SQL ==
select timestamp'today'
-------^^^

In the PostgreSQL dialect:

spark-sql> set spark.sql.dialect=PostgreSQL;
spark.sql.dialect	PostgreSQL
spark-sql> select timestamp'today';
2019-09-18 00:00:00

How was this patch tested?

Updated existing test suite TimestampFormatterSuite, DateTimeUtilsSuite and Csv/JsonFunctionsSuite.

@MaxGekk
Copy link
Member Author

MaxGekk commented Sep 18, 2019

@HyukjinKwon This is the draft PR for #25716 (review) . I added the config from #25697 . As soon as this PR #25708 for dates be merged, I will put special date values under the config as well.

@SparkQA
Copy link

SparkQA commented Sep 18, 2019

Test build #110929 has finished for PR 25834 at commit 2d47906.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

…ues-under-config

# Conflicts:
#	sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
#	sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala
@SparkQA
Copy link

SparkQA commented Sep 19, 2019

Test build #110973 has finished for PR 25834 at commit b8ed08b.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • public class ExecutorPluginContext
  • class ExecutorPluginSource(name: String) extends Source
  • class QuantileDiscretizer(JavaEstimator, HasInputCol, HasOutputCol, HasInputCols, HasOutputCols,
  • implicit class MultipartIdentifierHelper(parts: Seq[String])

@MaxGekk
Copy link
Member Author

MaxGekk commented Sep 19, 2019

jenkins, retest this, please

@SparkQA
Copy link

SparkQA commented Sep 19, 2019

Test build #110983 has finished for PR 25834 at commit b8ed08b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • public class ExecutorPluginContext
  • class ExecutorPluginSource(name: String) extends Source
  • class QuantileDiscretizer(JavaEstimator, HasInputCol, HasOutputCol, HasInputCols, HasOutputCols,
  • implicit class MultipartIdentifierHelper(parts: Seq[String])

@HyukjinKwon
Copy link
Member

HyukjinKwon commented Sep 22, 2019

cc @maropu and @cloud-fan

@SparkQA
Copy link

SparkQA commented Sep 22, 2019

Test build #111149 has finished for PR 25834 at commit 3227968.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class JavaClassifierParams(HasRawPredictionCol, JavaPredictorParams):
  • class JavaClassifier(JavaPredictor, JavaClassifierParams):
  • class JavaClassificationModel(JavaPredictionModel, JavaClassifierParams):
  • class JavaProbabilisticClassifierParams(HasProbabilityCol, HasThresholds, JavaClassifierParams):
  • class JavaProbabilisticClassifier(JavaClassifier, JavaProbabilisticClassifierParams):
  • class JavaProbabilisticClassificationModel(JavaClassificationModel,
  • class LinearSVC(JavaClassifier, HasMaxIter, HasRegParam, HasTol,
  • class LinearSVCModel(JavaClassificationModel, JavaMLWritable, JavaMLReadable):
  • class LogisticRegression(JavaProbabilisticClassifier, HasMaxIter, HasRegParam, HasTol,
  • class LogisticRegressionModel(JavaProbabilisticClassificationModel, JavaMLWritable, JavaMLReadable,
  • class DecisionTreeClassifier(JavaProbabilisticClassifier, HasWeightCol,
  • class DecisionTreeClassificationModel(DecisionTreeModel, JavaProbabilisticClassificationModel,
  • class RandomForestClassifier(JavaProbabilisticClassifier, HasSeed, RandomForestParams,
  • class RandomForestClassificationModel(TreeEnsembleModel, JavaProbabilisticClassificationModel,
  • class GBTClassifier(JavaProbabilisticClassifier, GBTClassifierParams, HasCheckpointInterval,
  • class GBTClassificationModel(TreeEnsembleModel, JavaProbabilisticClassificationModel,
  • class NaiveBayes(JavaProbabilisticClassifier, HasThresholds, HasWeightCol,
  • class NaiveBayesModel(JavaProbabilisticClassificationModel, JavaMLWritable, JavaMLReadable):
  • class MultilayerPerceptronClassifier(JavaProbabilisticClassifier, HasMaxIter, HasTol, HasSeed,
  • class MultilayerPerceptronClassificationModel(JavaProbabilisticClassificationModel, JavaMLWritable,
  • class OneVsRestParams(JavaClassifierParams, HasWeightCol):
  • class LinearRegression(JavaPredictor, HasMaxIter, HasRegParam, HasTol, HasElasticNetParam,
  • class LinearRegressionModel(JavaPredictionModel, GeneralJavaMLWritable, JavaMLReadable,
  • class IsotonicRegression(JavaPredictor, HasWeightCol, JavaMLWritable, JavaMLReadable):
  • class IsotonicRegressionModel(JavaPredictionModel, JavaMLWritable, JavaMLReadable):
  • class DecisionTreeRegressor(JavaPredictor, HasWeightCol, DecisionTreeParams, TreeRegressorParams,
  • class DecisionTreeModel(JavaPredictionModel):
  • class RandomForestRegressor(JavaPredictor, HasSeed, RandomForestParams, TreeRegressorParams,
  • class GBTRegressor(JavaPredictor, GBTRegressorParams, HasCheckpointInterval, HasSeed,
  • class AFTSurvivalRegression(JavaPredictor, HasFitIntercept, HasMaxIter, HasTol,
  • class AFTSurvivalRegressionModel(JavaPredictionModel, JavaMLWritable, JavaMLReadable):
  • class GeneralizedLinearRegression(JavaPredictor, HasFitIntercept, HasMaxIter, HasTol, HasRegParam,
  • class GeneralizedLinearRegressionModel(JavaPredictionModel, JavaMLWritable,
  • class JavaPredictorParams(HasLabelCol, HasFeaturesCol, HasPredictionCol):
  • class JavaPredictor(JavaEstimator, JavaPredictorParams):
  • class JavaPredictionModel(JavaModel, JavaPredictorParams):
  • abstract class PartitionTransformExpression extends Expression with Unevaluable
  • case class Years(child: Expression) extends PartitionTransformExpression
  • case class Months(child: Expression) extends PartitionTransformExpression
  • case class Days(child: Expression) extends PartitionTransformExpression
  • case class Hours(child: Expression) extends PartitionTransformExpression
  • case class Bucket(numBuckets: Literal, child: Expression) extends PartitionTransformExpression
  • implicit class OptionsHelper(options: Map[String, String])
  • trait WriteConfigMethods[R]
  • trait CreateTableWriter[T] extends WriteConfigMethods[CreateTableWriter[T]]

@SparkQA
Copy link

SparkQA commented Sep 22, 2019

Test build #111164 has finished for PR 25834 at commit 4f35fad.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 22, 2019

Test build #111165 has finished for PR 25834 at commit 2a86d9c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member

maropu commented Sep 23, 2019

Hi, @MaxGekk , thanks for the work! one question: don't we need some summary documents for these PgSQL-specific behaviour? If we add more and more features for this flag in future, it might be difficult for users to follow them. cc: @gatorsmile @dongjoon-hyun

@cloud-fan
Copy link
Contributor

We need a document page to explain the pgsql dialect. There will be a lot of things and it's too much to put in the SQLConf. Also cc @gengliangwang

@MaxGekk
Copy link
Member Author

MaxGekk commented Sep 23, 2019

Any ideas where I can create such page about supported features of PostgreSQL dialect?

@SparkQA
Copy link

SparkQA commented Sep 23, 2019

Test build #111188 has finished for PR 25834 at commit ea42053.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 23, 2019

Test build #111189 has finished for PR 25834 at commit 53afcdb.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk
Copy link
Member Author

MaxGekk commented Sep 23, 2019

jenkins, retest this, please

@SparkQA
Copy link

SparkQA commented Sep 23, 2019

Test build #111199 has finished for PR 25834 at commit 53afcdb.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 23, 2019

Test build #111221 has finished for PR 25834 at commit 94abaea.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

With hindsight, I think this feature (special datetime strings) doesn't conflict with Spark and seems very intuitive. Do we really need to control it by a flag?

@MaxGekk
Copy link
Member Author

MaxGekk commented Sep 25, 2019

Do we really need to control it by a flag?

I would keep the feature without any flags. For me it makes sense to hide it only if the feature brings any performance penalty but it should not (I do believe but we can recheck this) because condition for branching should be cheap.

@cloud-fan
Copy link
Contributor

Even if pgsql doesn't have this feature, I would accept this feature as an improvement to Spark. Shall we simply add a migration guide instead of protecting it via a flag? cc @gatorsmile @gengliangwang

@gengliangwang
Copy link
Member

This feature doesn't conflict with the existing behavior of Spark. I think we don't need the dialect flag here.

@maropu
Copy link
Member

maropu commented Sep 26, 2019

I think we need a consistent rule for that... for example, the case 2 in #25697 is conflict with the existing behaivours?

@gengliangwang
Copy link
Member

the case 2 in #25697 is conflict with the existing behaivours?

I think so. Previously the result of cast('tru' as boolean) is not true in Spark.

@MaxGekk
Copy link
Member Author

MaxGekk commented Sep 26, 2019

Please, let me know what I should do next 1. either rebase this PR or 2. open another one for updating the SQL migration guide.

@cloud-fan
Copy link
Contributor

I prefer to do 2. This is a nice feature to have in Spark. cast('tru' as boolean) is different as it's less intuitive and looks like pgsql-specific.

@MaxGekk
Copy link
Member Author

MaxGekk commented Sep 27, 2019

Please, take a look at #25948

@MaxGekk MaxGekk deleted the special-values-under-config branch October 15, 2019 19:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
7 participants