Skip to content

[SPARK-45748][SQL] Add a .fromSQL helper function for Literals#43612

Closed
anchovYu wants to merge 2 commits intoapache:masterfrom
anchovYu:literal-serde
Closed

[SPARK-45748][SQL] Add a .fromSQL helper function for Literals#43612
anchovYu wants to merge 2 commits intoapache:masterfrom
anchovYu:literal-serde

Conversation

@anchovYu
Copy link
Contributor

@anchovYu anchovYu commented Nov 1, 2023

What changes were proposed in this pull request?

Two major changes in the PR

  1. Adds a fromSQL helper function for Literal object to parse and analyze the given sql value and data type of the literal, and construct a Literal. Together with .sql, these two methods provide handy function to serialize and deserialize a Literal.
  2. Changes the timestamp (ltz) .sql output to adds the timezone information: s"TIMESTAMP '$toString${SQLConf.get.sessionLocalTimeZone}'". This is to help correctly deserialize the value.

Why are the changes needed?

Provide handy helper functions.

Does this PR introduce any user-facing change?

One minor change for the .sql output for the TIMESTAMP type.

How was this patch tested?

Added new tests.

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the SQL label Nov 1, 2023
@anchovYu anchovYu changed the title [SQL]WIP [SPARK-45748][SQL] Add a .fromSQL helper function for Literals Nov 1, 2023
@github-actions github-actions bot added the DOCS label Nov 1, 2023
}

val dataType = DataType.fromJson(dataTypeJson)
val analyzedExpr = if (dataType.sameType(CalendarIntervalType)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should check if it contains CalendarIntervalType.

*/
private[sql] def fromSQL(valueSqlStr: String, dataTypeJson: String): Literal = {
def parseAndAnalyze(valueSqlStr: String, dataType: DataType): Expression = {
lazy val parser = new CatalystSqlParser()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
lazy val parser = new CatalystSqlParser()
val parser = new CatalystSqlParser()

"PARSE_FAILURE", valueSqlStr, dataType, Some(e))
}

val analyzer: Analyzer = ResolveDefaultColumns.DefaultColumnAnalyzer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can move DefaultColumnAnalyzer to a more common place now.

throw QueryCompilationErrors.invalidLiteralValueSQLStringForDeserialization(
"ANALYSIS_FAILURE", valueSqlStr, dataType, Some(e))
}
analyzedPlan.collectFirst { case Project(Seq(a: Alias), OneRowRelation()) => a.child }.get
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

analyzedPlan.asInstanceOf[Project].projectList.head.asInstanceOf[Alias].child

* @param dataTypeJson the json format data type of the Literal
* @throws AnalysisException INVALID_LITERAL_VALUE_SQL_STRING_FOR_DESERIALIZATION
*/
private[sql] def fromSQL(valueSqlStr: String, dataTypeJson: String): Literal = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we use DDL string of the type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DDL does not preserves those containsNull fields while json does.

Copy link
Contributor

@beliefer beliefer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For simplicity, shall we split this PR to two?

Comment on lines +585 to +586


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

// )
// )
// )
// )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we retain the line starts with // TODO and delete others?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However the TODO itself is about enabling the commented out test below : )

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can change it to
TODO(anchovyu): fix the map value comparison problem

* @throws AnalysisException INVALID_LITERAL_VALUE_SQL_STRING_FOR_DESERIALIZATION
*/
private[sql] def fromSQL(valueSqlStr: String, dataTypeJson: String): Literal = {
def parseAndAnalyze(valueSqlStr: String, dataType: DataType): Expression = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is very similar to

Could we reuse the ResolveDefaultColumns.analyze?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about that too but I found that ResolveDefaultColumns.analyze has 1) more column default specific error and error messages 2) has more limitation on the supported case. I also found it can return a non-Literal value as return, e.g. a Cast.
Instead I think in the future ResolveDefaultColumns could refactor to use this general helper function.

]
}
},
"sqlState" : "42894"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I this a user error? Shouldn't it be an internal error? I.e. we expect it to work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit tricky that in this PR it's a helper function that accepts any string. It could be a user error that user passes in a random string.

@anchovYu anchovYu closed this Nov 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants