-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-50067][SQL] Codegen Support for SchemaOfCsv(by Invoke & RuntimeReplaceable) #48595
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
nullableSchema: DataType, | ||
nameOfCorruptRecord: String, | ||
timeZoneId: Option[String], | ||
variantAllowDuplicateKeys: Boolean) extends Serializable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not related to this PR, as we used the case class
here, which implements Serializable
by default, so we can remove it.
import org.apache.spark.unsafe.types.UTF8String | ||
|
||
case class SchemaOfCsvEvaluator(options: Map[String, String]) { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can share
some objects to avoid creating them every time
, thereby reducing
the pressure on GC
+1, LGTM. Merging to master. |
Thanks @MaxGekk ❤️ |
…lable as false ### What changes were proposed in this pull request? The pr is following up [schema_of_json](#48473), [schema_of_xml](#48594) and [schema_of_csv](#48595), to make returnNullable as false. ### Why are the changes needed? As `cloud-fan`'s comment https://github.com/apache/spark/pull/48594/files#r1860534460, we should follow the original logic, otherwise it's a regression. https://github.com/apache/spark/blob/1a502d32ef5a69739e10b827be4c9063b2a20493/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala#L846 https://github.com/apache/spark/blob/1a502d32ef5a69739e10b827be4c9063b2a20493/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/xmlExpressions.scala#L166 https://github.com/apache/spark/blob/1a502d32ef5a69739e10b827be4c9063b2a20493/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/csvExpressions.scala#L141 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #48987 from panbingkun/SPARK-50066_FOLLOWUP. Authored-by: panbingkun <panbingkun@apache.org> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
The pr aims to add
Codegen
Support forschema_of_csv
.Why are the changes needed?
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Pass GA & Existed UT (eg: CsvFunctionsSuite#
*schema_of_csv*
)Was this patch authored or co-authored using generative AI tooling?
No.