-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-3421][SQL] Allows arbitrary character in StructField.name #2291
Conversation
@@ -65,7 +70,7 @@ object DataType extends RegexParsers { | |||
"false" ^^^ false | |||
|
|||
protected lazy val structType: Parser[DataType] = | |||
"StructType\\([A-zA-z]*\\(".r ~> repsep(structField, ",") <~ "))" ^^ { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this should be a typo, took the chance to fix it.
ok to test |
"""))""" | ||
).mkString | ||
|
||
assert(catalyst.types.DataType(structTypeString) === expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is kind of a Nit, but I think I'd prefer tests that just roundtrip StructFields that have various weird characters instead of those that are dependent on the exact output. That would test for the desired behavior but would not have to be rewritten if we ever change the format. (I mostly say this because I just spent the last hour rewriting brittle parquet tests :) )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, some ScalaCheck style random generated input may be helpful here.
Jenkins, test this please |
QA tests have started for PR 2291 at commit
|
QA tests have finished for PR 2291 at commit
|
Jenkins, test this please. |
QA tests have started for PR 2291 at commit
|
QA tests have finished for PR 2291 at commit
|
QA tests have started for PR 2291 at commit
|
@liancheng Do you plan to fix this in Python? |
QA tests have started for PR 2291 at commit
|
QA tests have finished for PR 2291 at commit
|
@davies Oh, actually I didn't even realize that this issue also exists in PySpark.. So basically I only need to rewrite |
QA tests have finished for PR 2291 at commit
|
QA tests have started for PR 2291 at commit
|
QA tests have finished for PR 2291 at commit
|
@liancheng I think The most important part would be create Row() class using the name of field as name of attributes. |
The last build failure was caused by streaming suites. But I do need to update the data type parsing logic in Python. |
PR #2563 supersedes this one. Closing. |
StructField.toString
now quotes thename
field and escapes backslashes and double quotes within the string. TheDataType
parser is also updated to parse double quoted string asStructField
name.UPDATE Spark SQL Python binding is also updated.