Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-3421][SQL] Allows arbitrary character in StructField.name #2291

Closed
wants to merge 5 commits into from

Conversation

liancheng
Copy link
Contributor

StructField.toString now quotes the name field and escapes backslashes and double quotes within the string. The DataType parser is also updated to parse double quoted string as StructField name.

UPDATE Spark SQL Python binding is also updated.

@@ -65,7 +70,7 @@ object DataType extends RegexParsers {
"false" ^^^ false

protected lazy val structType: Parser[DataType] =
"StructType\\([A-zA-z]*\\(".r ~> repsep(structField, ",") <~ "))" ^^ {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this should be a typo, took the chance to fix it.

@liancheng
Copy link
Contributor Author

ok to test

"""))"""
).mkString

assert(catalyst.types.DataType(structTypeString) === expected)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is kind of a Nit, but I think I'd prefer tests that just roundtrip StructFields that have various weird characters instead of those that are dependent on the exact output. That would test for the desired behavior but would not have to be rewritten if we ever change the format. (I mostly say this because I just spent the last hour rewriting brittle parquet tests :) )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, some ScalaCheck style random generated input may be helpful here.

@marmbrus
Copy link
Contributor

Jenkins, test this please

@SparkQA
Copy link

SparkQA commented Sep 10, 2014

QA tests have started for PR 2291 at commit f3d8c98.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Sep 10, 2014

QA tests have finished for PR 2291 at commit f3d8c98.

  • This patch fails unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class Last(child: Expression) extends PartialAggregate with trees.UnaryNode[Expression]
    • case class LastFunction(expr: Expression, base: AggregateExpression) extends AggregateFunction
    • case class Abs(child: Expression) extends UnaryExpression

@marmbrus
Copy link
Contributor

Jenkins, test this please.

@SparkQA
Copy link

SparkQA commented Sep 10, 2014

QA tests have started for PR 2291 at commit f3d8c98.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Sep 10, 2014

QA tests have finished for PR 2291 at commit f3d8c98.

  • This patch fails unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 11, 2014

QA tests have started for PR 2291 at commit f3d8c98.

  • This patch merges cleanly.

@davies
Copy link
Contributor

davies commented Sep 11, 2014

@liancheng Do you plan to fix this in Python?

@SparkQA
Copy link

SparkQA commented Sep 11, 2014

QA tests have started for PR 2291 at commit bb452c8.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Sep 11, 2014

QA tests have finished for PR 2291 at commit f3d8c98.

  • This patch fails unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@liancheng
Copy link
Contributor Author

@davies Oh, actually I didn't even realize that this issue also exists in PySpark.. So basically I only need to rewrite StructField.__repr__ and quote the field name, right?

@SparkQA
Copy link

SparkQA commented Sep 11, 2014

QA tests have finished for PR 2291 at commit bb452c8.

  • This patch fails unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 11, 2014

QA tests have started for PR 2291 at commit e837e2b.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Sep 11, 2014

QA tests have finished for PR 2291 at commit e837e2b.

  • This patch fails unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@davies
Copy link
Contributor

davies commented Sep 11, 2014

@liancheng I think The most important part would be create Row() class using the name of field as name of attributes.

@liancheng
Copy link
Contributor Author

The last build failure was caused by streaming suites.

But I do need to update the data type parsing logic in Python.

@liancheng
Copy link
Contributor Author

PR #2563 supersedes this one. Closing.

@liancheng liancheng closed this Sep 28, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants