[SPARK-22442][SQL][BRANCH-2.2] ScalaReflection should produce correct field names for special characters #19734

viirya · 2017-11-13T02:52:50Z

What changes were proposed in this pull request?

For a class with field name of special characters, e.g.:

case class MyType(`field.1`: String, `field 2`: String)

Although we can manipulate DataFrame/Dataset, the field names are encoded:

scala> val df = Seq(MyType("a", "b"), MyType("c", "d")).toDF
df: org.apache.spark.sql.DataFrame = [field$u002E1: string, field$u00202: string]
scala> df.as[MyType].collect
res7: Array[MyType] = Array(MyType(a,b), MyType(c,d))

It causes resolving problem when we try to convert the data with non-encoded field names:

spark.read.json(path).as[MyType]
...
[info]   org.apache.spark.sql.AnalysisException: cannot resolve '`field$u002E1`' given input columns: [field 2, fie
ld.1];
[info]   at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
...

We should use decoded field name in Dataset schema.

How was this patch tested?

Added tests.

…ters.

viirya · 2017-11-13T02:56:02Z

cc @cloud-fan The backport of SPARK-22442 to branch 2.2.

SparkQA · 2017-11-13T05:17:00Z

Test build #83763 has finished for PR 19734 at commit 8d3fd95.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

felixcheung · 2017-11-13T05:17:12Z

ok thanks, I'm above to tag 2.2.1. technically this isn't a regression but I could wait for a few hours (need to wait for jenkins build from the branch) if we could merge this ASAP

felixcheung · 2017-11-13T05:19:32Z

merged to 2.2

… field names for special characters ## What changes were proposed in this pull request? For a class with field name of special characters, e.g.: ```scala case class MyType(`field.1`: String, `field 2`: String) ``` Although we can manipulate DataFrame/Dataset, the field names are encoded: ```scala scala> val df = Seq(MyType("a", "b"), MyType("c", "d")).toDF df: org.apache.spark.sql.DataFrame = [field$u002E1: string, field$u00202: string] scala> df.as[MyType].collect res7: Array[MyType] = Array(MyType(a,b), MyType(c,d)) ``` It causes resolving problem when we try to convert the data with non-encoded field names: ```scala spark.read.json(path).as[MyType] ... [info] org.apache.spark.sql.AnalysisException: cannot resolve '`field$u002E1`' given input columns: [field 2, fie ld.1]; [info] at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) ... ``` We should use decoded field name in Dataset schema. ## How was this patch tested? Added tests. Author: Liang-Chi Hsieh <viirya@gmail.com> Closes #19734 from viirya/SPARK-22442-2.2.

viirya · 2017-11-13T05:23:22Z

Thanks @felixcheung

felixcheung · 2017-11-13T05:34:57Z

@viirya could you take a look https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-2.2-compile-sbt-scala-2.10/724/

viirya · 2017-11-13T06:06:05Z

@felixcheung Yes. Looking into it.

viirya · 2017-11-13T06:24:45Z

val TermName: TermNameExtractor is new in scala 2.11. For 2.10, we should use deprecated newTermName. I will submit a follow-up.

… field names for special characters ## What changes were proposed in this pull request? For a class with field name of special characters, e.g.: ```scala case class MyType(`field.1`: String, `field 2`: String) ``` Although we can manipulate DataFrame/Dataset, the field names are encoded: ```scala scala> val df = Seq(MyType("a", "b"), MyType("c", "d")).toDF df: org.apache.spark.sql.DataFrame = [field$u002E1: string, field$u00202: string] scala> df.as[MyType].collect res7: Array[MyType] = Array(MyType(a,b), MyType(c,d)) ``` It causes resolving problem when we try to convert the data with non-encoded field names: ```scala spark.read.json(path).as[MyType] ... [info] org.apache.spark.sql.AnalysisException: cannot resolve '`field$u002E1`' given input columns: [field 2, fie ld.1]; [info] at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) ... ``` We should use decoded field name in Dataset schema. ## How was this patch tested? Added tests. Author: Liang-Chi Hsieh <viirya@gmail.com> Closes apache#19734 from viirya/SPARK-22442-2.2.

ScalaReflection should produce correct field names for special charac…

8d3fd95

…ters.

viirya closed this Nov 13, 2017

viirya deleted the SPARK-22442-2.2 branch December 27, 2023 18:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-22442][SQL][BRANCH-2.2] ScalaReflection should produce correct field names for special characters #19734

[SPARK-22442][SQL][BRANCH-2.2] ScalaReflection should produce correct field names for special characters #19734

viirya commented Nov 13, 2017

viirya commented Nov 13, 2017

SparkQA commented Nov 13, 2017

felixcheung commented Nov 13, 2017

felixcheung commented Nov 13, 2017

viirya commented Nov 13, 2017

felixcheung commented Nov 13, 2017

viirya commented Nov 13, 2017

viirya commented Nov 13, 2017

[SPARK-22442][SQL][BRANCH-2.2] ScalaReflection should produce correct field names for special characters #19734

[SPARK-22442][SQL][BRANCH-2.2] ScalaReflection should produce correct field names for special characters #19734

Conversation

viirya commented Nov 13, 2017

What changes were proposed in this pull request?

How was this patch tested?

viirya commented Nov 13, 2017

SparkQA commented Nov 13, 2017

felixcheung commented Nov 13, 2017

felixcheung commented Nov 13, 2017

viirya commented Nov 13, 2017

felixcheung commented Nov 13, 2017

viirya commented Nov 13, 2017

viirya commented Nov 13, 2017