[SPARK-17160] Properly escape field names in code-generated error messages #15156

JoshRosen · 2016-09-19T22:31:19Z

This patch addresses a corner-case escaping bug where field names which contain special characters were unsafely interpolated into error message string literals in generated Java code, leading to compilation errors.

This patch addresses these issues by using addReferenceObj to store the error messages as string fields rather than inline string constants.

davies · 2016-09-19T22:58:02Z

LGTM

SparkQA · 2016-09-20T00:33:35Z

Test build #65619 has finished for PR 15156 at commit 35b62d5.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

JoshRosen · 2016-09-20T03:20:03Z

I'm going to merge this to master and branch-2.0. Thanks!

…sages This patch addresses a corner-case escaping bug where field names which contain special characters were unsafely interpolated into error message string literals in generated Java code, leading to compilation errors. This patch addresses these issues by using `addReferenceObj` to store the error messages as string fields rather than inline string constants. Author: Josh Rosen <joshrosen@databricks.com> Closes #15156 from JoshRosen/SPARK-17160. (cherry picked from commit e719b1c) Signed-off-by: Josh Rosen <joshrosen@databricks.com>

…gations ## What changes were proposed in this pull request? If I use the function regexp_extract, and then in my regex string, use `\`, i.e. escape character, this fails codegen, because the `\` character is not properly escaped when codegen'd. Example stack trace: ``` /* 059 */ private int maxSteps = 2; /* 060 */ private int numRows = 0; /* 061 */ private org.apache.spark.sql.types.StructType keySchema = new org.apache.spark.sql.types.StructType().add("date_format(window#325.start, yyyy-MM-dd HH:mm)", org.apache.spark.sql.types.DataTypes.StringType) /* 062 */ .add("regexp_extract(source#310.description, ([a-zA-Z]+)\[.*, 1)", org.apache.spark.sql.types.DataTypes.StringType); /* 063 */ private org.apache.spark.sql.types.StructType valueSchema = new org.apache.spark.sql.types.StructType().add("sum", org.apache.spark.sql.types.DataTypes.LongType); /* 064 */ private Object emptyVBase; ... org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 62, Column 58: Invalid escape sequence at org.codehaus.janino.Scanner.scanLiteralCharacter(Scanner.java:918) at org.codehaus.janino.Scanner.produce(Scanner.java:604) at org.codehaus.janino.Parser.peekRead(Parser.java:3239) at org.codehaus.janino.Parser.parseArguments(Parser.java:3055) at org.codehaus.janino.Parser.parseSelector(Parser.java:2914) at org.codehaus.janino.Parser.parseUnaryExpression(Parser.java:2617) at org.codehaus.janino.Parser.parseMultiplicativeExpression(Parser.java:2573) at org.codehaus.janino.Parser.parseAdditiveExpression(Parser.java:2552) ``` In the codegend expression, the literal should use `\\` instead of `\` A similar problem was solved here: apache#15156. ## How was this patch tested? Regression test in `DataFrameAggregationSuite` Author: Burak Yavuz <brkyvz@gmail.com> Closes apache#16361 from brkyvz/reg-break.

JoshRosen added 7 commits September 19, 2016 14:15

Add regression test for SPARK-17160

512c114

Fix SPARK-17160 via additional escaping.

39eeed4

Add regression test for similar bug in AssertTrue

1430068

Fix using codegen context references.

39e2e02

Similar fix in PrintToStderr

4534705

Use same fix for GetExternalRowField

17a9fce

Similar change in ValidateExternalType

35b62d5

JoshRosen changed the title ~~[SPARK-17160] Properly escape field names in code generation error messages~~ [SPARK-17160] Properly escape field names in code-generated error messages Sep 19, 2016

asfgit closed this in e719b1c Sep 20, 2016

brkyvz mentioned this pull request Dec 20, 2016

[SPARK-18952] Regex strings not properly escaped in codegen for aggregations #16361

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-17160] Properly escape field names in code-generated error messages #15156

[SPARK-17160] Properly escape field names in code-generated error messages #15156

JoshRosen commented Sep 19, 2016

davies commented Sep 19, 2016

SparkQA commented Sep 20, 2016

JoshRosen commented Sep 20, 2016

[SPARK-17160] Properly escape field names in code-generated error messages #15156

[SPARK-17160] Properly escape field names in code-generated error messages #15156

Conversation

JoshRosen commented Sep 19, 2016

davies commented Sep 19, 2016

SparkQA commented Sep 20, 2016

JoshRosen commented Sep 20, 2016