Skip to content

[SPARK-45697][BUILD] Fix Unicode escapes in triple quoted strings are deprecated#43603

Closed
panbingkun wants to merge 11 commits intoapache:masterfrom
panbingkun:SPARK-45697
Closed

[SPARK-45697][BUILD] Fix Unicode escapes in triple quoted strings are deprecated#43603
panbingkun wants to merge 11 commits intoapache:masterfrom
panbingkun:SPARK-45697

Conversation

@panbingkun
Copy link
Contributor

@panbingkun panbingkun commented Oct 31, 2023

What changes were proposed in this pull request?

The pr aims to fix Unicode escapes in triple quoted strings are deprecated in scala 2.13.

Why are the changes needed?

Eliminate compile warnings.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

  • Pass GA.
  • Manually test:
    ./build/sbt "catalyst/testOnly org.apache.spark.sql.catalyst.parser.PlanParserSuite"
    

Was this patch authored or co-authored using generative AI tooling?

No.

// reduce the cost of migration in subsequent versions.
"-Wconf:cat=deprecation&msg=it will become a keyword in Scala 3:e"
"-Wconf:cat=deprecation&msg=it will become a keyword in Scala 3:e",
"-Wconf:cat=deprecation&msg=Unicode escapes in triple quoted strings are deprecated:e"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will be compile error in Scala 3?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only for test in GA.
I want it to run through GA first and see if there are any other similar issues besides the above modifications.
I will delete it later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will nothing in Scala 3.3.1, neither a warning nor an error.

  • scala 3.3.1
    image

  • scala 2.13.8
    image

List.empty, List.empty, None, None, false)))

// verify with ROW FORMAT DELIMETED
val collectionItemsTerminated1 = '\u0002'
Copy link
Contributor

@LuciferYang LuciferYang Oct 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • '\u0002' corresponds to "STX" (Start of Text), a non-printing character used to mark the beginning of text.
  • '\u0003' corresponds to "ETX" (End of Text), a non-printing character used to mark the end of text.
  • '\u0004' corresponds to "EOT" (End of Transmission), a non-printing character used to mark the end of transmission.
  • '\u0005' corresponds to "ENQ" (Enquiry), a non-printing character used to inquire whether the receiving device is ready to receive data.

I'm not sure if this cases really need to test using non-printing characters as delimiters, can we use printable characters for testing, so we can replace them with the actual corresponding characters.

If indeed we need to test using non-printing characters as delimiters, I think we can add a @nowarn annotation to suppress warnings at a fine granularity like:

    // verify with ROW FORMAT DELIMITED
    @nowarn("cat=deprecation")
    val sqlWithRowFormatDelimiters: String =
      """
        |SELECT TRANSFORM(a, b, c)
        |  ROW FORMAT DELIMITED
        |  FIELDS TERMINATED BY '\t'
        |  COLLECTION ITEMS TERMINATED BY '\u0002'
        |  MAP KEYS TERMINATED BY '\u0003'
        |  LINES TERMINATED BY '\n'
        |  NULL DEFINED AS 'null'
        |  USING 'cat' AS (a, b, c)
        |  ROW FORMAT DELIMITED
        |  FIELDS TERMINATED BY '\t'
        |  COLLECTION ITEMS TERMINATED BY '\u0004'
        |  MAP KEYS TERMINATED BY '\u0005'
        |  LINES TERMINATED BY '\n'
        |  NULL DEFINED AS 'NULL'
        |FROM testData
    """.stripMargin

    assertEqual(
      sqlWithRowFormatDelimiters,

@panbingkun
Copy link
Contributor Author

image image image

|FROM testData
""".stripMargin,
// verify with ROW FORMAT DELIMITED
val sqlWithRowFormatDelimited =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks a bit ugly ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

val sqlWithRowFormatDelimited =
"SELECT TRANSFORM(a, b, c)" + "\n" +
" ROW FORMAT DELIMITED" + "\n" +
""" FIELDS TERMINATED BY '\t'""" + "\n" +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

friendly ping @AngersZhuuuu , do you have any suggestions about this deprecated usage issue

@panbingkun panbingkun marked this pull request as ready for review November 1, 2023 12:05
@panbingkun
Copy link
Contributor Author

image image image

| NULL DEFINED AS 'NULL'
|FROM testData
""".stripMargin,
"SELECT TRANSFORM(a, b, c)\n" +
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although this writing may seem ugly, it seems better than the previous version, haha

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May I ask the status of this PR, @panbingkun ?

@panbingkun
Copy link
Contributor Author

panbingkun commented Nov 16, 2023

May I ask the status of this PR, @panbingkun ?

@dongjoon-hyun My modification principle is to eliminate the warning and maintain the original string semantics, but it is indeed ugly;while @LuciferYang 's suggestion is to eliminate the warning, but bypass it with a special annotation.
Which modification method do you think is more appropriate? We would also like to hear your suggestions, many thanks!
(PS: I actually think both are acceptable, but a judgment and decision are needed. 😄)

@LuciferYang
Copy link
Contributor

LuciferYang commented Nov 17, 2023

To clarify: I am more inclined to understand the original intent of the test. If the goal is just to test different separators, we can use the literals of visible separators to eliminate this compile warning. If the original intent was to test invisible separators, then I think we can simply suppress the warning in this test.

@panbingkun
Copy link
Contributor Author

To clarify: I am more inclined to understand the original intent of the test. If the goal is just to test different separators, we can use the literals of visible separators to eliminate this compile warning. If the original intent was to test invisible separators, then I think we can simply suppress the warning in this test.

Okay, let's suppress the warning with annotations first.
I tested it locally and it was okay.


// verify with ROW FORMAT DELIMETED
assertEqual(
@nowarn("cat=deprecation")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fine-grained suppressing this case with @nowarn if fine to me

pom.xml Outdated
reduce the cost of migration in subsequent versions.
-->
<arg>-Wconf:cat=deprecation&amp;msg=it will become a keyword in Scala 3:e</arg>
<arg>-Wconf:cat=deprecation&amp;msg=Unicode escapes in triple quoted strings are deprecated:e</arg>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@panbingkun please revert the change of SparkBuild.scala and pom.xml

Copy link
Contributor Author

@panbingkun panbingkun Nov 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, done.

@github-actions github-actions bot removed the BUILD label Nov 21, 2023
@LuciferYang
Copy link
Contributor

Merged into master for Spark 4.0. Thanks @panbingkun

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants