Skip to content

[SPARK-49240][BUILD] Add scalastyle and checkstyle rules to avoid URL constructors#47762

Closed
dongjoon-hyun wants to merge 2 commits intoapache:masterfrom
dongjoon-hyun:SPARK-49240
Closed

[SPARK-49240][BUILD] Add scalastyle and checkstyle rules to avoid URL constructors#47762
dongjoon-hyun wants to merge 2 commits intoapache:masterfrom
dongjoon-hyun:SPARK-49240

Conversation

@dongjoon-hyun
Copy link
Copy Markdown
Member

@dongjoon-hyun dongjoon-hyun commented Aug 14, 2024

What changes were proposed in this pull request?

This PR aims to add scalastyle and checkstyle rules to avoid URL constructors.

Why are the changes needed?

The java.net.URL class does not itself encode or decode any URL components according to the escaping mechanism defined in RFC2396.

So, from Java 20, all URL constructors are deprecated. We had better use better URI class.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Pass the CIs with newly added rules.

After this PR, there is only two exceptional instances in JettyUtils.scala and UISuite.scala.

  • JettyUtils is tricky instance
  • UISuite test case is supposed to add bad URL which URI prevents with java.net.URISyntaxException. This is an example why URI is better. In this PR, we keep the old, URL class, to keep the test coverage.
$ git grep -C1 'new URL('
core/src/main/scala/org/apache/spark/ui/JettyUtils.scala-        // scalastyle:off URLConstructor
core/src/main/scala/org/apache/spark/ui/JettyUtils.scala:        val newUrl = new URL(requestURL, prefixedDestPath).toString
core/src/main/scala/org/apache/spark/ui/JettyUtils.scala-        // scalastyle:on URLConstructor
--
core/src/test/scala/org/apache/spark/ui/UISuite.scala-      // scalastyle:off URLConstructor
core/src/test/scala/org/apache/spark/ui/UISuite.scala:      val badRequest = new URL(
core/src/test/scala/org/apache/spark/ui/UISuite.scala-        s"http://$localhost:${serverInfo.boundPort}$path/root?bypass&invalid<=foo")

Was this patch authored or co-authored using generative AI tooling?

No.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a little tricky to change. So, this PR allows this as the only exception.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

val requestURI = new URI(request.getRequestURL.toString)
val newUri = requestURI.resolve(prefixedDestPath).toString
response.sendRedirect(newUri)

Will there be any problems with this change?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me check. Thanks, @LuciferYang .

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To @LuciferYang , I checked multiple corner cases. And, it's not the same in some cases like the following. I'd prefer to keep the existing logic to avoid any risk here.

$ jshell
|  Welcome to JShell -- Version 21.0.4
|  For an introduction type: /help intro

jshell> new URL(new URL("https://a"), "?index.html")
$1 ==> https://a/?index.html

jshell> new URI("https://a").resolve("?index.html")
$2 ==> https://a?index.html

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, this PR aims to prevent any future addition of new URL( instead of removing everyone instances of new URL(.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

@dongjoon-hyun
Copy link
Copy Markdown
Member Author

Could you review this when you have some time, @LuciferYang ?

@dongjoon-hyun dongjoon-hyun force-pushed the SPARK-49240 branch 2 times, most recently from d67ef0d to 5ef3706 Compare August 14, 2024 22:44
// Try a request with bad content in a parameter to make sure the security filter
// is being added to new handlers.
// scalastyle:off URLConstructor
val badRequest = new URL(
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To keep the existing test coverage, we need to keep URL because it doesn't complain while URI throws URISyntaxException.

[info] - add and remove handlers with custom user filter *** FAILED *** (92 milliseconds)
[info]   java.net.URISyntaxException: Illegal character in query at index 47: http://127.0.0.1:54129/test/root?bypass&invalid<=foo

@dongjoon-hyun
Copy link
Copy Markdown
Member Author

Thank you, @HyukjinKwon . I excluded one more instance to keep the existing test coverage.

@@ -351,7 +351,7 @@ private class TestMasterInfo(val ip: String, val dockerId: DockerId, val logFile
def readState(): Unit = {
try {
val masterStream = new InputStreamReader(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not related to this PR, but it seems that masterStream has not been closed.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

val requestURI = new URI(request.getRequestURL.toString)
val newUri = requestURI.resolve(prefixedDestPath).toString
response.sendRedirect(newUri)

Will there be any problems with this change?

Comment thread scalastyle-config.xml
]]></customMessage>
</check>

<check customId="URLConstructor" level="error" class="org.scalastyle.file.RegexChecker" enabled="true">
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this rule also be added to dev/checkstyle.xml?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's already added by this PR, here.

  • spark/dev/checkstyle.xml

    Lines 209 to 212 in b1af69e

    <module name="RegexpSinglelineJava">
    <property name="format" value="new URL\("/>
    <property name="message" value="Use URI.toURL or URL.of instead of URL constructors." />
    </module>

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got

@dongjoon-hyun
Copy link
Copy Markdown
Member Author

Could you review this once more, @LuciferYang ?

Copy link
Copy Markdown
Contributor

@LuciferYang LuciferYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@LuciferYang
Copy link
Copy Markdown
Contributor

Merged into master. Thanks @dongjoon-hyun and @HyukjinKwon

@dongjoon-hyun
Copy link
Copy Markdown
Member Author

Thank you, @LuciferYang and @HyukjinKwon !

@dongjoon-hyun dongjoon-hyun deleted the SPARK-49240 branch August 15, 2024 04:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants