Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-33441][BUILD] Add unused-imports compilation check and remove all unused-imports #30351

Closed

Conversation

LuciferYang
Copy link
Contributor

@LuciferYang LuciferYang commented Nov 12, 2020

What changes were proposed in this pull request?

This pr add a new Scala compile arg to pom.xml to defense against new unused imports:

  • -Ywarn-unused-import for Scala 2.12
  • -Wconf:cat=unused-imports:e for Scala 2.13

The other fIles change are remove all unused imports in Spark code

Why are the changes needed?

Cleanup code and add guarantee to defense against new unused imports

Does this PR introduce any user-facing change?

No

How was this patch tested?

Pass the Jenkins or GitHub Action

@HyukjinKwon
Copy link
Member

@LuciferYang, since you're here, can you see if -Ywarn-unused option works? We enabled now Xfatal-warnings by default so the warnings will be converted as errors.

@SparkQA
Copy link

SparkQA commented Nov 12, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35594/

@SparkQA
Copy link

SparkQA commented Nov 12, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35594/

@LuciferYang
Copy link
Contributor Author

@HyukjinKwon Let me have a try ~

@LuciferYang
Copy link
Contributor Author

LuciferYang commented Nov 12, 2020

@HyukjinKwon Looks like -Ywarn-unused can works, but there are some configuration differences between pom.xml and SparkBuild.scala, -Xfatal-warnings is only configured in SparkBuild.scala.

After add -Ywarn-unused to pom.xml, there are some waring messages as follows:

[WARNING] [Warn] /spark/core/src/main/scala-2.12/org/apache/spark/util/TimeStampedHashMap.scala:95: pattern var t in value $anonfun is never used; `t@_' suppresses this warning
[WARNING] [Warn] /spark/core/src/main/scala/org/apache/spark/BarrierTaskContext.scala:24: Unused import
[WARNING] [Warn] /spark/core/src/main/scala/org/apache/spark/ContextCleaner.scala:204: pattern var ie in method keepCleaning is never used; `ie@_' suppresses this warning
[WARNING] [Warn] /spark/core/src/main/scala/org/apache/spark/ContextCleaner.scala:252: parameter value blocking in method doCleanupAccum is never used
[WARNING] [Warn] /spark/core/src/main/scala/org/apache/spark/ExecutorAllocationClient.scala:100: parameter value triggeredByExecutor in method decommissionExecutors is never used
[WARNING] [Warn] /spark/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala:326: private method totalRunningTasksPerResourceProfile in class ExecutorAllocationManager is never used
[WARNING] [Warn] /spark/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala:784: pattern var k in value $anonfun is never used; `k@_' suppresses this warning
[WARNING] [Warn] /spark/core/src/main/scala/org/apache/spark/MapOutputTracker.scala:38: Unused import
[WARNING] [Warn] /spark/core/src/main/scala/org/apache/spark/MapOutputTracker.scala:139: pattern var e in method updateMapOutput is never used; `e@_' suppresses this warning
[WARNING] [Warn] /spark/core/src/main/scala/org/apache/spark/MapOutputTracker.scala:472: pattern var ie in method run is never used; `ie@_' suppresses this warning
[WARNING] [Warn] /spark/core/src/main/scala/org/apache/spark/MapOutputTracker.scala:698: pattern var loc in value $anonfun is never used; `loc@_' suppresses this warning
[WARNING] [Warn] /spark/core/src/main/scala/org/apache/spark/MapOutputTracker.scala:283: parameter value conf in class MapOutputTrackerMasterEndpoint is never used
[WARNING] [Warn] /spark/core/src/main/scala/org/apache/spark/MapOutputTracker.scala:308: parameter value conf in class MapOutputTracker is never used
[WARNING] [Warn] /spark/core/src/main/scala/org/apache/spark/MapOutputTracker.scala:426: parameter value i in value $anonfun is never used
[WARNING] [Warn] /spark/core/src/main/scala/org/apache/spark/MapOutputTracker.scala:519: parameter value x in value $anonfun is never used
[WARNING] [Warn] /spark/core/src/main/scala/org/apache/spark/MapOutputTracker.scala:671: parameter value numReducers in method getLocationsWithLargestOutputs is never used

After add -Ywarn-unused and -Xfatal-warnings to pom.xml , the[WARNING] convert to [ERROR], maybe we can use -Ywarn-unused-import to clean up unused imports first?

@HyukjinKwon
Copy link
Member

HyukjinKwon commented Nov 12, 2020

Yeah, we can add it into pom.xml too. Yes, I think we can try with Ywarn-unused-import first. How many warnings/errors are there in the whole project?

@SparkQA
Copy link

SparkQA commented Nov 12, 2020

Test build #130988 has finished for PR 30351 at commit dfeeb13.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@github-actions github-actions bot added the BUILD label Nov 12, 2020
@LuciferYang
Copy link
Contributor Author

Address 986ffe5 add unused-imports related compilation warning args, but there are some interesting things:

  • one is compiler think scala.language.xxx is unused import, such as import scala.language.postfixOps in BarrierTaskContext

  • on the other hand, compiler think imports used in comments is unused, such as import org.apache.spark.util.collection.OpenHashMap in ResourceAllocator

import org.apache.spark.util.collection.OpenHashMap
/**
* Trait used to help executor/worker allocate resources.
* Please note that this is intended to be used in a single thread.
*/
trait ResourceAllocator {
protected def resourceName: String
protected def resourceAddresses: Seq[String]
protected def slotsPerAddress: Int
/**
* Map from an address to its availability, a value > 0 means the address is available,
* while value of 0 means the address is fully assigned.
*
* For task resources ([[org.apache.spark.scheduler.ExecutorResourceInfo]]), this value
* can be a multiple, such that each address can be allocated up to [[slotsPerAddress]]
* times.
*
* TODO Use [[OpenHashMap]] instead to gain better performance.
*/

@LuciferYang
Copy link
Contributor Author

LuciferYang commented Nov 12, 2020

Seems GitHub Action compile with -Xfatal-warnings default? all github action check failed .... haha, I will do statistics locally first @HyukjinKwon

@SparkQA
Copy link

SparkQA commented Nov 12, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35610/

@LuciferYang
Copy link
Contributor Author

LuciferYang commented Nov 12, 2020

@HyukjinKwon
compile with mvn clean install -DskipTests -Phadoop-3.2 -Phive-2.3 -Phadoop-cloud -Pmesos -Pyarn -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive

There are total 505 Unused import warnings printed with this pr, 291 in src dir and 214 in test dir, involves 308 files

@HyukjinKwon
Copy link
Member

Yes, GitHub Actions build use SBT with Xfatal-warnings on by default. Can you file a JIRA and do it all in this PR?

@HyukjinKwon
Copy link
Member

You'll probably have to specify -Xfatal-warnings in https://github.com/apache/spark/blob/master/project/SparkBuild.scala#L219

@LuciferYang
Copy link
Contributor Author

Yes, GitHub Actions build use SBT with Xfatal-warnings on by default. Can you file a JIRA and do it all in this PR?

OK ~ It doesn't look like a minor anymore, It's a little late today in my timezone and I will try to finish the work tomorrow :)

@SparkQA
Copy link

SparkQA commented Nov 12, 2020

Test build #131006 has finished for PR 30351 at commit 986ffe5.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 12, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35610/

@LuciferYang LuciferYang changed the title [MINOR][CORE] Remove unused imports in core module [SPARK-33441][CORE] Remove unused imports in core module Nov 13, 2020
@SparkQA
Copy link

SparkQA commented Nov 13, 2020

Test build #131042 has finished for PR 30351 at commit 6fbf7c8.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@LuciferYang
Copy link
Contributor Author

@LuciferYang, let's leave it open for few more days in case other people have some more comments.

OK ~

@SparkQA
Copy link

SparkQA commented Nov 18, 2020

Test build #131279 has finished for PR 30351 at commit f6f8cb7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@LuciferYang
Copy link
Contributor Author

Address d1d5f72 merge with master and Address 2addc8f fixed new added.

@SparkQA
Copy link

SparkQA commented Nov 18, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35896/

@SparkQA
Copy link

SparkQA commented Nov 18, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35896/

@SparkQA
Copy link

SparkQA commented Nov 18, 2020

Test build #131294 has finished for PR 30351 at commit 2addc8f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

Okay, this gets conflicted easily. Let's merge this in, @LuciferYang can you resolve the conflicts?

@SparkQA
Copy link

SparkQA commented Nov 19, 2020

Test build #131311 has finished for PR 30351 at commit 3498654.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class ExecutorSource(
  • implicit class MetadataColumnsHelper(metadata: Array[MetadataColumn])

@SparkQA
Copy link

SparkQA commented Nov 19, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35914/

@LuciferYang
Copy link
Contributor Author

@HyukjinKwon done ~ Address 3498654 resolve the conflicts and Address ef2ff08 fix new added

@SparkQA
Copy link

SparkQA commented Nov 19, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35914/

@SparkQA
Copy link

SparkQA commented Nov 19, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35918/

@SparkQA
Copy link

SparkQA commented Nov 19, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35918/

@HyukjinKwon
Copy link
Member

Merged to master.

@SparkQA
Copy link

SparkQA commented Nov 19, 2020

Test build #131315 has finished for PR 30351 at commit ef2ff08.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@LuciferYang
Copy link
Contributor Author

Thanks for your review ~ @HyukjinKwon @srowen

Copy link
Member

@MaxGekk MaxGekk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is dev/scalastyle supposed to catch unused imports?

$ ./dev/scalastyle
Scalastyle checks passed.
$ build/sbt -Phadoop-3.2 -Phive-2.3 -Pyarn -Phadoop-cloud -Phive-thriftserver -Pkubernetes -Pmesos -Phive -Pkinesis-asl -Pspark-ganglia-lgpl test:package streaming-kinesis-asl-assembly/assembly
[error] /Users/maximgekk/proj/show-partitions-exec-v2-test/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala:23:105: Unused import
[error] import org.apache.spark.sql.catalyst.analysis.{ResolvedNamespace, ResolvedPartitionSpec, ResolvedTable, ResolvedView}
[error]                                                                                                         ^
[error] /Users/maximgekk/proj/show-partitions-exec-v2-test/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/ShowPartitionsExec.scala:20:29: Unused import
[error] import org.apache.spark.sql.AnalysisException
[error]                             ^
[error] two errors found
[error] (sql / Compile / compileIncremental) Compilation failed
[error] Total time: 32 s, completed Nov 19, 2020 6:30:02 PM

@LuciferYang
Copy link
Contributor Author

LuciferYang commented Nov 19, 2020

@MaxGekk use org.scalastyle.scalariform.IllegalImportsChecker ? EDIT: seems not this rule

@LuciferYang
Copy link
Contributor Author

LuciferYang commented Nov 19, 2020

@MaxGekk seems no corresponding rule scalastyle-rules

@MaxGekk
Copy link
Member

MaxGekk commented Nov 19, 2020

Just in case, Maven doesn't detect unused imports too:

$ build/mvn -Phadoop-3.2 -Phive-2.3 -Pyarn -Phadoop-cloud -Phive-thriftserver -Pkubernetes -Pmesos -Phive -Pkinesis-asl -Pspark-ganglia-lgpl -DskipTests package
...
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  28:56 min
[INFO] Finished at: 2020-11-19T19:09:35+03:00
[INFO] ------------------------------------------------------------------------

@LuciferYang
Copy link
Contributor Author

Yes, GitHub Actions build use SBT with -Xfatal-warnings on by default.

Just in case, Maven doesn't detect unused imports too:

@MaxGekk mvn compile maybe only have some compilation warnings about unused imports now, seems need add -Xfatal-warnings to maven pom.xml too, will this have other negative impact? @HyukjinKwon

@LuciferYang
Copy link
Contributor Author

LuciferYang commented Nov 20, 2020

@HyukjinKwon @srowen @MaxGekk do you know how to add -P:silencer:globalFilters=.*deprecated.* to scala-maven-plugin ? There are still some differences in compilation options between maven and sbt:

maven:

<args>
    <arg>-unchecked</arg>
    <arg>-deprecation</arg>
    <arg>-feature</arg>
    <arg>-explaintypes</arg>
    <arg>-Ywarn-unused-import</arg>
    <arg>-target:jvm-1.8</arg>
 </args>

sbt:

Seq(
    "-Xfatal-warnings",
    "-deprecation",
    "-P:silencer:globalFilters=.*deprecated.*" //regex to catch deprecation warnings and supress them
)

seems need add -Xfatal-warnings and -P:silencer:globalFilters=.*deprecated.* to maven too, but will raise compile error if add scala-maven-plugin to <arg> directly as follows:

[ERROR] [Error] : bad option: -P:silencer:globalFilters=.*deprecated.*

And only add -Xfatal-warnings to maven will transform all warnings to errors

@HyukjinKwon
Copy link
Member

Doing in SBT side only is fine. The purpose of doing this is to catch the unused imports in PRs.

@@ -164,6 +164,7 @@
<commons.collections.version>3.2.2</commons.collections.version>
<scala.version>2.12.10</scala.version>
<scala.binary.version>2.12</scala.binary.version>
<scalac.arg.unused-imports>-Ywarn-unused-import</scalac.arg.unused-imports>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we just move it to SparkBuild.scala and let Maven doesn't care about that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be only some warnings when maven build now, seems that @MaxGekk wants maven check it as error too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is my guess right? @MaxGekk

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's easy to make both Maven and SBT build throw an error, it's fine. If that's difficult, let's move it to SparkBuild.scala, and make it SBT specific.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we just move it to SparkBuild.scala and let Maven doesn't care about that?

It may be useful for Maven, it can also help Maven users to check this although it is only warnings with maven build

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @LuciferYang

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HyukjinKwon Is it necessary for us to add more compiler checking further? like unused-locals?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can give a shot. Are they a lot of instances of unused-locals to fix? If there are too many, I think we should collect some more feedback from other committers because it makes more difficult to maintain the codes (e.g., reverting and backporting).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK ~ I'll collect the details of the issue like unused-locals first, then file a new Jira for tracking and discussion. thanks @HyukjinKwon

@LuciferYang LuciferYang deleted the remove-imports-core-module branch June 6, 2022 03:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants