Skip to content

[SPARK-18842][TESTS][LAUNCHER] De-duplicate paths in classpaths in commands for local-cluster mode to work around the path length limitation on Windows#16266

Closed
HyukjinKwon wants to merge 2 commits intoapache:masterfrom
HyukjinKwon:disable-local-cluster-tests
Closed

Conversation

@HyukjinKwon
Copy link
Member

@HyukjinKwon HyukjinKwon commented Dec 13, 2016

What changes were proposed in this pull request?

Currently, some tests are being failed and hanging on Windows due to this problem. For the reason in SPARK-18718, some tests using local-cluster mode were disabled on Windows due to the length limitation by paths given to classpaths.

The limitation seems roughly 32K (see the blog in MS and another reference) but in local-cluster mode, executors were being launched as processes with the command such as here in (only) tests.

This length is roughly 40K due to the classpaths given to java command. However, it seems duplicates are almost half of them. So, if we deduplicate the paths, it seems reduced to roughly 20K with the command, here.

Maybe, we should consider as some more paths are added in the future but it seems better than disabling all the tests for now with minimised changes.

Therefore, this PR proposes to deduplicate the paths in classpaths in case of launching executors as processes in local-cluster mode.

How was this patch tested?

Existing tests in ShuffleSuite and BroadcastJoinSuite manually via AppVeyor

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, great to get these back online

@HyukjinKwon
Copy link
Member Author

It seems I underestimated the instances using local-cluster causing failures due to the length limitation by paths and there are more than I expected. For example, there is LocalClusterSparkContext which is extended by some suite cases so I missed them as finding via grep.

Thank you for approving. I will try to double-check.

"SPARK_DIST_CLASSPATH" ->
(fullClasspath in Test).value.files.map(_.getAbsolutePath).mkString(":").stripSuffix(":"),
(fullClasspath in Test).value.files.map(_.getAbsolutePath)
.mkString(File.pathSeparator).stripSuffix(File.pathSeparator),
Copy link
Member Author

@HyukjinKwon HyukjinKwon Dec 13, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a required change because in addToClassPath, it is split with File.pathSeparator. If this is :, then it is not split and ends up with long duplicated paths on Windows.

@HyukjinKwon
Copy link
Member Author

Build started: [TESTS] org.apache.spark.ShuffleSuite PR-16266
Build started: [TESTS] org.apache.spark.sql.execution.joins.BroadcastJoinSuite PR-16266
Diff: master...spark-test:E05665C1-B7BA-4505-9821-0E82EA9AB928

@SparkQA
Copy link

SparkQA commented Dec 13, 2016

Test build #70085 has finished for PR 16266 at commit 41752b8.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member Author

Let me try to fix the tests accordingly.

@HyukjinKwon HyukjinKwon changed the title [SPARK-18842][TESTS][LAUNCHER] De-duplicate paths in classpaths in processes for local-cluster mode to work around the path length limitation on Windows [WIP][SPARK-18842][TESTS][LAUNCHER] De-duplicate paths in classpaths in commands for local-cluster mode to work around the path length limitation on Windows Dec 13, 2016
@SparkQA
Copy link

SparkQA commented Dec 14, 2016

Test build #70111 has finished for PR 16266 at commit 8c17f3a.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member Author

retest this please

@HyukjinKwon
Copy link
Member Author

Build started: [TESTS] org.apache.spark.ShuffleSuite PR-16266
Build started: [TESTS] org.apache.spark.sql.execution.joins.BroadcastJoinSuite PR-16266
Diff: master...spark-test:CCF7EE3E-8554-4764-BA30-5A06064444A3

@SparkQA
Copy link

SparkQA commented Dec 14, 2016

Test build #70121 has finished for PR 16266 at commit 8c17f3a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon HyukjinKwon changed the title [WIP][SPARK-18842][TESTS][LAUNCHER] De-duplicate paths in classpaths in commands for local-cluster mode to work around the path length limitation on Windows [SPARK-18842][TESTS][LAUNCHER] De-duplicate paths in classpaths in commands for local-cluster mode to work around the path length limitation on Windows Dec 14, 2016
@HyukjinKwon
Copy link
Member Author

@srowen, I think it is ready for a second look.

@HyukjinKwon
Copy link
Member Author

HyukjinKwon commented Dec 14, 2016

(note to myself) the last one in SparkLauncherSuite was not enabled back because it still fails (see the build and diff).

The reason seems the test uses cmd which has a length limitation up to 8K according to the reference above. The built command there is C:\projects\spark\bin\spark-submit.cmd --master local --conf "spark.... which has roughly 16K length whereas the other tests use java directly allowing 32K length.

@srowen
Copy link
Member

srowen commented Dec 14, 2016

Merged to master

@asfgit asfgit closed this in c6b8eb7 Dec 14, 2016
robert3005 pushed a commit to palantir/spark that referenced this pull request Dec 15, 2016
…mmands for local-cluster mode to work around the path length limitation on Windows

## What changes were proposed in this pull request?

Currently, some tests are being failed and hanging on Windows due to this problem. For the reason in SPARK-18718, some tests using `local-cluster` mode were disabled on Windows due to the length limitation by paths given to classpaths.

The limitation seems roughly 32K (see the [blog in MS](https://blogs.msdn.microsoft.com/oldnewthing/20031210-00/?p=41553/) and [another reference](https://support.thoughtworks.com/hc/en-us/articles/213248526-Getting-around-maximum-command-line-length-is-32767-characters-on-Windows)) but in `local-cluster` mode, executors were being launched as processes with the command such as [here](https://gist.github.com/HyukjinKwon/5bc81061c250d4af5a180869b59d42ea) in (only) tests.

This length is roughly 40K due to the classpaths given to `java` command. However, it seems duplicates are almost half of them. So, if we deduplicate the paths, it seems reduced to roughly 20K with the command, [here](https://gist.github.com/HyukjinKwon/dad0c8db897e5e094684a2dc6a417790).

Maybe, we should consider as some more paths are added in the future but it seems better than disabling all the tests for now with minimised changes.

Therefore, this PR proposes to deduplicate the paths in classpaths in case of launching executors as processes in `local-cluster` mode.

## How was this patch tested?

Existing tests in `ShuffleSuite` and `BroadcastJoinSuite` manually via AppVeyor

Author: hyukjinkwon <gurwls223@gmail.com>

Closes apache#16266 from HyukjinKwon/disable-local-cluster-tests.
uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
…mmands for local-cluster mode to work around the path length limitation on Windows

## What changes were proposed in this pull request?

Currently, some tests are being failed and hanging on Windows due to this problem. For the reason in SPARK-18718, some tests using `local-cluster` mode were disabled on Windows due to the length limitation by paths given to classpaths.

The limitation seems roughly 32K (see the [blog in MS](https://blogs.msdn.microsoft.com/oldnewthing/20031210-00/?p=41553/) and [another reference](https://support.thoughtworks.com/hc/en-us/articles/213248526-Getting-around-maximum-command-line-length-is-32767-characters-on-Windows)) but in `local-cluster` mode, executors were being launched as processes with the command such as [here](https://gist.github.com/HyukjinKwon/5bc81061c250d4af5a180869b59d42ea) in (only) tests.

This length is roughly 40K due to the classpaths given to `java` command. However, it seems duplicates are almost half of them. So, if we deduplicate the paths, it seems reduced to roughly 20K with the command, [here](https://gist.github.com/HyukjinKwon/dad0c8db897e5e094684a2dc6a417790).

Maybe, we should consider as some more paths are added in the future but it seems better than disabling all the tests for now with minimised changes.

Therefore, this PR proposes to deduplicate the paths in classpaths in case of launching executors as processes in `local-cluster` mode.

## How was this patch tested?

Existing tests in `ShuffleSuite` and `BroadcastJoinSuite` manually via AppVeyor

Author: hyukjinkwon <gurwls223@gmail.com>

Closes apache#16266 from HyukjinKwon/disable-local-cluster-tests.
@HyukjinKwon HyukjinKwon deleted the disable-local-cluster-tests branch January 2, 2018 03:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants