fix: spark.executor.cores' default value based on master when counting workers #855

ocworld · 2020-04-20T12:19:03Z

This pr is fixing spark.executor.cores' default value based on master when counting workers.

The default value of spark.executor.cores is different as what master is.

It is 1 in YARN mode. In other case, It is all the available cores on the worker in standalone and Mesos coarse-grained modes.
(https://spark.apache.org/docs/latest/configuration.html)

Currently, I've tried to use mmlspark based on spark on k8s. When I did not set spark.executor.cores, error was occured. It is because not matched expected num of mmlspark workers with actual value.

- https://spark.apache.org/docs/latest/configuration.html

imatiach-msft · 2020-04-21T14:46:45Z

/azp run

azure-pipelines · 2020-04-21T14:46:56Z

Azure Pipelines successfully started running 1 pipeline(s).

imatiach-msft · 2020-04-21T14:48:00Z

will need to test this out a bit since there have been a lot of issues in the past around this specific code section

codecov · 2020-04-21T14:53:39Z

Codecov Report

Merging #855 into master will increase coverage by 0.66%.
The diff coverage is 77.14%.

@@            Coverage Diff             @@
##           master     #855      +/-   ##
==========================================
+ Coverage   84.56%   85.22%   +0.66%     
==========================================
  Files         189      189              
  Lines        8709     8733      +24     
  Branches      544      543       -1     
==========================================
+ Hits         7365     7443      +78     
+ Misses       1344     1290      -54

Impacted Files	Coverage Δ
...om/microsoft/ml/spark/core/utils/ClusterUtil.scala	`69.44% <77.14%> (+2.77%)`	⬆️
...osoft/ml/spark/io/http/PartitionConsolidator.scala	`93.33% <0.00%> (-2.23%)`	⬇️
...a/com/microsoft/ml/spark/io/image/ImageUtils.scala	`88.23% <0.00%> (+9.80%)`	⬆️
...rosoft/ml/spark/core/schema/BinaryFileSchema.scala	`100.00% <0.00%> (+12.50%)`	⬆️
...icrosoft/ml/spark/io/binary/BinaryFileFormat.scala	`97.72% <0.00%> (+14.77%)`	⬆️
...icrosoft/ml/spark/io/binary/BinaryFileReader.scala	`73.91% <0.00%> (+17.39%)`	⬆️
...spark/ml/source/image/PatchedImageFileFormat.scala	`88.88% <0.00%> (+40.74%)`	⬆️
.../microsoft/ml/spark/core/env/StreamUtilities.scala	`85.18% <0.00%> (+59.25%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4ae0fe8...aff36e6. Read the comment docs.

imatiach-msft · 2020-04-23T15:31:36Z

/azp run

azure-pipelines · 2020-04-23T15:31:48Z

Azure Pipelines successfully started running 1 pipeline(s).

imatiach-msft · 2020-04-23T18:40:10Z

/azp run

azure-pipelines · 2020-04-23T18:40:21Z

Azure Pipelines successfully started running 1 pipeline(s).

imatiach-msft · 2020-04-23T18:45:02Z

the build was all green but it's not updating in github, weird, looks like some github bug

ocworld · 2020-04-27T13:48:58Z

@imatiach-msft I wrote new unittest and pushed. However, It is hard to tested mmlspark's unittest in my local environment . Please check it :)

imatiach-msft · 2020-04-27T15:04:49Z

/azp run

azure-pipelines · 2020-04-27T15:05:03Z

Azure Pipelines successfully started running 1 pipeline(s).

ocworld · 2020-05-03T03:42:39Z

@imatiach-msft Can I see why the build failed?

imatiach-msft · 2020-05-08T15:02:19Z

/azp run

azure-pipelines · 2020-05-08T15:02:32Z

Azure Pipelines successfully started running 1 pipeline(s).

imatiach-msft · 2020-06-02T04:01:01Z

/azp run

azure-pipelines · 2020-06-02T04:01:11Z

Azure Pipelines successfully started running 1 pipeline(s).

imatiach-msft · 2020-06-02T04:32:23Z

@ocworld the style checker is failing on:

[error] /home/vsts/work/1/s/src/test/scala/com/microsoft/ml/spark/core/utils/VerifyClusterUtil.scala: Header does not match expected text

It looks like you need to add this copyright message at the top of the file VerifyClusterUtil.scala to pass the style checker:

// Copyright (C) Microsoft Corporation. All rights reserved.
// Licensed under the MIT License. See LICENSE in project root for information.

src/main/scala/com/microsoft/ml/spark/core/utils/ClusterUtil.scala

imatiach-msft · 2020-06-02T05:09:37Z

@ocworld I tried this PR out on a cluster (azure databricks) with Higgs dataset and it didn't change the training time for me (compared to previous runs), so it seems safe to me at least in that scenario. I think they use spark standalone mode for databricks clusters.

ocworld · 2020-06-05T14:41:58Z

@ocworld the style checker is failing on:

[error] /home/vsts/work/1/s/src/test/scala/com/microsoft/ml/spark/core/utils/VerifyClusterUtil.scala: Header does not match expected text

It looks like you need to add this copyright message at the top of the file VerifyClusterUtil.scala to pass the style checker:
// Copyright (C) Microsoft Corporation. All rights reserved.
// Licensed under the MIT License. See LICENSE in project root for information.

@imatiach-msft Copyright is added in the file

ocworld · 2020-06-05T14:43:34Z

@ocworld I tried this PR out on a cluster (azure databricks) with Higgs dataset and it didn't change the training time for me (compared to previous runs), so it seems safe to me at least in that scenario. I think they use spark standalone mode for databricks clusters.

@imatiach-msft Thanks for your test

imatiach-msft · 2020-06-05T14:47:31Z

/azp run

azure-pipelines · 2020-06-05T14:47:41Z

Azure Pipelines successfully started running 1 pipeline(s).

imatiach-msft · 2020-06-05T19:04:52Z

it looks like @mhamilton723 is required to review as a code owner (can't seem to merge), can you please review this PR @mhamilton723 ?

ocworld · 2020-06-08T05:39:15Z

@AhnLab-OSS

fix default values based on master

c23417b

- https://spark.apache.org/docs/latest/configuration.html

ocworld requested a review from mhamilton723 as a code owner April 20, 2020 12:19

Merge branch 'master' into fix-default-executor-cores

0a9e510

add unittest for clusterutils

2142225

Merge branch 'master' into fix-default-executor-cores

d2c65b7

Merge branch 'master' into fix-default-executor-cores

e399659

imatiach-msft previously approved these changes Jun 2, 2020

View reviewed changes

src/main/scala/com/microsoft/ml/spark/core/utils/ClusterUtil.scala Outdated Show resolved Hide resolved

Keunhyun Oh added 2 commits June 5, 2020 23:38

Merge branch 'master' into fix-default-executor-cores

639f071

fix comment and add copyright

aff36e6

ocworld dismissed imatiach-msft’s stale review via aff36e6 June 5, 2020 14:41

imatiach-msft approved these changes Jun 5, 2020

View reviewed changes

mhamilton723 merged commit 64481e9 into microsoft:master Jun 6, 2020

imatiach-msft mentioned this pull request Jun 24, 2020

[LightGBM] Train Lambdamart failed with "org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle 1" #879

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: spark.executor.cores' default value based on master when counting workers #855

fix: spark.executor.cores' default value based on master when counting workers #855

ocworld commented Apr 20, 2020

imatiach-msft commented Apr 21, 2020

azure-pipelines bot commented Apr 21, 2020

imatiach-msft commented Apr 21, 2020

codecov bot commented Apr 21, 2020 •

edited

Loading

imatiach-msft commented Apr 23, 2020

azure-pipelines bot commented Apr 23, 2020

imatiach-msft commented Apr 23, 2020

azure-pipelines bot commented Apr 23, 2020

imatiach-msft commented Apr 23, 2020

ocworld commented Apr 27, 2020

imatiach-msft commented Apr 27, 2020

azure-pipelines bot commented Apr 27, 2020

ocworld commented May 3, 2020

imatiach-msft commented May 8, 2020

azure-pipelines bot commented May 8, 2020

imatiach-msft commented Jun 2, 2020

azure-pipelines bot commented Jun 2, 2020

imatiach-msft commented Jun 2, 2020 •

edited

Loading

imatiach-msft commented Jun 2, 2020

ocworld commented Jun 5, 2020 •

edited

Loading

ocworld commented Jun 5, 2020 •

edited

Loading

imatiach-msft commented Jun 5, 2020

azure-pipelines bot commented Jun 5, 2020

imatiach-msft commented Jun 5, 2020

ocworld commented Jun 8, 2020

fix: spark.executor.cores' default value based on master when counting workers #855

fix: spark.executor.cores' default value based on master when counting workers #855

Conversation

ocworld commented Apr 20, 2020

imatiach-msft commented Apr 21, 2020

azure-pipelines bot commented Apr 21, 2020

imatiach-msft commented Apr 21, 2020

codecov bot commented Apr 21, 2020 • edited Loading

Codecov Report

imatiach-msft commented Apr 23, 2020

azure-pipelines bot commented Apr 23, 2020

imatiach-msft commented Apr 23, 2020

azure-pipelines bot commented Apr 23, 2020

imatiach-msft commented Apr 23, 2020

ocworld commented Apr 27, 2020

imatiach-msft commented Apr 27, 2020

azure-pipelines bot commented Apr 27, 2020

ocworld commented May 3, 2020

imatiach-msft commented May 8, 2020

azure-pipelines bot commented May 8, 2020

imatiach-msft commented Jun 2, 2020

azure-pipelines bot commented Jun 2, 2020

imatiach-msft commented Jun 2, 2020 • edited Loading

imatiach-msft commented Jun 2, 2020

ocworld commented Jun 5, 2020 • edited Loading

ocworld commented Jun 5, 2020 • edited Loading

imatiach-msft commented Jun 5, 2020

azure-pipelines bot commented Jun 5, 2020

imatiach-msft commented Jun 5, 2020

ocworld commented Jun 8, 2020

codecov bot commented Apr 21, 2020 •

edited

Loading

imatiach-msft commented Jun 2, 2020 •

edited

Loading

ocworld commented Jun 5, 2020 •

edited

Loading

ocworld commented Jun 5, 2020 •

edited

Loading