Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: spark.executor.cores' default value based on master when counting workers #855

Merged
merged 7 commits into from
Jun 6, 2020
Merged

fix: spark.executor.cores' default value based on master when counting workers #855

merged 7 commits into from
Jun 6, 2020

Conversation

ocworld
Copy link
Contributor

@ocworld ocworld commented Apr 20, 2020

This pr is fixing spark.executor.cores' default value based on master when counting workers.

The default value of spark.executor.cores is different as what master is.

It is 1 in YARN mode. In other case, It is all the available cores on the worker in standalone and Mesos coarse-grained modes.
(https://spark.apache.org/docs/latest/configuration.html)

Currently, I've tried to use mmlspark based on spark on k8s. When I did not set spark.executor.cores, error was occured. It is because not matched expected num of mmlspark workers with actual value.

@imatiach-msft
Copy link
Contributor

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@imatiach-msft
Copy link
Contributor

will need to test this out a bit since there have been a lot of issues in the past around this specific code section

@codecov
Copy link

codecov bot commented Apr 21, 2020

Codecov Report

Merging #855 into master will increase coverage by 0.66%.
The diff coverage is 77.14%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #855      +/-   ##
==========================================
+ Coverage   84.56%   85.22%   +0.66%     
==========================================
  Files         189      189              
  Lines        8709     8733      +24     
  Branches      544      543       -1     
==========================================
+ Hits         7365     7443      +78     
+ Misses       1344     1290      -54     
Impacted Files Coverage Δ
...om/microsoft/ml/spark/core/utils/ClusterUtil.scala 69.44% <77.14%> (+2.77%) ⬆️
...osoft/ml/spark/io/http/PartitionConsolidator.scala 93.33% <0.00%> (-2.23%) ⬇️
...a/com/microsoft/ml/spark/io/image/ImageUtils.scala 88.23% <0.00%> (+9.80%) ⬆️
...rosoft/ml/spark/core/schema/BinaryFileSchema.scala 100.00% <0.00%> (+12.50%) ⬆️
...icrosoft/ml/spark/io/binary/BinaryFileFormat.scala 97.72% <0.00%> (+14.77%) ⬆️
...icrosoft/ml/spark/io/binary/BinaryFileReader.scala 73.91% <0.00%> (+17.39%) ⬆️
...spark/ml/source/image/PatchedImageFileFormat.scala 88.88% <0.00%> (+40.74%) ⬆️
.../microsoft/ml/spark/core/env/StreamUtilities.scala 85.18% <0.00%> (+59.25%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4ae0fe8...aff36e6. Read the comment docs.

@imatiach-msft
Copy link
Contributor

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@imatiach-msft
Copy link
Contributor

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@imatiach-msft
Copy link
Contributor

the build was all green but it's not updating in github, weird, looks like some github bug

@ocworld
Copy link
Contributor Author

ocworld commented Apr 27, 2020

@imatiach-msft I wrote new unittest and pushed. However, It is hard to tested mmlspark's unittest in my local environment . Please check it :)

@imatiach-msft
Copy link
Contributor

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@ocworld
Copy link
Contributor Author

ocworld commented May 3, 2020

@imatiach-msft Can I see why the build failed?

@imatiach-msft
Copy link
Contributor

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@imatiach-msft
Copy link
Contributor

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@imatiach-msft
Copy link
Contributor

imatiach-msft commented Jun 2, 2020

@ocworld the style checker is failing on:

[error] /home/vsts/work/1/s/src/test/scala/com/microsoft/ml/spark/core/utils/VerifyClusterUtil.scala: Header does not match expected text

It looks like you need to add this copyright message at the top of the file VerifyClusterUtil.scala to pass the style checker:

// Copyright (C) Microsoft Corporation. All rights reserved.
// Licensed under the MIT License. See LICENSE in project root for information.

@imatiach-msft
Copy link
Contributor

@ocworld I tried this PR out on a cluster (azure databricks) with Higgs dataset and it didn't change the training time for me (compared to previous runs), so it seems safe to me at least in that scenario. I think they use spark standalone mode for databricks clusters.

@ocworld
Copy link
Contributor Author

ocworld commented Jun 5, 2020

@ocworld the style checker is failing on:

[error] /home/vsts/work/1/s/src/test/scala/com/microsoft/ml/spark/core/utils/VerifyClusterUtil.scala: Header does not match expected text

It looks like you need to add this copyright message at the top of the file VerifyClusterUtil.scala to pass the style checker:

// Copyright (C) Microsoft Corporation. All rights reserved.
// Licensed under the MIT License. See LICENSE in project root for information.

@imatiach-msft Copyright is added in the file

@ocworld
Copy link
Contributor Author

ocworld commented Jun 5, 2020

@ocworld I tried this PR out on a cluster (azure databricks) with Higgs dataset and it didn't change the training time for me (compared to previous runs), so it seems safe to me at least in that scenario. I think they use spark standalone mode for databricks clusters.

@imatiach-msft Thanks for your test

@imatiach-msft
Copy link
Contributor

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@imatiach-msft
Copy link
Contributor

it looks like @mhamilton723 is required to review as a code owner (can't seem to merge), can you please review this PR @mhamilton723 ?

@mhamilton723 mhamilton723 merged commit 64481e9 into microsoft:master Jun 6, 2020
@ocworld
Copy link
Contributor Author

ocworld commented Jun 8, 2020

@AhnLab-OSS

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants