Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-41188][CORE][ML] Set executorEnv OMP_NUM_THREADS to be spark.task.cpus by default for spark executor JVM processes #38699

Closed
wants to merge 2 commits into from

Conversation

WeichenXu123
Copy link
Contributor

@WeichenXu123 WeichenXu123 commented Nov 18, 2022

Signed-off-by: Weichen Xu weichen.xu@databricks.com

What changes were proposed in this pull request?

Set executorEnv OMP_NUM_THREADS to be spark.task.cpus by default for spark executor JVM processes.

Why are the changes needed?

This is for limiting the thread number for OpenBLAS routine to the number of cores assigned to this executor because some spark ML algorithms calls OpenBlAS via netlib-java,
e.g.:
Spark ALS estimator training calls LAPACK API dppsv (internally it will call BLAS lib), if it calls OpenBLAS lib, by default OpenBLAS will try to use all CPU cores. But spark will launch multiple spark tasks on a spark worker, and each spark task might call dppsv API at the same time, and each call internally it will create multiple threads (threads number equals to CPU cores), this causes CPU oversubscription.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Manually.

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Copy link
Contributor

@LuciferYang LuciferYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM

@mridulm
Copy link
Contributor

mridulm commented Nov 18, 2022

If we are setting it in SparkContext, do we want to get rid of this from other places like PythonRunner.compute ?

@WeichenXu123
Copy link
Contributor Author

If we are setting it in SparkContext, do we want to get rid of this from other places like PythonRunner.compute ?

I think we can remove code in PythonRunner.compute

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
WeichenXu123 added a commit that referenced this pull request Nov 19, 2022
…ask.cpus by default for spark executor JVM processes

Signed-off-by: Weichen Xu <weichen.xudatabricks.com>

### What changes were proposed in this pull request?

Set executorEnv OMP_NUM_THREADS to be spark.task.cpus by default for spark executor JVM processes.

### Why are the changes needed?

This is for limiting the thread number for OpenBLAS routine to the number of cores assigned to this executor because some spark ML algorithms calls OpenBlAS via netlib-java,
e.g.:
Spark ALS estimator training calls LAPACK API `dppsv` (internally it will call BLAS lib), if it calls OpenBLAS lib, by default OpenBLAS will try to use all CPU cores. But spark will launch multiple spark tasks on a spark worker, and each spark task might call `dppsv` API at the same time, and each call internally it will create multiple threads (threads number equals to CPU cores), this causes CPU oversubscription.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manually.

Closes #38699 from WeichenXu123/SPARK-41188.

Authored-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
(cherry picked from commit 82a41d8)
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
WeichenXu123 added a commit that referenced this pull request Nov 19, 2022
…ask.cpus by default for spark executor JVM processes

Signed-off-by: Weichen Xu <weichen.xudatabricks.com>

### What changes were proposed in this pull request?

Set executorEnv OMP_NUM_THREADS to be spark.task.cpus by default for spark executor JVM processes.

### Why are the changes needed?

This is for limiting the thread number for OpenBLAS routine to the number of cores assigned to this executor because some spark ML algorithms calls OpenBlAS via netlib-java,
e.g.:
Spark ALS estimator training calls LAPACK API `dppsv` (internally it will call BLAS lib), if it calls OpenBLAS lib, by default OpenBLAS will try to use all CPU cores. But spark will launch multiple spark tasks on a spark worker, and each spark task might call `dppsv` API at the same time, and each call internally it will create multiple threads (threads number equals to CPU cores), this causes CPU oversubscription.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manually.

Closes #38699 from WeichenXu123/SPARK-41188.

Authored-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
(cherry picked from commit 82a41d8)
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
@WeichenXu123
Copy link
Contributor Author

Merged to master / branch-3.3 / branch-3.2

SandishKumarHN pushed a commit to SandishKumarHN/spark that referenced this pull request Dec 12, 2022
…ask.cpus by default for spark executor JVM processes

Signed-off-by: Weichen Xu <weichen.xudatabricks.com>

### What changes were proposed in this pull request?

Set executorEnv OMP_NUM_THREADS to be spark.task.cpus by default for spark executor JVM processes.

### Why are the changes needed?

This is for limiting the thread number for OpenBLAS routine to the number of cores assigned to this executor because some spark ML algorithms calls OpenBlAS via netlib-java,
e.g.:
Spark ALS estimator training calls LAPACK API `dppsv` (internally it will call BLAS lib), if it calls OpenBLAS lib, by default OpenBLAS will try to use all CPU cores. But spark will launch multiple spark tasks on a spark worker, and each spark task might call `dppsv` API at the same time, and each call internally it will create multiple threads (threads number equals to CPU cores), this causes CPU oversubscription.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manually.

Closes apache#38699 from WeichenXu123/SPARK-41188.

Authored-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
beliefer pushed a commit to beliefer/spark that referenced this pull request Dec 15, 2022
…ask.cpus by default for spark executor JVM processes

Signed-off-by: Weichen Xu <weichen.xudatabricks.com>

### What changes were proposed in this pull request?

Set executorEnv OMP_NUM_THREADS to be spark.task.cpus by default for spark executor JVM processes.

### Why are the changes needed?

This is for limiting the thread number for OpenBLAS routine to the number of cores assigned to this executor because some spark ML algorithms calls OpenBlAS via netlib-java,
e.g.:
Spark ALS estimator training calls LAPACK API `dppsv` (internally it will call BLAS lib), if it calls OpenBLAS lib, by default OpenBLAS will try to use all CPU cores. But spark will launch multiple spark tasks on a spark worker, and each spark task might call `dppsv` API at the same time, and each call internally it will create multiple threads (threads number equals to CPU cores), this causes CPU oversubscription.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manually.

Closes apache#38699 from WeichenXu123/SPARK-41188.

Authored-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
beliefer pushed a commit to beliefer/spark that referenced this pull request Dec 18, 2022
…ask.cpus by default for spark executor JVM processes

Signed-off-by: Weichen Xu <weichen.xudatabricks.com>

### What changes were proposed in this pull request?

Set executorEnv OMP_NUM_THREADS to be spark.task.cpus by default for spark executor JVM processes.

### Why are the changes needed?

This is for limiting the thread number for OpenBLAS routine to the number of cores assigned to this executor because some spark ML algorithms calls OpenBlAS via netlib-java,
e.g.:
Spark ALS estimator training calls LAPACK API `dppsv` (internally it will call BLAS lib), if it calls OpenBLAS lib, by default OpenBLAS will try to use all CPU cores. But spark will launch multiple spark tasks on a spark worker, and each spark task might call `dppsv` API at the same time, and each call internally it will create multiple threads (threads number equals to CPU cores), this causes CPU oversubscription.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manually.

Closes apache#38699 from WeichenXu123/SPARK-41188.

Authored-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
@jzhuge
Copy link
Member

jzhuge commented Feb 27, 2023

If we are setting it in SparkContext, do we want to get rid of this from other places like PythonRunner.compute ?

I think we can remove code in PythonRunner.compute

Found an issue in YARN (SPARK-42596). Could you double check?

@HyukjinKwon
Copy link
Member

Thanks for pointing it out and making a PR. I left a comment in your PR.

sunchao pushed a commit to sunchao/spark that referenced this pull request Jun 2, 2023
…ask.cpus by default for spark executor JVM processes

Signed-off-by: Weichen Xu <weichen.xudatabricks.com>

### What changes were proposed in this pull request?

Set executorEnv OMP_NUM_THREADS to be spark.task.cpus by default for spark executor JVM processes.

### Why are the changes needed?

This is for limiting the thread number for OpenBLAS routine to the number of cores assigned to this executor because some spark ML algorithms calls OpenBlAS via netlib-java,
e.g.:
Spark ALS estimator training calls LAPACK API `dppsv` (internally it will call BLAS lib), if it calls OpenBLAS lib, by default OpenBLAS will try to use all CPU cores. But spark will launch multiple spark tasks on a spark worker, and each spark task might call `dppsv` API at the same time, and each call internally it will create multiple threads (threads number equals to CPU cores), this causes CPU oversubscription.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manually.

Closes apache#38699 from WeichenXu123/SPARK-41188.

Authored-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
(cherry picked from commit 82a41d8)
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants