Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-34115][CORE] Check SPARK_TESTING as lazy val to avoid slowdown #31244

Closed
wants to merge 3 commits into from

Conversation

nob13
Copy link
Contributor

@nob13 nob13 commented Jan 19, 2021

What changes were proposed in this pull request?

Check SPARK_TESTING as lazy val to avoid slow down when there are many environment variables

Why are the changes needed?

If there are many environment variables, sys.env slows is very slow. As Utils.isTesting is called very often during Dataframe-Optimization, this can slow down evaluation very much.

An example for triggering the problem can be found in the bug ticket https://issues.apache.org/jira/browse/SPARK-34115

Does this PR introduce any user-facing change?

No

How was this patch tested?

With the example provided in the ticket.

@nob13 nob13 changed the title [WIP][SPARK-34115][CORE] Check SPARK_TESTING as lazy val to avoid slowdown [SPARK-34115][CORE] Check SPARK_TESTING as lazy val to avoid slowdown Jan 19, 2021
@HyukjinKwon
Copy link
Member

cc @dongjoon-hyun and @holdenk FYI. seems like this affects particularly K8S that automatically creates many envs according to the JIRA.

@github-actions github-actions bot added the CORE label Jan 19, 2021
nob13 and others added 2 commits January 19, 2021 11:59
@HyukjinKwon
Copy link
Member

Jenkins, ok to test

@SparkQA
Copy link

SparkQA commented Jan 19, 2021

Test build #134225 has started for PR 31244 at commit b52e427.

@SparkQA
Copy link

SparkQA commented Jan 19, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38809/

@SparkQA
Copy link

SparkQA commented Jan 19, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38809/

@HyukjinKwon
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Jan 19, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38813/

@SparkQA
Copy link

SparkQA commented Jan 19, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38813/

@SparkQA
Copy link

SparkQA commented Jan 19, 2021

Test build #134228 has finished for PR 31244 at commit b52e427.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

HyukjinKwon commented Jan 20, 2021

Merged to master, branch-3.1 and branch-3.0.

It is technically an improvement but I think it's really safe to backport.

@HyukjinKwon
Copy link
Member

Thanks for your contribution @nob13

HyukjinKwon pushed a commit that referenced this pull request Jan 20, 2021
### What changes were proposed in this pull request?
Check SPARK_TESTING as lazy val to avoid slow down when there are many environment variables

### Why are the changes needed?
If there are many environment variables, sys.env slows is very slow. As Utils.isTesting is called very often during Dataframe-Optimization, this can slow down evaluation very much.

An example for triggering the problem can be found in the bug ticket https://issues.apache.org/jira/browse/SPARK-34115

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
With the example provided in the ticket.

Closes #31244 from nob13/bug/34115.

Lead-authored-by: Norbert Schultz <norbert.schultz@reactivecore.de>
Co-authored-by: Norbert Schultz <noschultz@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(cherry picked from commit c3d8352)
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
HyukjinKwon pushed a commit that referenced this pull request Jan 20, 2021
### What changes were proposed in this pull request?
Check SPARK_TESTING as lazy val to avoid slow down when there are many environment variables

### Why are the changes needed?
If there are many environment variables, sys.env slows is very slow. As Utils.isTesting is called very often during Dataframe-Optimization, this can slow down evaluation very much.

An example for triggering the problem can be found in the bug ticket https://issues.apache.org/jira/browse/SPARK-34115

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
With the example provided in the ticket.

Closes #31244 from nob13/bug/34115.

Lead-authored-by: Norbert Schultz <norbert.schultz@reactivecore.de>
Co-authored-by: Norbert Schultz <noschultz@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(cherry picked from commit c3d8352)
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
@maropu
Copy link
Member

maropu commented Jan 20, 2021

Looks good to me, too.

What changes were proposed in this pull request?
Check SPARK_TESTING as lazy val to avoid slow down when there are many environment variables

nit: Could you update the PR description accordingly?

@dongjoon-hyun
Copy link
Member

Thank you, @nob13 and @HyukjinKwon and @maropu

skestle pushed a commit to skestle/spark that referenced this pull request Feb 3, 2021
### What changes were proposed in this pull request?
Check SPARK_TESTING as lazy val to avoid slow down when there are many environment variables

### Why are the changes needed?
If there are many environment variables, sys.env slows is very slow. As Utils.isTesting is called very often during Dataframe-Optimization, this can slow down evaluation very much.

An example for triggering the problem can be found in the bug ticket https://issues.apache.org/jira/browse/SPARK-34115

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
With the example provided in the ticket.

Closes apache#31244 from nob13/bug/34115.

Lead-authored-by: Norbert Schultz <norbert.schultz@reactivecore.de>
Co-authored-by: Norbert Schultz <noschultz@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
5 participants