Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-45527][CORE][TESTS][FOLLOWUP] Reduce the number of threads from 1k to 100 in TaskSchedulerImplSuite #45264

Closed
wants to merge 1 commit into from

Conversation

dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Feb 26, 2024

What changes were proposed in this pull request?

This PR is a follow-up of #43494 in order to reduce the number of threads of SparkContext from 1k to 100 in the test environment.

Why are the changes needed?

To reduce the test resource requirement. 1000 threads seem to be too large for some CI systems with a limited resource.

Warning: [766.327s][warning][os,thread] Failed to start thread "Unknown thread" - pthread_create failed (EAGAIN) for attributes: stacksize: 4096k, guardsize: 16k, detached.
Warning: [766.327s][warning][os,thread] Failed to start the native thread for java.lang.Thread "dispatcher-event-loop-840"
*** RUN ABORTED ***
An exception or error caused a run to abort: unable to create native thread: possibly out of memory or process/resource limits reached 
  java.lang.OutOfMemoryError: unable to create native thread: possibly out of memory or process/resource limits reached

Does this PR introduce any user-facing change?

No, this is a test-case update.

How was this patch tested?

Pass the CIs and monitor Daily Apple Silicon test.

Was this patch authored or co-authored using generative AI tooling?

No.

@dongjoon-hyun
Copy link
Member Author

cc @wbo4958 and @tgravescs

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-45527][CORE][TESTS][FOLLOWUP] Reduce the number of threads from 1k to 100 [SPARK-45527][CORE][TESTS][FOLLOWUP] Reduce the number of threads from 1k to 100 in TaskSchedulerImplSuite Feb 26, 2024
@tgravescs
Copy link
Contributor

tgravescs commented Feb 26, 2024

+1 thanks, I missed that it was actually creating that many threads

@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Feb 26, 2024

Thank you, @tgravescs !
Merged to master.

This PR changes only a single test suite and I verified that it passed locally without any issue.

$ build/sbt "core/testOnly *.TaskSchedulerImplSuite"
...
[info] Run completed in 1 minute, 22 seconds.
[info] Total number of tests run: 236
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 236, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 93 s (01:33), completed Feb 26, 2024, 2:31:33 PM

@dongjoon-hyun dongjoon-hyun deleted the SPARK-45527 branch February 26, 2024 22:32
@wbo4958
Copy link
Contributor

wbo4958 commented Feb 27, 2024

Thx for your PR

HyukjinKwon added a commit that referenced this pull request Feb 28, 2024
…e, and reduce the resource usage

### What changes were proposed in this pull request?

This PR is a followup of #45272, #45268, #45264 and #45283 that increase timeout more and decrease the resource needed during the CI.

### Why are the changes needed?

To make the scheduled build pass https://github.com/apache/spark/actions/runs/8054862135/job/22053180441.

At least as far as I can tell, those changes are effective (makes tests less flaky and less fail).

### Does this PR introduce _any_ user-facing change?

No, test-only.

### How was this patch tested?

I manually ran then via IDE.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45297 from HyukjinKwon/SPARK-47185-SPARK-47181-followup.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
TakawaAkirayo pushed a commit to TakawaAkirayo/spark that referenced this pull request Mar 4, 2024
…m 1k to 100 in `TaskSchedulerImplSuite`

### What changes were proposed in this pull request?

This PR is a follow-up of apache#43494 in order to reduce the number of threads of SparkContext from 1k to 100 in the test environment.

### Why are the changes needed?

To reduce the test resource requirement. 1000 threads seem to be too large for some CI systems with a limited resource.
- https://github.com/apache/spark/actions/workflows/build_maven_java21_macos14.yml
  - https://github.com/apache/spark/actions/runs/8054862135/job/22000403549
```
Warning: [766.327s][warning][os,thread] Failed to start thread "Unknown thread" - pthread_create failed (EAGAIN) for attributes: stacksize: 4096k, guardsize: 16k, detached.
Warning: [766.327s][warning][os,thread] Failed to start the native thread for java.lang.Thread "dispatcher-event-loop-840"
*** RUN ABORTED ***
An exception or error caused a run to abort: unable to create native thread: possibly out of memory or process/resource limits reached
  java.lang.OutOfMemoryError: unable to create native thread: possibly out of memory or process/resource limits reached
```

### Does this PR introduce _any_ user-facing change?

No, this is a test-case update.

### How was this patch tested?

Pass the CIs and monitor Daily Apple Silicon test.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#45264 from dongjoon-hyun/SPARK-45527.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
TakawaAkirayo pushed a commit to TakawaAkirayo/spark that referenced this pull request Mar 4, 2024
…e, and reduce the resource usage

### What changes were proposed in this pull request?

This PR is a followup of apache#45272, apache#45268, apache#45264 and apache#45283 that increase timeout more and decrease the resource needed during the CI.

### Why are the changes needed?

To make the scheduled build pass https://github.com/apache/spark/actions/runs/8054862135/job/22053180441.

At least as far as I can tell, those changes are effective (makes tests less flaky and less fail).

### Does this PR introduce _any_ user-facing change?

No, test-only.

### How was this patch tested?

I manually ran then via IDE.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#45297 from HyukjinKwon/SPARK-47185-SPARK-47181-followup.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
ericm-db pushed a commit to ericm-db/spark that referenced this pull request Mar 5, 2024
…m 1k to 100 in `TaskSchedulerImplSuite`

### What changes were proposed in this pull request?

This PR is a follow-up of apache#43494 in order to reduce the number of threads of SparkContext from 1k to 100 in the test environment.

### Why are the changes needed?

To reduce the test resource requirement. 1000 threads seem to be too large for some CI systems with a limited resource.
- https://github.com/apache/spark/actions/workflows/build_maven_java21_macos14.yml
  - https://github.com/apache/spark/actions/runs/8054862135/job/22000403549
```
Warning: [766.327s][warning][os,thread] Failed to start thread "Unknown thread" - pthread_create failed (EAGAIN) for attributes: stacksize: 4096k, guardsize: 16k, detached.
Warning: [766.327s][warning][os,thread] Failed to start the native thread for java.lang.Thread "dispatcher-event-loop-840"
*** RUN ABORTED ***
An exception or error caused a run to abort: unable to create native thread: possibly out of memory or process/resource limits reached
  java.lang.OutOfMemoryError: unable to create native thread: possibly out of memory or process/resource limits reached
```

### Does this PR introduce _any_ user-facing change?

No, this is a test-case update.

### How was this patch tested?

Pass the CIs and monitor Daily Apple Silicon test.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#45264 from dongjoon-hyun/SPARK-45527.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
ericm-db pushed a commit to ericm-db/spark that referenced this pull request Mar 5, 2024
…e, and reduce the resource usage

### What changes were proposed in this pull request?

This PR is a followup of apache#45272, apache#45268, apache#45264 and apache#45283 that increase timeout more and decrease the resource needed during the CI.

### Why are the changes needed?

To make the scheduled build pass https://github.com/apache/spark/actions/runs/8054862135/job/22053180441.

At least as far as I can tell, those changes are effective (makes tests less flaky and less fail).

### Does this PR introduce _any_ user-facing change?

No, test-only.

### How was this patch tested?

I manually ran then via IDE.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#45297 from HyukjinKwon/SPARK-47185-SPARK-47181-followup.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants