Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-45527][CORE][TESTS][FOLLOW-UP] Reduce the number of test cases in fraction resource calculation #45272

Closed
wants to merge 1 commit into from

Conversation

HyukjinKwon
Copy link
Member

What changes were proposed in this pull request?

There are two more instances to fix in #45268 mistakenly missed. This PR fixes both.

Why are the changes needed?

See #45268

Does this PR introduce any user-facing change?

No, test-only.

How was this patch tested?

Manually

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the CORE label Feb 27, 2024
@HyukjinKwon
Copy link
Member Author

@dongjoon-hyun I am sorry. This PR fixes the leftover.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's great~

@dongjoon-hyun
Copy link
Member

[info] TaskSchedulerImplSuite:
[info] - SPARK-32653: Decommissioned host/executor should be considered as inactive (409 milliseconds)
[info] - Scheduler does not always schedule tasks on the same workers (295 milliseconds)
[info] - Scheduler correctly accounts for multiple CPUs per task (26 milliseconds)
[info] - SPARK-18886 - partial offers (isAllFreeResources = false) reset timer before any resources have been rejected (27 milliseconds)
[info] - SPARK-18886 - delay scheduling timer is reset when it accepts all resources offered when isAllFreeResources = true (22 milliseconds)
[info] - SPARK-18886 - task set with no locality requirements should not starve one with them (21 milliseconds)
[info] - SPARK-18886 - partial resource offers (isAllFreeResources = false) reset time if last full resource offer (isAllResources = true) was accepted as well as any following partial resource offers (20 milliseconds)
[info] - SPARK-18886 - partial resource offers (isAllFreeResources = false) do not reset time if any offer was rejected since last full offer was fully accepted (19 milliseconds)
[info] - Scheduler does not crash when tasks are not serializable (23 milliseconds)
[info] - concurrent attempts for the same stage only have one active taskset (21 milliseconds)
[info] - don't schedule more tasks after a taskset is zombie (21 milliseconds)
[info] - if a zombie attempt finishes, continue scheduling tasks for non-zombie attempts (20 milliseconds)
[info] - tasks are not re-scheduled while executor loss reason is pending (24 milliseconds)
OpenJDK 64-Bit Server VM warning: Sharing is only supported for boot loader classes because bootstrap classpath has been appended
[info] - scheduled tasks obey task and stage excludelist (731 milliseconds)
[info] - scheduled tasks obey node and executor excludelists (49 milliseconds)
[info] - abort stage when all executors are excluded and we cannot acquire new executor (38 milliseconds)
[info] - SPARK-22148 abort timer should kick in when task is completely excluded & no new executor can be acquired (32 milliseconds)
[info] - SPARK-22148 try to acquire a new executor when task is unschedulable with 1 executor (29 milliseconds)
[info] - SPARK-22148 abort timer should clear unschedulableTaskSetToExpiryTime for all TaskSets (45 milliseconds)
[info] - SPARK-22148 Ensure we don't abort the taskSet if we haven't been completely excluded (31 milliseconds)
[info] - SPARK-31418 abort timer should kick in when task is completely excluded &allocation manager could not acquire a new executor before the timeout (24 milliseconds)
[info] - Excluded node for entire task set prevents per-task exclusion checks: iteration 0 (50 milliseconds)
[info] - Excluded node for entire task set prevents per-task exclusion checks: iteration 1 (37 milliseconds)
[info] - Excluded node for entire task set prevents per-task exclusion checks: iteration 2 (36 milliseconds)
[info] - Excluded node for entire task set prevents per-task exclusion checks: iteration 3 (37 milliseconds)
[info] - Excluded node for entire task set prevents per-task exclusion checks: iteration 4 (35 milliseconds)
[info] - Excluded node for entire task set prevents per-task exclusion checks: iteration 5 (33 milliseconds)
[info] - Excluded node for entire task set prevents per-task exclusion checks: iteration 6 (44 milliseconds)
[info] - Excluded node for entire task set prevents per-task exclusion checks: iteration 7 (33 milliseconds)
[info] - Excluded node for entire task set prevents per-task exclusion checks: iteration 8 (33 milliseconds)
[info] - Excluded node for entire task set prevents per-task exclusion checks: iteration 9 (31 milliseconds)
[info] - Excluded executor for entire task set prevents per-task exclusion checks: iteration 0 (34 milliseconds)
[info] - Excluded executor for entire task set prevents per-task exclusion checks: iteration 1 (30 milliseconds)
[info] - Excluded executor for entire task set prevents per-task exclusion checks: iteration 2 (31 milliseconds)
[info] - Excluded executor for entire task set prevents per-task exclusion checks: iteration 3 (29 milliseconds)
[info] - Excluded executor for entire task set prevents per-task exclusion checks: iteration 4 (31 milliseconds)
[info] - Excluded executor for entire task set prevents per-task exclusion checks: iteration 5 (30 milliseconds)
[info] - Excluded executor for entire task set prevents per-task exclusion checks: iteration 6 (29 milliseconds)
[info] - Excluded executor for entire task set prevents per-task exclusion checks: iteration 7 (27 milliseconds)
[info] - Excluded executor for entire task set prevents per-task exclusion checks: iteration 8 (39 milliseconds)
[info] - Excluded executor for entire task set prevents per-task exclusion checks: iteration 9 (32 milliseconds)
[info] - abort stage if executor loss results in unschedulability from previously failed tasks (25 milliseconds)
[info] - don't abort if there is an executor available, though it hasn't had scheduled tasks yet (21 milliseconds)
[info] - SPARK-16106 locality levels updated if executor added to existing host (15 milliseconds)
[info] - scheduler checks for executors that can be expired from excludeOnFailure (14 milliseconds)
[info] - if an executor is lost then the state for its running tasks is cleaned up (SPARK-18553) (15 milliseconds)
[info] - if a task finishes with TaskState.LOST its executor is marked as dead (15 milliseconds)
[info] - Locality should be used for bulk offers even with delay scheduling off (15 milliseconds)
[info] - With delay scheduling off, tasks can be run at any locality level immediately (16 milliseconds)
[info] - TaskScheduler should throw IllegalArgumentException when schedulingMode is not supported (19 milliseconds)
[info] - don't schedule for a barrier taskSet if available slots are less than pending tasks (17 milliseconds)
[info] - don't schedule for a barrier taskSet if available slots are less than pending tasks gpus limiting (18 milliseconds)
[info] - schedule tasks for a barrier taskSet if all tasks can be launched together gpus (20 milliseconds)
[info] - schedule tasks for a barrier taskSet if all tasks can be launched together diff ResourceProfile (15 milliseconds)
[info] - schedule tasks for a barrier taskSet if all tasks can be launched together diff ResourceProfile, but not enough gpus (14 milliseconds)
[info] - schedule tasks for a barrier taskSet if all tasks can be launched together (14 milliseconds)
[info] - SPARK-29263: barrier TaskSet can't schedule when higher prio taskset takes the slots (14 milliseconds)
[info] - killAllTaskAttempts shall kill all the running tasks (15 milliseconds)
[info] - mark taskset for a barrier stage as zombie in case a task fails (16 milliseconds)
[info] - Scheduler correctly accounts for GPUs per task (18 milliseconds)
[info] - Scheduler correctly accounts for GPUs per task with fractional amount (17 milliseconds)
[info] - Scheduler works with multiple ResourceProfiles and gpus (16 milliseconds)
[info] - Scheduler works with task resource profiles (19 milliseconds)
[info] - Calculate available tasks slots for task resource profiles (19 milliseconds)
[info] - scheduler should keep the decommission state where host was decommissioned (23 milliseconds)
[info] - test full decommissioning flow (22 milliseconds)
[info] - SPARK-40979: Keep removed executor info due to decommission (19 milliseconds)
[info] - SPARK-24818: test delay scheduling for barrier TaskSetManager (15 milliseconds)
[info] - SPARK-24818: test resource revert of barrier TaskSetManager (17 milliseconds)
[info] - SPARK-37300: TaskSchedulerImpl should ignore task finished event if its task was finished state (21 milliseconds)
[info] - SPARK-39955: executor lost could fail task set if task is running (18 milliseconds)
[info] - SPARK-39955: executor lost should not fail task set if task is launching (16 milliseconds)
[info] - SPARK-45527 default rp with task.gpu.amount=1.0 can restrict 1 barrier tasks run in the same executor (21 milliseconds)
[info] - SPARK-45527 default rp with task.gpu.amount=0.0625 can restrict 16 barrier tasks run in the same executor (31 milliseconds)
[info] - SPARK-45527 default rp with task.gpu.amount=0.5 can restrict 2 barrier tasks run in the same executor (21 milliseconds)
[info] - SPARK-45527 default rp with task.gpu.amount=0.125 can restrict 8 barrier tasks run in the same executor (23 milliseconds)
[info] - SPARK-45527 default rp with task.gpu.amount=0.0714285714285714 can restrict 14 barrier tasks run in the same executor (23 milliseconds)
[info] - SPARK-45527 default rp with task.gpu.amount=0.1428571428571428 can restrict 7  tasks run in the same executor (24 milliseconds)
[info] - SPARK-45527 default rp with task.gpu.amount=0.0714285714285714 can restrict 14  tasks run in the same executor (25 milliseconds)
[info] - SPARK-45527 default rp with task.gpu.amount=0.05 can restrict 20  tasks run in the same executor (25 milliseconds)
[info] - SPARK-45527 default rp with task.gpu.amount=0.0588235294117647 can restrict 17  tasks run in the same executor (26 milliseconds)
[info] - SPARK-45527 default rp with task.gpu.amount=0.25 can restrict 4  tasks run in the same executor (23 milliseconds)
[info] - SPARK-45527 default rp with task.gpu.amount=0.0555555555555555 can restrict 18 barrier tasks run on the different executor (24 milliseconds)
[info] - SPARK-45527 default rp with task.gpu.amount=0.0909090909090909 can restrict 11 barrier tasks run on the different executor (24 milliseconds)
[info] - SPARK-45527 default rp with task.gpu.amount=0.0666666666666666 can restrict 15 barrier tasks run on the different executor (26 milliseconds)
[info] - SPARK-45527 default rp with task.gpu.amount=0.0588235294117647 can restrict 17 barrier tasks run on the different executor (31 milliseconds)
[info] - SPARK-45527 default rp with task.gpu.amount=0.125 can restrict 8 barrier tasks run on the different executor (24 milliseconds)
[info] - SPARK-45527 default rp with task.gpu.amount=0.0714285714285714 can restrict 14  tasks run on the different executor (25 milliseconds)
[info] - SPARK-45527 default rp with task.gpu.amount=0.0666666666666666 can restrict 15  tasks run on the different executor (24 milliseconds)
[info] - SPARK-45527 default rp with task.gpu.amount=0.0588235294117647 can restrict 17  tasks run on the different executor (26 milliseconds)
[info] - SPARK-45527 default rp with task.gpu.amount=0.1666666666666666 can restrict 6  tasks run on the different executor (25 milliseconds)
[info] - SPARK-45527 default rp with task.gpu.amount=0.0625 can restrict 16  tasks run on the different executor (25 milliseconds)
[info] - SPARK-45527 TaskResourceProfile with task.gpu.amount=0.25 can restrict 4 barrier tasks run in the same executor (24 milliseconds)
[info] - SPARK-45527 TaskResourceProfile with task.gpu.amount=0.1 can restrict 10 barrier tasks run in the same executor (33 milliseconds)
[info] - SPARK-45527 TaskResourceProfile with task.gpu.amount=0.05 can restrict 20 barrier tasks run in the same executor (29 milliseconds)
[info] - SPARK-45527 TaskResourceProfile with task.gpu.amount=0.0833333333333333 can restrict 12 barrier tasks run in the same executor (27 milliseconds)
[info] - SPARK-45527 TaskResourceProfile with task.gpu.amount=0.0909090909090909 can restrict 11 barrier tasks run in the same executor (29 milliseconds)
[info] - SPARK-45527 TaskResourceProfile with task.gpu.amount=1.0 can restrict 1  tasks run in the same executor (30 milliseconds)
[info] - SPARK-45527 TaskResourceProfile with task.gpu.amount=0.125 can restrict 8  tasks run in the same executor (26 milliseconds)
[info] - SPARK-45527 TaskResourceProfile with task.gpu.amount=0.1666666666666666 can restrict 6  tasks run in the same executor (26 milliseconds)
[info] - SPARK-45527 TaskResourceProfile with task.gpu.amount=0.0909090909090909 can restrict 11  tasks run in the same executor (28 milliseconds)
[info] - SPARK-45527 TaskResourceProfile with task.gpu.amount=0.0588235294117647 can restrict 17  tasks run in the same executor (29 milliseconds)
[info] - SPARK-45527 TaskResourceProfile with task.gpu.amount=0.05 can restrict 20 barrier tasks run on the different executor (27 milliseconds)
[info] - SPARK-45527 TaskResourceProfile with task.gpu.amount=0.0909090909090909 can restrict 11 barrier tasks run on the different executor (28 milliseconds)
[info] - SPARK-45527 TaskResourceProfile with task.gpu.amount=0.25 can restrict 4 barrier tasks run on the different executor (33 milliseconds)
[info] - SPARK-45527 TaskResourceProfile with task.gpu.amount=0.1111111111111111 can restrict 9 barrier tasks run on the different executor (31 milliseconds)
[info] - SPARK-45527 TaskResourceProfile with task.gpu.amount=0.1 can restrict 10 barrier tasks run on the different executor (32 milliseconds)
[info] - SPARK-45527 TaskResourceProfile with task.gpu.amount=0.0833333333333333 can restrict 12  tasks run on the different executor (32 milliseconds)
[info] - SPARK-45527 TaskResourceProfile with task.gpu.amount=0.05 can restrict 20  tasks run on the different executor (34 milliseconds)
[info] - SPARK-45527 TaskResourceProfile with task.gpu.amount=0.2 can restrict 5  tasks run on the different executor (30 milliseconds)
[info] - SPARK-45527 TaskResourceProfile with task.gpu.amount=0.1666666666666666 can restrict 6  tasks run on the different executor (33 milliseconds)
[info] - SPARK-45527 TaskResourceProfile with task.gpu.amount=0.1 can restrict 10  tasks run on the different executor (36 milliseconds)
[info] - SPARK-45527 TaskResourceProfile: the left multiple gpu resources on 1 executor can assign to other taskset (62 milliseconds)
[info] - SPARK-45527 TaskResourceProfile: the left gpu resources on multiple executors can assign to other taskset (163 milliseconds)
[info] - SPARK-45527 TaskResourceProfile: the left multiple gpu resources on 1 executor can't assign to other taskset due to not enough gpu resource (45 milliseconds)
[info] - SPARK-45527 schedule tasks for a barrier taskSet if all tasks can be launched together (14 milliseconds)
[info] Run completed in 1 minute, 16 seconds.
[info] Total number of tests run: 116
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 116, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 87 s (01:27), completed Feb 26, 2024, 6:53:30 PM

I verified manually. Merged to master.

HyukjinKwon added a commit that referenced this pull request Feb 28, 2024
…e, and reduce the resource usage

### What changes were proposed in this pull request?

This PR is a followup of #45272, #45268, #45264 and #45283 that increase timeout more and decrease the resource needed during the CI.

### Why are the changes needed?

To make the scheduled build pass https://github.com/apache/spark/actions/runs/8054862135/job/22053180441.

At least as far as I can tell, those changes are effective (makes tests less flaky and less fail).

### Does this PR introduce _any_ user-facing change?

No, test-only.

### How was this patch tested?

I manually ran then via IDE.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45297 from HyukjinKwon/SPARK-47185-SPARK-47181-followup.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
TakawaAkirayo pushed a commit to TakawaAkirayo/spark that referenced this pull request Mar 4, 2024
… in fraction resource calculation

### What changes were proposed in this pull request?

There are two more instances to fix in apache#45268 mistakenly missed. This PR fixes both.

### Why are the changes needed?

See apache#45268

### Does this PR introduce _any_ user-facing change?

No, test-only.

### How was this patch tested?

Manually

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#45272 from HyukjinKwon/SPARK-45527-followup2.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
TakawaAkirayo pushed a commit to TakawaAkirayo/spark that referenced this pull request Mar 4, 2024
…e, and reduce the resource usage

### What changes were proposed in this pull request?

This PR is a followup of apache#45272, apache#45268, apache#45264 and apache#45283 that increase timeout more and decrease the resource needed during the CI.

### Why are the changes needed?

To make the scheduled build pass https://github.com/apache/spark/actions/runs/8054862135/job/22053180441.

At least as far as I can tell, those changes are effective (makes tests less flaky and less fail).

### Does this PR introduce _any_ user-facing change?

No, test-only.

### How was this patch tested?

I manually ran then via IDE.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#45297 from HyukjinKwon/SPARK-47185-SPARK-47181-followup.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
ericm-db pushed a commit to ericm-db/spark that referenced this pull request Mar 5, 2024
… in fraction resource calculation

### What changes were proposed in this pull request?

There are two more instances to fix in apache#45268 mistakenly missed. This PR fixes both.

### Why are the changes needed?

See apache#45268

### Does this PR introduce _any_ user-facing change?

No, test-only.

### How was this patch tested?

Manually

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#45272 from HyukjinKwon/SPARK-45527-followup2.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
ericm-db pushed a commit to ericm-db/spark that referenced this pull request Mar 5, 2024
…e, and reduce the resource usage

### What changes were proposed in this pull request?

This PR is a followup of apache#45272, apache#45268, apache#45264 and apache#45283 that increase timeout more and decrease the resource needed during the CI.

### Why are the changes needed?

To make the scheduled build pass https://github.com/apache/spark/actions/runs/8054862135/job/22053180441.

At least as far as I can tell, those changes are effective (makes tests less flaky and less fail).

### Does this PR introduce _any_ user-facing change?

No, test-only.

### How was this patch tested?

I manually ran then via IDE.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#45297 from HyukjinKwon/SPARK-47185-SPARK-47181-followup.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants