New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-7956] [flip6] Add support for queued scheduling with slot sharing to SlotPool #5091
Conversation
6489c67
to
c4fbaf0
Compare
…ing to SlotPool This commit adds support for queued scheduling with slot sharing to the SlotPool. The idea of slot sharing is that multiple tasks can run in the same slot. Moreover, queued scheduling means that a slot request must not be completed right away but at a later point in time. This allows to start new TaskExecutors in case that there are no more slots left. The main component responsible for the management of shared slots is the SlotSharingManager. The SlotSharingManager maintains internally a tree-like structure which stores the SlotContext future of the underlying AllocatedSlot. Whenever this future is completed potentially pending LogicalSlot instantiations are executed and sent to the slot requester. A shared slot is represented by a MultiTaskSlot which can harbour multiple TaskSlots. A TaskSlot can either be a MultiTaskSlot or a SingleTaskSlot. In order to represent co-location constraints, we first obtain a root MultiTaskSlot and then allocate a nested MultiTaskSlot in which the co-located tasks are allocated. The corresponding SlotRequestID is assigned to the CoLocationConstraint in order to make the TaskSlot retrievable for other tasks assigned to the same CoLocationConstraint. Port SchedulerSlotSharingTest, SchedulerIsolatedTasksTest and ScheduleWithCoLocationHintTest to run with SlotPool. Restructure SlotPool components. Add SlotSharingManagerTest, SlotPoolSlotSharingTest and SlotPoolCoLocationTest. This closes apache#5091.
…ing to SlotPool This commit adds support for queued scheduling with slot sharing to the SlotPool. The idea of slot sharing is that multiple tasks can run in the same slot. Moreover, queued scheduling means that a slot request must not be completed right away but at a later point in time. This allows to start new TaskExecutors in case that there are no more slots left. The main component responsible for the management of shared slots is the SlotSharingManager. The SlotSharingManager maintains internally a tree-like structure which stores the SlotContext future of the underlying AllocatedSlot. Whenever this future is completed potentially pending LogicalSlot instantiations are executed and sent to the slot requester. A shared slot is represented by a MultiTaskSlot which can harbour multiple TaskSlots. A TaskSlot can either be a MultiTaskSlot or a SingleTaskSlot. In order to represent co-location constraints, we first obtain a root MultiTaskSlot and then allocate a nested MultiTaskSlot in which the co-located tasks are allocated. The corresponding SlotRequestID is assigned to the CoLocationConstraint in order to make the TaskSlot retrievable for other tasks assigned to the same CoLocationConstraint. Port SchedulerSlotSharingTest, SchedulerIsolatedTasksTest and ScheduleWithCoLocationHintTest to run with SlotPool. Restructure SlotPool components. Add SlotSharingManagerTest, SlotPoolSlotSharingTest and SlotPoolCoLocationTest. This closes apache#5091.
Consumer<Tuple3<SlotRequestId, SlotSharingGroupId, Throwable>> currentReleaseSlotConsumer = this.releaseSlotConsumer; | ||
|
||
if (currentReleaseSlotConsumer != null) { | ||
currentReleaseSlotConsumer.accept(Tuple3.of(slotRequestId, slotSharingGroupId, cause )); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: whitespace after cause
... cause ));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will fix it.
fail("Scheduler accepted dead instance"); | ||
} | ||
catch (IllegalArgumentException e) { | ||
// stimmt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😃
assertFalse(i2.isAlive()); | ||
assertFalse(i3.isAlive()); | ||
} | ||
catch (Exception e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better propagate the exception but I guess this file was just copy pasted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
jup, but I'll remove it.
} | ||
|
||
private void progressToNextElement() { | ||
while(baseIterator.hasNext() && ! currentIterator.hasNext()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Missing space between while
and (
. Excess space after !
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Will correct it.
import org.apache.flink.runtime.jobmaster.SlotContext; | ||
import org.apache.flink.runtime.jobmaster.SlotOwner; | ||
import org.apache.flink.runtime.jobmaster.SlotRequestId; | ||
import org.apache.flink.runtime.instance.SlotSharingGroupId; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: wrong import order (not sorted lexicographically)
import org.apache.flink.runtime.instance.SlotSharingGroupId;
import org.apache.flink.runtime.jobmanager.scheduler.Locality;
items should appear before LogicalSlot
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will correct it.
*/ | ||
@VisibleForTesting | ||
public Collection<MultiTaskSlot> getResolvedRootSlots() { | ||
ResolvedRootSlotValues vs = resolvedMultiTaskSlotValues; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is only for testing, do we really need to cache the instance in resolvedMultiTaskSlotValues
? It is not obvious that resolvedMultiTaskSlotValues
is only used for testing. How about just
public Collection<MultiTaskSlot> getResolvedRootSlots() {
return new ResolvedRootSlotValues();
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this is better. Will change it.
|
||
private final SlotSharingGroupId slotSharingGroupId; | ||
|
||
// needed to release allocated slots after a complete multi task slot hierarchy has been released |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: All fields are commented with non-javadoc comments. Normally comments on fields are also done in Javadoc style, e.g., SlotPool
. Javadoc comments on fields are displayed by IntelliJ (Ctrl + J
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point. Will change it.
final MultiTaskSlot resolvedRootNode = unresolvedRootSlots.remove(slotRequestId); | ||
|
||
if (resolvedRootNode != null) { | ||
final Set<MultiTaskSlot> innerCollection = resolvedRootSlots.computeIfAbsent( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory the signature of this method allows concurrent modifications on resolvedRootSlots
. Maybe a comment why this cannot happen would be nice; perhaps with@GuardedBy
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True. I'll add the necessary synchronization because it's not guaranteed that the future completion won't change in the future.
@@ -32,6 +34,20 @@ | |||
*/ | |||
public interface LogicalSlot { | |||
|
|||
Payload TERMINATED_PAYLOAD = new Payload() { | |||
|
|||
private final CompletableFuture<?> COMPLETED_TERMINATION_FUTURE = CompletableFuture.completedFuture(null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: COMPLETED_TERMINATION_FUTURE
should be camel cased because is not actually a constant (not static).
// sanity check | ||
Preconditions.checkState(!multiTaskSlotFuture.getMultiTaskSlot().contains(task.getJobVertexId())); | ||
|
||
final SlotSharingManager.SingleTaskSlot leave = multiTaskSlotFuture.getMultiTaskSlot().allocateSingleTaskSlot( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: variable name should be leaf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
jup, we wouldn't want the single task slot to leave after all.
this, | ||
providerAndOwner)); | ||
|
||
final SlotSharingManager.MultiTaskSlotLocality multiTaskSlotFuture; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable name is confusing. multiTaskSlotFuture
is not of type Future
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, will rename it.
|
||
Payload TERMINATED_PAYLOAD = new Payload() { | ||
|
||
private final CompletableFuture<?> COMPLETED_TERMINATION_FUTURE = CompletableFuture.completedFuture(null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Will fix it.
// lock the co-location constraint once we have obtained the allocated slot | ||
coLocationSlot.getSlotContextFuture().whenComplete( | ||
(SlotContext slotContext, Throwable throwable) -> { | ||
if (throwable == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it acceptable to swallow the Throwable
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will log the throwable here.
|
||
multiTaskSlotFuture = slotSharingManager.createRootSlot( | ||
multiTaskSlotRequestId, | ||
futureSlot.thenApply(Function.identity()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: If you change the signature to
public MultiTaskSlot createRootSlot(
SlotRequestId slotRequestId,
CompletableFuture<? extends SlotContext> slotContextFuture,
SlotRequestId allocatedSlotRequestId)
you won't need this trick to satisfy the compiler (PECS rule). Unfortunately more changes in the code would be needed, e.g., in MultiTaskSlot
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Will change it.
* | ||
* <p>The {@link TaskSlot} hierarchy is implemented by {@link MultiTaskSlot} and | ||
* {@link SingleTaskSlot}. The former class represents inner nodes which can contain | ||
* a number of other {@link TaskSlot} and the latter class represents the leave nodes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: leaf nodes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch.
Locality locality) { | ||
Preconditions.checkState(!super.contains(groupId)); | ||
|
||
final SingleTaskSlot leave = new SingleTaskSlot( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: leaf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will change.
*/ | ||
public class SlotSharingManagerTest extends TestLogger { | ||
|
||
private static final SlotSharingGroupId slotSharingGroupId = new SlotSharingGroupId(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be SLOT_SHARING_GROUP_ID
since it is a constant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True.
@Test | ||
public void testRootSlotRelease() throws ExecutionException, InterruptedException { | ||
final CompletableFuture<SlotRequestId> slotReleasedFuture = new CompletableFuture<>(); | ||
final TestingAllocatedSlotActions testingAllocatedSlotActions = new TestingAllocatedSlotActions(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You already have an instance of TestingAllocatedSlotActions
defined in line 57.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. Will change it.
|
||
private static final DummySlotOwner slotOwner = new DummySlotOwner(); | ||
|
||
private static final TestingAllocatedSlotActions allocatedSlotActions = new TestingAllocatedSlotActions(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This instance is mutable... should not be static
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will make it non static.
|
||
public ScheduledUnit( | ||
Execution task, | ||
JobVertexID jobVertexId, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can get JobVertexID from Execution. Do we need this in Constructor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes because task
can be null. Will add the missing @Nullable
annotation.
null, | ||
new FlinkException("Locality constraint is not better fulfilled by allocated slot.")); | ||
} | ||
return multiTaskSlotLocality; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If multiTaskSlotLocality != null and multiTaskSlotLocality.getLocality() != LOCAL and slotAndLocality == null
It will return multiTaskSlotLocality without LOCAL locality. Is this expectable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes because it could not find a slot which is LOCAL
. The alternative would be to ask the ResourceManager but it will only give you a slot with an UNKNOWN
locality because it is uncompleted.
The cluster entrypoints start the ResourceManager with the web interface URL. This URL is used to set the correct tracking URL in Yarn when registering the Yarn application. This closes apache#5128.
This commit introduces the SlotContext which is an abstraction for the SimpleSlot to obtain the relevant slot information to do the communication with the TaskManager without relying on the AllocatedSlot which is now only used by the SlotPool. This closes apache#5088.
Before logical slots like the SimpleSlot and SharedSlot where associated to the actually allocated slot via the AllocationID. This, however, was sub-optimal because allocated slots can be re-used to fulfill also other slot requests (logical slots). Therefore, we should bind the logical slots to the right id with the right lifecycle which is the slot request id. This closes apache#5089.
Not only check for a slot request with the right allocation id but also check whether we can fulfill other pending slot requests with an unclaimed offered slot before adding it to the list of available slots. This closes apache#5090.
c4fbaf0
to
3fdb5c6
Compare
…ing to SlotPool This commit adds support for queued scheduling with slot sharing to the SlotPool. The idea of slot sharing is that multiple tasks can run in the same slot. Moreover, queued scheduling means that a slot request must not be completed right away but at a later point in time. This allows to start new TaskExecutors in case that there are no more slots left. The main component responsible for the management of shared slots is the SlotSharingManager. The SlotSharingManager maintains internally a tree-like structure which stores the SlotContext future of the underlying AllocatedSlot. Whenever this future is completed potentially pending LogicalSlot instantiations are executed and sent to the slot requester. A shared slot is represented by a MultiTaskSlot which can harbour multiple TaskSlots. A TaskSlot can either be a MultiTaskSlot or a SingleTaskSlot. In order to represent co-location constraints, we first obtain a root MultiTaskSlot and then allocate a nested MultiTaskSlot in which the co-located tasks are allocated. The corresponding SlotRequestID is assigned to the CoLocationConstraint in order to make the TaskSlot retrievable for other tasks assigned to the same CoLocationConstraint. Port SchedulerSlotSharingTest, SchedulerIsolatedTasksTest and ScheduleWithCoLocationHintTest to run with SlotPool. Restructure SlotPool components. Add SlotSharingManagerTest, SlotPoolSlotSharingTest and SlotPoolCoLocationTest. This closes apache#5091.
…ing to SlotPool This commit adds support for queued scheduling with slot sharing to the SlotPool. The idea of slot sharing is that multiple tasks can run in the same slot. Moreover, queued scheduling means that a slot request must not be completed right away but at a later point in time. This allows to start new TaskExecutors in case that there are no more slots left. The main component responsible for the management of shared slots is the SlotSharingManager. The SlotSharingManager maintains internally a tree-like structure which stores the SlotContext future of the underlying AllocatedSlot. Whenever this future is completed potentially pending LogicalSlot instantiations are executed and sent to the slot requester. A shared slot is represented by a MultiTaskSlot which can harbour multiple TaskSlots. A TaskSlot can either be a MultiTaskSlot or a SingleTaskSlot. In order to represent co-location constraints, we first obtain a root MultiTaskSlot and then allocate a nested MultiTaskSlot in which the co-located tasks are allocated. The corresponding SlotRequestID is assigned to the CoLocationConstraint in order to make the TaskSlot retrievable for other tasks assigned to the same CoLocationConstraint. Port SchedulerSlotSharingTest, SchedulerIsolatedTasksTest and ScheduleWithCoLocationHintTest to run with SlotPool. Restructure SlotPool components. Add SlotSharingManagerTest, SlotPoolSlotSharingTest and SlotPoolCoLocationTest. This closes apache#5091.
3fdb5c6
to
917fbcb
Compare
Thanks for the review @GJL and @ifndef-SleePy. Travis passed locally. Merging this PR. |
What is the purpose of the change
This commit adds support for queued scheduling with slot sharing to the
SlotPool. The idea of slot sharing is that multiple tasks can run in the
same slot. Moreover, queued scheduling means that a slot request must not
be completed right away but at a later point in time. This allows to
start new TaskExecutors in case that there are no more slots left.
The main component responsible for the management of shared slots is the
SlotSharingManager. The SlotSharingManager maintains internally a tree-like
structure which stores the SlotContext future of the underlying
AllocatedSlot. Whenever this future is completed potentially pending
LogicalSlot instantiations are executed and sent to the slot requester.
A shared slot is represented by a MultiTaskSlot which can harbour multiple
TaskSlots. A TaskSlot can either be a MultiTaskSlot or a SingleTaskSlot.
In order to represent co-location constraints, we first obtain a root
MultiTaskSlot and then allocate a nested MultiTaskSlot in which the
co-located tasks are allocated. The corresponding SlotRequestID is assigned
to the CoLocationConstraint in order to make the TaskSlot retrievable for
other tasks assigned to the same CoLocationConstraint.
This PR also moves the
SlotPool
components too.a.f.runtime.jobmaster.slotpool
.This PR is based on #5090
Brief change log
SlotSharingManager
to manage shared slotsSlotPool
to useSlotSharingManager
SlotPool#allocateMultiTaskSlot
to allocate a shared slotSlotPool#allocateCoLocatedMultiTaskSlot
to allocate a co-located slotSlotPool
components too.a.f.runtime.jobmaster.slotpool
Verifying this change
SchedulerSlotSharingTest
,SchedulerIsolatedTasksTest
andScheduleWithCoLocationHintTest
to run withSlotPool
SlotSharingManagerTest
,SlotPoolSlotSharingTest
andSlotPoolCoLocationTest
Does this pull request potentially affect one of the following parts:
@Public(Evolving)
: (no)Documentation
CC: @GJL