Skip to content

Conversation

@zentol
Copy link
Contributor

@zentol zentol commented Jun 27, 2023

based on #22795 for ease of testing.

The Adaptive Scheduler now supports a minimum parallelism per vertex.

  • a job is only run if the minimum required slots of all vertices is fulfilled
  • if the minimum parallelism is being raised above the currently available number of slots (== implicitly above the current parallelism) the scheduler immediately cancels the job and tries to rescale it. It will either restart the job if it could acquire the additionally required slots, or fail the job once the stabilization timeout kicked in.
    • We could consider adding some timeout here for the cancellation, to give us some time to acquire more slots.
  • if the minimum parallelism is lowered, or raised below the current parallelism, the job just keeps running.

@zentol zentol requested review from dmvk and gyfora June 27, 2023 16:46
@flinkbot
Copy link
Collaborator

flinkbot commented Jun 27, 2023

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

@gyfora
Copy link
Contributor

gyfora commented Jun 27, 2023

I think this behavior may cause some serious problems if the new resources cannot be acquired. I think the scheduler should wait until the resources are acquired before taking any action to not cause downtime.

I am thinking about the case where you simply want to increase the parallelism to a given new (larger) parallelism . In that case you would set the min max to the same new value, and you would expect the scaling to happen only when resources are available.

It’s not enough in these cases to simply raise the max because that may lead to incremental multiple restarts as the resources become available which can cause even more downtime.

what do you think @zentol ?

cc @mxm

@zentol
Copy link
Contributor Author

zentol commented Jun 27, 2023

I think the scheduler should wait until the resources are acquired before taking any action to not cause downtime.

I may have misremembered what your requirements were. It shouldn't be difficult to solve this such that don't cancel the job immediately.

@gyfora
Copy link
Contributor

gyfora commented Jun 27, 2023

We were thinking of using this in the context of autoscaling or in general scaling jobs up or down to a target parallelism. Theoretically there could be some circumstances where if the job requirements state a minimum that is higher than the current we may want cancel the job (eventually).

But from a purely practical perspective, I think most people would rather have the job running with the current (smaller) parallelism than to not have it at all in cases where the resources cannot be acquired. This is a rather common case in cloud environments due to various resource quotas, changing circumstances etc.

We could also make this configurable with a timeout to cancel where a negative/infinite timeout would mean that we actually wait forever for the resources without cancelling.

private static Map<SlotSharingGroupId, Integer> getMaxParallelismForSlotSharingGroups(
Iterable<JobInformation.VertexInformation> vertices) {
return getPerSlotSharingGroups(
vertices, JobInformation.VertexInformation::getParallelism, Math::max);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this supposed to be getMaxParallelism here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope, but the confusion is understandable. Max parallelism is an overloaded term, both referring to the max parallelism a job can ever run (== the number of key groups), and the upper bound parallelism that the job can run at.

Outside of validation purposes the actual max parallelism isn't relevant for scaling.

We should rename things to explicitly refer to lower/upper parallelism bound.

Copy link
Contributor

@gyfora gyfora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think one test case may still be missing if I understand correctly. Otherwise the PR looks good to me!

newJobResourceRequirements);
assertThat(scheduler.getState()).isSameAs(originalState);
},
singleThreadMainThreadExecutor))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may be missing the test case that this actually scales up once the resources are available for the new higher min parallelism. Maybe that could be added here

@zentol zentol merged commit 38f4d13 into apache:master Jul 5, 2023
- vertexInformation.getMinParallelism()),
(metaInfo1, metaInfo2) ->
new SlotSharingGroupMetaInfo(
Math.min(metaInfo1.getMinLowerBound(), metaInfo2.minLowerBound),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Schepler! May you explain here why it's not Math.max(metaInfo1.getMinLowerBound(), metaInfo2.minLowerBound) but min here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants