Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

YUNIKORN-462: Streamline core to shim update on allocation change #58

Closed
wants to merge 2 commits into from

Conversation

manirajv06
Copy link
Contributor

What is this PR for?

SI changes to remove ReSyncSchedulerCache plugin and added new fields in AllocationRelease message

What type of PR is it?

  • - Improvement

Todos

  • - Task

What is the Jira issue?

https://issues.apache.org/jira/browse/YUNIKORN-462

How should this be tested?

Screenshots (if appropriate)

Questions:

  • - The licenses files need update.
  • - There is breaking changes for older versions.
  • - It needs documentation.

@manirajv06
Copy link
Contributor Author

In addition to changes discussed offline, have also made changes for synchronous release allocations in a separate flow as I thought of not disturbing the other use cases (trigger points) which are ok with async communication. This kind of synchronous release allocations is required only for processAllocationReleases() method because call to ForgotPod() happens only in this flow through ReSyncSchedulerCache plugin.

Once overall approach makes sense, will need to work on unit tests

Copy link
Contributor

@wilfred-s wilfred-s left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure about the AllocationRelease message changes.

Comment on lines 564 to 568
// AllocationKey from AllocationAsk
string allocationkey = 6;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should already be covered in the message as the UUID. Otherwise we currently would not the possibility to release a pod that is not allocated yet (which is what an ask is).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AssumePod/ForgetPod expects allocationKey to do its cache add/remove operation. Currently, ReSyncSchedulerCache plugin wraps AllocationKey and few more fields under AssumedAllocation/ForgotAllocation message from core side and sends to shim which eventually passes to AssumePod/ForgetPod methods. Now, with the removal of ReSyncSchedulerCache plugin, in this PR, added AllocationKey in AllocationRelease message as AllocationRelease is being passed to shim (as AllocationResponse) through event processing.

Comment on lines 566 to 570
// update cache, default is false
bool updateCache = 7;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why we would ever not want to update the cache. Would that not cause issues with an out of sync cache?
The cache should be idempotent and update when needed:

  • delete can never fail: if the entry does not exist the result is the same as a successful delete
  • for creates and updates we should do a create OR update call with the end result being a consistent cache representing the correct object.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

context#processAllocationReleases is the only place where ReSyncSchedulerCache plugin is used to call ForgetPod in shim side for cache updates in addition to sending RMReleaseAllocationEvent event. RMReleaseAllocationEvent uses AllocationRelease message. AllocationRelease message is being used in few more places (for ex, context#handleRMUpdateApplicationEvent, node decommission etc) in addition to context#processAllocationReleases in core side. Hence, to help shim whether to do cache operation or not while processing UpdateAllocation callback method, used "updateCache" field

@wilfred-s
Copy link
Contributor

Sorry that it has taken so long to get back to you :-(
Way to long an update but it sets the direction for the other two repos involved

We have one change in mind remove the ReSyncSchedulerCache call. The call is made twice from the core to the shim:

  • for new allocations
  • for removed allocations

First new allocations: the message we send is a RMNewAllocationsEvent with all allocations that are new. That event is currently async. That is why we first we call a sync cache update with partial info. The change is that the event will become a sync call. We have enough information in the current Allocations array. This is part of the event we send and we can also call the assume of the pod inside the cache on the shim side by pulling that key from the allocation. We always call assume pod for every new allocation.

Simple change on the SI side: we can remove the AssumedAllocation message. Leverage existing information for the AllocationKey

On the remove side we have 4 locations where we send a RMReleaseAllocationEvent. Only in one location, as you pointed out, we also call the sync of the cache. The sync of the cache triggers the forget of an assumed pod. Only looking at the path that sends events back. The cache sync is part of one of these calls.

  1. handleRMUpdateApplicationEvent handles removal of an application. Does not call the cache sync.
  2. updateNode handles the node removal. Does not call the cache sync.
  3. schedule triggers the release of a placeholder. Does not call the cache sync.
  4. processAllocationReleases is processing release requests send by the shim. This calls the cache sync.
    The termination type for call 1,2 and 4 is STOPPED_BY_RM. For call 2 it is PLACEHOLDER_REPLACED.
    Every single allocation is assumed as per above description. So we should also forget a pod, remove it, from the assumed pod list when we remove the pod without exception. If we do not we could leak the entry in the assumed pod cache structure. There should be no difference in the communication for any of these cases between core and shim.

Simple change on the SI side: we can remove the ForgotAllocation message. Add the allocationKey to the AllocationRelease message (check case for the new field!)

The core sends the events synchronously. The shim collapses the assume call into the event processing for new allocations, returns as soon as possible forking of the long running tasks. The shim collapses the forget call into the event processing for the remove. If there is any special cases to not forget the assumption of a removed pod then the shim must implement it. The core should not be the one that decides this.

After this we also need to completely remove the ReSyncSchedulerCacheArgs and the ReSyncSchedulerCache call from the interface.

@manirajv06
Copy link
Contributor Author

Thanks for your detailed explanation and giving me a clear picture.

Have taken care of SI and Core accordingly. As discussed, once YUNIKORN-876 goes in, we can work on the shim. Summary is, We should call clear the cache on shim side for all these 4 calls. Since we need to clear cache in all 4 calls, then probably we should make all these calls sync?

Copy link
Contributor

@wilfred-s wilfred-s left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wilfred-s wilfred-s closed this in e19a1b0 Feb 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants