Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YUNIKORN-1440][FOLLOWUP] Remove expired apps from queue during cleanup #471

Closed
wants to merge 2 commits into from

Conversation

bgrams
Copy link
Contributor

@bgrams bgrams commented Dec 9, 2022

What is this PR for?

Completed applications are stored permanently in 2 places - the PartitionContext and the application Queue. #463 ensured that these are permanently removed from the PC. This PR ensures that the Queue is considered in the cleanup loop as well.

What type of PR is it?

  • - Bug Fix
  • - Improvement
  • - Feature
  • - Documentation
  • - Hot Fix
  • - Refactoring

Todos

  • - Task

What is the Jira issue?

https://issues.apache.org/jira/browse/YUNIKORN-1440

How should this be tested?

Tests added

Screenshots (if appropriate)

Questions:

  • - The licenses files need update.
  • - There is breaking changes for older versions.
  • - It needs documentation.

@codecov
Copy link

codecov bot commented Dec 9, 2022

Codecov Report

Merging #471 (33cf961) into master (020bbd8) will increase coverage by 0.00%.
The diff coverage is 78.94%.

@@           Coverage Diff           @@
##           master     #471   +/-   ##
=======================================
  Coverage   72.88%   72.89%           
=======================================
  Files          67       67           
  Lines       10057    10074   +17     
=======================================
+ Hits         7330     7343   +13     
- Misses       2482     2486    +4     
  Partials      245      245           
Impacted Files Coverage Δ
pkg/scheduler/objects/queue.go 70.01% <0.00%> (-0.32%) ⬇️
pkg/scheduler/partition.go 78.02% <100.00%> (+0.28%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Contributor

@craigcondit craigcondit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. Would like to have Wilfred look this over as well before commit.

@wilfred-s
Copy link
Contributor

wilfred-s commented Dec 12, 2022

As I discussed with Brandon off-line: YUNIKORN-800 introduced this leak and did not take into account dynamic queues and application ID reuse (i.e. K8s CronJob).
Completed applications should not be linked to the queue anymore. If we want them accessible based on the queue(path) we should allow a filter on the partition applications call.

In the case that we have a dynamic queue: the queue will be removed long before the completed application expires. Our main use cases have always used dynamic queues and that is one of the reasons why we did not see this issue. This is a workaround. It fixes the leak but nothing else. We should properly fix this: all the issues introduced in YUNIKORN-800 and the leak in one go.

@bgrams
Copy link
Contributor Author

bgrams commented Dec 12, 2022

Yep, thanks Wilfred. We're fine running this patch as a temp fix until 1.2.

I'll close the PR. Happy to pitch in on the broader fix as-needed.

@bgrams bgrams closed this Dec 12, 2022
@bgrams bgrams deleted the remove-expired-apps-from-queue branch December 13, 2022 19:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants