New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CleanUp Async Jobs after mgmt server maintenance #8394
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #8394 +/- ##
============================================
- Coverage 30.88% 30.75% -0.13%
+ Complexity 34079 33941 -138
============================================
Files 5341 5341
Lines 374861 374922 +61
Branches 54518 54529 +11
============================================
- Hits 115769 115323 -446
- Misses 243825 244347 +522
+ Partials 15267 15252 -15
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
@blueorangutan package |
@harikrishna-patnala a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code LGTM
framework/jobs/src/main/java/org/apache/cloudstack/framework/jobs/impl/AsyncJobManagerImpl.java
Outdated
Show resolved
Hide resolved
Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 8111 |
@blueorangutan test |
@rohityadavcloud a [SL] Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests |
Moving this to 4.19.1 milestone for now cc @rohityadavcloud |
framework/jobs/src/main/java/org/apache/cloudstack/framework/jobs/impl/AsyncJobManagerImpl.java
Outdated
Show resolved
Hide resolved
[SF] Trillian test result (tid-8653)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CLGTM, didn't test it
…reated from snapshots, and some code improvements
@blueorangutan package |
@sureshanaparti a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 8205 |
…created from snapshots
@blueorangutan package |
@sureshanaparti a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 8228 |
@blueorangutan test |
@sureshanaparti a [SL] Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests |
[SF] Trillian test result (tid-8754)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, tested the changes manually.
Stopped the management server during the following scenario's
- while a attachvolume is in progress
- while create volume from snapshot is in progress
- While vm deployment along with a new network is in progress
After the management server is started
-
Volume is marked Ready and removed column date is kept as NULL
-
Volume is marked Allocated and removed column date is populated, the entries in volume_details table is removed for the snapshot
-
Vm state is stopped, volume is set in allocated and the network state is implementing
@shwstppr can we get this in 4.19.0.0 ? Thanks. |
@blueorangutan package |
@shwstppr a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
@blueorangutan package |
@shwstppr a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 8370 |
@blueorangutan test matrix |
@shwstppr a [SL] Trillian-Jenkins matrix job (centos7 mgmt + xenserver71, rocky8 mgmt + vmware67u3, centos7 mgmt + kvmcentos7) has been kicked to run smoke tests |
[SF] Trillian test result (tid-8875)
|
[SF] Trillian test result (tid-8873)
|
Error here while creating network, is not related to this PR changes. Changes here would reset/cleanup any Volume, VM & Network(in implementing state) resources for the pending jobs on MS start & is good to go. cc @shwstppr
|
This PR fixes moves resources stuck in transition state during async job cleanup Problem: During maintenance of the management server, other servers in the cluster or the same server after a restart initiate async job cleanup. However, this process leaves resources in a transitional state. The only recovery option currently available is to make direct database changes. Solution: This PR introduces a resolution by changing Volume, Virtual Machine, and Network resources from their transitional states. This adjustment enables the reattempt of failed operations without the need for manual database modifications.
Description
This PR fixes moves resources stuck in transition state during async job cleanup
Problem:
During maintenance of the management server, other servers in the cluster or the same server after a restart initiate async job cleanup. However, this process leaves resources in a transitional state. The only recovery option currently available is to make direct database changes.
Solution:
This PR introduces a resolution by changing Volume, Virtual Machine, and Network resources from their transitional states. This adjustment enables the reattempt of failed operations without the need for manual database modifications.
Types of changes
Feature/Enhancement Scale or Bug Severity
Feature/Enhancement Scale
Bug Severity
Screenshots (if appropriate):
How Has This Been Tested?
Tested manually and with unit tests