-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Fix: Select another pod if all hosts in the pod becomes unavailable #8085
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix: Select another pod if all hosts in the pod becomes unavailable #8085
Conversation
Codecov Report
@@ Coverage Diff @@
## 4.18 #8085 +/- ##
============================================
+ Coverage 13.02% 13.10% +0.07%
- Complexity 9032 9123 +91
============================================
Files 2720 2720
Lines 257080 257598 +518
Branches 40088 40158 +70
============================================
+ Hits 33476 33748 +272
- Misses 219400 219587 +187
- Partials 4204 4263 +59
... and 15 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
engine/orchestration/src/main/java/com/cloud/vm/VirtualMachineManagerImpl.java
Outdated
Show resolved
Hide resolved
engine/orchestration/src/main/java/com/cloud/vm/VirtualMachineManagerImpl.java
Show resolved
Hide resolved
@blueorangutan package |
@vishesh92 a [SF] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 7350 |
@blueorangutan test matrix |
@rohityadavcloud a [SF] Trillian-Jenkins matrix job (centos7 mgmt + xenserver71, rocky8 mgmt + vmware67u3, centos7 mgmt + kvmcentos7) has been kicked to run smoke tests |
engine/orchestration/src/main/java/com/cloud/vm/VirtualMachineManagerImpl.java
Outdated
Show resolved
Hide resolved
[SF] Trillian test result (tid-7955)
|
[SF] Trillian test result (tid-7953)
|
@blueorangutan package |
@vishesh92 a [SF] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 7412 |
engine/orchestration/src/main/java/com/cloud/vm/VirtualMachineManagerImpl.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code LGTM
@blueorangutan test |
@DaanHoogland a [SL] Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests |
[SF] Trillian test result (tid-8070)
|
b0057f7
to
cad6412
Compare
@blueorangutan package |
@DaanHoogland a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
cad6412
to
db22bdf
Compare
@blueorangutan package |
@vishesh92 a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 7506 |
Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 7508 |
engine/orchestration/src/main/java/com/cloud/vm/VirtualMachineManagerImpl.java
Outdated
Show resolved
Hide resolved
@blueorangutan package |
@vishesh92 a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 7520 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clgtm, I'll be testing it manually to simulate the right conditions.
8af2720
to
07339be
Compare
07339be
to
a425b07
Compare
@blueorangutan package |
@DaanHoogland a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 7611 |
@blueorangutan test alma9 kvm-alma9 |
@DaanHoogland a [SL] Trillian-Jenkins test job (alma9 mgmt + kvm-alma9) has been kicked to run smoke tests |
[SF] Trillian test result (tid-8220)
|
not sure if the errors are related; |
@blueorangutan test alma9 kvm-alma9 |
@DaanHoogland a [SL] Trillian-Jenkins test job (alma9 mgmt + kvm-alma9) has been kicked to run smoke tests |
JFYI @DaanHoogland @vishesh92 I had some issues using Alma Linux (due to repo/mirror issue) but OL8/OL9 seems to work fine with backend CI/CD. |
@blueorangutan test alma9 kvm-alma9 |
this didn´t work 🤯 , so started it manually |
@DaanHoogland [SL] unsupported parameters provided. Supported mgmt server os are: |
results:
These error are all over the place at the moment, not specific to this issue. |
tested according to spec in the description. |
Description
In case of a failure while deploying VM, we reset the host_id for the failed VM to null but not the pod_id. This results in failure when there is enough capacity in another pod, but not in the existing pod.
Types of changes
Feature/Enhancement Scale or Bug Severity
Feature/Enhancement Scale
Bug Severity
Screenshots (if appropriate):
How Has This Been Tested?
How did you try to break this feature and the system with this change?
This needs an environment with 2 pods to reproduce the issue and test the fix.
cloudstack/server/src/main/java/com/cloud/capacity/CapacityManagerImpl.java
Line 383 in 9df580c
SELECT id, state, pod_id, host_id, last_host_id FROM vm_instance ORDER BY id DESC LIMIT 1;
on thecloud
database.UPDATE host_pod_ref SET allocation_state = 'Disabled' WHERE id = <pod id>
.hostHasCpuCapability = false
in the debugger to throw an error in the first run.