New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix reorder/list pools when cluster details are not set, while deploying vm / attaching volume #8373
Fix reorder/list pools when cluster details are not set, while deploying vm / attaching volume #8373
Conversation
@blueorangutan package |
@sureshanaparti a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #8373 +/- ##
=========================================
Coverage 30.85% 30.85%
- Complexity 34048 34054 +6
=========================================
Files 5341 5341
Lines 374861 374871 +10
Branches 54518 54521 +3
=========================================
+ Hits 115659 115676 +17
Misses 243973 243973
+ Partials 15229 15222 -7
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code lgtm
need manual testing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - refactoring the sql query to evoke based on passed parameter (when not null) makes it super clear why NPE could happen and why it makes the code more defensive.
6dbe9f2
to
2f35795
Compare
@blueorangutan package |
@shwstppr a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
Packaging result [SF]: ✖️ el7 ✖️ el8 ✖️ el9 ✖️ debian ✖️ suse15. SL-JID 8113 |
@blueorangutan package |
@sureshanaparti a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
…ing vm / attaching volume
2f35795
to
4634aaf
Compare
@blueorangutan package |
@sureshanaparti a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
Packaging result [SF]: ✖️ el7 ✖️ el8 ✖️ el9 ✖️ debian ✖️ suse15. SL-JID 8115 |
Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 8119 |
@blueorangutan test |
@weizhouapache a [SL] Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests |
[SF] Trillian test result (tid-8655)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, Tested manually by creating a zone wide storage and creating and attaching a data volume to a running vm
Before the fix there was sql exception
2024-01-03 04:16:06,397 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (Work-Job-Executor-15:ctx-61717f63 job-64/job-65) (logid:fc47afe3) Done executing com.cloud.vm.VmWorkAttachVolume for job-65
2024-01-03 04:16:06,398 INFO [o.a.c.f.j.i.AsyncJobMonitor] (Work-Job-Executor-15:ctx-61717f63 job-64/job-65) (logid:fc47afe3) Remove job-65 from job monitoring
2024-01-03 04:16:06,414 ERROR [c.c.a.ApiAsyncJobDispatcher] (API-Job-Executor-11:ctx-24c62e2f job-64) (logid:fc47afe3) Unexpected exception while executing org.apache.cloudstack.api.command.admin.volume.AttachVolumeCmdByAdmin
com.cloud.utils.exception.CloudRuntimeException: Caught: SELECT pool.id, SUM(IF(vol.state='Ready' AND vol.account_id = ?, 1, 0)) FROM `cloud`.`storage_pool` pool LEFT JOIN `cloud`.`volumes` vol ON pool.id = vol.pool_id WHERE pool.data_center_id = ? AND pool.pod_id = ? AND pool.cluster_id = ? GROUP BY pool.id ORDER BY 2 ASC
at com.cloud.storage.dao.VolumeDaoImpl.listPoolIdsByVolumeCount(VolumeDaoImpl.java:613)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
After the fix there was no sql error observed and the volume got attached
2024-01-03 06:50:22,550 DEBUG [c.c.s.StorageManagerImpl] (Work-Job-Executor-4:ctx-48decd5c job-50/job-51 ctx-5d9c474b) (logid:8aea532a) Checking pool: 2 for storage allocation , maxSize : (3.9980 TB) 4395899027456, totalAllocatedSize : (22.65 GB) 24318968832, askingSize : (5.00 GB) 5368709120, allocated disable threshold: 0.95
2024-01-03 06:50:22,550 DEBUG [o.a.c.s.a.AbstractStoragePoolAllocator] (Work-Job-Executor-4:ctx-48decd5c job-50/job-51 ctx-5d9c474b) (logid:8aea532a) ZoneWideStoragePoolAllocator is returning [1] suitable storage pools [[{"name":"test","uuid":"4fe5cbe5-ab56-37a4-8727-9d9795164ca7"}]].
2024-01-03 06:50:22,552 DEBUG [o.a.c.e.o.VolumeOrchestrator] (Work-Job-Executor-4:ctx-48decd5c job-50/job-51 ctx-5d9c474b) (logid:8aea532a) VM [VM instance {"id":3,"instanceName":"i-2-3-VM","type":"User","uuid":"922e5659-38fe-4fb7-b10e-9df3ffadb1dc"}] does not have a preferred storage pool or it cannot be used. Volume Orchestrator will use the available Storage Pool [{"name":"test","uuid":"4fe5cbe5-ab56-37a4-8727-9d9795164ca7"}], which was discovered using Storage Pool Allocator [ZoneWideStoragePoolAllocator].
2024-01-03 06:50:22,818 DEBUG [c.c.a.t.Request] (Work-Job-Executor-4:ctx-48decd5c job-50/job-51 ctx-5d9c474b) (logid:8aea532a) Seq 1-5166191722547380285: Received: { Ans: , MgmtId: 32988620391433, via: 1(ref-trl-6114-k-M7-kiran-chavala-kvm1), Ver: v1, Flags: 10, { AttachAnswer } }
2024-01-03 06:50:22,827 DEBUG [c.c.s.VolumeApiServiceImpl] (Work-Job-Executor-4:ctx-48decd5c job-50/job-51 ctx-5d9c474b) (logid:8aea532a) Volume: vb successfully attached to VM: i-2-3-VM
2024-01-03 06:50:22,834 DEBUG [c.c.v.VmWorkJobHandlerProxy] (Work-Job-Executor-4:ctx-48decd5c job-50/job-51 ctx-5d9c474b) (logid:8aea532a) Done executing VM work job: com.cloud.vm.VmWorkAttachVolume{"volumeId":8,"userId":2,"accountId":2,"vmId":3,"handlerName":"VolumeApiServiceImpl"}
cc @shwstppr this is a bug fix should we merge it, or is main frozen for others? |
@rohityadavcloud a [SL] Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests |
[SF] Trillian test result (tid-8717)
|
@rohityadavcloud it can be merged if it is ready. We will need to cut a new RC so we can accept bug fixes now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code LGTM
Packages are 2 weeks old, returning more errors - |
@shwstppr a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 8195 |
@blueorangutan test |
@sureshanaparti a [SL] Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests |
[SF] Trillian test result (tid-8722)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Looks good to include in 4.19RC2. Moving to the milestone and merging |
…ing vm / attaching volume (apache#8373) This PR fixes reorder/list pools when cluster details are not set, while deploying vm / attaching volume. Problem: Attach volume to a VM fails, on infra with zone-wide pools & vm.allocation.algorithm=userdispersing as the cluster details are not set (passed as null) while reordering / listing pools by volumes. Solution: Ignore cluster details when not set, while reordering / listing pools by volumes.
Description
This PR fixes reorder/list pools when cluster details are not set, while deploying vm / attaching volume.
Problem:
Attach volume to a VM fails, on infra with zone-wide pools & vm.allocation.algorithm=userdispersing as the cluster details are not set (passed as null) while reordering / listing pools by volumes.
Solution:
Ignore cluster details when not set, while reordering / listing pools by volumes.
Types of changes
Feature/Enhancement Scale or Bug Severity
Feature/Enhancement Scale
Bug Severity
Screenshots (if appropriate):
How Has This Been Tested?
Manually tested Deploy VM, and Attach Volume operations on infra with zone-wide pools & vm.allocation.algorithm=userdispersing
How did you try to break this feature and the system with this change?