Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix UUID for child datastores in all cases #8057

Merged
merged 1 commit into from Oct 18, 2023

Conversation

harikrishna-patnala
Copy link
Contributor

@harikrishna-patnala harikrishna-patnala commented Oct 9, 2023

Description

This PR fixes the issue #7999

While putting the storage pool datastore cluster in maintenance mode, then there are chances that the cloud.uuid gets updated with UUID without hyphens ('-') which causes issue with sync storage pool.

In my case this is happening if there any hosts in the clusters which does not have access to the storage pool.

In this PR we are making sure that the UUID does not change to uuid without hyphens

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)
  • build/CI

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

Screenshots (if appropriate):

Before and after putting the storage pool in maintenance mode, cloud.uuid has not changed
image

Before the fix, this UUID has changed to UUID without hyphens

How Has This Been Tested?

  1. Prepare a datastore cluster with 2 child datastores
  2. Add datastore cluster as primary storage in CloudStack
  3. Put the datastore cluster in maintenance mode or restart management server (as part of the fix, make sure the cloud.uuid in child storage pool does not. UUID has to be with hyphens)
  4. Add or remove new child datastore in vCenter
  5. Try sync storage pool operation on the datastore cluster => succeeded.

How did you try to break this feature and the system with this change?

@harikrishna-patnala
Copy link
Contributor Author

@blueorangutan package

@harikrishna-patnala harikrishna-patnala added this to the 4.18.2.0 milestone Oct 9, 2023
@blueorangutan
Copy link

@harikrishna-patnala a [SF] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 7277

@codecov
Copy link

codecov bot commented Oct 9, 2023

Codecov Report

Merging #8057 (d58c14d) into 4.18 (29c7b31) will increase coverage by 0.04%.
Report is 71 commits behind head on 4.18.
The diff coverage is 18.27%.

@@             Coverage Diff              @@
##               4.18    #8057      +/-   ##
============================================
+ Coverage     13.02%   13.06%   +0.04%     
- Complexity     9032     9108      +76     
============================================
  Files          2720     2720              
  Lines        257080   257537     +457     
  Branches      40088    40156      +68     
============================================
+ Hits          33476    33658     +182     
- Misses       219400   219649     +249     
- Partials       4204     4230      +26     
Files Coverage Δ
...hestration/service/VolumeOrchestrationService.java 100.00% <ø> (ø)
.../main/java/com/cloud/network/IpAddressManager.java 100.00% <100.00%> (ø)
...ava/com/cloud/network/as/AutoScaleVmProfileVO.java 80.20% <100.00%> (+11.66%) ⬆️
...java/com/cloud/upgrade/DatabaseUpgradeChecker.java 40.89% <100.00%> (+0.64%) ⬆️
...va/com/cloud/upgrade/DatabaseVersionHierarchy.java 85.10% <100.00%> (+1.01%) ⬆️
.../api/command/admin/ratelimit/ResetApiLimitCmd.java 0.00% <ø> (ø)
...oud/hypervisor/kvm/resource/LibvirtConnection.java 0.00% <ø> (ø)
.../hypervisor/kvm/storage/ScaleIOStorageAdaptor.java 10.44% <100.00%> (ø)
...ava/com/cloud/api/commands/StopNetScalerVMCmd.java 0.00% <ø> (ø)
...tungsten/api/command/ListTungstenFabricTagCmd.java 0.00% <ø> (ø)
... and 58 more

... and 7 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Contributor

@DaanHoogland DaanHoogland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clgtm

Copy link
Member

@rohityadavcloud rohityadavcloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - didn't test it though

@rohityadavcloud
Copy link
Member

@blueorangutan alma8 vmware-70u3

@DaanHoogland
Copy link
Contributor

@blueorangutan alma8 vmware-70u3

yeah, i make that mistake all the time

@DaanHoogland
Copy link
Contributor

@blueorangutan test alma8 vmware-70u3

@blueorangutan
Copy link

@DaanHoogland a [SF] Trillian-Jenkins test job (alma8 mgmt + vmware-70u3) has been kicked to run smoke tests

@blueorangutan
Copy link

[SF] Trillian test result (tid-7905)
Environment: vmware-70u3 (x2), Advanced Networking with Mgmt server a8
Total time taken: 56705 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr8057-t7905-vmware-70u3.zip
Smoke tests completed. 107 look OK, 1 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_01_deploy_vm_on_specific_host Error 3603.23 test_vm_deployment_planner.py
test_02_deploy_vm_on_specific_cluster Error 1.28 test_vm_deployment_planner.py
test_03_deploy_vm_on_specific_pod Error 2.33 test_vm_deployment_planner.py
test_04_deploy_vm_on_host_override_pod_and_cluster Error 2.35 test_vm_deployment_planner.py
test_05_deploy_vm_on_cluster_override_pod Error 1.29 test_vm_deployment_planner.py

@DaanHoogland
Copy link
Contributor

@blueorangutan test rocky8 vmware-67u3

@blueorangutan
Copy link

@DaanHoogland a [SF] Trillian-Jenkins test job (rocky8 mgmt + vmware-67u3) has been kicked to run smoke tests

@blueorangutan
Copy link

[SF] Trillian test result (tid-7914)
Environment: vmware-67u3 (x2), Advanced Networking with Mgmt server r8
Total time taken: 146853 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr8057-t7914-vmware-67u3.zip
Smoke tests completed. 104 look OK, 4 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_02_list_cpvm_vm Failure 0.04 test_ssvm.py
test_04_cpvm_internals Failure 0.04 test_ssvm.py
test_06_stop_cpvm Failure 0.04 test_ssvm.py
test_07_reboot_ssvm Failure 100.18 test_ssvm.py
test_11_destroy_ssvm Failure 920.72 test_ssvm.py
test_08_arping_in_ssvm Failure 5.23 test_diagnostics.py
test_01_invalid_upgrade_kubernetes_cluster Failure 3608.63 test_kubernetes_clusters.py
test_02_upgrade_kubernetes_cluster Failure 3611.06 test_kubernetes_clusters.py
test_03_deploy_and_scale_kubernetes_cluster Failure 0.07 test_kubernetes_clusters.py
test_04_autoscale_kubernetes_cluster Failure 0.06 test_kubernetes_clusters.py
test_05_basic_lifecycle_kubernetes_cluster Failure 0.05 test_kubernetes_clusters.py
test_06_delete_kubernetes_cluster Failure 0.05 test_kubernetes_clusters.py
test_07_deploy_kubernetes_ha_cluster Failure 0.05 test_kubernetes_clusters.py
test_08_upgrade_kubernetes_ha_cluster Failure 0.05 test_kubernetes_clusters.py
test_09_delete_kubernetes_ha_cluster Failure 0.06 test_kubernetes_clusters.py
test_10_vpc_tier_kubernetes_cluster Failure 1081.21 test_kubernetes_clusters.py
ContextSuite context=TestKubernetesCluster>:teardown Error 1175.93 test_kubernetes_clusters.py
test_01_scale_up_verify Failure 576.63 test_vm_autoscaling.py

@harikrishna-patnala
Copy link
Contributor Author

@blueorangutan test rocky8 vmware-67u3

@blueorangutan
Copy link

@harikrishna-patnala a [SF] Trillian-Jenkins test job (rocky8 mgmt + vmware-67u3) has been kicked to run smoke tests

@blueorangutan
Copy link

[SF] Trillian test result (tid-7951)
Environment: vmware-67u3 (x2), Advanced Networking with Mgmt server r8
Total time taken: 61003 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr8057-t7951-vmware-67u3.zip
Smoke tests completed. 107 look OK, 1 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_01_deploy_vm_on_specific_host Error 12.59 test_vm_deployment_planner.py
test_02_deploy_vm_on_specific_cluster Error 3601.71 test_vm_deployment_planner.py
test_03_deploy_vm_on_specific_pod Error 1.36 test_vm_deployment_planner.py
test_04_deploy_vm_on_host_override_pod_and_cluster Error 2.37 test_vm_deployment_planner.py
test_05_deploy_vm_on_cluster_override_pod Error 1.33 test_vm_deployment_planner.py

@rohityadavcloud
Copy link
Member

@harikrishna-patnala can you review the failures, are they due to this PR - or can we merge this PR?

@harikrishna-patnala
Copy link
Contributor Author

This can be merged @rohityadavcloud. Those seems to intermittent failures, PR changes are purely related to datastore cluster

@rohityadavcloud rohityadavcloud merged commit 76ab621 into apache:4.18 Oct 18, 2023
47 of 50 checks passed
@rohityadavcloud rohityadavcloud deleted the FixDatastoreClusterUUID branch October 18, 2023 07:30
DaanHoogland added a commit that referenced this pull request Oct 18, 2023
* 4.18:
  Fix UUID for child datastores in all cases (#8057)
shwstppr pushed a commit to shapeblue/cloudstack that referenced this pull request Dec 27, 2023
(cherry picked from commit 76ab621)
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants