Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add lock mechanism considering template id, pool id, host id in PowerFlex Storage #8233

Conversation

harikrishna-patnala
Copy link
Contributor

@harikrishna-patnala harikrishna-patnala commented Nov 14, 2023

Description

Observed a failure to start new virtual machine with PowerFlex storage. Traced it to concurrent VM starts using the same template and the same host to copy. Second mapping attempt failed.

While creating the volume clone from the seeded template in primary storage, adding a lock with the string containing IDs of template, storage pool and destination host avoids the situation of concurrent mapping attempts with the same host.

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)
  • build/CI

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • Major
  • Minor

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

@harikrishna-patnala
Copy link
Contributor Author

@blueorangutan package

@harikrishna-patnala harikrishna-patnala changed the title Add lock mechanism considering template id, pool id, host id Add lock mechanism considering template id, pool id, host id in PowerFlex Storage Nov 14, 2023
Copy link
Contributor

@DaanHoogland DaanHoogland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clgtm

@DaanHoogland
Copy link
Contributor

@harikrishna-patnala can you look at the locking used?

@harikrishna-patnala harikrishna-patnala force-pushed the fixPowerflexSDCfailureWhenMultipleTemplateCopies branch from 0d4d51a to e2fc4ff Compare November 20, 2023 03:19
@harikrishna-patnala
Copy link
Contributor Author

@blueorangutan package

Copy link

codecov bot commented Nov 20, 2023

Codecov Report

Attention: 17 lines in your changes are missing coverage. Please review.

Comparison is base (1b56a8e) 13.10% compared to head (e2fc4ff) 13.10%.

Files Patch % Lines
...e/cloudstack/storage/volume/VolumeServiceImpl.java 0.00% 17 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               4.18    #8233      +/-   ##
============================================
- Coverage     13.10%   13.10%   -0.01%     
  Complexity     9122     9122              
============================================
  Files          2720     2720              
  Lines        257638   257653      +15     
  Branches      40168    40171       +3     
============================================
  Hits          33753    33753              
- Misses       219619   219634      +15     
  Partials       4266     4266              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@harikrishna-patnala
Copy link
Contributor Author

@blueorangutan package

@blueorangutan
Copy link

@harikrishna-patnala a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 7800

@rohityadavcloud rohityadavcloud added this to the 4.18.2.0 milestone Nov 20, 2023
@rohityadavcloud
Copy link
Member

@blueorangutan test

@blueorangutan
Copy link

@rohityadavcloud a [SL] Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@blueorangutan
Copy link

[SF] Trillian test result (tid-8374)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 39367 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr8233-t8374-kvm-centos7.zip
Smoke tests completed. 108 look OK, 1 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_02_list_cpvm_vm Failure 0.04 test_ssvm.py
test_04_cpvm_internals Failure 0.04 test_ssvm.py

@harikrishna-patnala
Copy link
Contributor Author

@blueorangutan package

@blueorangutan
Copy link

@harikrishna-patnala a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 7923

@borisstoyanov
Copy link
Contributor

@blueorangutan test

1 similar comment
@rohityadavcloud
Copy link
Member

@blueorangutan test

@blueorangutan
Copy link

@rohityadavcloud a [SL] Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

Copy link
Contributor

@borisstoyanov borisstoyanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, lets wait for latest test and I guess we could merge this

@blueorangutan
Copy link

[SF] Trillian test result (tid-8480)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 44561 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr8233-t8480-kvm-centos7.zip
Smoke tests completed. 107 look OK, 2 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_08_arping_in_ssvm Failure 5.19 test_diagnostics.py
test_01_redundant_vpc_site2site_vpn Failure 382.21 test_vpc_vpn.py

@rohityadavcloud
Copy link
Member

@blueorangutan test

@blueorangutan
Copy link

@rohityadavcloud a [SL] Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@DaanHoogland
Copy link
Contributor

@blueorangutan test

I don't think re-running the tests without rebuild will add value here.

@JoaoJandre are your conserns met?

@blueorangutan
Copy link

@DaanHoogland [SL] unsupported parameters provided. Supported mgmt server os are: centos7, centos6, suse15, alma8, ubuntu18, ubuntu22, ubuntu20, rocky8, alma9. Supported hypervisors are: kvm-centos6, kvm-centos7, kvm-rocky8, kvm-alma8, kvm-alma9, kvm-ubuntu18, kvm-ubuntu20, kvm-ubuntu22, kvm-suse15, vmware-55u3, vmware-60u2, vmware-65u2, vmware-67u3, vmware-70u1, vmware-70u2, vmware-70u3, vmware-80, vmware-80u1, xenserver-65sp1, xenserver-71, xenserver-74, xcpng74, xcpng76, xcpng80, xcpng81, xcpng82

@harikrishna-patnala
Copy link
Contributor Author

@DaanHoogland , just to answer for question from my side, @JoaoJandre mentioned about missing lock() method call, which I've added already.

Copy link
Contributor

@JoaoJandre JoaoJandre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CLGTM, didn't test it.

@blueorangutan
Copy link

[SF] Trillian test result (tid-8496)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 43093 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr8233-t8496-kvm-centos7.zip
Smoke tests completed. 107 look OK, 2 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_08_migrate_vm Error 45.92 test_vm_life_cycle.py
test_hostha_enable_ha_when_host_in_maintenance Error 302.80 test_hostha_kvm.py

@rohityadavcloud
Copy link
Member

Last two smoketests rounds had different intermittent failures, LGTM, merging this.

@rohityadavcloud rohityadavcloud merged commit 7eb3636 into apache:4.18 Dec 8, 2023
25 of 27 checks passed
@rohityadavcloud rohityadavcloud deleted the fixPowerflexSDCfailureWhenMultipleTemplateCopies branch December 8, 2023 07:51
dhslove pushed a commit to ablecloud-team/ablestack-cloud that referenced this pull request Dec 11, 2023
…Flex Storage (apache#8233)

Observed a failure to start new virtual machine with PowerFlex storage. Traced it to concurrent VM starts using the same template and the same host to copy. Second mapping attempt failed.

While creating the volume clone from the seeded template in primary storage, adding a lock with the string containing IDs of template, storage pool and destination host avoids the situation of concurrent mapping attempts with the same host.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants