Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix deploy as is VM start after template deletion #8115

Merged
merged 3 commits into from Nov 14, 2023

Conversation

harikrishna-patnala
Copy link
Contributor

@harikrishna-patnala harikrishna-patnala commented Oct 18, 2023

Description

This PR fixes the issue #7970

Fix is to keep the template deploy as is details even after the template is deleted. VM start need to know whether the vapp property is password or not which are saved in template_deploy_as_is_details table.

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)
  • build/CI

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

How Has This Been Tested?

  1. Registered a VMware template with "Read VM settings from OVA" enabled
  2. Created a VM with it
  3. Force delete the template
  4. Stop and start the VM => succeeded

@harikrishna-patnala harikrishna-patnala added this to the 4.18.2.0 milestone Oct 18, 2023
@harikrishna-patnala
Copy link
Contributor Author

@blueorangutan package

@blueorangutan
Copy link

@harikrishna-patnala a [SF] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

Copy link
Member

@weizhouapache weizhouapache left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code lgtm

I am fine with this change, although this will lead to some records in the database.
I think this is safest way to do, unless we can 100% figure out what details needs to be kept after template is removed.

We probably need to look into the template details as well

            // Remove template details
            templateDetailsDao.removeDetails(template.getId());

@DaanHoogland
Copy link
Contributor

I think the best way is to copy template details to the ivm_instance_details on deploy

@weizhouapache
Copy link
Member

I think the best way is to copy template details to the ivm_instance_details on deploy

@DaanHoogland
it may require some code changes (e.g. read property in vm deploy-as-is details instead of template deploy-as-is details )

@DaanHoogland
Copy link
Contributor

I think the best way is to copy template details to the ivm_instance_details on deploy

@DaanHoogland it may require some code changes (e.g. read property in vm deploy-as-is details instead of template deploy-as-is details )

yes, but that would solve the DB remnance. I am not demanding we change this, just suggesting what could be a better design choice.

@harikrishna-patnala
Copy link
Contributor Author

template_deploy_as_is_details looks like this
image

The corresponding values in user_vm_deploy_as_is_details looks like this
image

So I dont think we can directly copy the template_deploy_as_is_details to user_vm_deploy_as_is_details as it is.
If we have to do that, we need a logic to differentiate the properties entries of templates and their values in the same table "user_vm_deploy_as_is_details", which I'm not sure if it is possible. May be @nvazquez can help here.

Currently we are parsing everything from user_vm_deploy_as_is_details table and passing them to the resource.
https://github.com/shapeblue/cloudstack/blob/4a0636374931e4e6b562def76acb9f8673a7f22a/engine/storage/src/main/java/org/apache/cloudstack/storage/image/deployasis/DeployAsIsHelperImpl.java#L317-L324

To simplify all these, I thought it will be better to keep the template_deploy_as_is_details even after deletion.
Or else like I mentioned in the issue ticket, we need another column in user_vm_deploy_as_is details table which is "is_password", because thats the only field we are looking in template details table.

cc @DaanHoogland @weizhouapache

@harikrishna-patnala
Copy link
Contributor Author

@blueorangutan package

@blueorangutan
Copy link

@harikrishna-patnala a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 7443

@codecov
Copy link

codecov bot commented Oct 20, 2023

Codecov Report

Merging #8115 (482a80b) into 4.18 (29c7b31) will increase coverage by 0.04%.
Report is 91 commits behind head on 4.18.
The diff coverage is 18.80%.

@@             Coverage Diff              @@
##               4.18    #8115      +/-   ##
============================================
+ Coverage     13.02%   13.06%   +0.04%     
- Complexity     9032     9108      +76     
============================================
  Files          2720     2720              
  Lines        257080   257547     +467     
  Branches      40088    40152      +64     
============================================
+ Hits          33476    33655     +179     
- Misses       219400   219664     +264     
- Partials       4204     4228      +24     
Files Coverage Δ
...hestration/service/VolumeOrchestrationService.java 100.00% <ø> (ø)
.../main/java/com/cloud/network/IpAddressManager.java 100.00% <100.00%> (ø)
...ava/com/cloud/network/as/AutoScaleVmProfileVO.java 80.20% <100.00%> (+11.66%) ⬆️
...java/com/cloud/upgrade/DatabaseUpgradeChecker.java 40.89% <100.00%> (+0.64%) ⬆️
...va/com/cloud/upgrade/DatabaseVersionHierarchy.java 85.10% <100.00%> (+1.01%) ⬆️
.../storage/vmsnapshot/StorageVMSnapshotStrategy.java 25.39% <ø> (+0.95%) ⬆️
.../api/command/admin/ratelimit/ResetApiLimitCmd.java 0.00% <ø> (ø)
...oud/hypervisor/kvm/resource/LibvirtConnection.java 0.00% <ø> (ø)
.../hypervisor/kvm/storage/ScaleIOStorageAdaptor.java 10.44% <100.00%> (ø)
...ava/com/cloud/api/commands/StopNetScalerVMCmd.java 0.00% <ø> (ø)
... and 62 more

... and 8 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Contributor

@DaanHoogland DaanHoogland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some style critique but clgtm

@DaanHoogland
Copy link
Contributor

@blueorangutan test

@blueorangutan
Copy link

@DaanHoogland a [SL] Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@blueorangutan
Copy link

[SF] Trillian test result (tid-8040)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 39447 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr8115-t8040-kvm-centos7.zip
Smoke tests completed. 108 look OK, 0 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File

Copy link
Contributor

@shwstppr shwstppr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code LGTM

@shwstppr
Copy link
Contributor

@harikrishna-patnala are you making any more changes for review comments?

@harikrishna-patnala
Copy link
Contributor Author

@blueorangutan package

@blueorangutan
Copy link

@harikrishna-patnala a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@rohityadavcloud
Copy link
Member

@blueorangutan package

@blueorangutan
Copy link

@rohityadavcloud a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@DaanHoogland
Copy link
Contributor

@blueorangutan test rocky8 xenserver-71

@blueorangutan
Copy link

@DaanHoogland a [SL] Trillian-Jenkins test job (rocky8 mgmt + xenserver-71) has been kicked to run smoke tests

@DaanHoogland
Copy link
Contributor

@blueorangutan test alma9 vmware-70u3

@blueorangutan
Copy link

@DaanHoogland a [SL] Trillian-Jenkins test job (alma9 mgmt + vmware-70u3) has been kicked to run smoke tests

@blueorangutan
Copy link

[SF] Trillian test result (tid-8148)
Environment: xenserver-71 (x2), Advanced Networking with Mgmt server r8
Total time taken: 40361 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr8115-t8148-xenserver-71.zip
Smoke tests completed. 102 look OK, 7 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_03_ping_in_ssvm_success Error 1.13 test_diagnostics.py
test_04_ping_in_ssvm_failure Error 1.11 test_diagnostics.py
test_08_arping_in_ssvm Error 1.13 test_diagnostics.py
test_09_arping_in_cpvm Error 1.11 test_diagnostics.py
test_11_traceroute_in_ssvm Error 1.11 test_diagnostics.py
test_15_retrieve_ssvm_default_files Error 1.12 test_diagnostics.py
test_16_retrieve_ssvm_single_file Error 1.12 test_diagnostics.py
test_01_add_primary_storage_disabled_host Error 53.29 test_primary_storage.py
test_01_non_strict_host_anti_affinity Failure 142.91 test_nonstrict_affinity_group.py
test_02_non_strict_host_affinity Error 103.00 test_nonstrict_affinity_group.py
test_02_upgrade_kubernetes_cluster Failure 579.93 test_kubernetes_clusters.py
test_03_deploy_and_scale_kubernetes_cluster Failure 268.03 test_kubernetes_clusters.py
test_07_deploy_kubernetes_ha_cluster Failure 57.76 test_kubernetes_clusters.py
test_08_upgrade_kubernetes_ha_cluster Failure 47.29 test_kubernetes_clusters.py
test_09_delete_kubernetes_ha_cluster Failure 42.16 test_kubernetes_clusters.py
test_10_vpc_tier_kubernetes_cluster Failure 55.16 test_kubernetes_clusters.py
test_10_vpc_tier_kubernetes_cluster Error 55.17 test_kubernetes_clusters.py
ContextSuite context=TestKubernetesCluster>:teardown Error 128.26 test_kubernetes_clusters.py
test_01_sys_vm_start Failure 0.09 test_secondary_storage.py
test_01_deploy_vm_on_specific_host Error 0.09 test_vm_deployment_planner.py
test_02_deploy_vm_on_specific_cluster Error 1.32 test_vm_deployment_planner.py
test_03_deploy_vm_on_specific_pod Error 2.33 test_vm_deployment_planner.py
test_04_deploy_vm_on_host_override_pod_and_cluster Error 0.12 test_vm_deployment_planner.py
test_05_deploy_vm_on_cluster_override_pod Error 2.32 test_vm_deployment_planner.py
test_08_migrate_vm Error 0.07 test_vm_life_cycle.py

@blueorangutan
Copy link

[SF] Trillian Build Failed (tid-8277)

Copy link
Collaborator

@rajujith rajujith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Instance starts fine with this fix although the template (vAPP) is deleted, it was failing earlier. Also observed is that the data is retained in the table template_deploy_as_is_details even after the template deletion as required by this fix. As mentioned already here this has side effects.

@shwstppr
Copy link
Contributor

@blueorangutan package

@blueorangutan
Copy link

@shwstppr a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✖️ el7 ✖️ el8 ✖️ el9 ✖️ debian ✖️ suse15. SL-JID 7702

@shwstppr
Copy link
Contributor

@blueorangutan package

@blueorangutan
Copy link

@shwstppr a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 7719

@DaanHoogland
Copy link
Contributor

@blueorangutan test alma9 vmware-70u3

@blueorangutan
Copy link

@DaanHoogland a [SL] Trillian-Jenkins test job (alma9 mgmt + vmware-70u3) has been kicked to run smoke tests

Copy link
Member

@weizhouapache weizhouapache left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code lgtm

@blueorangutan
Copy link

[SF] Trillian test result (tid-8308)
Environment: vmware-70u3 (x2), Advanced Networking with Mgmt server a9
Total time taken: 46304 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr8115-t8308-vmware-70u3.zip
Smoke tests completed. 103 look OK, 6 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_01_add_primary_storage_disabled_host Error 35.79 test_primary_storage.py
test_01_non_strict_host_anti_affinity Failure 149.25 test_nonstrict_affinity_group.py
test_02_non_strict_host_affinity Error 103.05 test_nonstrict_affinity_group.py
test_02_upgrade_kubernetes_cluster Failure 582.53 test_kubernetes_clusters.py
test_03_deploy_and_scale_kubernetes_cluster Failure 291.50 test_kubernetes_clusters.py
test_07_deploy_kubernetes_ha_cluster Failure 92.41 test_kubernetes_clusters.py
test_08_upgrade_kubernetes_ha_cluster Failure 57.53 test_kubernetes_clusters.py
test_09_delete_kubernetes_ha_cluster Failure 57.53 test_kubernetes_clusters.py
test_10_vpc_tier_kubernetes_cluster Error 3.48 test_kubernetes_clusters.py
ContextSuite context=TestKubernetesCluster>:teardown Error 82.69 test_kubernetes_clusters.py
test_01_sys_vm_start Failure 0.09 test_secondary_storage.py
test_01_deploy_vm_on_specific_host Error 0.08 test_vm_deployment_planner.py
test_04_deploy_vm_on_host_override_pod_and_cluster Error 0.11 test_vm_deployment_planner.py
test_03_live_migrate_VM_with_two_data_disks Error 65.96 test_vm_life_cycle.py
test_08_migrate_vm Error 0.06 test_vm_life_cycle.py

@harikrishna-patnala
Copy link
Contributor Author

can this PR be merged, I see the above failures on other PRs also

@DaanHoogland
Copy link
Contributor

can this PR be merged, I see the above failures on other PRs also

yes, we shouldn't ignore them, but I am going through them and will add any cleanup or added robustness in #8196

@DaanHoogland DaanHoogland merged commit b7835d0 into apache:4.18 Nov 14, 2023
27 checks passed
@DaanHoogland DaanHoogland deleted the FixVMstartAfterTemplateDeletion branch November 14, 2023 08:32
DaanHoogland added a commit that referenced this pull request Nov 14, 2023
* 4.18:
  Fix deploy as is VM start after template deletion (#8115)
dhslove pushed a commit to ablecloud-team/ablestack-cloud that referenced this pull request Nov 16, 2023
dhslove pushed a commit to ablecloud-team/ablestack-cloud that referenced this pull request Nov 16, 2023
* 4.18:
  Fix deploy as is VM start after template deletion (apache#8115)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

After force deleting a deploy-as-is template, the VMs using it cannot start
7 participants