Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLOUDSTACK-10004 : On deletion, Vmware volume snapshots are left behind with message 'the snapshot has child, can't delete it on the storage' #2188

Merged
merged 1 commit into from Sep 1, 2017

Conversation

niteshsarda
Copy link
Contributor

Snapshots are not deleted resulting unexpected storage consumption in case of VMware.

Steps to reproduce this issue :

  1. In VMware setup, create a snapshot of volume say Snap1.
  2. After successful creation of snapshot Snap1, create new snapshot of same volume say Snap2.snapshots
  3. While Snap2 is in BackingUp state, delete Snap1.
  4. Snap1 will disappear from Web UI, but when we check secondary storage, files associated with Snap1 still persists even after cleanup job is performed.
  5. In snapshot_store_ref table in DB, Snap1 will be in ready state instead of Destroyed.
  6. Also, in snapshots table, status of Snap1 will be Destroyed but removed column will be null and will never change to the date of snapshot removal.

Fix for this issue :

  1. In VMware, snapshot chain is not maintained, instead full snapshot is taken every time.
  2. So, it makes sense not to assign parent snapshot id for the snapshot. In this way, every snapshot will be individual and can be deleted successfully whenever required.

Screenshot of DB before applying fix :
db screenshot before fix

Screenshot of DB after applying fix :
db screenshot after fix

@SudharmaJain
Copy link
Contributor

@niteshsarda Please check travis failure.

@@ -171,7 +171,10 @@ public DataObject create(DataObject obj, DataStore dataStore) {
ss.setVolumeId(snapshot.getVolumeId());
SnapshotDataStoreVO snapshotDataStoreVO = snapshotDataStoreDao.findParent(dataStore.getRole(), dataStore.getId(), snapshot.getVolumeId());
if (snapshotDataStoreVO != null) {
ss.setParentSnapshotId(snapshotDataStoreVO.getSnapshotId());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@niteshsarda findParent method is not giving the correct result. Lets fix that rather than taking a decision later. Also we should not hardcode it as vmware, better to use hypervisorType enum. Differential snapshots are only applicable for XenServer, should use that not vmware.

@niteshsarda
Copy link
Contributor Author

@SudharmaJain : I have implemented changes as per your suggestion. Please check new fix.

@SudharmaJain
Copy link
Contributor

@niteshsarda we are making multiple queries on the snapshot_store_ref table. I think we can handle this with single query. Also SearchBuilder can be used in place of prepare statement.

@niteshsarda
Copy link
Contributor Author

@SudharmaJain : I have removed multiple queries call and also implemented SearchBuilder method to fetch details. Please review latest code.

@niteshsarda
Copy link
Contributor Author

@SudharmaJain : Travis failure was intermittent. After doing latest push, all checks are passing. Can you please review latest code.

@SudharmaJain
Copy link
Contributor

LGTM code changes.

@SowjanyaPatha
Copy link
Contributor

LGTM for Testing.
2188-after

@niteshsarda
Copy link
Contributor Author

niteshsarda commented Aug 17, 2017

tag:mergeready

Copy link
Member

@sateesh-chodapuneedi sateesh-chodapuneedi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, minor change might be needed w.r.t. filtering by hypervisor type requiring snapshot dependencies.

sc.setParameters("store_role", role.toString());
sc.setParameters("state", ObjectInDataStoreStateMachine.State.Ready.name());
sc.setParameters("store_id", storeId);
sc.setJoinParameters("snapshotVOSearch", "hypervisorType", Hypervisor.HypervisorType.XenServer);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@niteshsarda Can you confirm if XenServer is the only supported hypervisor with valid parent snapshot ID for a given snapshot ID?
Instead of hardcoding the hypervisor type, could we fetch hypervisor type categorically?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sateesh-chodapuneedi : Verified that parent snapshot ID is only required in case of XenServer.
Also, as per your suggestion changed the code to fetch hypervisor type categorically.

Please review latest code.

…nd with message 'the snapshot has child, can't delete it on the storage'
@rohityadavcloud
Copy link
Member

@blueorangutan package

@blueorangutan
Copy link

@rhtyd a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✔centos6 ✔centos7 ✖debian. JID-1056

@rohityadavcloud
Copy link
Member

@blueorangutan test centos7 vmware-55u3

@blueorangutan
Copy link

@rhtyd a Trillian-Jenkins test job (centos7 mgmt + vmware-55u3) has been kicked to run smoke tests

@cloudmonger
Copy link

ACS CI BVT Run

Sumarry:
Build Number 1173
Hypervisor xenserver
NetworkType Advanced
Passed=115
Failed=6
Skipped=40

Link to logs Folder (search by build_no): https://www.dropbox.com/sh/r2si930m8xxzavs/AAAzNrnoF1fC3auFrvsKo_8-a?dl=0

Failed tests:

  • test_router_dnsservice.py

  • test_router_dns_guestipquery Failed

  • test_volumes.py

  • test_06_download_detached_volume Failing since 2 runs

  • test_routers_network_ops.py

  • test_01_isolate_network_FW_PF_default_routes_egress_true Failing since 37 runs

  • test_02_isolate_network_FW_PF_default_routes_egress_false Failing since 164 runs

  • test_01_RVR_Network_FW_PF_SSH_default_routes_egress_true Failing since 159 runs

  • test_02_RVR_Network_FW_PF_SSH_default_routes_egress_false Failing since 159 runs

Skipped tests:
test_vm_nic_adapter_vmxnet3
test_01_verify_libvirt
test_02_verify_libvirt_after_restart
test_03_verify_libvirt_attach_disk
test_04_verify_guest_lspci
test_05_change_vm_ostype_restart
test_06_verify_guest_lspci_again
test_disable_oobm_ha_state_ineligible
test_ha_kvm_host_degraded
test_ha_kvm_host_fencing
test_ha_kvm_host_recovering
test_hostha_configure_default_driver
test_hostha_enable_ha_when_host_disabled
test_hostha_enable_ha_when_host_disconected
test_hostha_enable_ha_when_host_in_maintenance
test_remove_ha_provider_not_possible
test_configure_ha_provider_invalid
test_configure_ha_provider_valid
test_ha_configure_enabledisable_across_clusterzones
test_ha_disable_feature_invalid
test_ha_enable_feature_invalid
test_ha_list_providers
test_ha_multiple_mgmt_server_ownership
test_ha_verify_fsm_available
test_ha_verify_fsm_degraded
test_ha_verify_fsm_fenced
test_ha_verify_fsm_recovering
test_hostha_configure_default_driver
test_hostha_configure_invalid_provider
test_hostha_disable_feature_valid
test_hostha_enable_feature_valid
test_hostha_enable_feature_without_setting_provider
test_list_ha_for_host
test_list_ha_for_host_invalid
test_list_ha_for_host_valid
test_static_role_account_acls
test_11_ss_nfs_version_on_ssvm
test_nested_virtualization_vmware
test_3d_gpu_support
test_deploy_vgpu_enabled_vm

Passed test suits:
test_deploy_vm_with_userdata.py
test_affinity_groups_projects.py
test_portable_publicip.py
test_vm_snapshots.py
test_over_provisioning.py
test_global_settings.py
test_scale_vm.py
test_service_offerings.py
test_routers_iptables_default_policy.py
test_loadbalance.py
test_routers.py
test_reset_vm_on_reboot.py
test_deploy_vms_with_varied_deploymentplanners.py
test_network.py
test_router_dns.py
test_outofbandmanagement_nestedplugin.py
test_portforwardingrules.py
test_non_contigiousvlan.py
test_login.py
test_deploy_vm_iso.py
test_list_ids_parameter.py
test_public_ip_range.py
test_multipleips_per_nic.py
test_metrics_api.py
test_regions.py
test_affinity_groups.py
test_network_acl.py
test_pvlan.py
test_nic.py
test_deploy_vm_root_resize.py
test_resource_detail.py
test_secondary_storage.py
test_vm_life_cycle.py
test_disk_offerings.py

@blueorangutan
Copy link

Trillian test result (tid-1471)
Environment: vmware-55u3 (x2), Advanced Networking with Mgmt server 7
Total time taken: 48183 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr2188-t1471-vmware-55u3.zip
Intermitten failure detected: /marvin/tests/smoke/test_iso.py
Intermitten failure detected: /marvin/tests/smoke/test_privategw_acl.py
Intermitten failure detected: /marvin/tests/smoke/test_routers_network_ops.py
Intermitten failure detected: /marvin/tests/smoke/test_volumes.py
Intermitten failure detected: /marvin/tests/smoke/test_vpc_vpn.py
Test completed. 57 look OK, 5 have error(s)

Test Result Time (s) Test File
test_01_vpc_remote_access_vpn Failure 161.15 test_vpc_vpn.py
test_01_create_volume Failure 194.95 test_volumes.py
test_02_RVR_Network_FW_PF_SSH_default_routes_egress_false Failure 528.69 test_routers_network_ops.py
test_01_RVR_Network_FW_PF_SSH_default_routes_egress_true Failure 517.27 test_routers_network_ops.py
test_04_rvpc_privategw_static_routes Failure 832.29 test_privategw_acl.py
test_05_iso_permissions Failure 0.17 test_iso.py
test_02_edit_iso Failure 0.04 test_iso.py
test_08_resize_volume Skipped 5.09 test_volumes.py
test_07_resize_fail Skipped 10.19 test_volumes.py
test_09_copy_delete_template Skipped 0.01 test_templates.py
test_06_copy_template Skipped 0.00 test_templates.py
test_static_role_account_acls Skipped 0.01 test_staticroles.py
test_11_ss_nfs_version_on_ssvm Skipped 0.02 test_ssvm.py
test_01_scale_vm Skipped 66.46 test_scale_vm.py
test_01_primary_storage_iscsi Skipped 0.03 test_primary_storage.py
test_vm_nic_adapter_vmxnet3 Skipped 0.00 test_nic_adapter_type.py
test_06_copy_iso Skipped 0.00 test_iso.py
test_list_ha_for_host_valid Skipped 0.02 test_hostha_simulator.py
test_list_ha_for_host_invalid Skipped 0.02 test_hostha_simulator.py
test_list_ha_for_host Skipped 0.02 test_hostha_simulator.py
test_hostha_enable_feature_without_setting_provider Skipped 0.02 test_hostha_simulator.py
test_hostha_enable_feature_valid Skipped 0.02 test_hostha_simulator.py
test_hostha_disable_feature_valid Skipped 0.02 test_hostha_simulator.py
test_hostha_configure_invalid_provider Skipped 0.02 test_hostha_simulator.py
test_hostha_configure_default_driver Skipped 0.02 test_hostha_simulator.py
test_ha_verify_fsm_recovering Skipped 0.02 test_hostha_simulator.py
test_ha_verify_fsm_fenced Skipped 0.02 test_hostha_simulator.py
test_ha_verify_fsm_degraded Skipped 0.02 test_hostha_simulator.py
test_ha_verify_fsm_available Skipped 0.02 test_hostha_simulator.py
test_ha_multiple_mgmt_server_ownership Skipped 0.02 test_hostha_simulator.py
test_ha_list_providers Skipped 0.02 test_hostha_simulator.py
test_ha_enable_feature_invalid Skipped 0.02 test_hostha_simulator.py
test_ha_disable_feature_invalid Skipped 0.02 test_hostha_simulator.py
test_ha_configure_enabledisable_across_clusterzones Skipped 0.02 test_hostha_simulator.py
test_configure_ha_provider_valid Skipped 0.02 test_hostha_simulator.py
test_configure_ha_provider_invalid Skipped 0.02 test_hostha_simulator.py
test_remove_ha_provider_not_possible Skipped 0.02 test_hostha_kvm.py
test_hostha_enable_ha_when_host_in_maintenance Skipped 0.04 test_hostha_kvm.py
test_hostha_enable_ha_when_host_disconected Skipped 0.02 test_hostha_kvm.py
test_hostha_enable_ha_when_host_disabled Skipped 0.02 test_hostha_kvm.py
test_hostha_configure_default_driver Skipped 0.02 test_hostha_kvm.py
test_ha_kvm_host_recovering Skipped 0.04 test_hostha_kvm.py
test_ha_kvm_host_fencing Skipped 0.02 test_hostha_kvm.py
test_ha_kvm_host_degraded Skipped 0.02 test_hostha_kvm.py
test_disable_oobm_ha_state_ineligible Skipped 0.03 test_hostha_kvm.py
test_06_verify_guest_lspci_again Skipped 0.00 test_deploy_virtio_scsi_vm.py
test_05_change_vm_ostype_restart Skipped 0.00 test_deploy_virtio_scsi_vm.py
test_04_verify_guest_lspci Skipped 0.00 test_deploy_virtio_scsi_vm.py
test_03_verify_libvirt_attach_disk Skipped 0.00 test_deploy_virtio_scsi_vm.py
test_02_verify_libvirt_after_restart Skipped 0.00 test_deploy_virtio_scsi_vm.py
test_01_verify_libvirt Skipped 0.00 test_deploy_virtio_scsi_vm.py
test_deploy_vgpu_enabled_vm Skipped 0.75 test_deploy_vgpu_enabled_vm.py

@rohityadavcloud
Copy link
Member

LGTM, failures are known issues.

@rohityadavcloud rohityadavcloud merged commit 74fe9e3 into apache:master Sep 1, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants