Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLOUDSTACK-9280: Allow system VM volumes to be expunged if no system VMs are remaining. #1406

Closed
wants to merge 1 commit into from
Closed

CLOUDSTACK-9280: Allow system VM volumes to be expunged if no system VMs are remaining. #1406

wants to merge 1 commit into from

Conversation

ProjectMoon
Copy link

This pull request is our proposed fix for https://issues.apache.org/jira/browse/CLOUDSTACK-9280. I added a new special SSVM endpoint that happily accepts any command given to it. This endpoint is used in only a very specific scenario:

The volume's VM is in state destroyed or expunging, but the volume still lingers.
The volume's VM is a system VM (SSVM or console proxy).
There are no secondary storage machines existing in the volume's zone.
This necessitated a small change to VolumeObject which allows it to find removed VMs (findByIdIncludingRemoved). The main part of the work is in the DefaultEndpointSelector.

We would like some thorough review of this PR as well as what tests to create/run. I'm not sure if the scope of this fix will lead to unintentional behavior changes in other scenarios.

@ProjectMoon ProjectMoon changed the title Allow system VM volumes to be expunged if no system VMs are remaining. CLOUDSTACK-9280: Allow system VM volumes to be expunged if no system VMs are remaining. Feb 8, 2016
@wido
Copy link
Contributor

wido commented Feb 10, 2016

How have you been able to test this one?

@ProjectMoon
Copy link
Author

Tests were performed using a running environment. Deploy a zone, then delete the hosts in the zone and make sure there are no system VMs remaining. Wait for the cleanup thread to run. Without this fix, the volumes will not be deleted and it will log errors about missing endpoints/SSVM being down. With it, the volumes will be expunged. The zone cannot be deleted otherwise.

@wido part of this pull request is to figure out what unit tests we can add. This change is quite invasive in my opinion, so we want to make sure all possible cases are covered.

(vm.getState() == State.Expunging || vm.getState() == State.Destroyed)) {

List<SecondaryStorageVmVO> ssvms = ssvmDao.listByZoneId(Role.templateProcessor, volume.getDataCenterId());
if (ssvms == null || ssvms.isEmpty()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ProjectMoon As a sujestion, this conditional (ssvms == null || ssvms.isEmpty()) can be done using the Apache CollectionUtils.isEmpty method (http://commons.apache.org/proper/commons-collections/javadocs/api-release/org/apache/commons/collections4/CollectionUtils.html#isEmpty%28java.util.Collection%29). It is a null safe verification if the List is empty.

@ProjectMoon
Copy link
Author

@GabrielBrascher added CollectionUtils.isEmpty. Any ideas on unit test-ability?

@@ -92,7 +92,7 @@ public static VolumeObject getVolumeObject(DataStore dataStore, VolumeVO volumeV
public String getAttachedVmName() {
Long vmId = volumeVO.getInstanceId();
if (vmId != null) {
VMInstanceVO vm = vmInstanceDao.findById(vmId);
VMInstanceVO vm = vmInstanceDao.findByIdIncludingRemoved(vmId);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ProjectMoon I couldn't find the "findByIdIncludingRemoved" method
(https://github.com/apache/cloudstack/blob/master/engine/schema/src/com/cloud/vm/dao/VMInstanceDao.java).

Am I missing something?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @ProjectMoon, my mistake (it is all right, the method is implemented on GenericDao).

@bvbharatk
Copy link
Contributor

ACS CI BVT Run

Sumarry:
Build Number 88
Hypervisor xenserver
NetworkType Advanced
Passed=102
Failed=14
Skipped=4

The follwing tests have known issues
integration.smoke.test_iso.TestISO.test_07_list_default_iso
integration.smoke.test_privategw_acl.py
integration.smoke.test_vm_snapshots.TestSnapshots.test_01_test_vm_volume_snapshot
integration.smoke.test_vpc_vpn.TestVpcSite2SiteVpn.test_vpc_site2site_vpn
integration.smoke.test_iso.TestISO.test_04_extract_Iso

Link to logs Folder (search by build_no): https://www.dropbox.com/sh/yj3wnzbceo9uef2/AAB6u-Iap-xztdm6jHX9SjPja?dl=0

Failed tests:

  • integration.smoke.test_vpc_vpn.TestVpcRemoteAccessVpn
    • test_vpc_remote_access_vpn
    • test_vpc_site2site_vpn
  • integration.smoke.test_vm_snapshots.TestSnapshots
    • test_01_test_vm_volume_snapshot
  • integration.smoke.test_privategw_acl.TestPrivateGwACL
    • test_02_vpc_privategw_static_routes
    • test_03_rvpc_privategw_static_routes
  • integration.smoke.test_service_offerings.TestCreateServiceOffering
    • test_04_change_offering_small
  • <nose.suite
    • ContextSuite context=TestNiciraContoller>:setup
  • integration.smoke.test_primary_storage.TestPrimaryStorageServices
    • test_01_primary_storage_iscsi
  • integration.smoke.test_internal_lb.TestInternalLb
    • test02_internallb_haproxy_stats_on_all_interfaces
    • test_01_internallb_roundrobin_1VPC_3VM_HTTP_port80
  • <nose.suite
    • ContextSuite context=TestDeployVM>:setup
  • integration.smoke.test_templates.TestCreateTemplate
    • test_04_extract_template
  • integration.smoke.test_iso.TestCreateIso
    • test_04_extract_Iso
    • test_07_list_default_iso

Skipped tests:
test_vm_nic_adapter_vmxnet3
test_deploy_vgpu_enabled_vm
test_06_copy_template
test_06_copy_iso

Passed test suits:
integration.smoke.test_deploy_vm_with_userdata.TestDeployVmWithUserData
integration.smoke.test_affinity_groups_projects.TestDeployVmWithAffinityGroup
integration.smoke.test_portable_publicip.TestPortablePublicIPAcquire
integration.smoke.test_over_provisioning.TestUpdateOverProvision
integration.smoke.test_global_settings.TestUpdateConfigWithScope
integration.smoke.test_guest_vlan_range.TestDedicateGuestVlanRange
integration.smoke.test_scale_vm.TestScaleVm
integration.smoke.test_loadbalance.TestLoadBalance
integration.smoke.test_routers.TestRouterServices
integration.smoke.test_reset_vm_on_reboot.TestResetVmOnReboot
integration.smoke.test_snapshots.TestSnapshotRootDisk
integration.smoke.test_deploy_vms_with_varied_deploymentplanners.TestDeployVmWithVariedPlanners
integration.smoke.test_network.TestDeleteAccount
integration.smoke.test_non_contigiousvlan.TestUpdatePhysicalNetwork
integration.smoke.test_deploy_vm_iso.TestDeployVMFromISO
integration.smoke.test_public_ip_range.TestDedicatePublicIPRange
integration.smoke.test_multipleips_per_nic.TestDeployVM
integration.smoke.test_regions.TestRegions
integration.smoke.test_affinity_groups.TestDeployVmWithAffinityGroup
integration.smoke.test_network_acl.TestNetworkACL
integration.smoke.test_pvlan.TestPVLAN
integration.smoke.test_volumes.TestCreateVolume
integration.smoke.test_ssvm.TestSSVMs
integration.smoke.test_nic.TestNic
integration.smoke.test_deploy_vm_root_resize.TestDeployVM
integration.smoke.test_resource_detail.TestResourceDetail
integration.smoke.test_secondary_storage.TestSecStorageServices
integration.smoke.test_disk_offerings.TestCreateDiskOffering

@ProjectMoon
Copy link
Author

Is this output from the new automated system, or were these manually run? At first glance some of the failed tests don't seem related to this change (some definitely do seem related), though since the change is quite invasive, I suppose anything is possible.

This commit adds a special SSVM endpoint which simply returns true for
all operations sent to it, without actually doing anything. This
allows for destroyed volumes of system VMs to be expunged when there
are no hosts (and thus no system VMs) remaining to handle the volume
destruction.
@swill
Copy link
Contributor

swill commented Apr 13, 2016

another one against 4.6. how should we be handling this?

@ProjectMoon
Copy link
Author

The current workflow, as I understand it, is that the oldest supported release branch is for bug fixes (which are then forward-merged), and any new features go on master.

@swill
Copy link
Contributor

swill commented Apr 13, 2016

Yes exactly, but the oldest supported branch right now is 4.7.

@rohityadavcloud
Copy link
Member

@ProjectMoon open PR against 4.7, this enhancement is feature-ish; so I would say open against master. Thanks

@swill
Copy link
Contributor

swill commented May 22, 2016

@ProjectMoon can you reopen this against at least 4.7, maybe even master. Thanks...

@ProjectMoon
Copy link
Author

Have opened #1559 against master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants