Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent starting a VM in destroyed state (or any state but Stopped) #5165

Merged
merged 3 commits into from
Jul 9, 2021

Conversation

Pearl1594
Copy link
Contributor

Description

A destroyed VM when attempted to be started (using the bulk action support in the UI), results in a successful operation, however, the VM doesn't actually start. This PR fixes the false positive result shown in the above mentioned condition

image

VM continues to remain in destroyed state - but a success response is returned

image

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)

Feature/Enhancement Scale or Bug Severity

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

Screenshots (if appropriate):

How Has This Been Tested?

Tried starting an instance in destroyed state:

2021-06-29 12:57:51,609 WARN  [c.c.v.VirtualMachineManagerImpl] (Work-Job-Executor-5:ctx-da4601fa job-52/job-54 ctx-cc08e96e) (logid:08e57fcb) VM VM instance {"id": "3", "name": "i-2-3-VM", "uuid": "4213c3d9-72a8-4201-96c4-003e2c2f845b", "type"="User"} is not in a state to be started: Destroyed
2021-06-29 12:57:51,610 INFO  [c.c.v.VirtualMachineManagerImpl] (Work-Job-Executor-5:ctx-da4601fa job-52/job-54 ctx-cc08e96e) (logid:08e57fcb) Caught CloudRuntimeException, returning job failed com.cloud.utils.exception.CloudRuntimeException: Cannot start VM: VM instance {"id": "3", "name": "i-2-3-VM", "uuid": "4213c3d9-72a8-4201-96c4-003e2c2f845b", "type"="User"} in Destroyed state
2021-06-29 12:57:51,613 DEBUG [c.c.v.VmWorkJobHandlerProxy] (Work-Job-Executor-5:ctx-da4601fa job-52/job-54 ctx-cc08e96e) (logid:08e57fcb) Done executing VM work job: com.cloud.vm.VmWorkStart{"dcId":0,"rawParams":{"VmPassword":"rO0ABXQADnNhdmVkX3Bhc3N3b3Jk"},"userId":2,"accountId":2,"vmId":3,"handlerName":"VirtualMachineManagerImpl"}

image

@Pearl1594
Copy link
Contributor Author

@blueorangutan package

@blueorangutan
Copy link

@Pearl1594 a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✔️ centos7 ✖️ centos8 ✔️ debian. SL-JID 422

@Pearl1594
Copy link
Contributor Author

@blueorangutan package

@blueorangutan
Copy link

@Pearl1594 a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✔️ centos7 ✔️ centos8 ✔️ debian. SL-JID 424

@Pearl1594
Copy link
Contributor Author

@blueorangutan test

@blueorangutan
Copy link

@Pearl1594 a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@Pearl1594
Copy link
Contributor Author

@blueorangutan package

@blueorangutan
Copy link

@Pearl1594 a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✔️ centos7 ✔️ centos8 ✔️ debian. SL-JID 425

@blueorangutan
Copy link

Trillian Build Failed (tid-1159)

@Pearl1594
Copy link
Contributor Author

@blueorangutan test

@blueorangutan
Copy link

@Pearl1594 a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

Comment on lines 884 to 885
s_logger.warn("VM " + vm + " is not in a state to be started: " + state);
throw new CloudRuntimeException(String.format("Cannot start VM: %s in %s state", vm, state));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could unify these messages in a String var (with String.format).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
s_logger.warn("VM " + vm + " is not in a state to be started: " + state);
throw new CloudRuntimeException(String.format("Cannot start VM: %s in %s state", vm, state));
String msg = String.format("Cannot start VM: %s in %s state", vm, state)
s_logger.warn(msg);
throw new CloudRuntimeException(msg));

@blueorangutan
Copy link

Trillian test result (tid-1160)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 51927 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr5165-t1160-kvm-centos7.zip
Intermittent failure detected: /marvin/tests/smoke/test_kubernetes_clusters.py
Intermittent failure detected: /marvin/tests/smoke/test_router_dns.py
Intermittent failure detected: /marvin/tests/smoke/test_routers_network_ops.py
Intermittent failure detected: /marvin/tests/smoke/test_vpc_redundant.py
Intermittent failure detected: /marvin/tests/smoke/test_vpc_vpn.py
Smoke tests completed. 84 look OK, 4 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File
test_08_deploy_and_upgrade_kubernetes_ha_cluster Failure 224.01 test_kubernetes_clusters.py
test_01_RVR_Network_FW_PF_SSH_default_routes_egress_true Failure 347.60 test_routers_network_ops.py
test_01_create_redundant_VPC_2tiers_4VMs_4IPs_4PF_ACL Failure 537.61 test_vpc_redundant.py
test_01_create_redundant_VPC_2tiers_4VMs_4IPs_4PF_ACL Error 537.63 test_vpc_redundant.py
test_03_create_redundant_VPC_1tier_2VMs_2IPs_2PF_ACL_reboot_routers Failure 474.42 test_vpc_redundant.py
test_01_vpc_site2site_vpn_multiple_options Failure 380.14 test_vpc_vpn.py

@Pearl1594
Copy link
Contributor Author

@blueorangutan package

@blueorangutan
Copy link

@Pearl1594 a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✔️ centos7 ✔️ centos8 ✔️ debian. SL-JID 438

@Pearl1594
Copy link
Contributor Author

@blueorangutan test

@blueorangutan
Copy link

@Pearl1594 a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@blueorangutan
Copy link

Trillian test result (tid-1179)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 37293 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr5165-t1179-kvm-centos7.zip
Intermittent failure detected: /marvin/tests/smoke/test_deploy_vm_with_userdata.py
Intermittent failure detected: /marvin/tests/smoke/test_kubernetes_clusters.py
Intermittent failure detected: /marvin/tests/smoke/test_kubernetes_supported_versions.py
Intermittent failure detected: /marvin/tests/smoke/test_list_ids_parameter.py
Intermittent failure detected: /marvin/tests/smoke/test_loadbalance.py
Intermittent failure detected: /marvin/tests/smoke/test_metrics_api.py
Intermittent failure detected: /marvin/tests/smoke/test_multipleips_per_nic.py
Intermittent failure detected: /marvin/tests/smoke/test_nested_virtualization.py
Intermittent failure detected: /marvin/tests/smoke/test_network_acl.py
Intermittent failure detected: /marvin/tests/smoke/test_network.py
Intermittent failure detected: /marvin/tests/smoke/test_password_server.py
Intermittent failure detected: /marvin/tests/smoke/test_portforwardingrules.py
Intermittent failure detected: /marvin/tests/smoke/test_privategw_acl.py
Intermittent failure detected: /marvin/tests/smoke/test_projects.py
Intermittent failure detected: /marvin/tests/smoke/test_reset_vm_on_reboot.py
Intermittent failure detected: /marvin/tests/smoke/test_resource_accounting.py
Intermittent failure detected: /marvin/tests/smoke/test_router_dhcphosts.py
Smoke tests completed. 72 look OK, 16 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File
ContextSuite context=TestLoadBalance>:setup Error 0.00 test_loadbalance.py
test_list_vms_metrics Error 0.24 test_metrics_api.py
ContextSuite context=TestNetworkACL>:setup Error 0.00 test_network_acl.py
test_delete_account Error 0.46 test_network.py
test_delete_network_while_vm_on_it Error 1.12 test_network.py
test_deploy_vm_l2network Error 1.11 test_network.py
test_l2network_restart Error 2.19 test_network.py
ContextSuite context=TestPortForwarding>:setup Error 3.41 test_network.py
ContextSuite context=TestPublicIP>:setup Error 1.25 test_network.py
test_reboot_router Failure 0.04 test_network.py
test_releaseIP Error 0.44 test_network.py
ContextSuite context=TestRouterRules>:setup Error 0.49 test_network.py
test_01_invalid_upgrade_kubernetes_cluster Failure 0.01 test_kubernetes_clusters.py
test_02_deploy_and_upgrade_kubernetes_cluster Failure 0.00 test_kubernetes_clusters.py
test_03_deploy_and_scale_kubernetes_cluster Failure 0.00 test_kubernetes_clusters.py
test_04_basic_lifecycle_kubernetes_cluster Failure 0.00 test_kubernetes_clusters.py
test_05_delete_kubernetes_cluster Failure 0.00 test_kubernetes_clusters.py
test_06_deploy_invalid_kubernetes_ha_cluster Failure 0.00 test_kubernetes_clusters.py
test_07_deploy_kubernetes_ha_cluster Failure 0.00 test_kubernetes_clusters.py
test_08_deploy_and_upgrade_kubernetes_ha_cluster Failure 0.00 test_kubernetes_clusters.py
test_09_delete_kubernetes_ha_cluster Failure 0.00 test_kubernetes_clusters.py
test_01_add_delete_kubernetes_supported_version Error 1802.89 test_kubernetes_supported_versions.py
ContextSuite context=TestListIdsParams>:setup Error 0.00 test_list_ids_parameter.py
test_nic_secondaryip_add_remove Failure 0.04 test_multipleips_per_nic.py
ContextSuite context=TestNestedVirtualization>:setup Error 0.00 test_nested_virtualization.py
ContextSuite context=TestIsolatedNetworksPasswdServer>:setup Error 0.00 test_password_server.py
ContextSuite context=TestPortForwardingRules>:setup Error 0.00 test_portforwardingrules.py
ContextSuite context=TestPrivateGwACL>:setup Error 0.00 test_privategw_acl.py
test_09_project_suspend Error 1.08 test_projects.py
test_10_project_activation Error 1.06 test_projects.py
ContextSuite context=TestResetVmOnReboot>:setup Error 0.00 test_reset_vm_on_reboot.py
ContextSuite context=TestRAMCPUResourceAccounting>:setup Error 0.00 test_resource_accounting.py
ContextSuite context=TestRouterDHCPHosts>:setup Error 0.00 test_router_dhcphosts.py
ContextSuite context=TestRouterDHCPOpts>:setup Error 0.00 test_router_dhcphosts.py

@Pearl1594
Copy link
Contributor Author

@blueorangutan test

@blueorangutan
Copy link

@Pearl1594 a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@blueorangutan
Copy link

Trillian test result (tid-1190)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 43413 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr5165-t1190-kvm-centos7.zip
Intermittent failure detected: /marvin/tests/smoke/test_internal_lb.py
Intermittent failure detected: /marvin/tests/smoke/test_routers_network_ops.py
Intermittent failure detected: /marvin/tests/smoke/test_snapshots.py
Intermittent failure detected: /marvin/tests/smoke/test_vpc_redundant.py
Smoke tests completed. 86 look OK, 2 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File
test_02_RVR_Network_FW_PF_SSH_default_routes_egress_false Failure 336.32 test_routers_network_ops.py
test_03_create_redundant_VPC_1tier_2VMs_2IPs_2PF_ACL_reboot_routers Failure 523.40 test_vpc_redundant.py
test_05_rvpc_multi_tiers Failure 507.98 test_vpc_redundant.py

Copy link
Contributor

@DaanHoogland DaanHoogland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cltgm

@rohityadavcloud rohityadavcloud added this to the 4.16.0.0 milestone Jul 5, 2021
Copy link
Contributor

@GutoVeronezi GutoVeronezi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CLGTM; I raised a last point about toString.

@Pearl1594
Copy link
Contributor Author

@blueorangutan package

@blueorangutan
Copy link

@Pearl1594 a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✔️ el7 ✔️ el8 ✔️ debian. SL-JID 475

@Pearl1594
Copy link
Contributor Author

@blueorangutan test

@blueorangutan
Copy link

@Pearl1594 a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@blueorangutan
Copy link

Trillian test result (tid-1205)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 33120 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr5165-t1205-kvm-centos7.zip
Smoke tests completed. 88 look OK, 0 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File

return null;
String msg = String.format("Cannot start %s in %s state", vm, state);
s_logger.warn(msg);
throw new CloudRuntimeException(msg);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code LGTM

Copy link
Contributor

@vladimirpetrov vladimirpetrov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Functionality tested - LGTM.

Copy link
Member

@GabrielBrascher GabrielBrascher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code LGTM

@sureshanaparti
Copy link
Contributor

Merging this based on LGTMs, manual and smoke tests.

@sureshanaparti sureshanaparti merged commit 3fd9250 into apache:main Jul 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants