Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLOUDSTACK-9796 - Fix NPE in VirtualMachineManagerImpl.java #1956

Merged
merged 1 commit into from Apr 22, 2017

Conversation

nathanejohnson
Copy link
Member

This checks the work variable for NULL in all cases where it is
used. Fixes CLOUDSTACK-9796.

@GabrielBrascher
Copy link
Member

GabrielBrascher commented Feb 21, 2017

Thanks for the code that prevents NULL pointer exception.
Based on code review and that all checks have passed LGTM.

Copy link
Member

@rafaelweingartner rafaelweingartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nathanejohnson I have a question here before we proceed.

_workDao.updateStep(work, step);
Step previousStep = null;
if (work != null) {
previousStep = work.getStep();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can " work.getStep()" return null?
I see that you add a check at line 757 previousStep != null. Why would we need that check there, and not need it here (line750)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rafaelweingartner if work is null, previousStep will stay null. Maybe not the clearest way to handle this, but this prevents a null work from being passed down below. In other words, if work is null, previousStep will be guaranteed null, and if previousStep is not null, then work is guaranteed to be not null.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, now I think I am starting to get it.
But I am still not sure about some things here, would you mind continue discussing?

If the work is not null, you get the previous step (let’s assume it is not null) and call the method _workDao.updateStep(work, step). After this, you call stateTransitTo(vm, event, hostId). Why do we need to call _workDao.updateStep(work, previousStep) again at line 758 that is executed when the method finishes? The previousStep continues to be the same.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of this code I didn't write, but I can make some guesses:

_workDao.updateStep(work, previousStep) line is in the finally block, which will execute even if an exception is thrown in stateTransitTo (like NoTransitException for instance). So if stateTransitTo a) returns a false, or b) throw an exception, then result will be false, and line 758 will run. So if something happens that the state isn't transitioned, someone wanted the work reverted to its previous step value. Sort of a rollback maybe?

In the case of the VM hung in starting, my desired side effect is I want stateTransitTo to be called and set the state to Stopped , i.e., Event.AgentReportStopped -> State.Stopped . The work has already expired at this point, so it is null. I was trying to preserve the same behavior as before when work was not null.

Sorry if this wasn't very clear.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, and to your point earlier, getStep() generally shouldn't ever return a null I don't think , because the step column in the op_it_work table is marked not null.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not asking to make a distinction between exception or not. What I tried to say is that, if the intent/purpose of the finally block was only to revert the step to a previous state when exceptions occur, we could do that using a catch block. I think the finally here is meant to revert the state of work step even if an exception does not happen, for instance when stateTransitTo returns false.

I think you already answered my doubt; when you said that the previousStep is most likely never to be null. I thought we could have cases where previousStep == null, and then if the stateTransitTo returns false, with the newly added check at line 757, we would not update the step back to null for these cases.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think something like:

if (!result && work != null) {

would be better? Even if work.getStep() did return a null, that should have the same effect as before. Maybe it would be more readable too?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think this is more readable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks for the input

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thank you

This checks the work variable for NULL in all cases where it is
used.  Fixes CLOUDSTACK-9796.
Copy link
Member

@rafaelweingartner rafaelweingartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

_workDao.updateStep(work, step);
Step previousStep = null;
if (work != null) {
previousStep = work.getStep();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thank you

@kiwiflyer
Copy link
Contributor

LGTM by Rafael in commit review above (for acspr).

@borisstoyanov
Copy link
Contributor

@nathanejohnson thanks for this bug fix
@blueorangutan package

@blueorangutan
Copy link

@borisstoyanov a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✖centos6 ✔centos7 ✔debian. JID-571

@borisstoyanov
Copy link
Contributor

@blueorangutan package

@blueorangutan
Copy link

@borisstoyanov a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✔centos6 ✔centos7 ✔debian. JID-572

@borisstoyanov
Copy link
Contributor

@blueorangutan test

@blueorangutan
Copy link

@borisstoyanov a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@blueorangutan
Copy link

Trillian test result (tid-936)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 32211 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr1956-t936-kvm-centos7.zip
Intermitten failure detected: /marvin/tests/smoke/test_privategw_acl.py
Intermitten failure detected: /marvin/tests/smoke/test_snapshots.py
Intermitten failure detected: /marvin/tests/smoke/test_vpc_redundant.py
Test completed. 45 look ok, 3 have error(s)

Test Result Time (s) Test File
test_01_create_redundant_VPC_2tiers_4VMs_4IPs_4PF_ACL Failure 368.96 test_vpc_redundant.py
test_04_rvpc_privategw_static_routes Failure 315.79 test_privategw_acl.py
test_02_list_snapshots_with_removed_data_store Error 0.04 test_snapshots.py
test_01_vpc_site2site_vpn Success 160.02 test_vpc_vpn.py
test_01_vpc_remote_access_vpn Success 71.29 test_vpc_vpn.py
test_01_redundant_vpc_site2site_vpn Success 245.60 test_vpc_vpn.py
test_02_VPC_default_routes Success 270.06 test_vpc_router_nics.py
test_01_VPC_nics_after_destroy Success 533.80 test_vpc_router_nics.py
test_05_rvpc_multi_tiers Success 513.29 test_vpc_redundant.py
test_04_rvpc_network_garbage_collector_nics Success 1400.07 test_vpc_redundant.py
test_03_create_redundant_VPC_1tier_2VMs_2IPs_2PF_ACL_reboot_routers Success 548.99 test_vpc_redundant.py
test_02_redundant_VPC_default_routes Success 762.46 test_vpc_redundant.py
test_09_delete_detached_volume Success 156.53 test_volumes.py
test_08_resize_volume Success 156.59 test_volumes.py
test_07_resize_fail Success 161.50 test_volumes.py
test_06_download_detached_volume Success 156.55 test_volumes.py
test_05_detach_volume Success 150.77 test_volumes.py
test_04_delete_attached_volume Success 151.25 test_volumes.py
test_03_download_attached_volume Success 151.34 test_volumes.py
test_02_attach_volume Success 124.22 test_volumes.py
test_01_create_volume Success 711.43 test_volumes.py
test_deploy_vm_multiple Success 252.67 test_vm_life_cycle.py
test_deploy_vm Success 0.03 test_vm_life_cycle.py
test_advZoneVirtualRouter Success 0.02 test_vm_life_cycle.py
test_10_attachAndDetach_iso Success 26.67 test_vm_life_cycle.py
test_09_expunge_vm Success 125.25 test_vm_life_cycle.py
test_08_migrate_vm Success 40.97 test_vm_life_cycle.py
test_07_restore_vm Success 0.13 test_vm_life_cycle.py
test_06_destroy_vm Success 125.94 test_vm_life_cycle.py
test_03_reboot_vm Success 125.87 test_vm_life_cycle.py
test_02_start_vm Success 10.18 test_vm_life_cycle.py
test_01_stop_vm Success 40.34 test_vm_life_cycle.py
test_CreateTemplateWithDuplicateName Success 35.45 test_templates.py
test_08_list_system_templates Success 0.03 test_templates.py
test_07_list_public_templates Success 0.04 test_templates.py
test_05_template_permissions Success 0.06 test_templates.py
test_04_extract_template Success 5.16 test_templates.py
test_03_delete_template Success 5.11 test_templates.py
test_02_edit_template Success 90.13 test_templates.py
test_01_create_template Success 35.40 test_templates.py
test_10_destroy_cpvm Success 161.70 test_ssvm.py
test_09_destroy_ssvm Success 163.80 test_ssvm.py
test_08_reboot_cpvm Success 101.56 test_ssvm.py
test_07_reboot_ssvm Success 103.64 test_ssvm.py
test_06_stop_cpvm Success 161.80 test_ssvm.py
test_05_stop_ssvm Success 193.85 test_ssvm.py
test_04_cpvm_internals Success 1.20 test_ssvm.py
test_03_ssvm_internals Success 3.23 test_ssvm.py
test_02_list_cpvm_vm Success 0.13 test_ssvm.py
test_01_list_sec_storage_vm Success 0.16 test_ssvm.py
test_01_snapshot_root_disk Success 11.11 test_snapshots.py
test_04_change_offering_small Success 239.64 test_service_offerings.py
test_03_delete_service_offering Success 0.04 test_service_offerings.py
test_02_edit_service_offering Success 0.06 test_service_offerings.py
test_01_create_service_offering Success 0.11 test_service_offerings.py
test_02_sys_template_ready Success 0.13 test_secondary_storage.py
test_01_sys_vm_start Success 0.20 test_secondary_storage.py
test_09_reboot_router Success 35.32 test_routers.py
test_08_start_router Success 30.30 test_routers.py
test_07_stop_router Success 10.18 test_routers.py
test_06_router_advanced Success 0.06 test_routers.py
test_05_router_basic Success 0.04 test_routers.py
test_04_restart_network_wo_cleanup Success 5.73 test_routers.py
test_03_restart_network_cleanup Success 60.54 test_routers.py
test_02_router_internal_adv Success 1.04 test_routers.py
test_01_router_internal_basic Success 0.58 test_routers.py
test_router_dns_guestipquery Success 76.75 test_router_dns.py
test_router_dns_externalipquery Success 0.07 test_router_dns.py
test_router_dhcphosts Success 276.93 test_router_dhcphosts.py
test_router_dhcp_opts Success 21.93 test_router_dhcphosts.py
test_01_updatevolumedetail Success 0.08 test_resource_detail.py
test_01_reset_vm_on_reboot Success 146.04 test_reset_vm_on_reboot.py
test_createRegion Success 0.04 test_regions.py
test_create_pvlan_network Success 5.23 test_pvlan.py
test_dedicatePublicIpRange Success 0.42 test_public_ip_range.py
test_03_vpc_privategw_restart_vpc_cleanup Success 500.84 test_privategw_acl.py
test_02_vpc_privategw_static_routes Success 385.90 test_privategw_acl.py
test_01_vpc_privategw_acl Success 87.22 test_privategw_acl.py
test_01_primary_storage_nfs Success 35.94 test_primary_storage.py
test_createPortablePublicIPRange Success 15.20 test_portable_publicip.py
test_createPortablePublicIPAcquire Success 15.44 test_portable_publicip.py
test_isolate_network_password_server Success 89.33 test_password_server.py
test_UpdateStorageOverProvisioningFactor Success 0.14 test_over_provisioning.py
test_oobm_zchange_password Success 30.72 test_outofbandmanagement.py
test_oobm_multiple_mgmt_server_ownership Success 16.34 test_outofbandmanagement.py
test_oobm_issue_power_status Success 10.26 test_outofbandmanagement.py
test_oobm_issue_power_soft Success 15.50 test_outofbandmanagement.py
test_oobm_issue_power_reset Success 15.32 test_outofbandmanagement.py
test_oobm_issue_power_on Success 15.33 test_outofbandmanagement.py
test_oobm_issue_power_off Success 15.35 test_outofbandmanagement.py
test_oobm_issue_power_cycle Success 15.32 test_outofbandmanagement.py
test_oobm_enabledisable_across_clusterzones Success 72.60 test_outofbandmanagement.py
test_oobm_enable_feature_valid Success 5.16 test_outofbandmanagement.py
test_oobm_enable_feature_invalid Success 0.10 test_outofbandmanagement.py
test_oobm_disable_feature_valid Success 5.19 test_outofbandmanagement.py
test_oobm_disable_feature_invalid Success 0.11 test_outofbandmanagement.py
test_oobm_configure_invalid_driver Success 0.08 test_outofbandmanagement.py
test_oobm_configure_default_driver Success 0.08 test_outofbandmanagement.py
test_oobm_background_powerstate_sync Success 23.50 test_outofbandmanagement.py
test_extendPhysicalNetworkVlan Success 15.36 test_non_contigiousvlan.py
test_01_nic Success 419.16 test_nic.py
test_releaseIP Success 247.88 test_network.py
test_reboot_router Success 408.62 test_network.py
test_public_ip_user_account Success 10.26 test_network.py
test_public_ip_admin_account Success 40.29 test_network.py
test_network_rules_acquired_public_ip_3_Load_Balancer_Rule Success 67.00 test_network.py
test_network_rules_acquired_public_ip_2_nat_rule Success 61.79 test_network.py
test_network_rules_acquired_public_ip_1_static_nat_rule Success 124.14 test_network.py
test_delete_account Success 287.94 test_network.py
test_02_port_fwd_on_non_src_nat Success 55.68 test_network.py
test_01_port_fwd_on_src_nat Success 111.78 test_network.py
test_nic_secondaryip_add_remove Success 227.78 test_multipleips_per_nic.py
login_test_saml_user Success 19.16 test_login.py
test_assign_and_removal_lb Success 133.46 test_loadbalance.py
test_02_create_lb_rule_non_nat Success 187.48 test_loadbalance.py
test_01_create_lb_rule_src_nat Success 217.81 test_loadbalance.py
test_03_list_snapshots Success 0.06 test_list_ids_parameter.py
test_02_list_templates Success 0.04 test_list_ids_parameter.py
test_01_list_volumes Success 0.03 test_list_ids_parameter.py
test_07_list_default_iso Success 0.07 test_iso.py
test_05_iso_permissions Success 0.07 test_iso.py
test_04_extract_Iso Success 5.19 test_iso.py
test_03_delete_iso Success 95.20 test_iso.py
test_02_edit_iso Success 0.06 test_iso.py
test_01_create_iso Success 21.16 test_iso.py
test_04_rvpc_internallb_haproxy_stats_on_all_interfaces Success 208.61 test_internal_lb.py
test_03_vpc_internallb_haproxy_stats_on_all_interfaces Success 157.65 test_internal_lb.py
test_02_internallb_roundrobin_1RVPC_3VM_HTTP_port80 Success 531.73 test_internal_lb.py
test_01_internallb_roundrobin_1VPC_3VM_HTTP_port80 Success 441.42 test_internal_lb.py
test_dedicateGuestVlanRange Success 10.33 test_guest_vlan_range.py
test_UpdateConfigParamWithScope Success 0.14 test_global_settings.py
test_rolepermission_lifecycle_update Success 6.27 test_dynamicroles.py
test_rolepermission_lifecycle_list Success 6.04 test_dynamicroles.py
test_rolepermission_lifecycle_delete Success 5.86 test_dynamicroles.py
test_rolepermission_lifecycle_create Success 5.95 test_dynamicroles.py
test_rolepermission_lifecycle_concurrent_updates Success 6.03 test_dynamicroles.py
test_role_lifecycle_update_role_inuse Success 5.94 test_dynamicroles.py
test_role_lifecycle_update Success 10.99 test_dynamicroles.py
test_role_lifecycle_list Success 5.93 test_dynamicroles.py
test_role_lifecycle_delete Success 10.94 test_dynamicroles.py
test_role_lifecycle_create Success 5.91 test_dynamicroles.py
test_role_inuse_deletion Success 5.89 test_dynamicroles.py
test_role_account_acls_multiple_mgmt_servers Success 8.42 test_dynamicroles.py
test_role_account_acls Success 8.23 test_dynamicroles.py
test_default_role_deletion Success 6.15 test_dynamicroles.py
test_04_create_fat_type_disk_offering Success 0.07 test_disk_offerings.py
test_03_delete_disk_offering Success 0.05 test_disk_offerings.py
test_02_edit_disk_offering Success 0.06 test_disk_offerings.py
test_02_create_sparse_type_disk_offering Success 0.07 test_disk_offerings.py
test_01_create_disk_offering Success 0.11 test_disk_offerings.py
test_deployvm_userdispersing Success 20.58 test_deploy_vms_with_varied_deploymentplanners.py
test_deployvm_userconcentrated Success 20.58 test_deploy_vms_with_varied_deploymentplanners.py
test_deployvm_firstfit Success 70.75 test_deploy_vms_with_varied_deploymentplanners.py
test_deployvm_userdata_post Success 10.45 test_deploy_vm_with_userdata.py
test_deployvm_userdata Success 50.96 test_deploy_vm_with_userdata.py
test_02_deploy_vm_root_resize Success 5.97 test_deploy_vm_root_resize.py
test_01_deploy_vm_root_resize Success 5.98 test_deploy_vm_root_resize.py
test_00_deploy_vm_root_resize Success 217.70 test_deploy_vm_root_resize.py
test_deploy_vm_from_iso Success 207.45 test_deploy_vm_iso.py
test_DeployVmAntiAffinityGroup Success 65.94 test_affinity_groups.py
test_03_delete_vm_snapshots Skipped 0.00 test_vm_snapshots.py
test_02_revert_vm_snapshots Skipped 0.00 test_vm_snapshots.py
test_01_test_vm_volume_snapshot Skipped 0.00 test_vm_snapshots.py
test_01_create_vm_snapshots Skipped 0.00 test_vm_snapshots.py
test_06_copy_template Skipped 0.00 test_templates.py
test_static_role_account_acls Skipped 0.02 test_staticroles.py
test_01_scale_vm Skipped 0.00 test_scale_vm.py
test_01_primary_storage_iscsi Skipped 0.04 test_primary_storage.py
test_06_copy_iso Skipped 0.00 test_iso.py
test_deploy_vgpu_enabled_vm Skipped 0.01 test_deploy_vgpu_enabled_vm.py

Copy link
Contributor

@borisstoyanov borisstoyanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM based on test results and code review

@GabrielBrascher
Copy link
Member

Great, this one looks ready to merge.
Just for caution. Are those failures false positives?

test_01_create_redundant_VPC_2tiers_4VMs_4IPs_4PF_ACL 	Failure 	368.96 	test_vpc_redundant.py
test_04_rvpc_privategw_static_routes 	Failure 	315.79 	test_privategw_acl.py
test_02_list_snapshots_with_removed_data_store 	Error 	0.04 	test_snapshots.py

@borisstoyanov
Copy link
Contributor

borisstoyanov commented Mar 9, 2017

@GabrielBrascher privategw_acl and vpc_redundant test failures have been a pain for a while, couldn't managed to fix those tests. If someone could spend time and fix them would be really good.

While the test_snapshot is already fixed, in fact we just merged the fix in Trillian as well and next run would pass.

False positives should also be addressed with negative assertion within tests. Having a "Failure" test result should be considered as Failed test case, just my 2 cents..

@GabrielBrascher
Copy link
Member

Got it @borisstoyanov. Thanks!

@rohityadavcloud
Copy link
Member

LGTM. /cc @karuturi

@karuturi karuturi merged commit f2951d9 into apache:4.9 Apr 22, 2017
@karuturi karuturi modified the milestone: 4.10.0.0 Apr 27, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants