Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLOUDSTACK-9055: fix NPE in updating Redundant State of VPC networks #1073

Merged
merged 1 commit into from
Nov 20, 2015

Conversation

ustcweizhou
Copy link
Contributor

This issue happened when the KVM nodes is down.It might also happen when the cloudstack-agent is killed unexpectedly.

@DaanHoogland
Copy link
Contributor

code review done: LGTM

@wilderrodrigues
Copy link
Contributor

Hi @ustcweizhou,

Okay, I understood. Have you prepared a test to cover it? I mean, we need a test that deploys a DC, creates a redundant VPC, adds 1 tier, deploys 1 VM, destroys/stop/disconnect the host then when the router check runs we don't get a NPE.

In case writing an integration test for that is too much, perhaps would be nice to write some unit tests.

Do you have a test environment? If so, could you please run the following tests against KVM?

  • requires HW
    • component/test_vpc_redundant.py
  • doesn't require HW (but just run against KVM, it's fine)
  • component/test_vpc_offerings.py

Cheers,
Wilder

@remibergsma
Copy link
Contributor

LGTM, based on a set of tests that I run on this branch (which I rebased myself first):

nosetests --with-marvin --marvin-config=${marvinCfg} -s -a tags=advanced,required_hardware=true \
component/test_vpc_redundant.py \
component/test_routers_iptables_default_policy.py \
component/test_routers_network_ops.py \
component/test_vpc_router_nics.py \
smoke/test_loadbalance.py \
smoke/test_internal_lb.py \
smoke/test_ssvm.py \
smoke/test_network.py

Result:

Create a redundant VPC with two networks with two VMs in each network ... === TestName: test_01_create_redundant_VPC_2tiers_4VMs_4IPs_4PF_ACL | Status : SUCCESS ===
ok
Create a redundant VPC with two networks with two VMs in each network and check default routes ... === TestName: test_02_redundant_VPC_default_routes | Status : SUCCESS ===
ok
Test iptables default INPUT/FORWARD policy on RouterVM ... === TestName: test_02_routervm_iptables_policies | Status : SUCCESS ===
ok
Test iptables default INPUT/FORWARD policies on VPC router ... === TestName: test_01_single_VPC_iptables_policies | Status : SUCCESS ===
ok
Test redundant router internals ... === TestName: test_01_isolate_network_FW_PF_default_routes_egress_true | Status : SUCCESS ===
ok
Test redundant router internals ... === TestName: test_02_isolate_network_FW_PF_default_routes_egress_false | Status : SUCCESS ===
ok
Test redundant router internals ... === TestName: test_01_RVR_Network_FW_PF_SSH_default_routes_egress_true | Status : SUCCESS ===
ok
Test redundant router internals ... === TestName: test_02_RVR_Network_FW_PF_SSH_default_routes_egress_false | Status : SUCCESS ===
ok
Create a VPC with two networks with one VM in each network and test nics after destroy ... === TestName: test_01_VPC_nics_after_destroy | Status : SUCCESS ===
ok
Create a VPC with two networks with one VM in each network and test default routes ... === TestName: test_02_VPC_default_routes | Status : SUCCESS ===
ok
Check the password file in the Router VM ... === TestName: test_isolate_network_password_server | Status : SUCCESS ===
ok
Check that the /etc/dhcphosts.txt doesn't contain duplicate IPs ... === TestName: test_router_dhcphosts | Status : SUCCESS ===
ok
Test to create Load balancing rule with source NAT ... === TestName: test_01_create_lb_rule_src_nat | Status : SUCCESS ===
ok
Test to create Load balancing rule with non source NAT ... === TestName: test_02_create_lb_rule_non_nat | Status : SUCCESS ===
ok
Test for assign & removing load balancing rule ... === TestName: test_assign_and_removal_lb | Status : SUCCESS ===
ok
Test to verify access to loadbalancer haproxy admin stats page ... === TestName: test02_internallb_haproxy_stats_on_all_interfaces | Status : SUCCESS ===
ok
Test create, assign, remove of an Internal LB with roundrobin http traffic to 3 vm's ... === TestName: test_01_internallb_roundrobin_1VPC_3VM_HTTP_port80 | Status : SUCCESS ===
ok
Test SSVM Internals ... === TestName: test_03_ssvm_internals | Status : SUCCESS ===
ok
Test CPVM Internals ... === TestName: test_04_cpvm_internals | Status : SUCCESS ===
ok
Test stop SSVM ... === TestName: test_05_stop_ssvm | Status : SUCCESS ===
ok
Test stop CPVM ... === TestName: test_06_stop_cpvm | Status : SUCCESS ===
ok
Test reboot SSVM ... === TestName: test_07_reboot_ssvm | Status : SUCCESS ===
ok
Test reboot CPVM ... === TestName: test_08_reboot_cpvm | Status : SUCCESS ===
ok
Test destroy SSVM ... === TestName: test_09_destroy_ssvm | Status : SUCCESS ===
ok
Test destroy CPVM ... === TestName: test_10_destroy_cpvm | Status : SUCCESS ===
ok
Test Remote Access VPN in VPC ... === TestName: test_vpc_remote_access_vpn | Status : SUCCESS ===
ok
Test VPN in VPC ... === TestName: test_vpc_site2site_vpn | Status : SUCCESS ===
ok
Test for port forwarding on source NAT ... === TestName: test_01_port_fwd_on_src_nat | Status : SUCCESS ===
ok
Test for port forwarding on non source NAT ... === TestName: test_02_port_fwd_on_non_src_nat | Status : SUCCESS ===
ok
Test for reboot router ... === TestName: test_reboot_router | Status : SUCCESS ===
ok
Test for Router rules for network rules on acquired public IP ... === TestName: test_network_rules_acquired_public_ip_1_static_nat_rule | Status : SUCCESS ===
ok
Test for Router rules for network rules on acquired public IP ... === TestName: test_network_rules_acquired_public_ip_2_nat_rule | Status : SUCCESS ===
ok
Test for Router rules for network rules on acquired public IP ... === TestName: test_network_rules_acquired_public_ip_3_Load_Balancer_Rule | Status : SUCCESS ===
ok

----------------------------------------------------------------------
Ran 33 tests in 15321.119s

OK

And:

nosetests --with-marvin --marvin-config=${marvinCfg} -s -a tags=advanced,required_hardware=false \
smoke/test_routers.py \
smoke/test_network_acl.py \
smoke/test_privategw_acl.py \
smoke/test_reset_vm_on_reboot.py \
smoke/test_vm_life_cycle.py \
smoke/test_vpc_vpn.py \
smoke/test_service_offerings.py \
component/test_vpc_offerings.py \
component/test_vpc_routers.py

Result:

Test router internal advanced zone ... === TestName: test_02_router_internal_adv | Status : SUCCESS ===
ok
Test restart network ... === TestName: test_03_restart_network_cleanup | Status : SUCCESS ===
ok
Test router basic setup ... === TestName: test_05_router_basic | Status : SUCCESS ===
ok
Test router advanced setup ... === TestName: test_06_router_advanced | Status : SUCCESS ===
ok
Test stop router ... === TestName: test_07_stop_router | Status : SUCCESS ===
ok
Test start router ... === TestName: test_08_start_router | Status : SUCCESS ===
ok
Test reboot router ... === TestName: test_09_reboot_router | Status : SUCCESS ===
ok
test_privategw_acl (integration.smoke.test_privategw_acl.TestPrivateGwACL) ... === TestName: test_privategw_acl | Status : SUCCESS ===
ok
Test reset virtual machine on reboot ... === TestName: test_01_reset_vm_on_reboot | Status : SUCCESS ===
ok
Test advanced zone virtual router ... === TestName: test_advZoneVirtualRouter | Status : SUCCESS ===
ok
Test Deploy Virtual Machine ... === TestName: test_deploy_vm | Status : SUCCESS ===
ok
Test Multiple Deploy Virtual Machine ... === TestName: test_deploy_vm_multiple | Status : SUCCESS ===
ok
Test Stop Virtual Machine ... === TestName: test_01_stop_vm | Status : SUCCESS ===
ok
Test Start Virtual Machine ... === TestName: test_02_start_vm | Status : SUCCESS ===
ok
Test Reboot Virtual Machine ... === TestName: test_03_reboot_vm | Status : SUCCESS ===
ok
Test destroy Virtual Machine ... === TestName: test_06_destroy_vm | Status : SUCCESS ===
ok
Test recover Virtual Machine ... === TestName: test_07_restore_vm | Status : SUCCESS ===
ok
Test migrate VM ... === TestName: test_08_migrate_vm | Status : SUCCESS ===
ok
Test destroy(expunge) Virtual Machine ... === TestName: test_09_expunge_vm | Status : SUCCESS ===
ok
Test to create service offering ... === TestName: test_01_create_service_offering | Status : SUCCESS ===
ok
Test to update existing service offering ... === TestName: test_02_edit_service_offering | Status : SUCCESS ===
ok
Test to delete service offering ... === TestName: test_03_delete_service_offering | Status : SUCCESS ===
ok
Test for delete account ... === TestName: test_delete_account | Status : SUCCESS ===
ok
Test for Associate/Disassociate public IP address for admin account ... === TestName: test_public_ip_admin_account | Status : SUCCESS ===
ok
Test for Associate/Disassociate public IP address for user account ... === TestName: test_public_ip_user_account | Status : SUCCESS ===
ok
Test for release public IP address ... === TestName: test_releaseIP | Status : SUCCESS ===
ok
Test create VPC offering ... === TestName: test_01_create_vpc_offering | Status : SUCCESS ===
ok
Test VPC offering without load balancing service ... === TestName: test_03_vpc_off_without_lb | Status : SUCCESS ===
ok
Test VPC offering without static NAT service ... === TestName: test_04_vpc_off_without_static_nat | Status : SUCCESS ===
ok
Test VPC offering without port forwarding service ... === TestName: test_05_vpc_off_without_pf | Status : SUCCESS ===
ok
Test VPC offering with invalid services ... === TestName: test_06_vpc_off_invalid_services | Status : SUCCESS ===
ok
Test update VPC offering ... === TestName: test_07_update_vpc_off | Status : SUCCESS ===
ok
Test list VPC offering ... === TestName: test_08_list_vpc_off | Status : SUCCESS ===
ok
test_09_create_redundant_vpc_offering (integration.component.test_vpc_offerings.TestVPCOffering) ... === TestName: test_09_create_redundant_vpc_offering | Status : SUCCESS ===
ok
Test start/stop of router after addition of one guest network ... === TestName: test_01_start_stop_router_after_addition_of_one_guest_network | Status : SUCCESS ===
ok
Test reboot of router after addition of one guest network ... === TestName: test_02_reboot_router_after_addition_of_one_guest_network | Status : SUCCESS ===
ok
Test to change service offering of router after addition of one guest network ... === TestName: test_04_chg_srv_off_router_after_addition_of_one_guest_network | Status : SUCCESS ===
ok
Test destroy of router after addition of one guest network ... === TestName: test_05_destroy_router_after_addition_of_one_guest_network | Status : SUCCESS ===
ok
Test to stop and start router after creation of VPC ... === TestName: test_01_stop_start_router_after_creating_vpc | Status : SUCCESS ===
ok
Test to reboot the router after creating a VPC ... === TestName: test_02_reboot_router_after_creating_vpc | Status : SUCCESS ===
ok
Tests to change service offering of the Router after ... === TestName: test_04_change_service_offerring_vpc | Status : SUCCESS ===
ok
Test to destroy the router after creating a VPC ... === TestName: test_05_destroy_router_after_creating_vpc | Status : SUCCESS ===
ok

----------------------------------------------------------------------
Ran 42 tests in 8219.441s

OK

These test may not cover your change, all they do is show you didn't break them. Someone else needs to review the code. Please respond to the comments by @wilderrodrigues.

@ustcweizhou
Copy link
Contributor Author

@wilderrodrigues @remibergsma @DaanHoogland
this issue indeed happened in my testing environment. After investigation, I found the node was shut down unexpectedly. If you ask me if I have reproduced it, th answer is no. I would not spent much time on this issue. The issue in the code is obvious and the fix is harmless, from my point of view.
The issue is not critical at all. If you guys think this PR is not good, I will close this PR.

@DaanHoogland
Copy link
Contributor

let's get this merged. I need no intergration tests for this improvement. a disect of updateRoutersRedundantState to make the bit that is fixed and maybe some other bits unit testable would be nice but not related to this PR. LGTM

@asfgit asfgit merged commit 66fc7c6 into apache:4.6 Nov 20, 2015
asfgit pushed a commit that referenced this pull request Nov 20, 2015
CLOUDSTACK-9055: fix NPE in updating Redundant State of VPC networksThis issue happened when the KVM nodes is down.It might also happen when the cloudstack-agent is killed unexpectedly.

* pr/1073:
  CLOUDSTACK-9055: fix NPE in updating Redundant State of VPC networks

Signed-off-by: Remi Bergsma <github@remi.nl>
@ustcweizhou
Copy link
Contributor Author

today one of my nodes was rebooted unexpectedly, I got the same error

2015-12-02 09:59:53,224 ERROR c.c.n.r.VirtualNetworkApplianceManagerImpl Fail to complete the RvRStatusUpdateTask!
java.lang.NullPointerException
at com.cloud.network.router.VirtualNetworkApplianceManagerImpl.updateRoutersRedundantState(VirtualNetworkApplianceManagerImpl.java:1019)
at com.cloud.network.router.VirtualNetworkApplianceManagerImpl$RvRStatusUpdateTask.runInContext(VirtualNetworkApplianceManagerImpl.java:1201)
at org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
at org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
at org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants