Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Router aggregate timeout does not seem to be honored #2978

Closed
wido opened this issue Oct 29, 2018 · 3 comments
Closed

Router aggregate timeout does not seem to be honored #2978

wido opened this issue Oct 29, 2018 · 3 comments
Assignees
Milestone

Comments

@wido
Copy link
Contributor

wido commented Oct 29, 2018

ISSUE TYPE
  • Bug Report
COMPONENT NAME
Virtual Router
CLOUDSTACK VERSION
4.11.1
CONFIGURATION
router.aggregation.command.each.timeout = 6000
OS / ENVIRONMENT

Basic Networking

SUMMARY

Router gets killed on Start due to timeout before configuration has completed

STEPS TO REPRODUCE
Deploy a Virtual Router with ~600 DHCP entries
EXPECTED RESULTS
VR should deploy properly
ACTUAL RESULTS
Timeout was reached

The story is that during a upgrade from 4.10 to 4.11.1 we (PCextreme) encountered a problem that Virtual Routers would not start.

During their Start and configuration they ran into a timeout which caused the VR to get killed.

For example we saw in the logs:

2018-10-29 06:38:07,041 DEBUG [resource.virtualnetwork.VirtualRoutingResource] (agentRequest-Handler-6:null) (logid:ded92662) Aggregate action timeout in seconds is 665
2018-10-29 06:38:07,041 DEBUG [kvm.resource.LibvirtComputingResource] (agentRequest-Handler-6:null) (logid:ded92662) Creating file in VR, with ip: 169.254.3.223, file: VR-d09aa357-27e3-4176-a283-9a7afedbae27.cfg
2018-10-29 06:38:07,464 DEBUG [kvm.resource.LibvirtComputingResource] (agentRequest-Handler-6:null) (logid:ded92662) Executing: /usr/share/cloudstack-common/scripts/network/domr/router_proxy.sh vr_cfg.sh 169.254.3.223 -c /var/cache/cloud/VR-d09aa357-27e3-4176-a283-9a7afedbae27.cfg 
2018-10-29 06:38:07,466 DEBUG [kvm.resource.LibvirtComputingResource] (agentRequest-Handler-6:null) (logid:ded92662) Executing while with timeout : 665700

So in this case the timeout was 665 seconds, about 11 minutes.

We tried to increase router.aggregation.command.each.timeout both on the Management Server side and in agent.properties, but that did not seem to make any change.

For each DHCP entry a ~1 second timeout seems to be calculated. This VR has 609 DHCP entries:

root@r-32727-VM:~# wc -l /etc/dhcphosts.txt 
609 /etc/dhcphosts.txt
root@r-32727-VM:~#

10 minutes is a long time, that is something that would need improving as well, but apart from that I just would not start.

My colleague created PR #2977 as this fixed the issue for us. So we need to investigate if his fix is the proper one or that the (default) timeout should be increased.

@wido
Copy link
Contributor Author

wido commented Oct 29, 2018

Seee #2979 , I was too fast with typing :)

@rohityadavcloud rohityadavcloud added this to the 4.11.2.0 milestone Oct 29, 2018
@rohityadavcloud
Copy link
Member

The fix shared by @RPDiep LGTM @wido

@rohityadavcloud
Copy link
Member

rohityadavcloud commented Oct 29, 2018

#2979 has been merged, closing this issue. /cc @wido please re-open if additional work needs to be done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants