-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VR stop/start/reboot commands failed #2862
Comments
apache2 service is not started even after VPC restart. Don't now if it is related to described problem...
apache2 error log is clean. How apache2 should be started? By systemd or by some cloud scripts? |
we also hit a similar issue. vmware 5.5. first error was
full log
|
@resmo can you confirm if you tested 4.11.1.0 or the 4.11.2.0 RC2? I'm unable to reproduce this at least on KVM with 4.11.2.0 RC2, will test against vmware 5.5 next. |
@izenk apache2 should be started by the post cloud-init script. Can you share output of systemctl status apache2, what error do you see? I could not reproduce the issue with either stop+start or rebooting of the VPC VR, it came up OK for me in case of CentOS7 KVM host. For the broken VR, can you share its /var/log/cloud.log file? |
@rhtyd its 4.11.1.0 |
Thanks @resmo. @resmo @izenk can you describe your VPC (or isolated network) setups, do you have private gateway and/or additional services (vpn etc) or rules etc? In my two test setups (CentOS7 KVM and VMware 5.5 based), I could not reproduce the issue against latest 4.11.2.0 rc2 (latest 4.11 branch). About the apache2 process blocking VRs from starting, we've fixed that in 4.11.1.0 already, I could no longer reproduce that on vmware. /cc @PaulAngus |
@resmo we'll need the /var/log/cloud.log from the VR when this happens as well as the management server logs when you reproduce the next time. |
@resmo we'll need additional logs from an env/VRs where you're seeing the error to debug/investigate and fix this issue. |
@rhtyd
cloud.log from VR is also attached
This is the last line before I got exception above. |
Thanks @izenk let me have a look |
@izenk can you describe your KVM setup, are you using bridge based networking or openvswitch with vlan/vxlan/etc, is it normal adv zone (or with security groups)? The Reading the cloud.log from the VR, the first problem I see is which implies either failure of the cloudstack kvm agent to create file in the VR or VR already processed this file:
After restart of the VR, the following suggests that the nic device was not found due to which several other failures such as failure to reconfigure vpn also failed:
Let me try to investigate the code and see if I reproduce it, meanwhile please answer my questions. |
@rhtyd VPC restart with "Clean up" option does the job. |
I could not reproduce the similar error with stop+start. However, with rebooting a VR I could see similar log statements, but ultimately the VR was able to start and I could not reproduce the error. There have been several VR related fixes in 4.11.2-rc2 compared to 4.11.1 so it's likely the bug is not reproducible. Please check and increase your router aggregation timeout and see if that helps? |
Given, you've a workaround for this issue and I could not reproduce this with latest 4.11 branch I'll move this to 4.11.3.0 milestone to avoid further blocking of rc3. |
@rhtyd ok When I click "Start" - VR starts boot. I can connect by VNC. After connecting by VNC (while tested VR is in "Starting" state) I can see only 3 interafaces: Note: Also I increased "router aggregation timeout" (router.aggregation.command.each.timeout) - what doesn't help. It seems even has no any effect: error about finalizing operation appeared after the same time as before increasing outer.aggregation.command.each.timeout. |
@rhtyd an result is:
10.9.2.206 is public IP(SNAT) on eth0 on "Starting" VR More details: |
@rhtyd rule: from all lookup Table_eth1 and in this table there is no route for 169.254.x.x, so everything goes through default via public 10.9.x.x |
@izenk can you confirm if you tested 4.11.1 or 4.11.2-rc2? A routing related bug was fixed which is in 4.11.2-rc2. |
@rhtyd no, can't |
@izenk the routing issue you've shared has been fixed and is in 4.11.2-rc2: https://github.com/apache/cloudstack/pull/2791/files |
@rhtyd got it. Thanks |
Can you test with 4.11.2.0 @izenk ? This probably can be closed. |
Any update on this one ? I believe I was observing similar situation... |
@andrijapanicsb were you using vpc with public ips in multiple ip ranges or private gateway ? |
Can you test against 4.11.2.0 or 4.11.3.0 @izenk ? |
I'm unable to reproduce with 4.11.3.0 or master, kindly reopen with details @izenk if you're still facing this issue. |
ISSUE TYPE
COMPONENT NAME
CLOUDSTACK VERSION
CONFIGURATION
KVM, Advanced networking
OS / ENVIRONMENT
Centos 7.4
SUMMARY
Stop/Start/Reboot command leads to VR inaccessibility. Only case is to restart VPC with "Clean up" options
STEPS TO REPRODUCE
Create VPC
Create multiple tiers (I have 6)
Create vms in every tier (in general 3 vm per tier)
Try VR operations: STOP/START or REBOOT
EXPECTED RESULTS
ACTUAL RESULTS
The text was updated successfully, but these errors were encountered: