Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VR stop/start/reboot commands failed #2862

Closed
izenk opened this issue Sep 25, 2018 · 27 comments
Closed

VR stop/start/reboot commands failed #2862

izenk opened this issue Sep 25, 2018 · 27 comments

Comments

@izenk
Copy link

izenk commented Sep 25, 2018

ISSUE TYPE
  • Bug Report
COMPONENT NAME
VR
CLOUDSTACK VERSION
4.11.1
CONFIGURATION

KVM, Advanced networking

OS / ENVIRONMENT

Centos 7.4

SUMMARY

Stop/Start/Reboot command leads to VR inaccessibility. Only case is to restart VPC with "Clean up" options

STEPS TO REPRODUCE

Create VPC
Create multiple tiers (I have 6)
Create vms in every tier (in general 3 vm per tier)
Try VR operations: STOP/START or REBOOT

EXPECTED RESULTS
VR is restarted/ stopped/started
ACTUAL RESULTS
Resource [Host:4] is unreachable: Host 4: Unable to start instance due to Unable to start VM:38950734-d29e-49b3-b5f7-75f9b20a4207 due to error in finalizeStart, not retrying
@izenk
Copy link
Author

izenk commented Sep 25, 2018

apache2 service is not started even after VPC restart. Don't now if it is related to described problem...
There only logs related to apache2 is:

root@r-190-VM:~# grep apache /var/log/cloud.log
Tue Sep 25 13:59:41 UTC 2018 Setting up apache web server for VPC
2018-09-25 13:59:52,686  CsHelper.py copy:264 Copied /etc/apache2/vhost.template to /etc/apache2/sites-enabled/vhost-172.16.0.17.conf
2018-09-25 13:59:52,690  CsFile.py commit:66 Wrote edited file /etc/apache2/sites-enabled/vhost-172.16.0.17.conf
2018-09-25 13:59:54,456  CsHelper.py copy:264 Copied /etc/apache2/vhost.template to /etc/apache2/sites-enabled/vhost-172.16.0.1.conf
2018-09-25 13:59:54,461  CsFile.py commit:66 Wrote edited file /etc/apache2/sites-enabled/vhost-172.16.0.1.conf
2018-09-25 13:59:55,773  CsHelper.py copy:264 Copied /etc/apache2/vhost.template to /etc/apache2/sites-enabled/vhost-172.16.0.33.conf
2018-09-25 13:59:55,776  CsFile.py commit:66 Wrote edited file /etc/apache2/sites-enabled/vhost-172.16.0.33.conf
2018-09-25 13:59:57,076  CsHelper.py copy:264 Copied /etc/apache2/vhost.template to /etc/apache2/sites-enabled/vhost-172.16.0.49.conf
2018-09-25 13:59:57,080  CsFile.py commit:66 Wrote edited file /etc/apache2/sites-enabled/vhost-172.16.0.49.conf
2018-09-25 13:59:58,459  CsHelper.py copy:264 Copied /etc/apache2/vhost.template to /etc/apache2/sites-enabled/vhost-172.16.0.65.conf
2018-09-25 13:59:58,463  CsFile.py commit:66 Wrote edited file /etc/apache2/sites-enabled/vhost-172.16.0.65.conf
2018-09-25 13:59:59,926  CsHelper.py copy:264 Copied /etc/apache2/vhost.template to /etc/apache2/sites-enabled/vhost-172.16.0.81.conf
2018-09-25 13:59:59,930  CsFile.py commit:66 Wrote edited file /etc/apache2/sites-enabled/vhost-172.16.0.81.conf

apache2 error log is clean.

How apache2 should be started? By systemd or by some cloud scripts?

@rohityadavcloud rohityadavcloud self-assigned this Oct 2, 2018
@resmo
Copy link
Member

resmo commented Oct 3, 2018

we also hit a similar issue. vmware 5.5.

first error was

 2018-10-01 14:40:18,605 ERROR [network.router.VirtualNetworkApplianceManagerImpl] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) null
2018-10-01 14:40:18,605 WARN  [network.router.VirtualNetworkApplianceManagerImpl] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) Command: com.cloud.agent.api.routing.GroupCommand failed while starting virtual router

full log

[root@ma-cloud ~]# grep -r ctx-dae56f41 /var/log/cloudstack/management/management-server.log2018-10-01 14:29:22,735 DEBUG [cloud.vm.VmWorkJobHandlerProxy] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) Execute VM work job: com.cloud.vm.VmWorkStart{"dcId":0,"rawParams":{"RestartNetwork":"rO0ABXNyABFqYXZhLmxhbmcuQm9vbGVhbs0gcoDVnPruAgABWgAFdmFsdWV4cAE"},"userId":2,"accountId":2,"vmId":1603,"handler
[root@ma-cloud ~]# grep -r ctx-dae56f41 /var/log/cloudstack/management/management-server.log
2018-10-01 14:29:22,735 DEBUG [cloud.vm.VmWorkJobHandlerProxy] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) Execute VM work job: com.cloud.vm.VmWorkStart{"dcId":0,"rawParams":{"RestartNetwork":"rO0ABXNyABFqYXZhLmxhbmcuQm9vbGVhbs0gcoDVnPruAgABWgAFdmFsdWV4cAE"},"userId":2,"accountId":2,"vmId":1603,"handlerName":"VirtualMachineManagerImpl"}
2018-10-01 14:29:22,746 DEBUG [cloud.capacity.CapacityManagerImpl] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) VM state transitted from :Stopped to Starting with event: StartRequestedvm's original host id: 15 new host id: null host id before state transition: null
2018-10-01 14:29:22,746 DEBUG [cloud.vm.VirtualMachineManagerImpl] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) Successfully transitioned to start state for VM[DomainRouter|r-1603-VM] reservation id = d29a30cf-d4c8-405f-80c1-e3266242454d
2018-10-01 14:29:22,749 DEBUG [cloud.vm.VirtualMachineManagerImpl] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) Deploy avoids pods: null, clusters: null, hosts: null
2018-10-01 14:29:22,752 DEBUG [cloud.vm.VirtualMachineManagerImpl] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) Root volume is ready, need to place VM in volume's cluster
2018-10-01 14:29:22,752 DEBUG [cloud.vm.VirtualMachineManagerImpl] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) Vol[2022|vm=1603|ROOT] is READY, changing deployment plan to use this pool's dcId: 1 , podId: null , and clusterId: null
2018-10-01 14:29:22,763 DEBUG [cloud.deploy.DeploymentPlanningManagerImpl] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) DeploymentPlanner allocation algorithm: null
2018-10-01 14:29:22,764 DEBUG [cloud.deploy.DeploymentPlanningManagerImpl] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) Trying to allocate a host and storage pools from dc:1, pod:null,cluster:null, requested cpu: 1000, requested ram: 536870912
2018-10-01 14:29:22,764 DEBUG [cloud.deploy.DeploymentPlanningManagerImpl] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) Is ROOT volume READY (pool already allocated)?: Yes
2018-10-01 14:29:22,764 DEBUG [cloud.deploy.DeploymentPlanningManagerImpl] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) Deploy avoids pods: [], clusters: [], hosts: []
2018-10-01 14:29:22,765 DEBUG [cloud.deploy.DeploymentPlanningManagerImpl] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) This VM has last host_id specified, trying to choose the same host: 15
2018-10-01 14:29:22,774 DEBUG [cloud.capacity.CapacityManagerImpl] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) Host: 15 has cpu capability (cpu:24, speed:1900) to support requested CPU: 2 and requested speed: 500
2018-10-01 14:29:22,774 DEBUG [cloud.capacity.CapacityManagerImpl] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) Checking if host: 15 has enough capacity for requested CPU: 1000 and requested RAM: 536870912 , cpuOverprovisioningFactor: 8000.0
2018-10-01 14:29:22,776 DEBUG [cloud.capacity.CapacityManagerImpl] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) Hosts's actual total CPU: 45600 and CPU after applying overprovisioning: 364800000
2018-10-01 14:29:22,776 DEBUG [cloud.capacity.CapacityManagerImpl] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) We need to allocate to the last host again, so checking if there is enough reserved capacity
2018-10-01 14:29:22,776 DEBUG [cloud.capacity.CapacityManagerImpl] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) Reserved CPU: 1800 , Requested CPU: 1000
2018-10-01 14:29:22,776 DEBUG [cloud.capacity.CapacityManagerImpl] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) Reserved RAM: 2147483648 , Requested RAM: 536870912
2018-10-01 14:29:22,776 DEBUG [cloud.capacity.CapacityManagerImpl] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) Host has enough CPU and RAM available
2018-10-01 14:29:22,776 DEBUG [cloud.capacity.CapacityManagerImpl] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) STATS: Can alloc CPU from host: 15, used: 133400, reserved: 1800, actual total: 45600, total with overprovisioning: 364800000; requested cpu:1000,alloc_from_last_host?:true ,considerReservedCapacity?: true
2018-10-01 14:29:22,776 DEBUG [cloud.capacity.CapacityManagerImpl] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) STATS: Can alloc MEM from host: 15, used: 84188069888, reserved: 2147483648, total: 412170485760; requested mem: 536870912,alloc_from_last_host?:true ,considerReservedCapacity?: true
2018-10-01 14:29:22,776 DEBUG [cloud.deploy.DeploymentPlanningManagerImpl] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) The last host of this VM is UP and has enough capacity
2018-10-01 14:29:22,776 DEBUG [cloud.deploy.DeploymentPlanningManagerImpl] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) Now checking for suitable pools under zone: 1, pod: 1, cluster: 3
2018-10-01 14:29:22,780 DEBUG [cloud.deploy.DeploymentPlanningManagerImpl] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) Checking suitable pools for volume (Id, Type): (2022,ROOT)
2018-10-01 14:29:22,780 DEBUG [cloud.deploy.DeploymentPlanningManagerImpl] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) Volume has pool already allocated, checking if pool can be reused, poolId: 10
2018-10-01 14:29:22,784 DEBUG [cloud.deploy.DeploymentPlanningManagerImpl] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) Planner need not allocate a pool for this volume since its READY
2018-10-01 14:29:22,784 DEBUG [cloud.deploy.DeploymentPlanningManagerImpl] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) Trying to find a potenial host and associated storage pools from the suitable host/pool lists for this VM
2018-10-01 14:29:22,784 DEBUG [cloud.deploy.DeploymentPlanningManagerImpl] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) Checking if host: 15 can access any suitable storage pool for volume: ROOT
2018-10-01 14:29:22,785 DEBUG [cloud.deploy.DeploymentPlanningManagerImpl] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) Host: 15 can access pool: 10
2018-10-01 14:29:22,786 DEBUG [cloud.deploy.DeploymentPlanningManagerImpl] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) Found a potential host id: 15 name: ix-mapod01-esx03.stxt.media.int and associated storage pools for this VM
2018-10-01 14:29:22,786 DEBUG [cloud.deploy.DeploymentPlanningManagerImpl] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) Returning Deployment Destination: Dest[Zone(Id)-Pod(Id)-Cluster(Id)-Host(Id)-Storage(Volume(Id|Type-->Pool(Id))] : Dest[Zone(1)-Pod(1)-Cluster(3)-Host(15)-Storage()]
2018-10-01 14:29:22,786 DEBUG [cloud.vm.VirtualMachineManagerImpl] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) Deployment found  - P0=VM[DomainRouter|r-1603-VM], P0=Dest[Zone(Id)-Pod(Id)-Cluster(Id)-Host(Id)-Storage(Volume(Id|Type-->Pool(Id))] : Dest[Zone(1)-Pod(1)-Cluster(3)-Host(15)-Storage()]
2018-10-01 14:29:22,854 DEBUG [cloud.capacity.CapacityManagerImpl] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) VM state transitted from :Starting to Starting with event: OperationRetryvm's original host id: 15 new host id: 15 host id before state transition: null
2018-10-01 14:29:22,854 DEBUG [cloud.capacity.CapacityManagerImpl] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) VM starting again on the last host it was stopped on
2018-10-01 14:29:22,862 DEBUG [cloud.capacity.CapacityManagerImpl] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) Hosts's actual total CPU: 45600 and CPU after applying overprovisioning: 364800000
2018-10-01 14:29:22,862 DEBUG [cloud.capacity.CapacityManagerImpl] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) We are allocating VM, increasing the used capacity of this host:15
2018-10-01 14:29:22,862 DEBUG [cloud.capacity.CapacityManagerImpl] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) Current Used CPU: 133400 , Free CPU:364664800 ,Requested CPU: 1000
2018-10-01 14:29:22,862 DEBUG [cloud.capacity.CapacityManagerImpl] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) Current Used RAM: 84188069888 , Free RAM:325834932224 ,Requested RAM: 536870912
2018-10-01 14:29:22,862 DEBUG [cloud.capacity.CapacityManagerImpl] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) We are allocating VM to the last host again, so adjusting the reserved capacity if it is not less than required
2018-10-01 14:29:22,862 DEBUG [cloud.capacity.CapacityManagerImpl] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) Reserved CPU: 1800 , Requested CPU: 1000
2018-10-01 14:29:22,862 DEBUG [cloud.capacity.CapacityManagerImpl] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) Reserved RAM: 2147483648 , Requested RAM: 536870912
2018-10-01 14:29:22,862 DEBUG [cloud.capacity.CapacityManagerImpl] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) CPU STATS after allocation: for host: 15, old used: 133400, old reserved: 1800, actual total: 45600, total with overprovisioning: 364800000; new used:134400, reserved:800; requested cpu:1000,alloc_from_last:true
2018-10-01 14:29:22,862 DEBUG [cloud.capacity.CapacityManagerImpl] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) RAM STATS after allocation: for host: 15, old used: 84188069888, old reserved: 2147483648, total: 412170485760; new used: 84724940800, reserved: 1610612736; requested mem: 536870912,alloc_from_last:true
2018-10-01 14:29:22,916 DEBUG [dc.dao.DataCenterIpAddressDaoImpl] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) Releasing ip address for instance=2233
2018-10-01 14:29:22,925 DEBUG [network.guru.PodBasedNetworkGuru] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) Allocated a nic NicProfile[2233-1603-d29a30cf-d4c8-405f-80c1-e3266242454d-10.100.10.194-null for VM[DomainRouter|r-1603-VM]
2018-10-01 14:29:22,959 DEBUG [cloud.network.NetworkModelImpl] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) Service SecurityGroup is not supported in the network id=386
2018-10-01 14:29:22,971 DEBUG [engine.orchestration.NetworkOrchestrator] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) Asking VpcVirtualRouter to prepare for Nic[2235-1603-d29a30cf-d4c8-405f-80c1-e3266242454d-10.23.1.1]
2018-10-01 14:29:22,973 DEBUG [cloud.network.NetworkModelImpl] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) Service SecurityGroup is not supported in the network id=386
2018-10-01 14:29:22,979 DEBUG [engine.orchestration.VolumeOrchestrator] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) No need to recreate the volume: Vol[2022|vm=1603|ROOT], since it already has a pool assigned: 10, adding disk to VM
2018-10-01 14:29:22,998 DEBUG [network.router.VpcVirtualNetworkApplianceManagerImpl] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) Removing nic NicProfile[2234-1603-null-185.27.109.94-vlan://96 of type Public from the nics passed on vm start. The nic will be plugged later
2018-10-01 14:29:22,998 DEBUG [network.router.VpcVirtualNetworkApplianceManagerImpl] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) Removing nic NicProfile[2235-1603-d29a30cf-d4c8-405f-80c1-e3266242454d-10.23.1.1-vlan://1994 of type Guest from the nics passed on vm start. The nic will be plugged later
2018-10-01 14:29:23,002 INFO  [network.router.VirtualNetworkApplianceManagerImpl] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) Check if we need to add management server explicit route to DomR. pod cidr: 10.100.10.0/24, pod gateway: 10.100.10.1, management host: 10.100.10.100
2018-10-01 14:29:23,002 INFO  [network.router.VirtualNetworkApplianceManagerImpl] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) Add management server explicit route to DomR.
2018-10-01 14:29:23,004 DEBUG [network.router.VirtualNetworkApplianceManagerImpl] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) Boot Args for VM[DomainRouter|r-1603-VM]:  vpccidr=10.23.0.0/16 domain=ma-cloud.local dns1=193.218.103.253 dns2=193.218.104.190 privategateway=None template=domP name=r-1603-VM eth0ip=10.100.10.194 eth0mask=255.255.255.0 mgmtcidr=10.100.10.0/24 localgw=10.100.10.1 type=vpcrouter disable_rp_filter=true extra_pubnics=1
2018-10-01 14:29:23,076 DEBUG [cloud.network.NetworkModelImpl] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) Service SecurityGroup is not supported in the network id=386
2018-10-01 14:29:23,087 DEBUG [cloud.network.NetworkModelImpl] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) Service SecurityGroup is not supported in the network id=386
2018-10-01 14:29:23,097 DEBUG [cloud.network.NetworkModelImpl] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) Service SecurityGroup is not supported in the network id=386
2018-10-01 14:29:23,101 DEBUG [network.router.VpcVirtualNetworkApplianceManagerImpl] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) Found 0 static routes to apply as a part of vpc route VM[DomainRouter|r-1603-VM] start
2018-10-01 14:29:23,107 DEBUG [network.router.VpcVirtualNetworkApplianceManagerImpl] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) Found 1 ip(s) to apply as a part of domR VM[DomainRouter|r-1603-VM] start.
2018-10-01 14:29:23,119 DEBUG [network.router.VirtualNetworkApplianceManagerImpl] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) Resending ipAssoc, port forwarding, load balancing rules as a part of Virtual router start
2018-10-01 14:29:23,124 DEBUG [network.router.VirtualNetworkApplianceManagerImpl] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) Found 0 firewall Egress rule(s) to apply as a part of domR VM[DomainRouter|r-1603-VM] start.
2018-10-01 14:29:23,128 DEBUG [network.router.VirtualNetworkApplianceManagerImpl] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) Found 0 static nat(s) to apply as a part of domR VM[DomainRouter|r-1603-VM] start.
2018-10-01 14:29:23,128 DEBUG [network.router.VirtualNetworkApplianceManagerImpl] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) Found 0 firewall Ingress rule(s) to apply as a part of domR VM[DomainRouter|r-1603-VM] start.
2018-10-01 14:29:23,128 DEBUG [network.router.VirtualNetworkApplianceManagerImpl] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) Found 1 port forwarding rule(s) to apply as a part of domR VM[DomainRouter|r-1603-VM] start.
2018-10-01 14:29:23,131 DEBUG [network.router.VirtualNetworkApplianceManagerImpl] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) Found 0 static nat rule(s) to apply as a part of domR VM[DomainRouter|r-1603-VM] start.
2018-10-01 14:29:23,131 DEBUG [network.router.VirtualNetworkApplianceManagerImpl] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) Found 0 vpn(s) to apply as a part of domR VM[DomainRouter|r-1603-VM] start.
2018-10-01 14:29:23,132 DEBUG [network.router.VirtualNetworkApplianceManagerImpl] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) Found 0 load balancing rule(s) to apply as a part of domR VM[DomainRouter|r-1603-VM] start.
2018-10-01 14:29:23,135 DEBUG [network.router.VpcVirtualNetworkApplianceManagerImpl] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) Found 3 network ACLs to apply as a part of VPC VR VM[DomainRouter|r-1603-VM] start for guest network id=386
2018-10-01 14:29:23,143 DEBUG [cloud.network.NetworkModelImpl] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) Service SecurityGroup is not supported in the network id=386
2018-10-01 14:29:23,148 DEBUG [network.router.VirtualNetworkApplianceManagerImpl] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) Reapplying dhcp entries as a part of domR VM[DomainRouter|r-1603-VM] start...
2018-10-01 14:29:23,152 DEBUG [network.router.CommandSetupHelper] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) Creating dhcp entry for vm VM[User|i-321-1585-VM] on domR VM[DomainRouter|r-1603-VM].
2018-10-01 14:29:23,156 DEBUG [network.router.VirtualNetworkApplianceManagerImpl] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) Reapplying vm data (userData and metaData) entries as a part of domR VM[DomainRouter|r-1603-VM] start...
2018-10-01 14:29:23,159 DEBUG [network.router.CommandSetupHelper] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) Creating user data entry for vm VM[User|i-321-1585-VM] on domR VM[DomainRouter|r-1603-VM]
2018-10-01 14:29:23,172 DEBUG [agent.transport.Request] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) Seq 15-8691947280825083095: Sending  { Cmd , MgmtId: 345050582394, via: 15(ix-mapod01-esx03.stxt.media.int), Ver: v1, Flags: 100111, [{"com.cloud.agent.api.StartCommand":{"vm":{"id":1603,"name":"r-1603-VM","bootloader":"HVM","type":"DomainRouter","cpus":2,"minSpeed":0,"maxSpeed":500,"minRam":268435456,"maxRam":536870912,"hostName":"r-1603-VM","arch":"x86_64","os":"Debian GNU/Linux 7(64-bit)","platformEmulator":"debian7_64Guest","bootArgs":" vpccidr=10.23.0.0/16 domain=ma-cloud.local dns1=193.218.103.253 dns2=193.218.104.190 privategateway=None template=domP name=r-1603-VM eth0ip=10.100.10.194 eth0mask=255.255.255.0 mgmtcidr=10.100.10.0/24 localgw=10.100.10.1 type=vpcrouter disable_rp_filter=true extra_pubnics=1 nic_macs=02:00:59:6a:13:2b","enableHA":true,"limitCpuUse":false,"enableDynamicallyScaleVm":false,"vncPassword":"JVjfXTAacEF0Vbi5ac_71A","params":{"cpuOvercommitRatio":"8000.0","vmware.reserve.mem":"false","vmware.reserve.cpu":"false","memoryOvercommitRatio":"2.0","nicAdapter":"E1000"},"uuid":"19ab7474-3185-4014-9943-6a7df2e083be","disks":[{"data":{"org.apache.cloudstack.storage.to.VolumeObjectTO":{"uuid":"8fe5a473-f5f5-445f-9f22-9486d932c02f","volumeType":"ROOT","dataStore":{"org.apache.cloudstack.storage.to.PrimaryDataStoreTO":{"uuid":"12fa9615-b85d-3ede-ae6d-562ec6b67227","id":10,"poolType":"VMFS","host":"VMFS datastore: /MA_CLOUD_ZUERICH/Perf01-ix-stor-003","path":"/MA_CLOUD_ZUERICH/Perf01-ix-stor-003","port":0,"url":"VMFS://VMFS datastore: /MA_CLOUD_ZUERICH/Perf01-ix-stor-003/MA_CLOUD_ZUERICH/Perf01-ix-stor-003/?ROLE=Primary&STOREUUID=12fa9615-b85d-3ede-ae6d-562ec6b67227","isManaged":false}},"name":"ROOT-1603","size":1887436800,"path":"ROOT-1603","volumeId":2022,"vmName":"r-1603-VM","accountId":321,"chainInfo":"{\"diskDeviceBusName\":\"scsi0:0\",\"diskChain\":[\"[Perf01-ix-stor-003] r-1603-VM/ROOT-1603.vmdk\"]}","format":"OVA","provisioningType":"THIN","id":2022,"deviceId":0,"bytesReadRate":0,"bytesWriteRate":0,"iopsReadRate":0,"iopsWriteRate":0,"cacheMode":"NONE","hypervisorType":"VMware"}},"diskSeq":0,"path":"ROOT-1603","type":"ROOT","_details":{"storageHost":"VMFS datastore: /MA_CLOUD_ZUERICH/Perf01-ix-stor-003","managed":"false","storagePort":"0","volumeSize":"1887436800"}}],"nics":[{"deviceId":0,"networkRateMbps":-1,"defaultNic":false,"pxeDisable":true,"nicUuid":"5a7ce139-1e3c-4e4b-9a9e-ad4cb45dd3a8","details":{"MacAddressChanges":"true","PromiscuousMode":"false","ForgedTransmits":"true"},"uuid":"ee622fdf-299d-4857-ab16-b562378544d3","ip":"10.100.10.194","netmask":"255.255.255.0","gateway":"10.100.10.1","mac":"02:00:59:6a:13:2b","broadcastType":"Native","type":"Control","isSecurityGroupEnabled":false,"name":"vSwitch0"}],"guestOsDetails":{}},"hostIp":"10.100.10.13","executeInSequence":true,"wait":0}},{"com.cloud.agent.api.check.CheckSshCommand":{"ip":"10.100.10.194","port":3922,"interval":6,"retries":100,"name":"r-1603-VM","wait":0}},{"com.cloud.agent.api.GetDomRVersionCmd":{"accessDetails":{"router.name":"r-1603-VM","router.ip":"10.100.10.194"},"wait":0}},{"com.cloud.agent.api.PlugNicCommand":{"nic":{"deviceId":1,"networkRateMbps":10000,"defaultNic":true,"pxeDisable":true,"nicUuid":"74995950-ed9b-47f8-b65b-938e1bebb88f","uuid":"b7d7d43a-2b8e-4149-bff5-818a484f0010","ip":"185.27.109.94","netmask":"255.255.255.0","gateway":"185.27.109.1","mac":"1e:00:8e:00:02:25","broadcastType":"Vlan","type":"Public","broadcastUri":"vlan://96","isolationUri":"vlan://96","isSecurityGroupEnabled":false,"name":"vSwitch3,,vmwaresvs"},"instanceName":"r-1603-VM","vmType":"DomainRouter","details":{"cpuOvercommitRatio":"8000.0","vmware.reserve.mem":"false","vmware.reserve.cpu":"false","memoryOvercommitRatio":"2.0","nicAdapter":"E1000"},"wait":0}},{"com.cloud.agent.api.routing.IpAssocVpcCommand":{"ipAddresses":[{"accountId":321,"publicIp":"185.27.109.94","sourceNat":true,"add":true,"oneToOneNat":false,"firstIP":true,"broadcastUri":"vlan://96","vlanGateway":"185.27.109.1","vlanNetmask":"255.255.255.0","vifMacAddress":"1e:00:8e:00:02:25","networkRate":10000,"trafficType":"Public","networkName":"vSwitch3,,vmwaresvs","newNic":false}],"accessDetails":{"router.name":"r-1603-VM","router.guest.ip":"185.27.109.94","router.ip":"10.100.10.194","zone.network.type":"Advanced"},"wait":0}},{"com.cloud.agent.api.routing.SetSourceNatCommand":{"ipAddress":{"accountId":321,"publicIp":"185.27.109.94","sourceNat":true,"add":true,"oneToOneNat":false,"firstIP":true,"broadcastUri":"vlan://96","vlanGateway":"185.27.109.1","vlanNetmask":"255.255.255.0","vifMacAddress":"1e:00:8e:00:02:25","networkRate":10000,"trafficType":"Public","networkName":"vSwitch3,,vmwaresvs","newNic":false},"add":true,"accessDetails":{"router.name":"r-1603-VM","router.ip":"10.100.10.194","zone.network.type":"Advanced"},"wait":0}},{"com.cloud.agent.api.PlugNicCommand":{"nic":{"deviceId":2,"networkRateMbps":10000,"defaultNic":false,"pxeDisable":true,"nicUuid":"0ab4e974-413f-4286-9226-079456ea9c53","uuid":"f0aed162-a496-47c2-853f-077f3ea3c345","ip":"10.23.1.1","netmask":"255.255.255.0","mac":"02:00:4d:44:00:05","broadcastType":"Vlan","type":"Guest","broadcastUri":"vlan://1994","isolationUri":"vlan://1994","isSecurityGroupEnabled":false,"name":"vSwitch1,,vmwaresvs"},"instanceName":"r-1603-VM","vmType":"DomainRouter","details":{"cpuOvercommitRatio":"8000.0","vmware.reserve.mem":"false","vmware.reserve.cpu":"false","memoryOvercommitRatio":"2.0","nicAdapter":"E1000"},"wait":0}},{"com.cloud.agent.api.SetupGuestNetworkCommand":{"dhcpRange":"10.23.1.1","networkDomain":"ma-cloud.local","isRedundant":false,"add":true,"nic":{"deviceId":2,"networkRateMbps":10000,"defaultNic":false,"pxeDisable":true,"nicUuid":"0ab4e974-413f-4286-9226-079456ea9c53","uuid":"f0aed162-a496-47c2-853f-077f3ea3c345","ip":"10.23.1.1","netmask":"255.255.255.0","mac":"02:00:4d:44:00:05","broadcastType":"Vlan","type":"Guest","broadcastUri":"vlan://1994","isolationUri":"vlan://1994","isSecurityGroupEnabled":false,"name":"vSwitch1,,vmwaresvs"},"accessDetails":{"router.name":"r-1603-VM","guest.network.gateway":"10.23.1.1","router.guest.ip":"10.23.1.1","router.ip":"10.100.10.194","guest.bridge":"10.23.1.255","guest.vlan.tag":"1994"},"wait":0}},{"com.cloud.agent.api.routing.AggregationControlCommand":{"action":"Start","accessDetails":{"router.name":"r-1603-VM","router.ip":"10.100.10.194","router.guest.ip":"10.23.1.1"},"wait":0}},{"com.cloud.agent.api.routing.IpAssocVpcCommand":{"ipAddresses":[{"accountId":321,"publicIp":"185.27.109.95","sourceNat":true,"add":true,"oneToOneNat":false,"firstIP":true,"broadcastUri":"vlan://96","vlanGateway":"185.27.109.1","vlanNetmask":"255.255.255.0","vifMacAddress":"1e:00:8e:00:02:25","networkRate":10000,"trafficType":"Public","networkName":"vSwitch3,,vmwaresvs","newNic":false}],"accessDetails":{"router.name":"r-1603-VM","router.guest.ip":"185.27.109.94","router.ip":"10.100.10.194","zone.network.type":"Advanced"},"wait":0}},{"com.cloud.agent.api.routing.SetPortForwardingRulesVpcCommand":{"rules":[{"dstIp":"10.23.1.111","dstPortRange":[22,22],"id":2538,"srcIp":"185.27.109.95","protocol":"tcp","srcPortRange":[22,22],"revoked":false,"alreadyAdded":true,"purpose":"PortForwarding","defaultEgressPolicy":false}],"accessDetails":{"router.name":"r-1603-VM","router.guest.ip":"10.23.1.1","router.ip":"10.100.10.194","zone.network.type":"Advanced"},"wait":0}},{"com.cloud.agent.api.routing.SetNetworkACLCommand":{"rules":[{"id":0,"vlanTag":"1994","protocol":"tcp","portRange":[443,443],"revoked":false,"alreadyAdded":true,"cidrList":["0.0.0.0/0"],"trafficType":"Egress","action":"ACCEPT","number":1},{"id":0,"vlanTag":"1994","protocol":"tcp","portRange":[22,22],"revoked":false,"alreadyAdded":true,"cidrList":["0.0.0.0/0"],"trafficType":"Ingress","action":"ACCEPT","number":102},{"id":0,"vlanTag":"1994","protocol":"all","revoked":false,"alreadyAdded":true,"cidrList":["0.0.0.0/0"],"trafficType":"Egress","action":"ACCEPT","number":9999}],"nic":{"deviceId":2,"networkRateMbps":10000,"defaultNic":false,"pxeDisable":true,"nicUuid":"0ab4e974-413f-4286-9226-079456ea9c53","uuid":"f0aed162-a496-47c2-853f-077f3ea3c345","ip":"10.23.1.1","netmask":"255.255.255.0","mac":"02:00:4d:44:00:05","broadcastType":"Vlan","type":"Guest","broadcastUri":"vlan://1994","isolationUri":"vlan://1994","isSecurityGroupEnabled":false,"name":"vSwitch1,,vmwaresvs"},"accessDetails":{"router.name":"r-1603-VM","router.guest.ip":"10.23.1.1","router.ip":"10.100.10.194","zone.network.type":"Advanced","guest.vlan.tag":"1994"},"wait":0}},{"com.cloud.agent.api.routing.DhcpEntryCommand":{"vmMac":"02:00:18:d1:00:01","vmIpAddress":"10.23.1.111","vmName":"vm-1","defaultRouter":"10.23.1.1","duid":"00:03:00:01:02:00:18:d1:00:01","isDefault":true,"executeInSequence":true,"accessDetails":{"router.name":"r-1603-VM","router.guest.ip":"10.23.1.1","router.ip":"10.100.10.194","zone.network.type":"Advanced"},"wait":0}},{"com.cloud.agent.api.routing.VmDataCommand":{"vmIpAddress":"10.23.1.111","vmName":"vm-1","executeInSequence":true,"accessDetails":{"router.name":"r-1603-VM","router.guest.ip":"10.23.1.1","router.ip":"10.100.10.194","zone.network.type":"Advanced"},"wait":0}},{"com.cloud.agent.api.routing.AggregationControlCommand":{"action":"Finish","accessDetails":{"router.name":"r-1603-VM","router.ip":"10.100.10.194","router.guest.ip":"10.23.1.1"},"wait":0}},{}] }
2018-10-01 14:29:23,173 DEBUG [agent.transport.Request] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) Seq 15-8691947280825083095: Executing:  { Cmd , MgmtId: 345050582394, via: 15(ix-mapod01-esx03.stxt.media.int), Ver: v1, Flags: 100111, [{"com.cloud.agent.api.StartCommand":{"vm":{"id":1603,"name":"r-1603-VM","bootloader":"HVM","type":"DomainRouter","cpus":2,"minSpeed":0,"maxSpeed":500,"minRam":268435456,"maxRam":536870912,"hostName":"r-1603-VM","arch":"x86_64","os":"Debian GNU/Linux 7(64-bit)","platformEmulator":"debian7_64Guest","bootArgs":" vpccidr=10.23.0.0/16 domain=ma-cloud.local dns1=193.218.103.253 dns2=193.218.104.190 privategateway=None template=domP name=r-1603-VM eth0ip=10.100.10.194 eth0mask=255.255.255.0 mgmtcidr=10.100.10.0/24 localgw=10.100.10.1 type=vpcrouter disable_rp_filter=true extra_pubnics=1 nic_macs=02:00:59:6a:13:2b","enableHA":true,"limitCpuUse":false,"enableDynamicallyScaleVm":false,"vncPassword":"JVjfXTAacEF0Vbi5ac_71A","params":{"cpuOvercommitRatio":"8000.0","vmware.reserve.mem":"false","vmware.reserve.cpu":"false","memoryOvercommitRatio":"2.0","nicAdapter":"E1000"},"uuid":"19ab7474-3185-4014-9943-6a7df2e083be","disks":[{"data":{"org.apache.cloudstack.storage.to.VolumeObjectTO":{"uuid":"8fe5a473-f5f5-445f-9f22-9486d932c02f","volumeType":"ROOT","dataStore":{"org.apache.cloudstack.storage.to.PrimaryDataStoreTO":{"uuid":"12fa9615-b85d-3ede-ae6d-562ec6b67227","id":10,"poolType":"VMFS","host":"VMFS datastore: /MA_CLOUD_ZUERICH/Perf01-ix-stor-003","path":"/MA_CLOUD_ZUERICH/Perf01-ix-stor-003","port":0,"url":"VMFS://VMFS datastore: /MA_CLOUD_ZUERICH/Perf01-ix-stor-003/MA_CLOUD_ZUERICH/Perf01-ix-stor-003/?ROLE=Primary&STOREUUID=12fa9615-b85d-3ede-ae6d-562ec6b67227","isManaged":false}},"name":"ROOT-1603","size":1887436800,"path":"ROOT-1603","volumeId":2022,"vmName":"r-1603-VM","accountId":321,"chainInfo":"{\"diskDeviceBusName\":\"scsi0:0\",\"diskChain\":[\"[Perf01-ix-stor-003] r-1603-VM/ROOT-1603.vmdk\"]}","format":"OVA","provisioningType":"THIN","id":2022,"deviceId":0,"bytesReadRate":0,"bytesWriteRate":0,"iopsReadRate":0,"iopsWriteRate":0,"cacheMode":"NONE","hypervisorType":"VMware"}},"diskSeq":0,"path":"ROOT-1603","type":"ROOT","_details":{"storageHost":"VMFS datastore: /MA_CLOUD_ZUERICH/Perf01-ix-stor-003","managed":"false","storagePort":"0","volumeSize":"1887436800"}}],"nics":[{"deviceId":0,"networkRateMbps":-1,"defaultNic":false,"pxeDisable":true,"nicUuid":"5a7ce139-1e3c-4e4b-9a9e-ad4cb45dd3a8","details":{"MacAddressChanges":"true","PromiscuousMode":"false","ForgedTransmits":"true"},"uuid":"ee622fdf-299d-4857-ab16-b562378544d3","ip":"10.100.10.194","netmask":"255.255.255.0","gateway":"10.100.10.1","mac":"02:00:59:6a:13:2b","broadcastType":"Native","type":"Control","isSecurityGroupEnabled":false,"name":"vSwitch0"}],"guestOsDetails":{}},"hostIp":"10.100.10.13","executeInSequence":true,"wait":0}},{"com.cloud.agent.api.check.CheckSshCommand":{"ip":"10.100.10.194","port":3922,"interval":6,"retries":100,"name":"r-1603-VM","wait":0}},{"com.cloud.agent.api.GetDomRVersionCmd":{"accessDetails":{"router.name":"r-1603-VM","router.ip":"10.100.10.194"},"wait":0}},{"com.cloud.agent.api.PlugNicCommand":{"nic":{"deviceId":1,"networkRateMbps":10000,"defaultNic":true,"pxeDisable":true,"nicUuid":"74995950-ed9b-47f8-b65b-938e1bebb88f","uuid":"b7d7d43a-2b8e-4149-bff5-818a484f0010","ip":"185.27.109.94","netmask":"255.255.255.0","gateway":"185.27.109.1","mac":"1e:00:8e:00:02:25","broadcastType":"Vlan","type":"Public","broadcastUri":"vlan://96","isolationUri":"vlan://96","isSecurityGroupEnabled":false,"name":"vSwitch3,,vmwaresvs"},"instanceName":"r-1603-VM","vmType":"DomainRouter","details":{"cpuOvercommitRatio":"8000.0","vmware.reserve.mem":"false","vmware.reserve.cpu":"false","memoryOvercommitRatio":"2.0","nicAdapter":"E1000"},"wait":0}},{"com.cloud.agent.api.routing.IpAssocVpcCommand":{"ipAddresses":[{"accountId":321,"publicIp":"185.27.109.94","sourceNat":true,"add":true,"oneToOneNat":false,"firstIP":true,"broadcastUri":"vlan://96","vlanGateway":"185.27.109.1","vlanNetmask":"255.255.255.0","vifMacAddress":"1e:00:8e:00:02:25","networkRate":10000,"trafficType":"Public","networkName":"vSwitch3,,vmwaresvs","newNic":false}],"accessDetails":{"router.name":"r-1603-VM","router.guest.ip":"185.27.109.94","router.ip":"10.100.10.194","zone.network.type":"Advanced"},"wait":0}},{"com.cloud.agent.api.routing.SetSourceNatCommand":{"ipAddress":{"accountId":321,"publicIp":"185.27.109.94","sourceNat":true,"add":true,"oneToOneNat":false,"firstIP":true,"broadcastUri":"vlan://96","vlanGateway":"185.27.109.1","vlanNetmask":"255.255.255.0","vifMacAddress":"1e:00:8e:00:02:25","networkRate":10000,"trafficType":"Public","networkName":"vSwitch3,,vmwaresvs","newNic":false},"add":true,"accessDetails":{"router.name":"r-1603-VM","router.ip":"10.100.10.194","zone.network.type":"Advanced"},"wait":0}},{"com.cloud.agent.api.PlugNicCommand":{"nic":{"deviceId":2,"networkRateMbps":10000,"defaultNic":false,"pxeDisable":true,"nicUuid":"0ab4e974-413f-4286-9226-079456ea9c53","uuid":"f0aed162-a496-47c2-853f-077f3ea3c345","ip":"10.23.1.1","netmask":"255.255.255.0","mac":"02:00:4d:44:00:05","broadcastType":"Vlan","type":"Guest","broadcastUri":"vlan://1994","isolationUri":"vlan://1994","isSecurityGroupEnabled":false,"name":"vSwitch1,,vmwaresvs"},"instanceName":"r-1603-VM","vmType":"DomainRouter","details":{"cpuOvercommitRatio":"8000.0","vmware.reserve.mem":"false","vmware.reserve.cpu":"false","memoryOvercommitRatio":"2.0","nicAdapter":"E1000"},"wait":0}},{"com.cloud.agent.api.SetupGuestNetworkCommand":{"dhcpRange":"10.23.1.1","networkDomain":"ma-cloud.local","isRedundant":false,"add":true,"nic":{"deviceId":2,"networkRateMbps":10000,"defaultNic":false,"pxeDisable":true,"nicUuid":"0ab4e974-413f-4286-9226-079456ea9c53","uuid":"f0aed162-a496-47c2-853f-077f3ea3c345","ip":"10.23.1.1","netmask":"255.255.255.0","mac":"02:00:4d:44:00:05","broadcastType":"Vlan","type":"Guest","broadcastUri":"vlan://1994","isolationUri":"vlan://1994","isSecurityGroupEnabled":false,"name":"vSwitch1,,vmwaresvs"},"accessDetails":{"router.name":"r-1603-VM","guest.network.gateway":"10.23.1.1","router.guest.ip":"10.23.1.1","router.ip":"10.100.10.194","guest.bridge":"10.23.1.255","guest.vlan.tag":"1994"},"wait":0}},{"com.cloud.agent.api.routing.AggregationControlCommand":{"action":"Start","accessDetails":{"router.name":"r-1603-VM","router.ip":"10.100.10.194","router.guest.ip":"10.23.1.1"},"wait":0}},{"com.cloud.agent.api.routing.IpAssocVpcCommand":{"ipAddresses":[{"accountId":321,"publicIp":"185.27.109.95","sourceNat":true,"add":true,"oneToOneNat":false,"firstIP":true,"broadcastUri":"vlan://96","vlanGateway":"185.27.109.1","vlanNetmask":"255.255.255.0","vifMacAddress":"1e:00:8e:00:02:25","networkRate":10000,"trafficType":"Public","networkName":"vSwitch3,,vmwaresvs","newNic":false}],"accessDetails":{"router.name":"r-1603-VM","router.guest.ip":"185.27.109.94","router.ip":"10.100.10.194","zone.network.type":"Advanced"},"wait":0}},{"com.cloud.agent.api.routing.SetPortForwardingRulesVpcCommand":{"rules":[{"dstIp":"10.23.1.111","dstPortRange":[22,22],"id":2538,"srcIp":"185.27.109.95","protocol":"tcp","srcPortRange":[22,22],"revoked":false,"alreadyAdded":true,"purpose":"PortForwarding","defaultEgressPolicy":false}],"accessDetails":{"router.name":"r-1603-VM","router.guest.ip":"10.23.1.1","router.ip":"10.100.10.194","zone.network.type":"Advanced"},"wait":0}},{"com.cloud.agent.api.routing.SetNetworkACLCommand":{"rules":[{"id":0,"vlanTag":"1994","protocol":"tcp","portRange":[443,443],"revoked":false,"alreadyAdded":true,"cidrList":["0.0.0.0/0"],"trafficType":"Egress","action":"ACCEPT","number":1},{"id":0,"vlanTag":"1994","protocol":"tcp","portRange":[22,22],"revoked":false,"alreadyAdded":true,"cidrList":["0.0.0.0/0"],"trafficType":"Ingress","action":"ACCEPT","number":102},{"id":0,"vlanTag":"1994","protocol":"all","revoked":false,"alreadyAdded":true,"cidrList":["0.0.0.0/0"],"trafficType":"Egress","action":"ACCEPT","number":9999}],"nic":{"deviceId":2,"networkRateMbps":10000,"defaultNic":false,"pxeDisable":true,"nicUuid":"0ab4e974-413f-4286-9226-079456ea9c53","uuid":"f0aed162-a496-47c2-853f-077f3ea3c345","ip":"10.23.1.1","netmask":"255.255.255.0","mac":"02:00:4d:44:00:05","broadcastType":"Vlan","type":"Guest","broadcastUri":"vlan://1994","isolationUri":"vlan://1994","isSecurityGroupEnabled":false,"name":"vSwitch1,,vmwaresvs"},"accessDetails":{"router.name":"r-1603-VM","router.guest.ip":"10.23.1.1","router.ip":"10.100.10.194","zone.network.type":"Advanced","guest.vlan.tag":"1994"},"wait":0}},{"com.cloud.agent.api.routing.DhcpEntryCommand":{"vmMac":"02:00:18:d1:00:01","vmIpAddress":"10.23.1.111","vmName":"vm-1","defaultRouter":"10.23.1.1","duid":"00:03:00:01:02:00:18:d1:00:01","isDefault":true,"executeInSequence":true,"accessDetails":{"router.name":"r-1603-VM","router.guest.ip":"10.23.1.1","router.ip":"10.100.10.194","zone.network.type":"Advanced"},"wait":0}},{"com.cloud.agent.api.routing.VmDataCommand":{"vmIpAddress":"10.23.1.111","vmName":"vm-1","executeInSequence":true,"accessDetails":{"router.name":"r-1603-VM","router.guest.ip":"10.23.1.1","router.ip":"10.100.10.194","zone.network.type":"Advanced"},"wait":0}},{"com.cloud.agent.api.routing.AggregationControlCommand":{"action":"Finish","accessDetails":{"router.name":"r-1603-VM","router.ip":"10.100.10.194","router.guest.ip":"10.23.1.1"},"wait":0}},{}] }
2018-10-01 14:40:18,591 DEBUG [agent.transport.Request] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) Seq 15-8691947280825083095: Received:  { Ans: , MgmtId: 345050582394, via: 15(ix-mapod01-esx03.stxt.media.int), Ver: v1, Flags: 110, { StartAnswer, CheckSshAnswer, GetDomRVersionAnswer, PlugNicAnswer, GroupAnswer } }
2018-10-01 14:40:18,599 WARN  [apache.cloudstack.alerts] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) AlertType:: 9 | dataCenterId:: 1 | podId:: 1 | clusterId:: null | message:: Command: com.cloud.agent.api.routing.GroupCommand failed while starting virtual router
2018-10-01 14:40:18,605 ERROR [network.router.VirtualNetworkApplianceManagerImpl] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) null
2018-10-01 14:40:18,605 WARN  [network.router.VirtualNetworkApplianceManagerImpl] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) Command: com.cloud.agent.api.routing.GroupCommand failed while starting virtual router
2018-10-01 14:40:18,605 INFO  [cloud.vm.VirtualMachineManagerImpl] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) The guru did not like the answers so stopping VM[DomainRouter|r-1603-VM]
2018-10-01 14:40:18,609 DEBUG [agent.transport.Request] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) Seq 15-8691947280825083166: Sending  { Cmd , MgmtId: 345050582394, via: 15(ix-mapod01-esx03.stxt.media.int), Ver: v1, Flags: 100011, [{"com.cloud.agent.api.StopCommand":{"isProxy":false,"checkBeforeCleanup":false,"controlIp":"10.100.10.137","forceStop":false,"volumesToDisconnect":[],"vmName":"r-1603-VM","executeInSequence":true,"wait":0}}] }
2018-10-01 14:40:18,609 DEBUG [agent.transport.Request] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) Seq 15-8691947280825083166: Executing:  { Cmd , MgmtId: 345050582394, via: 15(ix-mapod01-esx03.stxt.media.int), Ver: v1, Flags: 100011, [{"com.cloud.agent.api.StopCommand":{"isProxy":false,"checkBeforeCleanup":false,"controlIp":"10.100.10.137","forceStop":false,"volumesToDisconnect":[],"vmName":"r-1603-VM","executeInSequence":true,"wait":0}}] }
2018-10-01 14:40:34,038 DEBUG [agent.transport.Request] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) Seq 15-8691947280825083166: Received:  { Ans: , MgmtId: 345050582394, via: 15(ix-mapod01-esx03.stxt.media.int), Ver: v1, Flags: 10, { StopAnswer } }
2018-10-01 14:40:34,038 DEBUG [agent.manager.AgentManagerImpl] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) Details from executing class com.cloud.agent.api.StopCommand: Stop VM r-1603-VM Succeed
2018-10-01 14:40:34,038 ERROR [cloud.vm.VirtualMachineManagerImpl] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) Failed to start instance VM[DomainRouter|r-1603-VM]
2018-10-01 14:40:34,044 DEBUG [cloud.vm.VirtualMachineManagerImpl] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) Cleaning up resources for the vm VM[DomainRouter|r-1603-VM] in Starting state
2018-10-01 14:40:34,050 DEBUG [agent.transport.Request] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) Seq 15-8691947280825083168: Sending  { Cmd , MgmtId: 345050582394, via: 15(ix-mapod01-esx03.stxt.media.int), Ver: v1, Flags: 100011, [{"com.cloud.agent.api.StopCommand":{"isProxy":false,"checkBeforeCleanup":false,"controlIp":"10.100.10.137","forceStop":false,"volumesToDisconnect":[],"vmName":"r-1603-VM","executeInSequence":true,"wait":0}}] }
2018-10-01 14:40:34,050 DEBUG [agent.transport.Request] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) Seq 15-8691947280825083168: Executing:  { Cmd , MgmtId: 345050582394, via: 15(ix-mapod01-esx03.stxt.media.int), Ver: v1, Flags: 100011, [{"com.cloud.agent.api.StopCommand":{"isProxy":false,"checkBeforeCleanup":false,"controlIp":"10.100.10.137","forceStop":false,"volumesToDisconnect":[],"vmName":"r-1603-VM","executeInSequence":true,"wait":0}}] }
2018-10-01 14:40:34,131 DEBUG [agent.transport.Request] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) Seq 15-8691947280825083168: Received:  { Ans: , MgmtId: 345050582394, via: 15(ix-mapod01-esx03.stxt.media.int), Ver: v1, Flags: 10, { StopAnswer } }
2018-10-01 14:40:34,136 DEBUG [network.router.VirtualNetworkApplianceManagerImpl] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) Successfully updated user statistics as a part of domR VM[DomainRouter|r-1603-VM] reboot/stop
2018-10-01 14:40:34,155 DEBUG [dc.dao.DataCenterIpAddressDaoImpl] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) Releasing ip address for reservationId=d29a30cf-d4c8-405f-80c1-e3266242454d, instance=2233
2018-10-01 14:40:34,155 DEBUG [network.guru.PodBasedNetworkGuru] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) Released nic: NicProfile[2233-1603-null-null-null
2018-10-01 14:40:34,155 DEBUG [network.guru.ControlNetworkGuru] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) Released nic: NicProfile[2233-1603-null-null-null
2018-10-01 14:40:34,169 DEBUG [cloud.network.NetworkModelImpl] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) Service SecurityGroup is not supported in the network id=386
2018-10-01 14:40:34,176 DEBUG [engine.orchestration.NetworkOrchestrator] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) Asking VpcVirtualRouter to release NicProfile[2235-1603-d29a30cf-d4c8-405f-80c1-e3266242454d-10.23.1.1-null
2018-10-01 14:40:34,176 DEBUG [cloud.vm.VirtualMachineManagerImpl] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) Successfully released network resources for the vm VM[DomainRouter|r-1603-VM]
2018-10-01 14:40:34,176 DEBUG [cloud.vm.VirtualMachineManagerImpl] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) Successfully cleanued up resources for the vm VM[DomainRouter|r-1603-VM] in Starting state
2018-10-01 14:40:34,187 DEBUG [cloud.capacity.CapacityManagerImpl] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) VM state transitted from :Starting to Stopped with event: OperationFailedvm's original host id: 15 new host id: null host id before state transition: 15
2018-10-01 14:40:34,194 DEBUG [cloud.capacity.CapacityManagerImpl] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) Hosts's actual total CPU: 45600 and CPU after applying overprovisioning: 364800000
2018-10-01 14:40:34,194 DEBUG [cloud.capacity.CapacityManagerImpl] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) Hosts's actual total RAM: 206085242880 and RAM after applying overprovisioning: 412170485760
2018-10-01 14:40:34,194 DEBUG [cloud.capacity.CapacityManagerImpl] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) release cpu from host: 15, old used: 136200,reserved: 1800, actual total: 45600, total with overprovisioning: 364800000; new used: 135200,reserved:1800; movedfromreserved: false,moveToReserveredfalse
2018-10-01 14:40:34,194 DEBUG [cloud.capacity.CapacityManagerImpl] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) release mem from host: 15, old used: 86872424448,reserved: 2147483648, total: 412170485760; new used: 86335553536,reserved:2147483648; movedfromreserved: false,moveToReserveredfalse
2018-10-01 14:40:34,199 ERROR [cloud.vm.VmWorkJobHandlerProxy] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) Invocation exception, caused by: com.cloud.exception.AgentUnavailableException: Resource [Host:15] is unreachable: Host 15: Unable to start instance due to Unable to start  VM:19ab7474-3185-4014-9943-6a7df2e083be due to error in finalizeStart, not retrying
2018-10-01 14:40:34,200 INFO  [cloud.vm.VmWorkJobHandlerProxy] (Work-Job-Executor-35:ctx-29f6e97a job-32465/job-32466 ctx-dae56f41) Rethrow exception com.cloud.exception.AgentUnavailableException: Resource [Host:15] is unreachable: Host 15: Unable to start instance due to Unable to start  VM:19ab7474-3185-4014-9943-6a7df2e083be due to error in finalizeStart, not retrying

@rohityadavcloud
Copy link
Member

@resmo can you confirm if you tested 4.11.1.0 or the 4.11.2.0 RC2? I'm unable to reproduce this at least on KVM with 4.11.2.0 RC2, will test against vmware 5.5 next.

@rohityadavcloud
Copy link
Member

@izenk apache2 should be started by the post cloud-init script. Can you share output of systemctl status apache2, what error do you see? I could not reproduce the issue with either stop+start or rebooting of the VPC VR, it came up OK for me in case of CentOS7 KVM host. For the broken VR, can you share its /var/log/cloud.log file?

@resmo
Copy link
Member

resmo commented Oct 5, 2018

@rhtyd its 4.11.1.0

@rohityadavcloud
Copy link
Member

Thanks @resmo. @resmo @izenk can you describe your VPC (or isolated network) setups, do you have private gateway and/or additional services (vpn etc) or rules etc? In my two test setups (CentOS7 KVM and VMware 5.5 based), I could not reproduce the issue against latest 4.11.2.0 rc2 (latest 4.11 branch). About the apache2 process blocking VRs from starting, we've fixed that in 4.11.1.0 already, I could no longer reproduce that on vmware. /cc @PaulAngus

@rohityadavcloud
Copy link
Member

@resmo we'll need the /var/log/cloud.log from the VR when this happens as well as the management server logs when you reproduce the next time.

@rohityadavcloud
Copy link
Member

@resmo we'll need additional logs from an env/VRs where you're seeing the error to debug/investigate and fix this issue.

@rohityadavcloud
Copy link
Member

Unless, VR logs may be available we're unable to investigate and fix this problem @izenk @resmo thanks.

@izenk
Copy link
Author

izenk commented Oct 16, 2018

@rhtyd
I reproduced the problem.
Management log:

2018-10-16 11:23:19,102 ERROR [c.c.n.r.VirtualNetworkApplianceManagerImpl] (Work-Job-Executor-147:ctx-edf7f753 job-3221/job-3222 ctx-ea
279b8a) (logid:e2c57742) process hasn't exited
2018-10-16 11:23:19,102 WARN  [c.c.n.r.VirtualNetworkApplianceManagerImpl] (Work-Job-Executor-147:ctx-edf7f753 job-3221/job-3222 ctx-ea
279b8a) (logid:e2c57742) Command: com.cloud.agent.api.Command failed while starting virtual router
2018-10-16 11:23:20,014 DEBUG [o.a.c.m.PrometheusExporterServerImpl] (Thread-26:null) (logid:) Prometheus exporter received client request from: 127.0.0.1
2018-10-16 11:23:20,210 WARN  [o.a.c.f.j.i.AsyncJobMonitor] (Timer-1:ctx-7e4bc9be) (logid:9c2c84b7) Task (job-3221) has been pending for 721 seconds
2018-10-16 11:23:20,210 WARN  [o.a.c.f.j.i.AsyncJobMonitor] (Timer-1:ctx-7e4bc9be) (logid:9c2c84b7) Task (job-3222) has been pending for 720 seconds
2018-10-16 11:23:20,554 DEBUG [o.a.c.s.SecondaryStorageManagerImpl] (secstorage-1:ctx-0e1cf9c6) (logid:377fefe1) Zone 1 is ready to launch secondary storage VM
2018-10-16 11:23:21,482 INFO  [o.a.c.f.j.i.AsyncJobManagerImpl] (AsyncJobMgr-Heartbeat-1:ctx-b70ac705) (logid:c2946d52) Begin cleanup expired async-jobs
2018-10-16 11:23:21,487 INFO  [o.a.c.f.j.i.AsyncJobManagerImpl] (AsyncJobMgr-Heartbeat-1:ctx-b70ac705) (logid:c2946d52) End cleanup expired async-jobs
2018-10-16 11:23:21,648 WARN  [c.c.a.d.ParamGenericValidationWorker] (qtp1595953398-787:ctx-149b3355 ctx-c8a9574f) (logid:9aa313e4) Received unknown parameters for command queryAsyncJobResult. Unknown parameters : projectid
2018-10-16 11:23:21,678 ERROR [c.c.v.VirtualMachineManagerImpl] (Work-Job-Executor-147:ctx-edf7f753 job-3221/job-3222 ctx-ea279b8a) (logid:e2c57742) Failed to start instance VM[DomainRouter|r-191-VM]
com.cloud.utils.exception.ExecutionException: Unable to start  VM:0832cd60-327d-416f-8264-cbf5df37d921 due to error in finalizeStart, not retrying
        at com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:1223)
        at com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:4930)
        at sun.reflect.GeneratedMethodAccessor536.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at com.cloud.vm.VmWorkJobHandlerProxy.handleVmWorkJob(VmWorkJobHandlerProxy.java:107)
        at com.cloud.vm.VirtualMachineManagerImpl.handleVmWorkJob(VirtualMachineManagerImpl.java:5093)
        at com.cloud.vm.VmWorkJobDispatcher.runJob(VmWorkJobDispatcher.java:102)
        at org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.runInContext(AsyncJobManagerImpl.java:581)
        at org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
        at org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
        at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
        at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
        at org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
        at org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.run(AsyncJobManagerImpl.java:529)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

cloud.log from VR is also attached
some notes: attached log finishes on line:

2018-10-16 09:13:20,611  CsHelper.py execute:188 Executing: mount

This is the last line before I got exception above.
What I see it's that VR didn't get Link Local network. It got Link Local IP, but there was no connection from host to VR through Link Local IP. I can enter VR through VNC from host only.

cloud.log

@rohityadavcloud
Copy link
Member

Thanks @izenk let me have a look

@rohityadavcloud
Copy link
Member

@izenk can you describe your KVM setup, are you using bridge based networking or openvswitch with vlan/vxlan/etc, is it normal adv zone (or with security groups)? The process hasn't exited in the logs implies the VR programming did not complete quickly or the script failed internally in the VR.

Reading the cloud.log from the VR, the first problem I see is which implies either failure of the cloudstack kvm agent to create file in the VR or VR already processed this file:

2018-09-25 14:14:04,139  configure.py main:1047 Unable to find and process databag for file: ip_associations.json.a1f78335-796f-495e-8e95-b606587b6a51, for json

After restart of the VR, the following suggests that the nic device was not found due to which several other failures such as failure to reconfigure vpn also failed:

2018-10-16 07:24:46,274  CsAddress.py waitfordevice:250 Device eth2 cannot be configured - device was not found
 ...
 2018-10-16 09:13:20,590  configure.py main:1047 Unable to find and process databag for file: ip_associations.json.d9dbda9d-e218-4c3c-a3c2-864428771b3d, for json

Let me try to investigate the code and see if I reproduce it, meanwhile please answer my questions.

@izenk
Copy link
Author

izenk commented Oct 17, 2018

@rhtyd
KVM with bridge based networking
Advanced Zone without security groups
During "Start" VR stucks in "Starting" state. If I connect to it through vnc I see, that public network is working (it has internet connection).

VPC restart with "Clean up" option does the job.

@rohityadavcloud
Copy link
Member

I could not reproduce the similar error with stop+start. However, with rebooting a VR I could see similar log statements, but ultimately the VR was able to start and I could not reproduce the error. There have been several VR related fixes in 4.11.2-rc2 compared to 4.11.1 so it's likely the bug is not reproducible. Please check and increase your router aggregation timeout and see if that helps?
@izenk can you test against 4.11.2-rc2 using the latest 4.11.2 systemvmtemplate? See the voting thread on dev@ ML.

@rohityadavcloud
Copy link
Member

Given, you've a workaround for this issue and I could not reproduce this with latest 4.11 branch I'll move this to 4.11.3.0 milestone to avoid further blocking of rc3.

@rohityadavcloud rohityadavcloud modified the milestones: 4.11.2.0, 4.11.3.0 Oct 17, 2018
@izenk
Copy link
Author

izenk commented Oct 17, 2018

@rhtyd ok
Im trying to debug issue myself. So may be some guidelines from your side..

When I click "Start" - VR starts boot. I can connect by VNC.
On my installation VR (normally) has 4 interfaces:
loopback
eth0: linklocal (169.254.x.x)
eth1: public net (10.9.x.x)
eth2: vpc tier net (172.16.x.x)

After connecting by VNC (while tested VR is in "Starting" state) I can see only 3 interafaces:
loopback
eth0: linklocal net is not working, I can arping 169.254.0.1 (what is on host(cloud0)) - so it's visible on link level, but can't ping 169.254.0.1
eth1: public net is working normally
VR doesn't have vpc tier network at all.

Note: Also I increased "router aggregation timeout" (router.aggregation.command.each.timeout) - what doesn't help. It seems even has no any effect: error about finalizing operation appeared after the same time as before increasing outer.aggregation.command.each.timeout.

@izenk
Copy link
Author

izenk commented Oct 17, 2018

@rhtyd
I tried to ping 169.254.0.1 from "Starting" VR
At the same time, on the KVM host I ran
tcpdump -nn -i any icmp

an result is:

11:23:43.492824 IP 10.9.2.206 > 169.254.0.1: ICMP echo request, id 1760, seq 0, length 64
11:23:43.492843 IP 10.9.2.206 > 169.254.0.1: ICMP echo request, id 1760, seq 0, length 64
11:23:43.492846 ethertype IPv4, IP 10.9.2.206 > 169.254.0.1: ICMP echo request, id 1760, seq 0, length 64
11:23:43.492849 ethertype IPv4, IP 10.9.2.206 > 169.254.0.1: ICMP echo request, id 1760, seq 0, length 64
11:23:44.493918 IP 10.9.2.206 > 169.254.0.1: ICMP echo request, id 1760, seq 1, length 64
11:23:44.493936 IP 10.9.2.206 > 169.254.0.1: ICMP echo request, id 1760, seq 1, length 64
11:23:44.493939 ethertype IPv4, IP 10.9.2.206 > 169.254.0.1: ICMP echo request, id 1760, seq 1, length 64
11:23:44.493942 ethertype IPv4, IP 10.9.2.206 > 169.254.0.1: ICMP echo request, id 1760, seq 1, length 64
11:23:45.495031 IP 10.9.2.206 > 169.254.0.1: ICMP echo request, id 1760, seq 2, length 64
11:23:45.495056 IP 10.9.2.206 > 169.254.0.1: ICMP echo request, id 1760, seq 2, length 64
11:23:45.495060 ethertype IPv4, IP 10.9.2.206 > 169.254.0.1: ICMP echo request, id 1760, seq 2, length 64
11:23:45.495063 ethertype IPv4, IP 10.9.2.206 > 169.254.0.1: ICMP echo request, id 1760, seq 2, length 64

10.9.2.206 is public IP(SNAT) on eth0 on "Starting" VR
iptables does not have any lines related to 10.9.2.206 and even eth0
so, from my side it looks like a routing problem (taken into account that arping is wokring)
screen shot 2018-10-17 at 13 51 54

More details:
if I start ping from KVM host while VR is starting, I can see a small amount if time when ping is successful, but then it starts failing. Looks likes some configuration is going during VR boot and that breaks linklocal connectivity.

@izenk
Copy link
Author

izenk commented Oct 17, 2018

@rhtyd
Looks like I found the reason (not the root, but near)

screen shot 2018-10-17 at 16 33 16

rule: from all lookup Table_eth1 and in this table there is no route for 169.254.x.x, so everything goes through default via public 10.9.x.x

@rohityadavcloud
Copy link
Member

@izenk can you confirm if you tested 4.11.1 or 4.11.2-rc2? A routing related bug was fixed which is in 4.11.2-rc2.

@izenk
Copy link
Author

izenk commented Oct 17, 2018

@rhtyd no, can't
I have only 4.11.1 which is 'production'

@rohityadavcloud
Copy link
Member

@izenk the routing issue you've shared has been fixed and is in 4.11.2-rc2: https://github.com/apache/cloudstack/pull/2791/files
You may wait for 4.11.2.0 release or try to manually apply changes in the VR scripts in your test VR.

@izenk
Copy link
Author

izenk commented Oct 18, 2018

@rhtyd got it. Thanks

@rohityadavcloud rohityadavcloud removed this from the 4.11.3.0 milestone Oct 24, 2018
@rohityadavcloud rohityadavcloud removed their assignment Oct 24, 2018
@rohityadavcloud
Copy link
Member

Can you test with 4.11.2.0 @izenk ? This probably can be closed.

@andrijapanicsb
Copy link
Contributor

Any update on this one ? I believe I was observing similar situation...

@ustcweizhou
Copy link
Contributor

@andrijapanicsb were you using vpc with public ips in multiple ip ranges or private gateway ?

@rohityadavcloud rohityadavcloud added this to the 4.13.0.0 milestone May 27, 2019
@rohityadavcloud
Copy link
Member

Can you test against 4.11.2.0 or 4.11.3.0 @izenk ?

@rohityadavcloud
Copy link
Member

I'm unable to reproduce with 4.11.3.0 or master, kindly reopen with details @izenk if you're still facing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants