Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error while stopping Vm #8401

Closed
0sorkon opened this issue Dec 22, 2023 · 9 comments
Closed

Error while stopping Vm #8401

0sorkon opened this issue Dec 22, 2023 · 9 comments

Comments

@0sorkon
Copy link

0sorkon commented Dec 22, 2023

ISSUE TYPE
  • Bug Report
  • Other
COMPONENT NAME
vm, kvm, libvirt
CLOUDSTACK VERSION
4.18.1.0
CONFIGURATION

advanced networking, shared linstore storage

OS / ENVIRONMENT

Ubuntu 22.04 host

SUMMARY

When stoping VM from interface, after some time error showing:

Stop instance
(VM-b74797c3-e835-435e-9791-0d1256b7e2d5) Unable to stop VM instance {"id":310,"instanceName":"i-2-310-VM","type":"User","uuid":"b74797c3-e835-435e-9791-0d1256b7e2d5"}

if HA is not enabled, vm stops with error, if HA enabled true error showing and vm restarts. If the forse option is enabled when shutting down the VM, the VM also shuts down with an error.

full log from pressing the Stop instance button until the error occurs:

Dec 22 12:42:29 cloud java[21477]: WARN  [kvm.resource.LibvirtComputingResource] (agentRequest-Handler-5:) (logid:265164ab) Couldn't retrieve free memory, returning -1.
Dec 22 12:42:29 cloud java[21477]: WARN  [kvm.resource.LibvirtComputingResource] (agentRequest-Handler-5:) (logid:265164ab) Couldn't retrieve free memory, returning -1.
Dec 22 12:42:32 cloud kernel: [827081.293050] drbd cs-6408916c-cfec-4aae-b5c3-aa01e4fa4027: Preparing cluster-wide state change 3695816892 (0->-1 3/2)
Dec 22 12:42:32 cloud kernel: [827081.293471] drbd cs-6408916c-cfec-4aae-b5c3-aa01e4fa4027: State change 3695816892: primary_nodes=0, weak_nodes=0
Dec 22 12:42:32 cloud kernel: [827081.293476] drbd cs-6408916c-cfec-4aae-b5c3-aa01e4fa4027: Committing cluster-wide state change 3695816892 (0ms)
Dec 22 12:42:32 cloud kernel: [827081.293495] drbd cs-6408916c-cfec-4aae-b5c3-aa01e4fa4027: role( Primary -> Secondary )
Dec 22 12:42:32 cloud systemd-networkd[1937]: vnet50: Link DOWN
Dec 22 12:42:32 cloud systemd-networkd[1937]: vnet50: Lost carrier
Dec 22 12:42:32 cloud kernel: [827081.318739] cloudbr0: port 14(vnet50) entered disabled state
Dec 22 12:42:32 cloud Keepalived_vrrp[3760]: Interface vnet50 deleted
Dec 22 12:42:32 cloud kernel: [827081.358594] device vnet50 left promiscuous mode
Dec 22 12:42:32 cloud kernel: [827081.358601] cloudbr0: port 14(vnet50) entered disabled state
Dec 22 12:42:32 cloud Keepalived_vrrp[3760]: Unknown interface #013 deleted
Dec 22 12:42:32 cloud systemd[1]: machine-qemu\x2d51\x2di\x2d2\x2d310\x2dVM.scope: Deactivated successfully.
Dec 22 12:42:32 cloud systemd[1]: machine-qemu\x2d51\x2di\x2d2\x2d310\x2dVM.scope: Consumed 17.020s CPU time.
Dec 22 12:42:32 cloud kernel: [827081.686316] drbd cs-6408916c-cfec-4aae-b5c3-aa01e4fa4027: Preparing cluster-wide state change 24032427 (0->-1 3/1)
Dec 22 12:42:32 cloud kernel: [827081.686551] drbd cs-6408916c-cfec-4aae-b5c3-aa01e4fa4027: State change 24032427: primary_nodes=1, weak_nodes=FFFFFFFFFFFFFFFC
Dec 22 12:42:32 cloud kernel: [827081.686556] drbd cs-6408916c-cfec-4aae-b5c3-aa01e4fa4027: Committing cluster-wide state change 24032427 (0ms)
Dec 22 12:42:32 cloud kernel: [827081.686562] drbd cs-6408916c-cfec-4aae-b5c3-aa01e4fa4027: role( Secondary -> Primary )
Dec 22 12:42:32 cloud kernel: [827081.687590] drbd cs-6408916c-cfec-4aae-b5c3-aa01e4fa4027: Preparing cluster-wide state change 852045333 (0->-1 3/2)
Dec 22 12:42:32 cloud kernel: [827081.687993] drbd cs-6408916c-cfec-4aae-b5c3-aa01e4fa4027: State change 852045333: primary_nodes=0, weak_nodes=0
Dec 22 12:42:32 cloud kernel: [827081.687999] drbd cs-6408916c-cfec-4aae-b5c3-aa01e4fa4027: Committing cluster-wide state change 852045333 (0ms)
Dec 22 12:42:32 cloud kernel: [827081.688025] drbd cs-6408916c-cfec-4aae-b5c3-aa01e4fa4027: role( Primary -> Secondary )
Dec 22 12:42:32 cloud kernel: [827081.692714] drbd cs-6408916c-cfec-4aae-b5c3-aa01e4fa4027: Preparing cluster-wide state change 3776904071 (0->-1 3/1)
Dec 22 12:42:32 cloud kernel: [827081.692870] drbd cs-6408916c-cfec-4aae-b5c3-aa01e4fa4027: State change 3776904071: primary_nodes=1, weak_nodes=FFFFFFFFFFFFFFFC
Dec 22 12:42:32 cloud kernel: [827081.692874] drbd cs-6408916c-cfec-4aae-b5c3-aa01e4fa4027: Committing cluster-wide state change 3776904071 (0ms)
Dec 22 12:42:32 cloud kernel: [827081.692879] drbd cs-6408916c-cfec-4aae-b5c3-aa01e4fa4027: role( Secondary -> Primary )
Dec 22 12:42:32 cloud kernel: [827081.693080] drbd cs-6408916c-cfec-4aae-b5c3-aa01e4fa4027: Preparing cluster-wide state change 1229137601 (0->-1 3/2)
Dec 22 12:42:32 cloud kernel: [827081.693387] drbd cs-6408916c-cfec-4aae-b5c3-aa01e4fa4027: State change 1229137601: primary_nodes=0, weak_nodes=0
Dec 22 12:42:32 cloud kernel: [827081.693390] drbd cs-6408916c-cfec-4aae-b5c3-aa01e4fa4027: Committing cluster-wide state change 1229137601 (0ms)
Dec 22 12:42:32 cloud kernel: [827081.693405] drbd cs-6408916c-cfec-4aae-b5c3-aa01e4fa4027: role( Primary -> Secondary )
Dec 22 12:42:33 cloud java[21477]: libvirt: QEMU Driver error : Domain not found: no domain with matching uuid 'b74797c3-e835-435e-9791-0d1256b7e2d5' (i-2-310-VM)
Dec 22 12:42:33 cloud java[21477]: WARN  [kvm.resource.LibvirtKvmAgentHook] (agentRequest-Handler-1:) (logid:290d5f86) Groovy script '/etc/cloudstack/agent/hooks/libvirt-vm-state-change.groovy' is not available. Transformations will not be applied.
Dec 22 12:42:33 cloud java[21477]: WARN  [kvm.resource.LibvirtKvmAgentHook] (agentRequest-Handler-1:) (logid:290d5f86) Groovy scripting engine is not initialized. Data transformation skipped.
Dec 22 12:43:30 cloud java[21477]: WARN  [kvm.resource.LibvirtComputingResource] (agentRequest-Handler-5:) (logid:fd9227f8) Couldn't retrieve free memory, returning -1.
Dec 22 12:43:30 cloud java[21477]: WARN  [kvm.resource.LibvirtComputingResource] (agentRequest-Handler-5:) (logid:fd9227f8) Couldn't retrieve free memory, returning -1.
Dec 22 12:44:32 cloud java[21477]: WARN  [kvm.resource.LibvirtComputingResource] (agentRequest-Handler-3:) (logid:d5104f22) Couldn't retrieve free memory, returning -1.
Dec 22 12:44:32 cloud java[21477]: WARN  [kvm.resource.LibvirtComputingResource] (agentRequest-Handler-3:) (logid:d5104f22) Couldn't retrieve free memory, returning -1.
Dec 22 12:44:44 cloud java[21477]: WARN  [cloud.agent.Agent] (agentRequest-Handler-1:) (logid:290d5f86) Caught:
Dec 22 12:44:44 cloud java[21477]: javax.ws.rs.ProcessingException: java.net.ConnectException: Connection timed out (Connection timed out)
Dec 22 12:44:44 cloud java[21477]: #011at org.glassfish.jersey.client.internal.HttpUrlConnector.apply(HttpUrlConnector.java:284)
Dec 22 12:44:44 cloud java[21477]: #011at org.glassfish.jersey.client.ClientRuntime.invoke(ClientRuntime.java:278)
Dec 22 12:44:44 cloud java[21477]: #011at org.glassfish.jersey.client.JerseyInvocation.lambda$invoke$0(JerseyInvocation.java:753)
Dec 22 12:44:44 cloud java[21477]: #011at org.glassfish.jersey.internal.Errors.process(Errors.java:316)
Dec 22 12:44:44 cloud java[21477]: #011at org.glassfish.jersey.internal.Errors.process(Errors.java:298)
Dec 22 12:44:44 cloud java[21477]: #011at org.glassfish.jersey.internal.Errors.process(Errors.java:229)
Dec 22 12:44:44 cloud java[21477]: #011at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:414)
Dec 22 12:44:44 cloud java[21477]: #011at org.glassfish.jersey.client.JerseyInvocation.invoke(JerseyInvocation.java:752)
Dec 22 12:44:44 cloud java[21477]: #011at org.glassfish.jersey.client.JerseyInvocation$Builder.method(JerseyInvocation.java:419)
Dec 22 12:44:44 cloud java[21477]: #011at org.glassfish.jersey.client.JerseyInvocation$Builder.get(JerseyInvocation.java:319)
Dec 22 12:44:44 cloud java[21477]: #011at com.linbit.linstor.api.ApiClient.invokeAPI(ApiClient.java:703)
Dec 22 12:44:44 cloud java[21477]: #011at com.linbit.linstor.api.DevelopersApi.viewResources(DevelopersApi.java:5212)
Dec 22 12:44:44 cloud java[21477]: #011at com.cloud.hypervisor.kvm.storage.LinstorStorageAdaptor.disconnectPhysicalDiskByPath(LinstorStorageAdaptor.java:338)
Dec 22 12:44:44 cloud java[21477]: #011at com.cloud.hypervisor.kvm.storage.KVMStoragePoolManager.disconnectPhysicalDiskByPath(KVMStoragePoolManager.java:205)
Dec 22 12:44:44 cloud java[21477]: #011at com.cloud.hypervisor.kvm.resource.LibvirtComputingResource.cleanupDisk(LibvirtComputingResource.java:3247)
Dec 22 12:44:44 cloud java[21477]: #011at com.cloud.hypervisor.kvm.resource.wrapper.LibvirtStopCommandWrapper.execute(LibvirtStopCommandWrapper.java:101)
Dec 22 12:44:44 cloud java[21477]: #011at com.cloud.hypervisor.kvm.resource.wrapper.LibvirtStopCommandWrapper.execute(LibvirtStopCommandWrapper.java:49)
Dec 22 12:44:44 cloud java[21477]: #011at com.cloud.hypervisor.kvm.resource.wrapper.LibvirtRequestWrapper.execute(LibvirtRequestWrapper.java:78)
Dec 22 12:44:44 cloud java[21477]: #011at com.cloud.hypervisor.kvm.resource.LibvirtComputingResource.executeRequest(LibvirtComputingResource.java:1848)
Dec 22 12:44:44 cloud java[21477]: #011at com.cloud.agent.Agent.processRequest(Agent.java:662)
Dec 22 12:44:44 cloud java[21477]: #011at com.cloud.agent.Agent$AgentRequestHandler.doTask(Agent.java:1082)
Dec 22 12:44:44 cloud java[21477]: #011at com.cloud.utils.nio.Task.call(Task.java:83)
Dec 22 12:44:44 cloud java[21477]: #011at com.cloud.utils.nio.Task.call(Task.java:29)
Dec 22 12:44:44 cloud java[21477]: #011at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
Dec 22 12:44:44 cloud java[21477]: #011at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
Dec 22 12:44:44 cloud java[21477]: #011at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
Dec 22 12:44:44 cloud java[21477]: #011at java.base/java.lang.Thread.run(Thread.java:829)
Dec 22 12:44:44 cloud java[21477]: Caused by: java.net.ConnectException: Connection timed out (Connection timed out)
Dec 22 12:44:44 cloud java[21477]: #011at java.base/java.net.PlainSocketImpl.socketConnect(Native Method)
Dec 22 12:44:44 cloud java[21477]: #011at java.base/java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:412)
Dec 22 12:44:44 cloud java[21477]: #011at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:255)
Dec 22 12:44:44 cloud java[21477]: #011at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:237)
Dec 22 12:44:44 cloud java[21477]: #011at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
Dec 22 12:44:44 cloud java[21477]: #011at java.base/java.net.Socket.connect(Socket.java:609)
Dec 22 12:44:44 cloud java[21477]: #011at java.base/sun.security.ssl.SSLSocketImpl.connect(SSLSocketImpl.java:305)
Dec 22 12:44:44 cloud java[21477]: #011at java.base/sun.net.NetworkClient.doConnect(NetworkClient.java:177)
Dec 22 12:44:44 cloud java[21477]: #011at java.base/sun.net.www.http.HttpClient.openServer(HttpClient.java:509)
Dec 22 12:44:44 cloud java[21477]: #011at java.base/sun.net.www.http.HttpClient.openServer(HttpClient.java:604)
Dec 22 12:44:44 cloud java[21477]: #011at java.base/sun.net.www.protocol.https.HttpsClient.<init>(HttpsClient.java:266)
Dec 22 12:44:44 cloud java[21477]: #011at java.base/sun.net.www.protocol.https.HttpsClient.New(HttpsClient.java:373)
Dec 22 12:44:44 cloud java[21477]: #011at java.base/sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.getNewHttpClient(AbstractDelegateHttpsURLConnection.java:207)
Dec 22 12:44:44 cloud java[21477]: #011at java.base/sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1187)
Dec 22 12:44:44 cloud java[21477]: #011at java.base/sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1081)
Dec 22 12:44:44 cloud java[21477]: #011at java.base/sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:193)
Dec 22 12:44:44 cloud java[21477]: #011at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1592)
Dec 22 12:44:44 cloud java[21477]: #011at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1520)
Dec 22 12:44:44 cloud java[21477]: #011at java.base/java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:527)
Dec 22 12:44:44 cloud java[21477]: #011at java.base/sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:334)
Dec 22 12:44:44 cloud java[21477]: #011at org.glassfish.jersey.client.internal.HttpUrlConnector._apply(HttpUrlConnector.java:390)
Dec 22 12:44:44 cloud java[21477]: #011at org.glassfish.jersey.client.internal.HttpUrlConnector.apply(HttpUrlConnector.java:282)
Dec 22 12:44:44 cloud java[21477]: #011... 26 more
STEPS TO REPRODUCE

EXPECTED RESULTS
The VM is shutting down normally
ACTUAL RESULTS
The VM terminates with an error either. If HA in offering is enabled, it terminates with an error and starts again
@weizhouapache
Copy link
Member

cc @rp-

@rp-
Copy link
Contributor

rp- commented Dec 22, 2023

From the exception, I only see that the resource list timed out.
Are there any error reports from the controller at that time? Maybe slow LVM or some other backend storage problem?

@0sorkon
Copy link
Author

0sorkon commented Dec 22, 2023

No, there are no suspicious entries or errors. Just normal resource transition messages from primary to secondary.

@rp-
Copy link
Contributor

rp- commented Dec 22, 2023

And does this happen all the time? or only sometimes?

@0sorkon
Copy link
Author

0sorkon commented Dec 22, 2023

It started yesterday and shows up 9 out of 10 times

@rp-
Copy link
Contributor

rp- commented Dec 22, 2023 via email

@0sorkon
Copy link
Author

0sorkon commented Dec 22, 2023

No, all looks normal. Every linstor command executes as usual. Looks like everything is ok with storage, drbd and linstor

@0sorkon
Copy link
Author

0sorkon commented Dec 26, 2023

can be considered solved. The problem went away after restarting cloudstack-agent service. However, the true cause of the problem remains unidentified

@DaanHoogland
Copy link
Contributor

@0sorkon closing this, please reopen if new information becomes available.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants