Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Virtual machines stuck in "Starting" state for extended periods of time during provisioning #3104

Closed
dubauski opened this issue Dec 19, 2018 · 9 comments
Milestone

Comments

@dubauski
Copy link

dubauski commented Dec 19, 2018

ISSUE TYPE
Bug Report

COMPONENT NAME
API

CLOUDSTACK VERSION
4.9.2.0

CONFIGURATION
N/A

OS / ENVIRONMENT
N/A

SUMMARY

When requesting multiple virtual machines some of them stay in "Starting" state for a very long time without switching to "Running state

STEPS TO REPRODUCE

  1. Create VPC
  2. Create guest network
  3. using deployVirtualMachine REST method request a virtual machine to be created in the guest network
  4. without waiting for completion repeat step 3) twenty more times to request more virtual machines. This step is performed in a parallel fashion: i.e. requests for new vms are issues from parallel threads (not sequentially)

EXPECTED RESULTS
All of the requested virtual machines created successfully and in timely manner.

ACTUAL RESULTS
Some virtual machines are successfully provisioned and are assigned "Running" state (usually done under 1 min). While other virtual machines are in "Starting" state for excessive periods of time (often longer than 5 minutes).

@rohityadavcloud
Copy link
Member

Can you try using 4.11.2.0?

@dubauski
Copy link
Author

Can you confirm that this a supported use case: performing concurrent VM creation requests via Cloudstack API?

Is this a known issue that has been fixed in 4.11.2.0?

@aleskxyz
Copy link
Contributor

I think I hit the same bug on 4.9.3.
Does you cloud is usable after this issue. For example can you reboot existing VMs?
I have submitted a bug in jira about this problem:
https://issues.apache.org/jira/browse/CLOUDSTACK-10401

@rohityadavcloud
Copy link
Member

There were a variety of issues fixed in latest 4.11 release. Can you test against 4.11.2.0? @dubauski @geekza

@aleskxyz
Copy link
Contributor

There were a variety of issues fixed in latest 4.11 release. Can you test against 4.11.2.0? @dubauski @geekza

I'm going to upgrade CS to 4.11.2
But because of this problem I can't register new systemvm templates. So I want to know is it a required step in CS upgrade or I can do it later?

@aleskxyz
Copy link
Contributor

There were a variety of issues fixed in latest 4.11 release. Can you test against 4.11.2.0? @dubauski @geekza

I have upgraded my CloudStack to 4.11.2.0 but nothing fixed! I see exactly the same behavior.

@aleskxyz
Copy link
Contributor

@rhtyd @dubauski
I found these items in my database that may prevent CloudStack to work:

MySQL [cloud]> select * from op_ha_work;
+----+-------------+-----------+---------+---------+----------------+---------+---------------------+-------+---------------------+-----------+-------------+---------+
| id | instance_id | type | vm_type | state | mgmt_server_id | host_id | created | tried | taken | step | time_to_try | updated |
+----+-------------+-----------+---------+---------+----------------+---------+---------------------+-------+---------------------+-----------+-------------+---------+
| 16 | 546 | Migration | User | Running | 345051240548 | 91 | 2019-01-20 12:10:07 | 2 | 2019-01-23 10:21:21 | Migrating | 1511890258 | 12 |
| 37 | 631 | Migration | User | Running | 345051240548 | 91 | 2019-01-20 12:10:07 | 1 | 2019-01-23 09:38:26 | Done | 1511890258 | 4 |
| 40 | 633 | Migration | User | Running | 345051240548 | 91 | 2019-01-20 12:10:07 | 1 | 2019-01-23 10:21:21 | Migrating | 1511890258 | 3 |
| 43 | 671 | Migration | User | Running | 345051240548 | 91 | 2019-01-20 12:10:07 | 2 | 2019-01-23 10:24:23 | Migrating | 1511952155 | 4 |
| 46 | 691 | Migration | User | Running | NULL | 91 | 2019-01-20 12:10:07 | 2 | NULL | Migrating | 1511952155 | 7 |
| 48 | 631 | Migration | User | Running | 345051240548 | 94 | 2019-01-23 09:38:26 | 0 | 2019-01-23 10:21:21 | Migrating | 1511949518 | 10 |
| 51 | 546 | Migration | User | Running | 345051240548 | 94 | 2019-01-23 09:38:26 | 0 | 2019-01-23 10:21:23 | Migrating | 1511949518 | 16 |
+----+-------------+-----------+---------+---------+----------------+---------+---------------------+-------+---------------------+-----------+-------------+---------+
7 rows in set (0.00 sec)

MySQL [cloud]> select * from op_lock;
+----------------+--------------+-------------+------------+---------------------+---------+
| key | mac | ip | thread | acquired_on | waiters |
+----------------+--------------+-------------+------------+---------------------+---------+
| vm_instance546 | 345051240548 | HA-Worker-4 | 1198190978 | 2019-01-23 10:21:21 | 0 |
| vm_instance631 | 345051240548 | HA-Worker-1 | 2092627389 | 2019-01-23 10:21:21 | 1 |
| vm_instance633 | 345051240548 | HA-Worker-0 | 770823 | 2019-01-23 10:21:21 | 0 |
+----------------+--------------+-------------+------------+---------------------+---------+
3 rows in set (0.00 sec)

MySQL [cloud]> select * from vm_work_job;
+-------+----------+----------+----------------+
| id | step | vm_type | vm_instance_id |
+-------+----------+----------+----------------+
| 57262 | Prepare | Instance | 691 |
| 57268 | Starting | Instance | 748 |
| 57396 | Filed | Instance | 691 |
| 57399 | Filed | Instance | 546 |
| 57402 | Filed | Instance | 631 |
| 57405 | Filed | Instance | 671 |
| 57408 | Filed | Instance | 633 |
+-------+----------+----------+----------------+
7 rows in set (0.01 sec)

@aleskxyz
Copy link
Contributor

aleskxyz commented Jan 23, 2019

Finally I have resolved this issue!
I confirm that CloudStack 4.9 and 4.11 is affected by this bug.
I have deleted all records in these tables:

  • op_ha_work
  • op_lock
  • vm_work_job

I don't know what is the best way to figure out which record should be deleted. It will be good if someone could complete this workaround.

@rohityadavcloud rohityadavcloud added this to the 4.13.0.0 milestone May 27, 2019
@rohityadavcloud
Copy link
Member

Dupes: #3025
Will try to fix towards 3025

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants