Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split azure_rm_virtualmachine integration tests to prevent timeouts #57250

Merged
merged 18 commits into from Jun 13, 2019

Conversation

Projects
None yet
4 participants
@samdoran
Copy link
Member

commented May 31, 2019

SUMMARY

The integration tests for azure_rm_virtualmachine take a long time to run and sometimes timeout. They are the only test in their current group so the timeouts can't be fixed by moving other tests out of the group. The only way to reduce overall execution time is to parallelize the tasks.

I have split up the tasks in the tests into groups that can run in parallel using this techinque. There may be room for further parallelization, or we may need to break this test up so it can be split into separate groups.

ISSUE TYPE
  • Bugfix Pull Request
COMPONENT NAME

test/integration/targets/azure_rm_virtualmachine

ADDITIONAL INFORMATION

Alternative implementation of #44875.

@ansibot

This comment has been minimized.

@samdoran samdoran force-pushed the samdoran:split-azure_rm_virtualmachine-tests branch from 4d10b45 to 4e72764 May 31, 2019

@ansibot

This comment has been minimized.

Copy link
Contributor

commented May 31, 2019

The test ansible-test sanity --test yamllint [explain] failed with 1 error:

test/integration/targets/azure_rm_virtualmachine/tasks/virtualmachine.yml:268:1: empty-lines too many blank lines (4 > 3)

click here for bot help

@MyronFanQiu

This comment has been minimized.

Copy link
Contributor

commented Jun 3, 2019

@samdoran Thanks for the PR! It is very helpful and can make the process of a new PR smooth. When you finish the PR, could you please remove the labe [WIP] and let me know? I will push the review process going on.

@samdoran

This comment has been minimized.

Copy link
Member Author

commented Jun 3, 2019

In my local testing, this seems promising. Test seems to complete in about 20 minutes now.

Monday 03 June 2019  15:26:08 +0000 (0:00:06.528)       0:19:55.896 ***********
===============================================================================
TEST | Delete dual NIC VM --------------------------------------------- 273.14s
TEST | Create virtual machine with two NICs --------------------------- 140.49s
TEST | Create minimal VM with defaults -------------------------------- 103.45s
TEST | Delete VM ------------------------------------------------------- 99.05s
TEST | Resize VM ------------------------------------------------------- 66.81s
TEST | Enable boot diagnostics on an existing VM for the first time without specifying a storage account -- 55.92s
TEST | Start the virtual machine --------------------------------------- 54.83s
SETUP | Create storage accounts ---------------------------------------- 43.72s
SETUP | Create NIC for single nic VM ----------------------------------- 34.86s
TEST | Generalize VM --------------------------------------------------- 34.54s
TEST | Create virtual network ------------------------------------------ 32.95s
TEST | Delete VM ------------------------------------------------------- 28.00s
TEST | Assert that autocreated resources were deleted ------------------ 23.27s
TEST | Deallocate the virtual machine ---------------------------------- 20.15s
SETUP | Create virtual network ----------------------------------------- 17.45s
TEST | Re-enable boot diagnostics on an existing VM where it was previously configured -- 15.14s
TEARDOWN | Destroy security groups ------------------------------------- 14.85s
TEST | Create NICs for dual nic VM ------------------------------------- 14.27s
TEARDOWN | Destroy virtual network ------------------------------------- 12.71s
TEARDOWN | Destroy subnet ---------------------------------------------- 12.47s
Playbook run took 0 days, 0 hours, 19 minutes, 55 seconds

@samdoran samdoran changed the title [WIP] Split azure_rm_virtualmachine integration tests to prevent timeouts Split azure_rm_virtualmachine integration tests to prevent timeouts Jun 3, 2019

@ansibot ansibot added community_review and removed WIP labels Jun 3, 2019

@ansibot ansibot removed the needs_triage label Jun 3, 2019

@samdoran samdoran changed the title Split azure_rm_virtualmachine integration tests to prevent timeouts [WIP] Split azure_rm_virtualmachine integration tests to prevent timeouts Jun 3, 2019

@samdoran

This comment has been minimized.

Copy link
Member Author

commented Jun 3, 2019

I still have some refactoring to do after looking through it to address the above comments.

@ansibot ansibot added the WIP label Jun 3, 2019

@samdoran

This comment has been minimized.

Copy link
Member Author

commented Jun 5, 2019

My latest changes still have the tests running in about twenty minutes. Not sure there is much more room for improvement but I think this is good enough for now.

Wednesday 05 June 2019  15:04:30 +0000 (0:00:06.579)       0:20:25.759 ********
===============================================================================
SET3 | Assert that autocreated resources were deleted ---------------- 417.49s
SET1 | Create virtual machine with a single NIC and no boot diagnostics - 115.21s
SET1 | Resize VM ------------------------------------------------------ 66.42s
SET1 | Start the virtual machine -------------------------------------- 55.79s
SET2 | Create NICs for dual nic VM ------------------------------------ 51.85s
SETUP | Create storage accounts ---------------------------------------- 43.96s
SET1 | Delete VM ------------------------------------------------------ 40.44s
SET1 | Disable boot diagnostics and change the storage account at the same time -- 35.69s
SET1 | Re-enable boot diagnostics on an existing VM where it was previously configured -- 35.57s
SET1 | Restart the virtual machine ------------------------------------ 34.92s
SET2 | Generalize VM -------------------------------------------------- 34.60s
SET1 | Create NIC for single nic VM ----------------------------------- 34.10s
SET1 | Change the boot diagnostics storage account while enabled ------ 28.52s
SET1 | Deallocate the virtual machine --------------------------------- 24.33s
SET3 | Delete VM ------------------------------------------------------ 22.39s
SET1 | Create security group ------------------------------------------ 16.97s
SET2 | Create virtual network ----------------------------------------- 16.72s
SET1 | Destroy subnet ------------------------------------------------- 13.96s
SET1 | Destroy virtual network ---------------------------------------- 12.94s
SET1 | Destroy security group ----------------------------------------- 12.93s

samdoran added some commits Jun 3, 2019

Simplify inventory
Put everything in a single file
Pare down setup and teardown tasks
Only two setup and teardown tasks are actually shared across tests VMs. The bulk of the tasks are really only needed for the first test VM.
Further refine tests
- give hosts meaningful names and rename task files to match
- move set_fact task to host_vars in inventory
- move network tasks to task file that can be reused only when needed

@samdoran samdoran changed the title [WIP] Split azure_rm_virtualmachine integration tests to prevent timeouts Split azure_rm_virtualmachine integration tests to prevent timeouts Jun 5, 2019

@ansibot ansibot removed the WIP label Jun 5, 2019

@samdoran samdoran changed the title Split azure_rm_virtualmachine integration tests to prevent timeouts [WIP] Split azure_rm_virtualmachine integration tests to prevent timeouts Jun 5, 2019

@ansibot ansibot added the WIP label Jun 5, 2019

samdoran added some commits Jun 5, 2019

Make storage account and availability set unique per host
This prevents race conditions for share resources from causing intermittent failures
Rename hosts and split up tasks further
Most tasks now fire and forget VM deletion since that takes the longest by far. ansible-test will clean up everything in the resource group, so there is no risk of leaving behind assets in our CI infrastructure that would incur costs.

Rename hosts and task files to bette reflect the tasks.
@samdoran

This comment has been minimized.

Copy link
Member Author

commented Jun 6, 2019

I have split the tests out a bit more. The longest running set of tasks takes about 25 minutes. It waits for VM deletion and then removes all other resources created for that set of tasks.

The other task files do fire and forget VM deletion and therefore do not actually clean up other resources created for that set of tasks. This makes those tasks complete much quicker but we rely on ansible-test to do the final cleanup of the test resource groups

# azure_test_public_ip
===============================================================================
Deallocate the virtual machine ---------------------------------------- 658.40s
Create virtual machine with a single NIC and no boot diagnostics ------ 168.83s
Delete VM ------------------------------------------------------------- 128.86s
Re-enable boot diagnostics on an existing VM where it was previously configured - 102.80s
Resize VM -------------------------------------------------------------- 67.58s
Enable boot diagnostics on an existing VM for the first time without specifying a storage account -- 61.65s
Start the virtual machine ---------------------------------------------- 54.89s
Change the boot diagnostics storage account while enabled -------------- 35.81s
Disable boot diagnostics and change the storage account at the same time -- 35.71s
Restart the virtual machine -------------------------------------------- 35.02s
Create NIC for single nic VM ------------------------------------------- 34.53s
SETUP | Create virtual network ----------------------------------------- 28.28s
SETUP | Create storage account ----------------------------------------- 23.43s
Create security group -------------------------------------------------- 17.37s
Destroy security group ------------------------------------------------- 12.82s
Destroy virtual network ------------------------------------------------ 12.70s
Destroy subnet --------------------------------------------------------- 12.00s
Create public ip -------------------------------------------------------- 7.57s
SETUP | Add subnet ------------------------------------------------------ 6.00s
Should be idempotent with a single NIC ---------------------------------- 3.81s
Playbook run took 0 days, 0 hours, 25 minutes, 28 seconds
# azure_test_no_public_ip
===============================================================================
Create virtual machine without public ip address and with boot diagnostics enabled - 206.30s
SETUP | Create storage account ----------------------------------------- 22.13s
SETUP | Create virtual network ----------------------------------------- 17.65s
SETUP | Create subnet --------------------------------------------------- 6.41s
SETUP | Create availability set ----------------------------------------- 2.89s
Delete VM with no public ip --------------------------------------------- 0.41s
Ensure VM was created properly ------------------------------------------ 0.05s
include_tasks ----------------------------------------------------------- 0.04s
Include tasks based on inventory hostname ------------------------------- 0.04s
Playbook run took 0 days, 0 hours, 4 minutes, 15 seconds
# azure_test_minimal_invalid
===============================================================================
Create minimal VM with defaults --------------------------------------- 252.09s
Delete VM ------------------------------------------------------------- 139.80s
SETUP | Create storage account ----------------------------------------- 23.00s
SETUP | Create virtual network ----------------------------------------- 18.01s
SETUP | Add subnet ------------------------------------------------------ 6.92s
SETUP | Create availability set ----------------------------------------- 3.13s
Query NIC --------------------------------------------------------------- 1.86s
Query NSG --------------------------------------------------------------- 1.86s
Query public IP --------------------------------------------------------- 1.85s
Assert error finding missing custom image (dict style) ------------------ 1.61s
Assert error finding missing custom image ------------------------------- 1.48s
Assert error thrown with invalid image dict ----------------------------- 1.46s
Assert error thrown with invalid image type ----------------------------- 1.38s
Assert that autocreated resources were deleted -------------------------- 0.07s
Include tasks based on inventory hostname ------------------------------- 0.06s
include_tasks ----------------------------------------------------------- 0.04s
Playbook run took 0 days, 0 hours, 7 minutes, 34 seconds
# azure_test_dual_nic
Thursday 06 June 2019  15:30:36 +0000 (0:00:00.526)       0:05:37.771 *********
===============================================================================
Create virtual machine with two NICs ---------------------------------- 137.34s
Create NICs for dual NIC VM in secondary resource group ---------------- 84.09s
Generalize VM ---------------------------------------------------------- 34.87s
SETUP | Create storage account ----------------------------------------- 22.02s
SETUP | Create virtual network ----------------------------------------- 17.63s
Create virtual network in secondary resource group --------------------- 16.97s
SETUP | Create subnet --------------------------------------------------- 5.86s
Add subnet in secondary resource group ---------------------------------- 5.77s
Should be idempotent with a dual NICs ----------------------------------- 3.62s
SETUP | Create availability set ----------------------------------------- 3.04s
Retrieve facts by tags -------------------------------------------------- 1.99s
Retrieve VM facts (filtering by name) ----------------------------------- 1.88s
Gather facts and check if machine is generalized ------------------------ 1.74s
Delete dual NIC VM ------------------------------------------------------ 0.53s
Ensure facts module returned the second VM ------------------------------ 0.06s
Include tasks based on inventory hostname ------------------------------- 0.05s
Ensure power state is generalized --------------------------------------- 0.05s
Ensure nothing changed -------------------------------------------------- 0.04s
Ensure VM was created properly ------------------------------------------ 0.04s
Assert that facts module returned the second VM ------------------------- 0.04s
Playbook run took 0 days, 0 hours, 5 minutes, 37 seconds
@ansibot

This comment has been minimized.

Copy link
Contributor

commented Jun 6, 2019

The test ansible-test sanity --test yamllint [explain] failed with 1 error:

test/integration/targets/azure_rm_virtualmachine/tasks/azure_test_public_ip.yml:353:1: empty-lines too many blank lines (1 > 0)

click here for bot help

@ansibot ansibot added the ci_verified label Jun 6, 2019

@samdoran

This comment has been minimized.

Copy link
Member Author

commented Jun 6, 2019

Here are the results from spliting out the deallocation tests:

# azure_test_deallocate
Thursday 06 June 2019  17:19:17 +0000 (0:00:00.455)       0:16:27.024 *********
===============================================================================
Deallocate the virtual machine ---------------------------------------- 616.08s
Create minimal VM with defaults --------------------------------------- 221.34s
Start the virtual machine ---------------------------------------------- 54.78s
Restart the virtual machine -------------------------------------------- 34.63s
SETUP | Create virtual network ----------------------------------------- 27.86s
SETUP | Create storage account ----------------------------------------- 22.34s
SETUP | Create subnet --------------------------------------------------- 6.41s
SETUP | Create availability set ----------------------------------------- 2.80s
Delete VM --------------------------------------------------------------- 0.46s
Include tasks based on inventory hostname ------------------------------- 0.06s
Ensure VM was deallocated ----------------------------------------------- 0.05s
Ensue VM was restarted -------------------------------------------------- 0.04s
include_tasks ----------------------------------------------------------- 0.04s
Ensure VM was started --------------------------------------------------- 0.04s
Playbook run took 0 days, 0 hours, 16 minutes, 26 seconds

@ansibot ansibot removed the ci_verified label Jun 6, 2019

@samdoran

This comment has been minimized.

Copy link
Member Author

commented Jun 6, 2019

After moving those tests, out, that takes the first set of task down to about twenty-one minutes from twenty-six.

# azure_test_public_ip
===============================================================================
Delete VM ------------------------------------------------------------- 644.34s
Create virtual machine with a single NIC and no boot diagnostics ------ 168.54s
Resize VM -------------------------------------------------------------- 67.81s
Re-enable boot diagnostics on an existing VM where it was previously configured -- 66.20s
Enable boot diagnostics on an existing VM for the first time without specifying a storage account -- 55.28s
Disable boot diagnostics and change the storage account at the same time -- 35.98s
Change the boot diagnostics storage account while enabled -------------- 35.72s
Create NIC for single nic VM ------------------------------------------- 34.56s
Create security group -------------------------------------------------- 23.85s
SETUP | Create storage account ----------------------------------------- 22.00s
SETUP | Create virtual network ----------------------------------------- 17.19s
Destroy virtual network ------------------------------------------------ 12.66s
Destroy security group ------------------------------------------------- 12.57s
Destroy subnet --------------------------------------------------------- 12.37s
Create public ip -------------------------------------------------------- 9.79s
SETUP | Create subnet --------------------------------------------------- 5.88s
Destroy storage account ------------------------------------------------- 4.04s
Should be idempotent with a single NIC ---------------------------------- 3.70s
SETUP | Create availability set ----------------------------------------- 2.74s
Destroy availability set ------------------------------------------------ 2.20s
Playbook run took 0 days, 0 hours, 20 minutes, 48 seconds
Run command: /usr/bin/python test/runner/versions.py

samdoran added some commits Jun 7, 2019

Further segment and refine tests
- use different networks per host
- use better names for parameters rather than using vm_name for everything
- split out invalid tests into its own thread
- have deallocate test destry its resources
Specify virtual_network to avoid deletion failures
If the virtual_network is not specificed, the default for the resource group is used. This is problematic because a thread may want to delete a subnet that another VM is using because those tests are still running. By specifying the network, this is avoided because each VM will be using its own network.

@samdoran samdoran changed the title [WIP] Split azure_rm_virtualmachine integration tests to prevent timeouts Split azure_rm_virtualmachine integration tests to prevent timeouts Jun 7, 2019

@ansibot ansibot removed the WIP label Jun 7, 2019

@samdoran

This comment has been minimized.

Copy link
Member Author

commented Jun 10, 2019

Rerunning tests again just to test reliability.

@samdoran

This comment has been minimized.

Copy link
Member Author

commented Jun 10, 2019

The completed in 19 and 23 minutes. A good improvement.

@mattclay mattclay merged commit 99fd782 into ansible:devel Jun 13, 2019

1 check passed

Shippable Run 126846 status is SUCCESS.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.