Skip to content

Conversation

@GutoVeronezi
Copy link
Contributor

Description

ACS has an API to scale VMs scaleVirtualMachine, however dynamic scale of VMs is limited to XenServer and VMWare.

This PR intends suport live scale for VMs on KVM.

When we are using a custom service offerings, KVM's XML only receives the boot memory, but with this changes it will receive the max values defined in the service offerings. If it is a unconstrainced service offering, max values will be the vm.serviceoffering.cpu.cores.max and vm.serviceoffering.ram.size.max values . If they are 0, max values will be host or last host capacities.

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • Major
  • Minor

How Has This Been Tested?

I created unit tests for the code that is being introduced here. Moreover, it has been tested locally in a test lab. I created multiples VMs with multiples definitions and tested API with they:

VM STATE INIT CPU CORES UP TO INIT RAM SIZE UP TO SERVICE OFFERING RAM SERVICE OFFERING CPU SERVICE OFFERING MAX RAM CONFIG MAX CPU CONFIG EXPECTED RESULT
RUNNING 1 3 64 128 CONSTRAINED 256 2 512 4 FAIL
RUNNING 1 2 64 257 CONSTRAINED 256 2 512 4 FAIL
RUNNING 1 2 128 64 CONSTRAINED 256 2 512 4 FAIL
RUNNING 2 1 64 64 CONSTRAINED 256 2 512 4 FAIL
STOPPED 1 3 64 128 CONSTRAINED 256 2 512 4 FAIL
STOPPED 1 2 64 257 CONSTRAINED 256 2 512 4 FAIL
RUNNING 1 2 64 128 CONSTRAINED 256 2 512 4 SUCCESS
RUNNING 1 1 64 128 CONSTRAINED 256 2 512 4 SUCCESS
RUNNING 1 2 64 64 CONSTRAINED 256 2 512 4 SUCCESS
STOPPED 1 2 128 64 CONSTRAINED 256 2 512 4 SUCCESS
STOPPED 2 1 64 64 CONSTRAINED 256 2 512 4 SUCCESS
STOPPED 1 1 64 128 CONSTRAINED 256 2 512 4 SUCCESS
STOPPED 1 1 128 64 CONSTRAINED 256 2 512 4 SUCCESS
RUNNING 1 2 128 64 UNCONSTRAINED 512 4 FAIL
RUNNING 2 1 64 64 UNCONSTRAINED 512 4 FAIL
STOPPED 1 2 128 64 UNCONSTRAINED 512 4 SUCCESS
STOPPED 2 1 64 64 UNCONSTRAINED 512 4 SUCCESS
RUNNING 1 3 64 128 UNCONSTRAINED 512 4 FAIL
STOPPED 1 1 64 4000 UNCONSTRAINED 512 4 FAIL
RUNNING 1 2 64 128 UNCONSTRAINED 512 4 SUCCESS
RUNNING 1 1 64 128 UNCONSTRAINED 512 4 SUCCESS
RUNNING 1 2 64 64 UNCONSTRAINED 512 4 SUCCESS
RUNNING 1 2 64 257 UNCONSTRAINED 512 4 SUCCESS
RUNNING 1 1 64 700 UNCONSTRAINED 0 0 SUCCESS
STOPPED 1 2 64 128 UNCONSTRAINED 512 4 SUCCESS
STOPPED 1 1 64 128 UNCONSTRAINED 512 4 SUCCESS
STOPPED 1 2 64 64 UNCONSTRAINED 512 4 SUCCESS
STOPPED 1 2 64 257 UNCONSTRAINED 512 4 SUCCESS
STOPPED 1 1 64 700 UNCONSTRAINED 512 4 FAIL

@DaanHoogland
Copy link
Contributor

¿@wido @weizhouapache @ravening @GabrielBrascher , can you guys look at this please?

@DaanHoogland DaanHoogland added this to the 4.16.0.0 milestone Mar 29, 2021
@GutoVeronezi GutoVeronezi marked this pull request as draft April 1, 2021 13:07
@GutoVeronezi GutoVeronezi changed the title Support vm dynamic scalling with kvm Support vm dynamic scaling with kvm Apr 7, 2021
@GutoVeronezi GutoVeronezi deleted the support-vm-dynamic-scalling-with-kvm branch April 7, 2021 13:45
@GutoVeronezi GutoVeronezi restored the support-vm-dynamic-scalling-with-kvm branch April 7, 2021 13:46
@GutoVeronezi GutoVeronezi reopened this Apr 7, 2021
@GutoVeronezi GutoVeronezi marked this pull request as ready for review April 7, 2021 15:24
@wido wido requested a review from GabrielBrascher April 7, 2021 15:43
@wido
Copy link
Contributor

wido commented Apr 7, 2021

In this PR: #4341

The functionality was added that the rootDiskSize can be configured in an Offering.

When we scale the VM we should also scale the rootdisk's size if that is bigger in the new offering.

This PR does not seem to do that. We probably want that added as well.

I really like this PR though. Very much welcome!

@GutoVeronezi
Copy link
Contributor Author

@wido We already have the API resizeVolume, which is designed to scale the root disk as well as data disks, and it works with VMs in stopped or running states.

This API scaleVirtualMachine was first designed to scale computing resources (memory, cpuNumber and cpuSpeed). I think we should not mix these two concepts here.

@wido
Copy link
Contributor

wido commented Apr 8, 2021

@wido We already have the API resizeVolume, which is designed to scale the root disk as well as data disks, and it works with VMs in stopped or running states.

This API scaleVirtualMachine was first designed to scale computing resources (memory, cpuNumber and cpuSpeed). I think we should not mix these two concepts here.

After reading the code and comments from @GabrielBrascher I understand this code only works with custom service offerings, correct?

If so, please disregard my comments.

Otherwise, with PR #4341 merged we now have the option to make the root disk size a condition of the service offering. Thus when changing the offering one needs to make sure that the root disk is also resized according to the new offering.

@GutoVeronezi GutoVeronezi force-pushed the support-vm-dynamic-scalling-with-kvm branch from 3006d61 to a087de9 Compare April 19, 2021 22:48
@GutoVeronezi GutoVeronezi force-pushed the support-vm-dynamic-scalling-with-kvm branch from a087de9 to 3ac82ba Compare April 27, 2021 18:31
@GutoVeronezi
Copy link
Contributor Author

Can anyone review this?

@DaanHoogland
Copy link
Contributor

@blueorangutan package

@blueorangutan
Copy link

@DaanHoogland a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✔️ el7 ✔️ el8 ✔️ debian. SL-JID 838

@DaanHoogland
Copy link
Contributor

@blueorangutan test

@blueorangutan
Copy link

@DaanHoogland a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@blueorangutan
Copy link

Trillian test result (tid-1601)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 48404 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr4878-t1601-kvm-centos7.zip
Intermittent failure detected: /marvin/tests/smoke/test_diagnostics.py
Intermittent failure detected: /marvin/tests/smoke/test_domain_service_offerings.py
Intermittent failure detected: /marvin/tests/smoke/test_internal_lb.py
Intermittent failure detected: /marvin/tests/smoke/test_kubernetes_clusters.py
Intermittent failure detected: /marvin/tests/smoke/test_primary_storage.py
Intermittent failure detected: /marvin/tests/smoke/test_service_offerings.py
Intermittent failure detected: /marvin/tests/smoke/test_snapshots.py
Intermittent failure detected: /marvin/tests/smoke/test_usage.py
Intermittent failure detected: /marvin/tests/smoke/test_vm_deployment_planner.py
Intermittent failure detected: /marvin/tests/smoke/test_vm_life_cycle.py
Intermittent failure detected: /marvin/tests/smoke/test_volumes.py
Smoke tests completed. 80 look OK, 8 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File
test_01_add_primary_storage_disabled_host Error 0.14 test_primary_storage.py
test_03_migration_options_storage_tags Error 0.01 test_primary_storage.py
test_01_internallb_roundrobin_1VPC_3VM_HTTP_port80 Failure 448.37 test_internal_lb.py
test_02_internallb_roundrobin_1RVPC_3VM_HTTP_port80 Failure 542.12 test_internal_lb.py
test_01_invalid_upgrade_kubernetes_cluster Failure 670.76 test_kubernetes_clusters.py
test_02_deploy_and_upgrade_kubernetes_cluster Failure 78.29 test_kubernetes_clusters.py
test_03_deploy_and_scale_kubernetes_cluster Failure 66.51 test_kubernetes_clusters.py
test_04_basic_lifecycle_kubernetes_cluster Failure 59.36 test_kubernetes_clusters.py
test_05_delete_kubernetes_cluster Failure 76.61 test_kubernetes_clusters.py
test_07_deploy_kubernetes_ha_cluster Failure 0.21 test_kubernetes_clusters.py
test_08_deploy_and_upgrade_kubernetes_ha_cluster Failure 3.22 test_kubernetes_clusters.py
test_09_delete_kubernetes_ha_cluster Failure 0.24 test_kubernetes_clusters.py
ContextSuite context=TestKubernetesCluster>:teardown Error 173.56 test_kubernetes_clusters.py
ContextSuite context=TestCpuCapServiceOfferings>:setup Error 0.00 test_service_offerings.py
test_02_list_snapshots_with_removed_data_store Error 0.27 test_snapshots.py
ContextSuite context=TestSnapshotUsage>:setup Error 150.40 test_usage.py
test_01_deploy_vm_on_specific_host Error 1.13 test_vm_deployment_planner.py
test_04_deploy_vm_on_host_override_pod_and_cluster Error 1.65 test_vm_deployment_planner.py
test_01_secure_vm_migration Error 85.92 test_vm_life_cycle.py
test_02_unsecure_vm_migration Error 225.57 test_vm_life_cycle.py
test_04_nonsecured_to_secured_vm_migration Error 148.24 test_vm_life_cycle.py
test_08_migrate_vm Error 0.05 test_vm_life_cycle.py

@GutoVeronezi GutoVeronezi force-pushed the support-vm-dynamic-scalling-with-kvm branch from e5ab57c to 4cde5e3 Compare August 17, 2021 12:17
Copy link
Member

@GabrielBrascher GabrielBrascher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GutoVeronezi overall it looks good. I've raised a few points to change in regards to the Source code header to meet the Apache license for Apache's projects.

@@ -0,0 +1,43 @@
/*
* Copyright 2021 The Apache Software Foundation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check the Copyright

@@ -0,0 +1,246 @@
/*
* Copyright 2021 The Apache Software Foundation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check the Copyright

@@ -0,0 +1,105 @@
/*
* Copyright 2021 The Apache Software Foundation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check the Copyright

@@ -0,0 +1,45 @@
/*
* Copyright 2021 The Apache Software Foundation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GutoVeronezi I think that these source code headers need to be updated removing Copyright as we discussed in another PR (sorry but I forgot which one xD)

@GabrielBrascher
Copy link
Member

GabrielBrascher commented Aug 19, 2021

Manual tests LGTM.
I will "officially" approve when the license headers are updated 👍 .

@GutoVeronezi
Copy link
Contributor Author

@GabrielBrascher done, thanks.

Copy link
Member

@GabrielBrascher GabrielBrascher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, @GutoVeronezi.
LGTM, based on manual tests and code review.

@DaanHoogland
Copy link
Contributor

@blueorangutan package

@blueorangutan
Copy link

@DaanHoogland a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✔️ el7 ✔️ el8 ✔️ debian ✔️ suse15. SL-JID 952

@DaanHoogland
Copy link
Contributor

@blueorangutan test

@blueorangutan
Copy link

@DaanHoogland a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@blueorangutan
Copy link

Trillian test result (tid-1736)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 33654 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr4878-t1736-kvm-centos7.zip
Intermittent failure detected: /marvin/tests/smoke/test_snapshots.py
Smoke tests completed. 89 look OK, 0 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants