Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve migration of external VMware VMs into KVM cluster #8815

Open
wants to merge 33 commits into
base: 4.19
Choose a base branch
from

Conversation

sureshanaparti
Copy link
Contributor

@sureshanaparti sureshanaparti commented Mar 20, 2024

Description

This PR improves/optimises the migration/import performance of external VMware VMs into KVM cluster.

The improved migration process still uses the 'virt-v2v' for importing the VMware VMs, and uses exported OVF (of the VMware VM) as the source for the conversion to KVM instance, instead of the cloned VM (in earlier migration, accessing vCenter using 'virt-v2v' from KVM host). For this,

  • CloudStack would try to pick the supported host for conversion if one exists, when conversion host is not specified, Otherwise uses any random Up & Enabled host as conversion host. It first checks if the host supports conversion (with virt-v2v, and nbdkit for Ubuntu host) or not.
  • CloudStack would then export OVF template files of the source VMware VM (or Cloned VM - for Powered-On VMs, it creates a clone of the source VM on VMware) to the temporary conversion location (uses primary or secondary storage - only NFS pools are supported, defaulted to secondary storage). Any cloned VM and OVF template files are cleaned up after the migration, or if any errors.
  • CloudStack then delegates the conversion to a KVM host. The KVM host accesses/uses the exported OVF on the temporary conversion location as the source for the migration using 'virt-v2v' ova input. It creates the temporary disks and xml on the temporary conversion location, and later these are moved to the destination storage pools to import them. Any errors during virt-v2v conversion due to unsupported guest OS, or other reasons are handled, and the conversion operation is failed.

Noticed 70-90% decrease in the time taken for the migration with this approach (Some earlier linux VM migrations with 3-5 GB disks which took 30-35 mins, now takes 2-3 mins with this improvement).

There are no changes in the VMs supported on vCenter - Stopped & Running Linux VMs, and Stopped Windows VMs are supported as earlier. Additionally, Stopped VMs on standalone hosts are supported (can be imported by specifying its default datacenter name - ha-datacenter along with host and credentails).

Note: All 'virt-v2v' limitations are still applicable. CloudStack will not perform any checks about the guest OS compatibility for the virt-v2v library as indicated on: https://access.redhat.com/articles/1351473.

Doc PR: apache/cloudstack-documentation#388

Other improvements in the migration process:

  • MS checks the host for instance conversion support (with virt-v2v, and nbdkit for Ubuntu host), & windows guest conversion support (virtio-win package) before attempting migration (using CheckConvertInstanceCommand / CheckConvertInstanceAnswer).
  • MS checks ovftool available in the host to export OVF from vCenter directly, if not MS is used to export OVF
  • Auto selects any existing host with instance conversion capability if conversion host not specified (uses detail 'host.instance.conversion').
  • New Parameter: forcemstoimportvmfiles in importVM API, to force MS to import VM files/OVF if required. Added UI support for it.
  • New detail 'host.instance.conversion' with value 'true' is added to host_details table if the host supports instance conversion (needs agent restart after installing virt-v2v).
  • New host response parameter 'instanceconversionsupported' - true, indicating the instance conversion support.
  • UI shows Instance Conversion Supported - true in Host details if host supports conversion (with virt-v2v, and nbdkit for Ubuntu host).
  • UI shows Supported KVM hosts (with virt-v2v) for conversion on Import VM dialog.
  • Support for parallel import/download of OVF disk files on MS or KVM Host, using threads that are configurable using the global settings: threads.on.ms.to.import.vmware.vm.files or threads.on.kvm.host.to.import.vmware.vm.files respectively.

Also, fixes: #8632

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)
  • build/CI

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • Major
  • Minor

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

Screenshots (if appropriate):

Host details (Instance Conversion Supported for KVM hosts with virt-v2v):

HostDetails-InstanceConversionSupported

Import VM from VMware (Supported KVM hosts with virt-v2v):

VmwareImport-KVMHostsSupported

Flag - to Force MS to import/download VM files (OVF):

VMwareMigrationToKVM_ForceMSFlag_Updated

Threads config - to import/download VM files

ThreadsConfig_VMwareMigration_Updated

How Has This Been Tested?

Manually tested importing external VMware VMs with single and multiple disks(on different datastore) to KVM cluster in CloudStack.

How did you try to break this feature and the system with this change?

@sureshanaparti
Copy link
Contributor Author

@blueorangutan package

@sureshanaparti sureshanaparti self-assigned this Mar 20, 2024
@sureshanaparti sureshanaparti added this to the 4.19.1.0 milestone Mar 20, 2024
@blueorangutan
Copy link

@sureshanaparti a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

Copy link

codecov bot commented Mar 20, 2024

Codecov Report

Attention: Patch coverage is 26.35135% with 109 lines in your changes are missing coverage. Please review.

Project coverage is 30.97%. Comparing base (308ed13) to head (3ebee84).
Report is 4 commits behind head on 4.19.

Files Patch % Lines
...ain/java/com/cloud/hypervisor/guru/VMwareGuru.java 0.00% 66 Missing ⚠️
.../storage/datastore/db/PrimaryDataStoreDaoImpl.java 0.00% 14 Missing ⚠️
.../apache/cloudstack/vm/UnmanagedVMsManagerImpl.java 66.66% 7 Missing and 5 partials ⚠️
...tack/engine/orchestration/NetworkOrchestrator.java 0.00% 7 Missing ⚠️
.../wrapper/LibvirtConvertInstanceCommandWrapper.java 70.58% 3 Missing and 2 partials ⚠️
.../java/com/cloud/hypervisor/HypervisorGuruBase.java 0.00% 2 Missing ⚠️
...m/cloud/hypervisor/vmware/mo/VirtualMachineMO.java 0.00% 2 Missing ⚠️
...va/com/cloud/agent/api/ConvertInstanceCommand.java 66.66% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               4.19    #8815      +/-   ##
============================================
+ Coverage     30.92%   30.97%   +0.05%     
- Complexity    34290    34372      +82     
============================================
  Files          5355     5355              
  Lines        376634   376724      +90     
  Branches      54808    54823      +15     
============================================
+ Hits         116480   116705     +225     
+ Misses       244820   244661     -159     
- Partials      15334    15358      +24     
Flag Coverage Δ
simulator-marvin-tests 24.85% <0.00%> (+0.10%) ⬆️
uitests 4.39% <ø> (ø)
unit-tests 16.56% <26.35%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@sureshanaparti sureshanaparti changed the title Improve migration of VMware VMs into KVM cluster Improve migration of external VMware VMs into KVM cluster Mar 20, 2024
@blueorangutan
Copy link

Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 8985

@sureshanaparti
Copy link
Contributor Author

@blueorangutan test matrix

@blueorangutan
Copy link

@sureshanaparti a [SL] Trillian-Jenkins matrix job (centos7 mgmt + xenserver71, rocky8 mgmt + vmware67u3, centos7 mgmt + kvmcentos7) has been kicked to run smoke tests

@blueorangutan
Copy link

[SF] Trillian test result (tid-9541)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 41640 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr8815-t9541-kvm-centos7.zip
Smoke tests completed. 129 look OK, 0 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File

@blueorangutan
Copy link

[SF] Trillian test result (tid-9539)
Environment: xenserver-71 (x2), Advanced Networking with Mgmt server 7
Total time taken: 45973 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr8815-t9539-xenserver-71.zip
Smoke tests completed. 128 look OK, 1 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_02_trigger_shutdown Failure 336.63 test_safe_shutdown.py

@blueorangutan
Copy link

[SF] Trillian test result (tid-9540)
Environment: vmware-67u3 (x2), Advanced Networking with Mgmt server r8
Total time taken: 46712 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr8815-t9540-vmware-67u3.zip
Smoke tests completed. 129 look OK, 0 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File

Copy link
Contributor

@nvazquez nvazquez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code LGTM

@sureshanaparti
Copy link
Contributor Author

@blueorangutan package

@blueorangutan
Copy link

@sureshanaparti a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 9003

@sureshanaparti
Copy link
Contributor Author

@blueorangutan test matrix

@blueorangutan
Copy link

@sureshanaparti a [SL] Trillian-Jenkins matrix job (centos7 mgmt + xenserver71, rocky8 mgmt + vmware67u3, centos7 mgmt + kvmcentos7) has been kicked to run smoke tests

@blueorangutan
Copy link

[SF] Trillian test result (tid-9553)
Environment: xenserver-71 (x2), Advanced Networking with Mgmt server 7
Total time taken: 41092 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr8815-t9553-xenserver-71.zip
Smoke tests completed. 128 look OK, 1 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_02_trigger_shutdown Failure 336.82 test_safe_shutdown.py

@blueorangutan
Copy link

[SF] Trillian test result (tid-9554)
Environment: vmware-67u3 (x2), Advanced Networking with Mgmt server r8
Total time taken: 47384 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr8815-t9554-vmware-67u3.zip
Smoke tests completed. 128 look OK, 1 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_02_balanced_drs_algorithm Failure 128.96 test_cluster_drs.py

Copy link
Contributor

@DaanHoogland DaanHoogland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clgtm

…om VMware, instead of OVA file

 - this would further increase the migration performance (as it reduces the time for OVA preparation / archiving of the VM files into a single file)
…g OVF from MS, and other changes below.

- Skip clone for powered off VMs
- Fixes to support standalone host (with its default datacenter)
- Some code improvements
@sureshanaparti sureshanaparti force-pushed the vmware-to-kvm-migration-improvements branch from 449c158 to 3cb750d Compare June 24, 2024 20:13
@sureshanaparti
Copy link
Contributor Author

@blueorangutan package

@blueorangutan
Copy link

@sureshanaparti a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 10100

@sureshanaparti
Copy link
Contributor Author

@blueorangutan test matrix

@blueorangutan
Copy link

@sureshanaparti a [SL] Trillian-Jenkins matrix job (centos7 mgmt + xenserver71, rocky8 mgmt + vmware67u3, centos7 mgmt + kvmcentos7) has been kicked to run smoke tests

@blueorangutan
Copy link

[SF] Trillian test result (tid-10603)
Environment: xenserver-71 (x2), Advanced Networking with Mgmt server 7
Total time taken: 42055 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr8815-t10603-xenserver-71.zip
Smoke tests completed. 131 look OK, 0 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File

@blueorangutan
Copy link

[SF] Trillian test result (tid-10605)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 46262 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr8815-t10605-kvm-centos7.zip
Smoke tests completed. 131 look OK, 0 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File

@blueorangutan
Copy link

[SF] Trillian test result (tid-10604)
Environment: vmware-67u3 (x2), Advanced Networking with Mgmt server r8
Total time taken: 79916 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr8815-t10604-vmware-67u3.zip
Smoke tests completed. 128 look OK, 3 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_01_restore_vm Error 3618.07 test_restore_vm.py
test_02_restore_vm_allocated_root Error 6.73 test_restore_vm.py
test_01_deploy_vm_on_specific_host Error 13.64 test_vm_deployment_planner.py
test_02_deploy_vm_on_specific_cluster Error 3602.65 test_vm_deployment_planner.py
test_03_deploy_vm_on_specific_pod Error 4.44 test_vm_deployment_planner.py
test_04_deploy_vm_on_host_override_pod_and_cluster Error 2.47 test_vm_deployment_planner.py
test_05_deploy_vm_on_cluster_override_pod Error 2.36 test_vm_deployment_planner.py
test_09_expunge_vm Failure 425.69 test_vm_life_cycle.py

@sureshanaparti
Copy link
Contributor Author

@blueorangutan test rocky8 vmware-70u3

@blueorangutan
Copy link

@sureshanaparti a [SL] Trillian-Jenkins test job (rocky8 mgmt + vmware-70u3) has been kicked to run smoke tests

Copy link

This pull request has merge conflicts. Dear author, please fix the conflicts and sync your branch with the base branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment