Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLOUDSTACK-9428: Fix for CLOUDSTACK-9211 - Improve performance of 3D GPU support in cloud-plugin-hypervisor-vmware #1605

Merged
merged 2 commits into from Sep 11, 2016

Conversation

nvazquez
Copy link
Contributor

@nvazquez nvazquez commented Jul 5, 2016

JIRA TICKET: https://issues.apache.org/jira/browse/CLOUDSTACK-9428

Introduction

On #1310 passing vRAM size to support 3D GPU problem was addressed on VMware. It was found out that it could be improved to increase performance by reducing extra API calls, as we'll describe later

Improvement

On WMware, VmwareResource manages execution of StartCommand. Before sending power on command to ESXi hypervisor, vm is configured by calling reconfigVMTask web method on vSphere's client VimPortType web service.
It was found out that we were using this method 2 times when passing vRAM size, as it implied creating a new vm config spec only editing video card specs and making an extra call to reconfigVMTask.

We propose reducing the extra web service call by adjusting vm's config spec. This way video card gets properly configured (when passing vRAM size) in the same configure call, increasing performance.

Use case (passing vRAM size)

  • Deploy a new VM, let its id be X
  • Stop VM
  • Execute SQL, where X is vm's id and Z is vRAM size (in kB):
INSERT INTO cloud.user_vm_details (vm_id, name, value) VALUES (X, 'mks.enable3d', 'true');
INSERT INTO cloud.user_vm_details (vm_id, name, value) VALUES (X, 'mks.use3dRenderer', 'automatic');
INSERT INTO cloud.user_vm_details (vm_id, name, value) VALUES (X, 'svga.autodetect', 'false');
INSERT INTO cloud.user_vm_details (vm_id, name, value) VALUES (X, 'svga.vramSize', Z);
  • Start VM

@rohityadavcloud
Copy link
Member

In general, a good idea. I've not tested it though. @nvazquez can you re-kick Jenkins by pushing -f on the PR

*/
protected void postVideoCardMemoryConfigBeforeStart(VirtualMachineMO vmMo, VirtualMachineTO vmSpec) {
protected void videoCardMemoryConfig(VirtualMachineMO vmMo, VirtualMachineTO vmSpec, VirtualMachineConfigSpec vmConfigSpec) {
String paramVRamSize = "svga.vramSize";

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Thanks @nvazquez for this PR. It indeed reduces additional API calls.
I'd suggest few minor changes though.
Please move this string constant "svga.vramSize" to VmDetailConstants which is place holder for VM detail constants.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks!

@nvazquez
Copy link
Contributor Author

nvazquez commented Jul 6, 2016

Thanks @rhtyd @sateesh-chodapuneedi for your reviews! I pushed new chages based on @sateesh-chodapuneedi comments, now Jenkins passed.

@sateesh-chodapuneedi
Copy link
Member

@nvazquez Thanks.
I will try this in my setup and share the results.

@serg38
Copy link

serg38 commented Aug 4, 2016

Ping for review -- @sateesh-chodapuneedi, @rhtyd, @koushik-das

@nvazquez nvazquez force-pushed the fixVram branch 2 times, most recently from e78a69a to d21c485 Compare August 5, 2016 20:30
@rohityadavcloud
Copy link
Member

@blueorangutan package

@blueorangutan
Copy link

@rhtyd a Trillian-Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✖centos6 ✔centos7 ✔debian repo: http://packages.shapeblue.com/cloudstack/pr/1605

@serg38
Copy link

serg38 commented Aug 16, 2016

@rafaelweingartner @swill Will you be able to review this one?

@sateesh-chodapuneedi
Copy link
Member

LGTM 👍

@nvazquez
Copy link
Contributor Author

Thanks @sateesh-chodapuneedi!
You've mentioned before that you will test this in your env, can you please share your tests results?

VirtualMachineConfigSpec changeVideoCardSpecs = new VirtualMachineConfigSpec();
changeVideoCardSpecs.getDeviceChange().add(arrayVideoCardConfigSpecs);
return changeVideoCardSpecs;
vmConfigSpec.getDeviceChange().add(arrayVideoCardConfigSpecs);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible for getDeviceChange to return null?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. Device change done in the structure locally and exception occurs during the VM reconfiguration on the hypervisor if anything in the config is wrong.

@jburwell
Copy link
Contributor

@nvazquez could you please create Marvin test case to verify the use case in the PR description?

@serg38
Copy link

serg38 commented Aug 23, 2016

@jburwell The support for passing vGPU parameters is already implemented via PR1310 which has a test case. This PR is only performance optimization by reducing an extra reconfgtask which save about 5 sec during VM startup.

@rafaelweingartner
Copy link
Member

@nvazquez,
Very nice proposal this one. I have only very small suggestions, which are the following:

  • The method “videoCardConfig”, would be better if called “configureVideoCard”, normally I treat methods as actions, and as such, they normally start with a verb. This is cosmetics, but I think it helps to follow the code.
  • Method “modifyVmVideoCardVRamSize” does not need that throws Exception.
  • Would you mind creating a test case for “modifyVmVideoCardVRamSize”, it is a very simple integration test case. It is pretty easy to do with mocks, if you need any help, just send me an email.
  • If you remove the “throws exception from modifyVmVideoCardVRamSize”, you can also remove from “setNewVRamSizeVmVideoCard”.
  • Would you mind creating test cases for the method “setNewVRamSizeVmVideoCard”?
  • Method “videoCardConfig”, if you removed the “throw Exception” from the above methods, you can remove the “Catch Exception from here”
  • And also, what about test cases for the method “videoCardConfig”?
  • What about changing the verb of “configSpecVideoCardNewVRamSize” method to its complete form “configure”?
  • And finally, what about a test case for “configSpecVideoCardNewVRamSize”?

@nvazquez great work, my suggestions are mostly aesthetics, but I think they can help improve this PR.

@jburwell
Copy link
Contributor

@serg38 looking through PR #1310, I don't see any Marvin test cases to exercising this behavior.

@nvazquez please add Marvin tests to exercise specifying vGPU parameters when creating a VM and updating its configuration.

@nvazquez
Copy link
Contributor Author

@rafaelweingartner Thanks for your review! I'll work on your suggestions!
@jburwell Sure, I'll work on it

@nvazquez
Copy link
Contributor Author

Thanks @rafaelweingartner for your review! I've been working on it and pushed changes based on your comments!
The only thing I need to mention is that actually I can't remove throws Exception from setNewVRamSizeVmVideoCard method, as getAllDeviceList method throws Exception and that's the reason for catch block on configureVideoCard. Do you agree with this?

@rafaelweingartner
Copy link
Member

rafaelweingartner commented Aug 24, 2016

@nvazquez, I am sorry, my bad; last time I read the code, I overlooked the method com.cloud.hypervisor.vmware.mo.VirtualMachineMO.getAllDeviceList(). Having said that, how do you feel about extracting the try/catch block to “getAllDeviceList”, then this method would not need to throw a checked exception. It could re-throw a runtime exception such as the CloudRuntime.

From my understanding the test method “testStartVm3dgpuEnabled” is used to test the method “configureVideoCard”, right? What about changing that method name to “testConfigureVideoCard”?

Moreover, do not you think that the method “configureVideoCard” needs at least two test cases? One that is already written, and the other to test the condition in which the call “vmSpec.getDetails().containsKey(VmDetailConstants.SVGA_VRAM_SIZE)” returns false, and then the method “setNewVRamSizeVmVideoCard” is never called.

The same happens for the method “modifyVmVideoCardVRamSize”, I think it has another condition “videoCard.getVideoRamSizeInKB().longValue() == svgaVmramSize” to be tested.

And finally, for the test method “testConfigureSpecVideoCardNewVRamSize”, what about using the “Mockito.InOrder” to assure that the methods are being called in the proper order. If someone changes that order of call in the future, that can affect the method behavior.

I also have one small comment about the title of the PR and the commit message. “Improve performance” seems a little too vague for a title. The idea is very well explained on the PR, but the title does not reflect much. What about, “improve the performance of cloud-plugin-hypervisor-vmware” or something like that.

@serg38
Copy link

serg38 commented Aug 25, 2016

LGTM for testing. Vmware ESX 5.5 and 6.0 hypervisors, advanced networking, RHEL 6 management servers

[root@ussarlabcsmgt41 ~]# cat /tmp//MarvinLogs/test_volumes_930LZ3/results.txt|grep -v ok
test DeployVM in anti-affinity groups for project ... === TestName: test_DeployVmAntiAffinityGroup_in_project | Status : SUCCESS ===
test DeployVM in anti-affinity groups ... === TestName: test_DeployVmAntiAffinityGroup | Status : SUCCESS ===
Test Deploy Virtual Machine ... SKIP: Skipping test because suitable hypervisor/host not present
Test Deploy Virtual Machine from ISO ... === TestName: test_deploy_vm_from_iso | Status : SUCCESS ===
Test deploy virtual machine with root resize ... === TestName: test_00_deploy_vm_root_resize | Status : SUCCESS ===
Test proper failure to deploy virtual machine with rootdisksize of 0 ... === TestName: test_01_deploy_vm_root_resize | Status : SUCCESS ===
Test proper failure to deploy virtual machine with rootdisksize less than template size ... === TestName: test_02_deploy_vm_root_resize | Status : SUCCESS ===
Test to deploy vm with a first fit offering ... === TestName: test_deployvm_firstfit | Status : SUCCESS ===
Test deploy VMs using user concentrated planner ... === TestName: test_deployvm_userconcentrated | Status : SUCCESS ===
Test deploy VMs using user dispersion planner ... === TestName: test_deployvm_userdispersing | Status : SUCCESS ===
Test userdata as GET, size > 2k ... === TestName: test_deployvm_userdata | Status : SUCCESS ===
Test userdata as POST, size > 2k ... === TestName: test_deployvm_userdata_post | Status : SUCCESS ===
Test to create disk offering ... === TestName: test_01_create_disk_offering | Status : SUCCESS ===
Test to create a sparse type disk offering ... === TestName: test_02_create_sparse_type_disk_offering | Status : SUCCESS ===
Test to create a sparse type disk offering ... === TestName: test_04_create_fat_type_disk_offering | Status : SUCCESS ===
Test to update existing disk offering ... === TestName: test_02_edit_disk_offering | Status : SUCCESS ===
Test to delete disk offering ... === TestName: test_03_delete_disk_offering | Status : SUCCESS ===
Test to ensure 4 default roles cannot be deleted ... SKIP: Dynamic Role-Based API checker not enabled, skipping test
Test to check role, role permissions and account life cycles ... SKIP: Dynamic Role-Based API checker not enabled, skipping test
Test for role-rule enforcement in case of multiple mgmt servers ... SKIP: Dynamic Role-Based API checker not enabled, skipping test
Test to ensure role in use cannot be deleted ... SKIP: Dynamic Role-Based API checker not enabled, skipping test
Tests normal lifecycle operations for roles ... SKIP: Dynamic Role-Based API checker not enabled, skipping test
Tests role update ... SKIP: Dynamic Role-Based API checker not enabled, skipping test
Tests that default four roles exist ... SKIP: Dynamic Role-Based API checker not enabled, skipping test
Tests role update ... SKIP: Dynamic Role-Based API checker not enabled, skipping test
Tests role update when role is in use by an account ... SKIP: Dynamic Role-Based API checker not enabled, skipping test
Tests concurrent order updation of role permission ... SKIP: Dynamic Role-Based API checker not enabled, skipping test
Tests creation of role permission ... SKIP: Dynamic Role-Based API checker not enabled, skipping test
Tests deletion of role permission ... SKIP: Dynamic Role-Based API checker not enabled, skipping test
Tests listing of default role's permission ... SKIP: Dynamic Role-Based API checker not enabled, skipping test
Tests order updation of role permission ... SKIP: Dynamic Role-Based API checker not enabled, skipping test
test update configuration setting at zone level scope ... === TestName: test_UpdateConfigParamWithScope | Status : SUCCESS ===
Test guest vlan range dedication ... === TestName: test_dedicateGuestVlanRange | Status : SUCCESS ===
Test create public & private ISO ... === TestName: test_01_create_iso | Status : SUCCESS ===
Test Edit ISO ... === TestName: test_02_edit_iso | Status : SUCCESS ===
Test delete ISO ... === TestName: test_03_delete_iso | Status : SUCCESS ===
Test for extract ISO ... === TestName: test_04_extract_Iso | Status : SUCCESS ===
Update & Test for ISO permissions ... === TestName: test_05_iso_permissions | Status : SUCCESS ===
Test for copy ISO from one zone to another ... SKIP: Not enough zones available to perform copy template
Test delete ISO ... === TestName: test_07_list_default_iso | Status : SUCCESS ===
Test listing Volumes using 'ids' parameter ... === TestName: test_01_list_volumes | Status : SUCCESS ===
Test listing Templates using 'ids' parameter ... === TestName: test_02_list_templates | Status : SUCCESS ===
Test listing Snapshots using 'ids' parameter ... === TestName: test_03_list_snapshots | Status : SUCCESS ===
Test to create Load balancing rule with source NAT ... === TestName: test_01_create_lb_rule_src_nat | Status : SUCCESS ===
Test to create Load balancing rule with non source NAT ... === TestName: test_02_create_lb_rule_non_nat | Status : SUCCESS ===
Test for assign & removing load balancing rule ... === TestName: test_assign_and_removal_lb | Status : SUCCESS ===
Tests that SAML users are not allowed CloudStack local log in ... === TestName: login_test_saml_user | Status : SUCCESS ===
Test network ACL lists and items in VPC ... === TestName: test_network_acl | Status : SUCCESS ===
Register a template for VMware with nicAdapter vmxnet3 ... SKIP: VCenter API Integration Remaining
Test to add and update added nic to a virtual machine ... === TestName: test_01_nic | Status : SUCCESS ===
Test to update a physical network and extend its vlan ... === TestName: test_extendPhysicalNetworkVlan | Status : SUCCESS ===
test update configuration setting at storage scope ... === TestName: test_UpdateStorageOverProvisioningFactor | Status : SUCCESS ===
Test for create region ... === TestName: test_createRegion | Status : SUCCESS ===
Test reset virtual machine on reboot ... === TestName: test_01_reset_vm_on_reboot | Status : SUCCESS ===
Test volume detail ... === TestName: test_01_updatevolumedetail | Status : SUCCESS ===
Test scale virtual machine ... SKIP: Skipping scale VM operation because VMware tools are not installed on the VM
Test to create service offering ... === TestName: test_01_create_service_offering | Status : SUCCESS ===
Test to update existing service offering ... === TestName: test_02_edit_service_offering | Status : SUCCESS ===
Test to delete service offering ... === TestName: test_03_delete_service_offering | Status : SUCCESS ===
Test to change service to a small capacity ... === TestName: test_04_change_offering_small | Status : SUCCESS ===
Test List secondary storage VMs ... === TestName: test_01_list_sec_storage_vm | Status : SUCCESS ===
Test List console proxy VMs ... === TestName: test_02_list_cpvm_vm | Status : SUCCESS ===
Test SSVM Internals ... === TestName: test_03_ssvm_internals | Status : SUCCESS ===
Test CPVM Internals ... === TestName: test_04_cpvm_internals | Status : SUCCESS ===
Test stop SSVM ... === TestName: test_05_stop_ssvm | Status : SUCCESS ===
Test stop CPVM ... === TestName: test_06_stop_cpvm | Status : SUCCESS ===
Test reboot SSVM ... === TestName: test_07_reboot_ssvm | Status : SUCCESS ===
Test reboot CPVM ... === TestName: test_08_reboot_cpvm | Status : SUCCESS ===
Test destroy SSVM ... === TestName: test_09_destroy_ssvm | Status : SUCCESS ===
Test destroy CPVM ... === TestName: test_10_destroy_cpvm | Status : SUCCESS ===
Tests allowed APIs for common account types ... === TestName: test_static_role_account_acls | Status : SUCCESS ===
Test create public & private template ... === TestName: test_01_create_template | Status : SUCCESS ===
Test when createTemplate is used to create templates having the same name all of them get ... === TestName: test_CreateTemplateWithDuplicateName | Status : SUCCESS ===
Test Edit template ... === TestName: test_02_edit_template | Status : SUCCESS ===
Test delete template ... === TestName: test_03_delete_template | Status : SUCCESS ===
Test for extract template ... === TestName: test_04_extract_template | Status : SUCCESS ===
Update & Test for template permissions ... === TestName: test_05_template_permissions | Status : SUCCESS ===
Test for copy template from one zone to another ... SKIP: Not enough zones available to perform copy template
Test only public templates are visible to normal user ... === TestName: test_07_list_public_templates | Status : SUCCESS ===
Test System templates are not visible to normal user ... === TestName: test_08_list_system_templates | Status : SUCCESS ===
Check events in usage_events table when VM creation fails ... === TestName: test_01_positive_tests_usage | Status : SUCCESS ===
Test advanced zone virtual router ... === TestName: test_advZoneVirtualRouter | Status : SUCCESS ===
Tests for basic zone virtual router ... === TestName: test_basicZoneVirtualRouter | Status : SUCCESS ===
Test Deploy Virtual Machine ... === TestName: test_deploy_vm | Status : SUCCESS ===
Test Multiple Deploy Virtual Machine ... === TestName: test_deploy_vm_multiple | Status : SUCCESS ===
Test Stop Virtual Machine ... === TestName: test_01_stop_vm | Status : SUCCESS ===
Test Start Virtual Machine ... === TestName: test_02_start_vm | Status : SUCCESS ===
Test Reboot Virtual Machine ... === TestName: test_03_reboot_vm | Status : SUCCESS ===
Test destroy Virtual Machine ... === TestName: test_06_destroy_vm | Status : SUCCESS ===
Test recover Virtual Machine ... === TestName: test_07_restore_vm | Status : SUCCESS ===
Test migrate VM ... === TestName: test_08_migrate_vm | Status : SUCCESS ===
Test destroy(expunge) Virtual Machine ... === TestName: test_09_expunge_vm | Status : SUCCESS ===
Test for attach and detach ISO to virtual machine ... === TestName: test_10_attachAndDetach_iso | Status : SUCCESS ===
Test Volume creation for all Disk Offerings (incl. custom) ... === TestName: test_01_create_volume | Status : SUCCESS ===
Attach a created Volume to a Running VM ... === TestName: test_02_attach_volume | Status : SUCCESS ===
Download a Volume attached to a VM ... === TestName: test_03_download_attached_volume | Status : SUCCESS ===
Delete a Volume attached to a VM ... === TestName: test_04_delete_attached_volume | Status : SUCCESS ===
Detach a Volume attached to a VM ... === TestName: test_05_detach_volume | Status : SUCCESS ===
Download a Volume unattached to an VM ... === TestName: test_06_download_detached_volume | Status : SUCCESS ===
Test resize (negative) non-existent volume ... SKIP: Resize Volume is unsupported on VmWare and Hyper-V
Test resize a volume ... SKIP: Resize Volume is unsupported on VmWare and Hyper-V
Delete a Volume unattached to an VM ... === TestName: test_09_delete_detached_volume | Status : SUCCESS ===


Ran 104 tests in 11634.840s

OK (SKIP=21)

@nvazquez nvazquez changed the title CLOUDSTACK-9428: Fix for CLOUDSTACK-9211 - Improve performance CLOUDSTACK-9428: Fix for CLOUDSTACK-9211 - Improve performance of 3D GPU support in cloud-plugin-hypervisor-vmware Aug 25, 2016
@nvazquez
Copy link
Contributor Author

Thanks @rafaelweingartner! I pushed new changes

@rafaelweingartner
Copy link
Member

@nvazquez great.
What about squashing the commits now?

LGTM for the code, giving my reviews.

@nvazquez
Copy link
Contributor Author

Done, thanks @rafaelweingartner! I'll start working on adding Marvin tests for this PR as @jburwell suggested.

@nvazquez
Copy link
Contributor Author

nvazquez commented Sep 1, 2016

@jburwell I added marvin test, this are results on VMware env:

[root@ussarlabcsmgt41 cloudstack]# cat /tmp//MarvinLogs/test_deploy_vgpu_enabled_vm_OG63Q9/results.txt
# 1. Register a template for VMware with nicAdapter vmxnet3 and 3D GPU details ... === TestName: test_3d_gpu_support | Status : SUCCESS ===
ok
Test Deploy Virtual Machine ... SKIP: This test case is written specifically                    for XenServer hypervisor

----------------------------------------------------------------------
Ran 2 tests in 739.974s

OK (SKIP=1)

@serg38
Copy link

serg38 commented Sep 9, 2016

@rhtyd @jburwell @swill @koushik-das @rafaelweingartner @wido This PR has enough of everything. Can one of the committers merge it?

@asfgit asfgit merged commit 2de5b0d into apache:master Sep 11, 2016
asfgit pushed a commit that referenced this pull request Sep 11, 2016
CLOUDSTACK-9428: Fix for CLOUDSTACK-9211 - Improve performance of 3D GPU support in cloud-plugin-hypervisor-vmwareJIRA TICKET: https://issues.apache.org/jira/browse/CLOUDSTACK-9428

### Introduction

On #1310 passing vRAM size to support 3D GPU problem was addressed on VMware. It was found out that it could be improved to increase performance by reducing extra API calls, as we'll describe later

### Improvement
On WMware, `VmwareResource` manages execution of `StartCommand.` Before sending power on command to ESXi hypervisor, vm is configured by calling `reconfigVMTask` web method on vSphere's client `VimPortType` web service.
It was found out that we were using this method 2 times when passing vRAM size, as it implied creating a new vm config spec only editing video card specs and making an extra call to `reconfigVMTask.`

We propose reducing the extra web service call by adjusting vm's config spec. This way video card gets properly configured (when passing vRAM size) in the same configure call, increasing performance.

### Use case (passing vRAM size)
* Deploy a new VM, let its id be X
* Stop VM
* Execute SQL, where X is vm's id and Z is vRAM size (in kB):
````
INSERT INTO cloud.user_vm_details (vm_id, name, value) VALUES (X, 'mks.enable3d', 'true');
INSERT INTO cloud.user_vm_details (vm_id, name, value) VALUES (X, 'mks.use3dRenderer', 'automatic');
INSERT INTO cloud.user_vm_details (vm_id, name, value) VALUES (X, 'svga.autodetect', 'false');
INSERT INTO cloud.user_vm_details (vm_id, name, value) VALUES (X, 'svga.vramSize', Z);
````
* Start VM

* pr/1605:
  CLOUDSTACK-9428: Add marvin test
  CLOUDSTACK-9428: Fix for CLOUDSTACK-9211 - Improve performance

Signed-off-by: Rafael Weingärtner <rafael@apache.org>
@rafaelweingartner
Copy link
Member

rafaelweingartner commented Sep 11, 2016

@serg38 merged based on reviews, and tests (marvin and unit ones).

@nvazquez
Copy link
Contributor Author

Thanks @rafaelweingartner!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants