Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keep iotune section in the VM's XML after live migration #3171

Conversation

@GabrielBrascher
Copy link
Member

GabrielBrascher commented Feb 8, 2019

Description

When live migrating a KVM VM among local storages, the VM loses the
<iotune> ... </iotune> section on its XML, therefore, having no IO limitations.

 <iotune>
    <read_iops_sec>5000</read_iops_sec>
    <write_iops_sec>5000</write_iops_sec>
</iotune>

This commit removes the piece of code that deletes the <iotune> ... </iotune> section in the XML.

@rhtyd @mike-tutkowski @wido @DaanHoogland @nvazquez @kiwiflyer @rafaelweingartner do any of you guys know a reason for keeping the conditional that I removed? It looks like it can be removed without breaking anything.

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)

Screenshots (if appropriate):

How Has This Been Tested?

Bug:

  • Start a VM running on KVM, with local storage and service offering limiting the VM's IOPS;
  • verify that the VM's XML file contains <iotune> ... </iotune>;
  • live migrate to another host (with local storage);
  • verify that the <iotune> ... </iotune> section has been removed in the VM's XML;

With the fix:

  • Start a VM running on KVM, with local storage and service offering limiting the VM's IOPS;
  • verify that the VM's XML file contains <iotune> ... </iotune>;
  • live migrate to another host (with local storage);
  • verify that the <iotune> ... </iotune> section stayed in the VM's XML

@GabrielBrascher GabrielBrascher added this to the 4.12.0.0 milestone Feb 8, 2019

@GabrielBrascher GabrielBrascher self-assigned this Feb 8, 2019

@GabrielBrascher

This comment has been minimized.

Copy link
Member Author

GabrielBrascher commented Feb 11, 2019

@blueorangutan package

@blueorangutan

This comment has been minimized.

Copy link

blueorangutan commented Feb 11, 2019

@GabrielBrascher a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan

This comment has been minimized.

Copy link

blueorangutan commented Feb 11, 2019

Packaging result: ✔centos6 ✔centos7 ✔debian. JID-2603

@GabrielBrascher

This comment has been minimized.

Copy link
Member Author

GabrielBrascher commented Feb 11, 2019

@blueorangutan

This comment has been minimized.

Copy link

blueorangutan commented Feb 11, 2019

@GabrielBrascher a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@wido wido self-requested a review Feb 11, 2019

@wido

wido approved these changes Feb 11, 2019

Copy link
Contributor

wido left a comment

LGTM

I still don't see that when using local storage (path is set) we remove this section.

@ustcweizhou

This comment has been minimized.

Copy link
Contributor

ustcweizhou commented Feb 11, 2019

code LGTM

@blueorangutan

This comment has been minimized.

Copy link

blueorangutan commented Feb 11, 2019

Trillian test result (tid-3397)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 32920 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr3171-t3397-kvm-centos7.zip
Intermittent failure detected: /marvin/tests/smoke/test_accounts.py
Intermittent failure detected: /marvin/tests/smoke/test_iso.py
Intermittent failure detected: /marvin/tests/smoke/test_templates.py
Intermittent failure detected: /marvin/tests/smoke/test_volumes.py
Intermittent failure detected: /marvin/tests/smoke/test_host_maintenance.py
Smoke tests completed. 66 look OK, 4 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File
ContextSuite context=TestTemplateHierarchy>:setup Error 1519.22 test_accounts.py
test_04_extract_Iso Failure 1.09 test_iso.py
test_04_extract_template Failure 1.08 test_templates.py
test_06_download_detached_volume Failure 11.46 test_volumes.py
@GabrielBrascher

This comment has been minimized.

Copy link
Member Author

GabrielBrascher commented Feb 11, 2019

@blueorangutan package

@blueorangutan

This comment has been minimized.

Copy link

blueorangutan commented Feb 11, 2019

@GabrielBrascher a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan

This comment has been minimized.

Copy link

blueorangutan commented Feb 11, 2019

Packaging result: ✔centos6 ✔centos7 ✔debian. JID-2604

@GabrielBrascher

This comment has been minimized.

Copy link
Member Author

GabrielBrascher commented Feb 11, 2019

@blueorangutan

This comment has been minimized.

Copy link

blueorangutan commented Feb 11, 2019

@GabrielBrascher a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

Keep iotune section in the VM's XML after live migration
When live migrating a KVM VM among local storages, the VM loses the
<iotune> section on its XML, therefore, having no IO limitations.

This commit removes the piece of code that deletes the <iotune> section
in the XML.

@GabrielBrascher GabrielBrascher force-pushed the PCextreme:iotune-removed-from-xml-on-kvm-migration branch from 8439b0b to 3bd8202 Feb 11, 2019

wido and others added some commits Feb 11, 2019

Add test for replaceStorage in LibvirtMigrateCommandWrapper
Signed-off-by: Wido den Hollander <wido@widodh.nl>
@blueorangutan

This comment has been minimized.

Copy link

blueorangutan commented Feb 11, 2019

Trillian test result (tid-3398)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 31980 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr3171-t3398-kvm-centos7.zip
Intermittent failure detected: /marvin/tests/smoke/test_accounts.py
Intermittent failure detected: /marvin/tests/smoke/test_iso.py
Intermittent failure detected: /marvin/tests/smoke/test_templates.py
Intermittent failure detected: /marvin/tests/smoke/test_volumes.py
Intermittent failure detected: /marvin/tests/smoke/test_vpc_redundant.py
Intermittent failure detected: /marvin/tests/smoke/test_vpc_vpn.py
Intermittent failure detected: /marvin/tests/smoke/test_host_maintenance.py
Smoke tests completed. 64 look OK, 6 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File
ContextSuite context=TestTemplateHierarchy>:setup Error 1519.95 test_accounts.py
test_04_extract_Iso Failure 1.11 test_iso.py
test_04_extract_template Failure 1.09 test_templates.py
test_06_download_detached_volume Failure 10.42 test_volumes.py
test_04_rvpc_network_garbage_collector_nics Failure 274.57 test_vpc_redundant.py
test_02_cancel_host_maintenace_with_migration_jobs Error 4.39 test_host_maintenance.py
@GabrielBrascher

This comment has been minimized.

Copy link
Member Author

GabrielBrascher commented Feb 11, 2019

@blueorangutan package

@blueorangutan

This comment has been minimized.

Copy link

blueorangutan commented Feb 11, 2019

@GabrielBrascher a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan

This comment has been minimized.

Copy link

blueorangutan commented Feb 11, 2019

Packaging result: ✔centos6 ✔centos7 ✔debian. JID-2607

@GabrielBrascher

This comment has been minimized.

Copy link
Member Author

GabrielBrascher commented Feb 11, 2019

@blueorangutan

This comment has been minimized.

Copy link

blueorangutan commented Feb 11, 2019

@GabrielBrascher a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@blueorangutan

This comment has been minimized.

Copy link

blueorangutan commented Feb 12, 2019

Trillian test result (tid-3402)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 28145 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr3171-t3402-kvm-centos7.zip
Intermittent failure detected: /marvin/tests/smoke/test_accounts.py
Intermittent failure detected: /marvin/tests/smoke/test_iso.py
Intermittent failure detected: /marvin/tests/smoke/test_templates.py
Intermittent failure detected: /marvin/tests/smoke/test_volumes.py
Smoke tests completed. 66 look OK, 4 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File
ContextSuite context=TestTemplateHierarchy>:setup Error 1517.14 test_accounts.py
test_04_extract_Iso Failure 1.05 test_iso.py
test_04_extract_template Failure 1.05 test_templates.py
test_06_download_detached_volume Failure 10.29 test_volumes.py
* <li>The value of the 'type' of the driver of the disk (ex. qcow2, raw)
* <li>The source of the disk needs an attribute that is either 'file' or 'dev' as well as its corresponding value.
* </ul>
*/

This comment has been minimized.

Copy link
@DaanHoogland

DaanHoogland Feb 12, 2019

Contributor

the code-change makes sense but this javadoc doesn't seem to match the code below. Do these two references to 'type' in any way have to do with the element 'auth'? I can see an explanation of 'source' but there are two 'type's, I think only the 'driver' version is valid, no?

This comment has been minimized.

Copy link
@GabrielBrascher

GabrielBrascher Feb 12, 2019

Author Member

Do these two references to 'type' in any way have to do with the element 'auth'?

No, they are related to the disk 'type' and the driver 'type' only.

This documentation might be a bit confusing, I just formatted from a comment to a Javadoc. However, it matches the code when describing a 'type' for the disk diskNodeAttributes.getNamedItem("type") and another 'type' for the driver driverNodeAttributes.getNamedItem("type").

An XML for a disk with the type file and with the driver qcow2 would be structured as follows:

<disk type='file' device='disk'>
      <driver name='qemu' type='qcow2' cache='none'/>
      ...
</disk>

The element 'auth' is related with the disk in case of the VM volume being stored in a ceph cluster; however the 'auth' does not have a 'type' in its section. Having the auth section as follows:

<disk type='network' device='disk'>
    <driver name='qemu' type='raw' cache='none'/>
    <auth username='username'>
        <secret type='ceph' uuid='ab123bbe-e1a0-3911-9928-3154566ca1c7'/>
    </auth>
    ...
</disk>

Thanks for reviewing @DaanHoogland. Did I answer your question?

This comment has been minimized.

Copy link
@DaanHoogland

DaanHoogland Feb 12, 2019

Contributor

So is it correct that the 'auth' element should be abandoned?

This comment has been minimized.

Copy link
@GabrielBrascher

GabrielBrascher Feb 12, 2019

Author Member

That is a good question @DaanHoogland. As the migration of VMs with volume in ceph does not happen in this execution flow, so far we have not experienced problems with that conditional. However, I do not know why that conditional is there as well.

This comment has been minimized.

Copy link
@DaanHoogland

DaanHoogland Feb 12, 2019

Contributor

ok, i would expect it in the javadoc but am aware this might expand the scope of the PR ;)

@GabrielBrascher

This comment has been minimized.

Copy link
Member Author

GabrielBrascher commented Feb 12, 2019

@blueorangutan package

@blueorangutan

This comment has been minimized.

Copy link

blueorangutan commented Feb 12, 2019

@GabrielBrascher a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan

This comment has been minimized.

Copy link

blueorangutan commented Feb 12, 2019

Packaging result: ✔centos6 ✔centos7 ✔debian. JID-2608

@GabrielBrascher

This comment has been minimized.

Copy link
Member Author

GabrielBrascher commented Feb 12, 2019

@blueorangutan

This comment has been minimized.

Copy link

blueorangutan commented Feb 12, 2019

@GabrielBrascher a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@wido

This comment has been minimized.

Copy link
Contributor

wido commented Feb 12, 2019

I just tested this PR again and it works like expected.

After migration the XML contains:

virsh dumpxml i-2-1890-VM

    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2' cache='none' discard='unmap'/>
      <source file='/var/lib/libvirt/images/26ccd31c-4c68-4749-95ef-7f526e7232ba'/>
      <backingStore/>
      <target dev='sda' bus='scsi'/>
      <iotune>
        <write_iops_sec>500</write_iops_sec>
        <write_iops_sec_max>5000</write_iops_sec_max>
        <write_iops_sec_max_length>60</write_iops_sec_max_length>
      </iotune>
      <serial>8e86e7dd8ccc4f7fb0cb</serial>
      <alias name='scsi0-0-0-0'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>

screenshot from 2019-02-12 15-48-12

@blueorangutan

This comment has been minimized.

Copy link

blueorangutan commented Feb 12, 2019

Trillian test result (tid-3403)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 30343 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr3171-t3403-kvm-centos7.zip
Intermittent failure detected: /marvin/tests/smoke/test_accounts.py
Intermittent failure detected: /marvin/tests/smoke/test_iso.py
Intermittent failure detected: /marvin/tests/smoke/test_templates.py
Intermittent failure detected: /marvin/tests/smoke/test_volumes.py
Intermittent failure detected: /marvin/tests/smoke/test_vpc_vpn.py
Smoke tests completed. 66 look OK, 4 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File
ContextSuite context=TestTemplateHierarchy>:setup Error 1519.67 test_accounts.py
test_04_extract_Iso Failure 1.11 test_iso.py
test_04_extract_template Failure 1.07 test_templates.py
test_06_download_detached_volume Failure 9.34 test_volumes.py
@GabrielBrascher

This comment has been minimized.

Copy link
Member Author

GabrielBrascher commented Feb 13, 2019

Considering the 3 LGTMs, that all checks have been passed, and that this PR is a blocker for 4.12 RC2. I am merging it. Trillian failed tests do not look related to this implementation; additionally, #3173 has the same failures, which are clearly not caused by that PR as well.

@GabrielBrascher GabrielBrascher merged commit 709845f into apache:master Feb 13, 2019

2 checks passed

Jenkins This pull request looks good
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

kiwiflyer added a commit to myENA/cloudstack that referenced this pull request Feb 20, 2019

Merge pull request #40 from apache/master
Keep iotune section in the VM's XML after live migration (apache#3171)

kiwiflyer added a commit to myENA/cloudstack that referenced this pull request Feb 20, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.