Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add some cases to check migration over unix socket #3256

Merged
merged 1 commit into from
Feb 24, 2021

Conversation

Yingshun
Copy link
Contributor

@Yingshun Yingshun commented Jan 19, 2021

This PR adds below cases:
1) Migrate vm over unix socket
2) Migrate vm over unix socket - libvirt tunnelled(--tunnelled)
3) Migrate vm over unix socket - enable multifd(--parallel)
4) Migrate vm with copy storage over unix socket - one disk
5) Migrate vm with copy storage over unix socket - multiple disks
6) Abort migration over unix socket and migrate again
7) Abort migration with copy storage over unix socket, and migrate again
8) Migrate vm with copy storage over unix socket - multiple disks
- enable multifd(--parallel)

depends on:
avocado-framework/avocado-vt#2891
avocado-framework/avocado-vt#2892
avocado-framework/avocado-vt#2894
avocado-framework/avocado-vt#2915

Signed-off-by: Yingshun Cui yicui@redhat.com

@Yingshun
Copy link
Contributor Author

Yingshun commented Jan 19, 2021

Test results:

# avocado run --vt-type libvirt --vt-machine-type q35 migrate_over_unix                                   
JOB ID     : fb1ba034f01896defb8590480bf53341b430a812                               
JOB LOG    : /root/avocado/job-results/job-2021-01-21T02.07-fb1ba03/job.log         
 (01/11) type_specific.io-github-autotest-libvirt.virsh.migrate_over_unix.positive_testing.without_copy_storage.migrate_uri.p2p_live_migration.with_postcopy: -          
PASS (140.53 s)
 (02/11) type_specific.io-github-autotest-libvirt.virsh.migrate_over_unix.positive_testing.without_copy_storage.migrate_uri.p2p_live_migration.without_postcopy: \      PASS (143.72 s)
 (03/11) type_specific.io-github-autotest-libvirt.virsh.migrate_over_unix.positive_testing.without_copy_storage.migrate_uri.live_migration.without_postcopy: PASS (145.00 s)
 (04/11) type_specific.io-github-autotest-libvirt.virsh.migrate_over_unix.positive_testing.without_copy_storage.multifd.p2p_live_migration.without_postcopy: PASS (143.13 s)
 (05/11) type_specific.io-github-autotest-libvirt.virsh.migrate_over_unix.positive_testing.without_copy_storage.tunnelled.p2p_live_migration.without_postcopy: PASS (138.48 s)
 (06/11) type_specific.io-github-autotest-libvirt.virsh.migrate_over_unix.positive_testing.with_copy_storage.default.single_disk.p2p_live_migration.without_postcopy: PASS (83.75 s)
 (07/11) type_specific.io-github-autotest-libvirt.virsh.migrate_over_unix.positive_testing.with_copy_storage.default.single_disk.live_migration.without_postcopy: PASS (84.86 s)
 (08/11) type_specific.io-github-autotest-libvirt.virsh.migrate_over_unix.positive_testing.with_copy_storage.default.multi_disks.p2p_live_migration.without_postcopy: PASS (101.48 s)
 (09/11) type_specific.io-github-autotest-libvirt.virsh.migrate_over_unix.positive_testing.with_copy_storage.multifd.multi_disks.p2p_live_migration.without_postcopy: PASS (99.98 s)
 (10/11) type_specific.io-github-autotest-libvirt.virsh.migrate_over_unix.negative_test.domjobabort.normal_migration.p2p_live_migration.without_postcopy: PASS (156.96 s)
 (11/11) type_specific.io-github-autotest-libvirt.virsh.migrate_over_unix.negative_test.domjobabort.storage_migration.p2p_live_migration.without_postcopy: PASS (202.84 s)
RESULTS    : PASS 11 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0
JOB TIME   : 1442.34 s
(.libvirt-ci-venv-ci-runtest-hniqXO) [root@dell-per630-fc-01 ~]# 
(.libvirt-ci-venv-ci-runtest-hniqXO) [root@dell-per630-fc-01 ~]# 
[0] 0:bash*                      

- positive_testing:
status_error = "no"
variants:
- basic:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as per explanaiton:https://libvirt.org/migration.html#scenarionativepeer2peer
-basic change to -Hypervisor-native makes more meaningful.


Network data transports ¶
There are two options for the data transport used during migration, either the hypervisor's own native transport, or tunnelled over a libvirtd connection.

@chunfuwen
Copy link
Contributor

@smitterl , would you mind joining in VM migration related case review ?

@Yingshun Yingshun marked this pull request as draft January 21, 2021 03:10
@Yingshun Yingshun marked this pull request as ready for review January 21, 2021 07:32
@chunfuwen
Copy link
Contributor

chunfuwen commented Jan 22, 2021

@fangge1212 , we may need align with some migration terminology at the beginning point of feature cases design
Here is a good reference :https://libvirt.org/migration.html .
Apparently , we can figure out sereval pairs of terminology from the same scope, be careful not to mess them up :
1)Migration protocol :
-- Hypervisor-native
-- libvirt-tunnel
2)Communication control paths/flow
--direct
--p2p
3) Memory and storage migration
-- live-migration (memory only)
-- block-migration(storage and memory only)

@Yingshun
Copy link
Contributor Author

@fangge1212 , we may need align with some migration terminology at the beginning point of feature cases design
Here is a good reference :https://libvirt.org/migration.html .
Apparently , we can figure out sereval pairs of terminology from the same scope, be careful not to mess them up :
1)Migration protocol :
-- Hypervisor-native
-- libvirt-tunnel
2)Communication control paths/flow
--direct
--p2p
3) Memory and storage migration
-- live-migration (memory only)
-- block-migration(storage and memory only)
@chunfuwen your questions are about test plan, I think it would be great if you email to her directly.
There may be some conversation to discuss them or her may miss github information...

@fangge1212
Copy link
Contributor

@fangge1212 , we may need align with some migration terminology at the beginning point of feature cases design
Here is a good reference :https://libvirt.org/migration.html .
Apparently , we can figure out sereval pairs of terminology from the same scope, be careful not to mess them up :
1)Migration protocol :
-- Hypervisor-native
This corresponds to "without virsh option --tunnel"

-- libvirt-tunnel
This corresponds to "with virsh option --tunnel"

2)Communication control paths/flow
--direct
"virsh option --direct" corresponds to "unmanaged direct", which is not supported by qemu-kvm

--p2p
"virsh option --p2p" corresponds to "managed peer to peer", but we can just call it p2p migration

without either --direct or --p2p:
This corresponds to "managed direct", we don't have to point this out as it is default behaviour

Note:
managed means migration control flow is managed by libvirt.
unmanaged means migration control flow is managed by hypervisor itself.

  1. Memory and storage migration
    -- live-migration (memory only)
    For "migration with memory only", we call it "live migration";
    If --p2p is used, we call it "p2p live migration"
    -- block-migration(storage and memory only)
    For "migration with both storage and memory", we call it "live migration with copy storage"
  1. live/offline/non-live migration
    live migration:
    "live migration" corresponds to "virsh option --live", normally we will use --live when do migration, as apps/services in guest will be affected with live migration(just a short downtime).

non-live migration:
"non-live migration" corresponds to "without virsh option --live", guest will be paused when non-live migration starts and resumed when non-live migration completes, so apps/services in guest will be affected. This is not common usage, we just test it in one single test case. In most cases, we test live migration.

offline migration:
"offline migration" corresponds to "virsh option --offline", it means only guest defination(vm xml) is migrated to dest host(it is equivalent to "define a same vm on dest host").

@dzhengfy dzhengfy self-requested a review January 29, 2021 01:45
:param params: Dictionary with the test parameters
:param env: Dictionary with test environment.
"""
def update_disk(vm, params):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please consider moving helper functions to top level, ie. same level as 'run' function, to a) avoid using variables that are not passed, b) improve readability

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g. update_disk updates a value that's declared further down the code after it's definition while nothing in the functions signature indicates that that value is being updated

assumes list declared in

is already filled with values from

vm_did_list.append(target_dev)

after the function has been called in

the signature being

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for sharing your option! I understand what you mean but in other hand, it's convenient to implement the operations. And we have both of them in our existing code.
Can I say it's a matter of coding style? It looks like we don't have a limitation about it.
There are pluses and minuses to all the solutions, I'm ok with both of them. But if you think it blocks this PR, we'd better to have a discussion about this problem with whole team. How do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chunfuwen What do you think? I think in context of improving code quality we should insist on having non-nested functions.

@Yingshun You have to admit that the signature e.g. of update_disk is wrong - it uses a local variable 'vm_did_list' which is not shown in the signature. I think for consistency if we accept nested run functions at least the signature should either be empty and variables just consumed because they are available to scope, or the signature has to be correct, listing all parameters or accepting a dictionary holding all parameters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dzhengfy @kylazhang @chloerh @kamilvarga Would you mind sharing your ideas?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my point of view it is better to move the helper functions to the top level as suggested by @smitterl especially due to first argument: "a) avoid using variables that are not passed". There might be also impact on readability, but not so significant.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 with kamil and smitterl 's comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smitterl @kamilvarga redefined such lists of images in this function. Would you please help review again?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chunfuwen Would you mind adding some details to the review comments so that we can understand you clearly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smitterl @kamilvarga @chunfuwen , thanks for your comments. Your opinion is valuable and a good point. In the past, we do not have a strict rule for this, so you may see both writings exist in many python files. I will create an issue to specifically discuss about this and hope we can reach an agreement finally, then we will follow the conclusion in the following pull requests.
Regarding this PR, it implemented many critical and high test cases and feature owner is waiting for its run in jenkins, so we decide to merge it now to speed up our tests. Thank you.


remote_session.close()

def check_socket(params):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s. above

disks_uri_port = params.get("disks_uri_port", "44444")
migrate_again = "yes" == params.get("migrate_again")
func_params_exists = "yes" == params.get("func_params_exists", "no")
func_name = params.get("action_during_mig")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency, better name variable same as config parameter (like in all the other cases): action_during_mig = params.get("action_during_mig").

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variant func_name's used in almost all of the scripts under src/migration, so let's keep it. Is it ok?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chunfuwen What do you think?

Copy link
Contributor

@smitterl smitterl Feb 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yingshun To be honest, my preference would be to rather change it in all other occurrences, too. When I read the code further down I don't remember what func_name is intended to be used for. I mean you wouldn't want to name "disks_uri_port" "int_as_string_param_1" either (I admit I'm exaggerating a bit :P). I mean, we already have this in our review guide IIUC:

variable name issue:
     name should be meaningful

https://github.com/autotest/tp-libvirt/blob/master/tp-libvirt_review_comment_summary.rst

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my memory, 'func_name' is used in migrate_options_shared.py as a common name to accommodate different function names in different cases. But in this py file, 'func_name' is used to always read the parameter of 'action_during_mig', so I think maybe we do not need to use a common name any more here. But I think that is only better to change, not a must.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Below are the files we used 'func_name', if we need another name, I prefer to do it separately.
migrate_bandwidth.py
migrate_gluster.py
migrate_mem.py
migrate_network.py
migrate_options_shared.py
migrate_over_unix.py
migrate_storage.py
sriov_migrate.py

libvirt/tests/src/migration/migrate_over_unix.py Outdated Show resolved Hide resolved
libvirt/tests/src/migration/migrate_over_unix.py Outdated Show resolved Hide resolved
vm_session_after_mig.cmd(
"echo mytest >> /mnt/%s1/mytest" % did)
finally:
logging.info("Recover test environment")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is logging this message necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I often check this message when debugging migration jobs so let's keep it.

"test%s.img" % str(idx))
libvirt.create_local_disk("file", disk_format=disk_format,
path=disk_path, size="500M")
target_dev = 'vd' + chr(idx + 96)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using magic number here. Might be worth creating function and moving to avocado-vt

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about changing it to target_dev = 'vd' + chr(idx + ord('a') - 1)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yingshun I believe it's still cryptical and should really live in a reusable function but I wouldn't block approval on it.

libvirt/tests/src/migration/migrate_over_unix.py Outdated Show resolved Hide resolved
disk_path,
new_disk_dict, False)
if result.exit_status:
raise test.fail("Attach device %s failed." % target_dev)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.


local_image_list.append(disk_path)

disk_cmd = ("qemu-img create -f %s %s 500M" %
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to consider creating an analog to libvirt.create_local_disk that you used above - e.g. libvirt.create_remote_disk. This would improve readability and helps avoid duplication.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I know, only storage migration cases need to create disks on remote host. and we have some functions to execute command lines on a remote host, such as utils_misc.cmd_status_output(), remote.run_remote_cmd() and session.cmd/cmd_status_output() and so on. So I just generate a command line and send it to utils_misc.cmd_status_output(). We may run different commands on a remote host and if they're not quite common, I'd rather call the above functions than define each one individually. Can you please tell me what you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yingshun I'm sorry, I'm not sure I understand. Not having the _cmds but rather an equivalent create_remote_disk or more general create_disk would make your code more readable besides being useful. Something that reads like

libvirt.create_disk("file", disk_format=disk_format,
                             path=disk_path, size="500M", session=session)

This way you can also use it in a local guest or even in a remote guest. For safe refactoring, libvirt.create_local_disk could either delegate to create_disk - the code would have to run cmd_status_output on session and process.run or process.getstatusoutput for local access without session - or create_disk delegates simply to create_local_disk if session==None.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, please check avocado-framework/avocado-vt#2915.

@Yingshun Yingshun force-pushed the migration_unix branch 2 times, most recently from eefb6a2 to 7434248 Compare February 2, 2021 09:34
@Yingshun
Copy link
Contributor Author

Yingshun commented Feb 2, 2021

@smitterl I've updated the code, please help review again. Thanks!

Copy link
Contributor

@smitterl smitterl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Yingshun I mainly have some more questions. Everything is relative and I agree that having a discussion helps to understand what expectations there are to the test code.

@Yingshun Yingshun force-pushed the migration_unix branch 2 times, most recently from fe538ed to 9e241e4 Compare February 5, 2021 01:23
disk_path,
new_disk_dict, False)
if result.exit_status:
raise test.error("Attach device %s failed." % target_dev)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please include detailed error message for better debugging.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

if result.exit_status:
raise test.error("Attach device %s failed." % target_dev)

local_image_list.append(disk_path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To ensure this image can be removed in most cases, we'd better put this line after line 68.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you're right, updated.


def check_socket(params):
"""
Update interfaces for guest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct the docstring , please

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

disks_uri_port = params.get("disks_uri_port", "44444")
migrate_again = "yes" == params.get("migrate_again")
func_params_exists = "yes" == params.get("func_params_exists", "no")
func_name = params.get("action_during_mig")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my memory, 'func_name' is used in migrate_options_shared.py as a common name to accommodate different function names in different cases. But in this py file, 'func_name' is used to always read the parameter of 'action_during_mig', so I think maybe we do not need to use a common name any more here. But I think that is only better to change, not a must.

This PR adds below cases:
1) Migrate vm over unix socket
2) Migrate vm over unix socket - libvirt tunnelled(--tunnelled)
3) Migrate vm over unix socket - enable multifd(--parallel)
4) Migrate vm with copy storage over unix socket - one disk
5) Migrate vm with copy storage over unix socket - multiple disks
6) Abort migration over unix socket and migrate again
7) Abort migration with copy storage over unix socket, and migrate again
8) Migrate vm with copy storage over unix socket - multiple disks
    - enable multifd(--parallel)

Signed-off-by: Yingshun Cui <yicui@redhat.com>
mig_result = None
local_image_list = []
remote_image_list = []
vm_did_list = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Above assignments seems useless because in Line 193, they will be reassigned by update_disk() function which already declares local variables with same names. Maybe you can set them to None or pass them to update_disk(), I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If an exception is raised before calling update_disk() there will be a problem. And local_image_list and remote_image_list will be checked in 'finally' block as below:

for source_file in local_image_list:
libvirt.delete_local_disk("file", path=source_file)
for img in remote_image_list:
remote.run_remote_cmd("rm -rf %s" % img, params)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is ok regarding your words. But I do not think it is good practice. However, it is not a must to change.

mig_result = None
local_image_list = []
remote_image_list = []
vm_did_list = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is ok regarding your words. But I do not think it is good practice. However, it is not a must to change.

Copy link
Contributor

@chunfuwen chunfuwen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
critical_case depend on The PR has dependency on other PRs high_case
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants