-
Notifications
You must be signed in to change notification settings - Fork 425
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrate docker container from ubuntu 20.04 to 22.04 #5877
Conversation
@rveerama1 Thank you for looking into this. I am glad to see that you are learning from previous attempts. In general, I'd suggest you focus on x86_64 worker first, and gradually move to other workers, say aarch64, amd, windows workers. On the other hand, I believe most of the debugging work can be done offline, either on your own machine or a equivalent Azure VM instance. I'd only use the CI pipeline to validate once a certain worker is working locally. |
Instead of using this PR, let's have the discussion and track the process in the issue #5878. |
Sure. |
some progress with the tests just migrating ubuntu from 20.04 to 22.04 and keeping the
In previous attempts
|
Regarding the build failure on AArch64, it was also seen in #5487. There is an issue rust-lang/rust#89626 in rust-lang community tracking the problem. But seems it has not been resolved. Some workarounds were mentioned among the discussion. |
Yes, I noticed it.
Let's see if we can find a good workaround. |
@rveerama1, for the AArch64 build error, can you try updating the
I can reproduce the error, the change works. |
Done, Thank you. |
some update
Some guess from @likebreath , "I think the reason why the test_vfio was failing is Cloud Hypervisor binary compiled on new container image Ubuntu 22.04 has different dynamic binaries from what's provided from Ubuntu 20.04 (the focal guest image)." I modified test_vfio to use Jammy image (Ubuntu 22.04) and tests are passing.
Now some tests are failing
|
No new changes, just rebased against main. |
@rveerama1 Index branching will run the CI from time to time even if the PR is not updated and as a draft, and it will waste resources. Let's keep the PR closed unless you want to run the CI. Please reopen as needed. Thank you. |
Of course, you will still have the CI log history as a reference for debugging offline: https://cloud-hypervisor-jenkins.westus.cloudapp.azure.com/blue/organizations/jenkins/cloud-hypervisor/activity/?branch=PR-5877 |
Noticed many changes related to integrations tests recently. Thought of checking, will that introduce new issues or solve something on all workers. Anyway I am looking into |
Need some help to investigating it further. So far
It was struck at https://github.com/cloud-hypervisor/cloud-hypervisor/blob/main/vmm/src/lib.rs#L1721 : vm.start_dirty_log() and never returned.
it never returned from there. Don't know exactly what cloud be reason on Ubuntu22.04 it stuck over there. some sample logs from Ubuntu 20.04
It does proceed further with dirty memory migration. Any help/suggestion/insights @likebreath @rbradford @sboeuf ? |
Marking as WIP doesn't trigger CI. |
@rveerama1 Good progress. It is now clear that the problem is from migrating vhost_user device (that is using a ovs-dpdk backend) while sending request Note that we recently upgraded vhost crate version. So please rebase before further debugging. |
Ok, I will check and update. |
|
d533d47
to
42247d1
Compare
ARM worker fixed as well. |
42247d1
to
498b78a
Compare
Disabled those two tests. Now CI looks fine. |
498b78a
to
2ecf516
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will need to run change more times to see how stable our tests are on the migrated docker container. Also, we will need to test bare-metal workers.
Since we are cutting a new release tomorrow. Let's do these tests after the release. Marking this PR as DNM
to avoid unnecessary CI runs before that.
2ecf516
to
0976355
Compare
it seems spdk needs to rebuild every time?
It was picking right shared libraries of libjson-c.so.5 and initialization of nvmf_tgt. But now container picking again old libs. |
So for clean SPDK builds, it works properly. |
7ee76b4
to
0976355
Compare
|
aecfa5e
to
2d90980
Compare
2d90980
to
a824f5b
Compare
The following tests have been temporarily disabled: 1. Live upgrade/migration test with ovs-dpdk (cloud-hypervisor#5532); 2. Disk hotplug tests on windows guests (cloud-hypervisor#6037); This patch has been tested with PR cloud-hypervisor#6048. Signed-off-by: Ravi kumar Veeramally <ravikumar.veeramally@intel.com> Signed-off-by: Michael Zhao <michael.zhao@arm.com> Tested-by: Bo Chen <chen.bo@intel.com>
a824f5b
to
de5492e
Compare
This patch has been tested with #6048. Details: #6048 (comment). I updated the container tag and added details about the lists being disabled in the commit message. I think we are ready to land this PR. Thank you for the good work @rveerama1. |
Migration attempt has been done few times already in following PRs. #5072, #5449, #5456, #5487 and dropped due to various reasons. One main reason was failure of the test named
live_migration::live_migration_sequential::test_live_migration_ovs_dpdk
#5532.After analyzing those PRs I would like to approach this issue step by step.
@likebreath @rbradford @michael2012z Any comments or suggestions?