-
Notifications
You must be signed in to change notification settings - Fork 239
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vt_console:Skip waiting screen when login via console #3948
Conversation
51ee139
to
c611112
Compare
To skip the UEFI guest reset process and prevent the forever waiting Signed-off-by: Haijiao Zhao <haizhao@redhat.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
if not is_responsive: | ||
raise ConsoleNotResponsiveError("Console is not responsive.") | ||
self._console.read_nonblocking() | ||
latest_output = "\n".join(self._console.get_output().splitlines()[-50:]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here , we get output starting from [-50:], can we ensure it is the same in different arches, such as s390x , arm etc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. Thank you for helping check this on s390. I'll get help to check on arm too.
@smitterl Can you help check this from s390x perspective? |
Run arm acceptance test, got no failure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
IIUC this is trying to handle a very specific UEFI boot up screen that is only shown on first boot resp. when there's no nvram file. If so, in #3849 the proposed solution is to handle that file. Would this also be helpful here? @chloerh It is used to avoid the system reset in these tests: autotest/tp-libvirt#5465 |
if reset_str in latest_output: | ||
time.sleep(1) | ||
continue | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chloerh Sorry I don't understand what you are achieving here. IIUC, the code at login reads the last lines in the console output and waits as long as the UEFI system reset string is shown? When is this else branch relevant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic is:
line 87- 89
first check the latest output, if 'reset' is currently on the screen, we'll redo the loop after sleep(1) aiming to skip the 'reset' screen.
line 90-101
After we cannot find 'reset' in the latest output, it means:
- the 'reset' has passed
- or it never appeared
if it's the 1st: we will stop the loop and try the login on line 103
if it's the 2nd: we're not sure if the 'reset' is going to appear or not. But we know that if it will appear, it will be at the beginning part of the output, so I set the limit to 500 lines. If no 'reset' after 500 lines, the 'reset' will not appear at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much for explaining. I have run some tests on s390x to confirm there's no negative impact for our systems that don't use that system reset.
May I ask why the current implementation can't be used to wait for the message to disappear?
This looks like quite a specific solution that only applies to UEFI on x86_64 with a specific (qemu?)firmware. I wonder if there could be a general solution that would allow for waiting for any message to disappear first (make the line configurable). However, from #3948 (comment) I understand you're considering an alternative solution, so I'll wait for the outcome of that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@smitterl Thanks so much for checking this pr with 390x tests!
Actually, the current implementation was added to workaround the same issue 2523255
Unfortunately, it reduced the number of failed cases but not to 0. Therefore we made another workaround to extend internal timeout of login (10s to 20s) which was configured in our ci.
But lately we found some login timeout errors and they could be workaround by time.sleep(x). I have even merged one with passt tests, but I don't think after 2 ways of workaround, adding a 3rd workaround is the best way to deal with this issue.
This looks like quite a specific solution that only applies to UEFI on x86_64 with a specific (qemu?)firmware.
About this, I'm not sure about that, but the 'reset' happens quite often after we do xml.sync(). Do you mean that it doesn't happen to other archs? If so, I could check with arm qes to get the info.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the reset should happen only on x86_64 AFAIK but it could make the code clearer and reusable in case other setups will have a similar timeout, I don't know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found that filesystem_device_unprivileged.two_fs..two_guests
test cases reproduce the 'infinite wait for login' with reset message on UEFI guests. But with this patch they pass. I don't really understand how but I will approve this request.
JOB ID : fbb46d18520a1a07a1fc72907fcc27e9f9b63e5b
JOB LOG : /var/log/avocado/job-results/job-2024-07-24T08.08-fbb46d1/job.log
(1/1) type_specific.io-github-autotest-libvirt.virtual_devices.filesystem_device_unprivileged.two_fs.with_shm.two_guests.coldplug: PASS (109.02 s)
RESULTS : PASS 1 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0
JOB HTML : /var/log/avocado/job-results/job-2024-07-24T08.08-fbb46d1/results.html
JOB TIME : 110.63 s
From s390x perspective: The reset string won't be shown, so I am worried that taking away the |
I found it could be helpful to the problem I'm trying to fix with the current pr. The problem is what's being described in the comment below. I also think we need to have a discussion with the team before we make the change to Thank you for bringing this up, I believe this could solve from the root.
|
Moreover, xml.sync() is commonly and widely used in many job features, such as virtual disk , detach and attach related cases. Therefore, since this is specific issue, so I am still concerned about necessity of this PR change. |
Additionally, Can we consider to address this issue in specific tp-libvirt file? such as autotest/tp-libvirt#5465. If reset/reboot happen in some specific cases, we need reconsider whether the test case manipulate VM life cycle in correct way. |
|
vmxml.sync() support pass in "--keep-nvram", did we ever try to use vmxml.sync("--keep-nvram") ? |
Yes, we're going to set --keep-nvram as default option, which is another issue to be discussed. |
Why this issue appear to our testing? Most possibility is that we rudely virsh undefine --nram firstly, then define VM wth the same image. This is not correct way from user's perspective. |
Althrough after we use --keep-nvram in sync(), we will not happen to "reset" message in most cases when booting the guest, current PR still can solve some specific cases when --nvram is used which leads to "reset" message and this PR will not impact --keep-nvram cases. So I will merge it and see if any improvement for cases' stability. |
To skip the UEFI guest reset process and prevent the forever waiting
Test with previously failed cases: