New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nixos/lib/test-driver: use QMP API to watch for VM state #257535
Conversation
8b4c520
to
f8f646c
Compare
Triggering eval after it was fixed on master. @ofborg eval |
f8f646c
to
65275f9
Compare
@tfc I would like to get this merged as-is as any downstream consumer can start building things on the top of that. Once we understand more how we want to drive this, we can incrementally improve the new APIs. I also want to discuss the new APIs with @nikstur and @ElvishJerricco. |
Thank you for the in-depth review @tfc ! Will address all of this. |
f4e1371
to
9c749b7
Compare
@ofborg test login |
I think I fucked up and didn't rebase everything so will rerun the test
Le ven. 20 oct. 2023, 13:39, Jacek Galowicz ***@***.***> a
écrit :
… @ofborg <https://github.com/ofborg> test login
—
Reply to this email directly, view it on GitHub
<#257535 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACMZRE5UFDM2LT7V75VTGDYAJWHBAVCNFSM6AAAAAA5IQ3VSCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONZSGY3DOMRZG4>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Now that we have a QMP client, we can wire it up in the test driver. For now, it is almost completely useless because of the need of a constant "event loop", especially for event listening. In the next commits, we will slowly enable more and more usecases.
9c749b7
to
f94876a
Compare
@ofborg test login |
@RaitoBezarius, this PR introduced regression on non-NixOS machines!
(Sorry, the Traceback is not exactly from this test run, but the messages are the same) |
@tfc, regardless of the problem cause on non-NixOS machines, I think it would be nice to have an option to disable QMP. |
It would be nice to know the root cause though of the failure on non-NixOS machine, that's not normal. |
Agree. But I confirmed this behavior on 2 non-NixOS machines already. It's currently a blocker for our team, since we depend on running the test driver, passing arguments to it in non-interactive mode. @RaitoBezarius, I should've noted that driverInteractive works just fine on non-NixOS: |
Understandable, I guess you can revert this commit in the meantime but we really need a root cause analysis or more logs on what's going on. I don't have non NixOS systems that can run a VM test so... Unfortunately I cannot reproduce anything you sent so far. It'd be helpful to have full systems details like OS, presence of sandbox, etc. What nix-info gives you. |
@AleXoundOS I am unable to reproduce your problem on Debian 12 using |
Tested on Fedora and it works too! The problem is indeed specific to Arch Linux. Both machines which fail are Arch Linux in my case. |
(I just arrived at this PR after brief confusion, so I'm going to leave this here for posterity.) When qemu fails to run, e.g. because you specified fatally wrong
|
Description of changes
Following nix-community/lanzaboote#213
I realized that our test framework is terrible when it comes to boot-level crashes or hangs.
This is because the framework has no way to distinguish between a rightful long computation which shows random panic on the screen and an actual boot-level panic that prevent any further movement.
Who is responsible for knowing this? Computers usually have a concept of POST codes, which are there to communicate some code that says the current state of the machine, e.g. panicked, running, shutdown, memory failures, etc.
In VMs, there's no reason we could not have the same and even better.
This is what we attempt to bring by wiring up the QMP API which is the rich API that QEMU has internally and can keep us in touch with the current VM state: https://www.qemu.org/docs/master/interop/qemu-qmp-ref.html#qapidoc-81.
The only challenge of this PR is that the QEMU QMP API is inherently asynchronous, and our code is very synchronous. No problem, we will:
The aim of this PR is to enable enough technology to showcase a fix of the mentioned issue in the first place.
Things done
sandbox = true
set innix.conf
? (See Nix manual)nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD"
. Note: all changes have to be committed, also see nixpkgs-review usage./result/bin/
)