New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unable to run HVM domain with current repo code #1486

Closed
marmarek opened this Issue Dec 4, 2015 · 8 comments

Comments

Projects
None yet
3 participants
@marmarek
Member

marmarek commented Dec 4, 2015

On Thu, Dec 03, 2015 at 01:49:29PM -0800, Eric Shelton wrote:

I know that 3.1-rc1 has not been released, but I have been making use of
what is currently in the repo firm up Skylake support (mainly, getting the
integrated graphics to work).

Although I have not noticed any problems running the PV-based AppVMs, there
is definitely a problem with HVMs. For example, if I take an image for a
Win7 HVM from a 3.0 machine, and try using it, it works initially, but
eventually /var/log/xen/console/guest-win7-dm.log reports:

...
vga s->lfb_addr = f0000000 s->lfb_end = f1000000
packet error
packet error
read error -1 on /local/domain/12/device/vbd/51712 at offset 3364040704,
num bytes 4096
read error -1 on /local/domain/12/device/vbd/51712 at offset 3365826560,
num bytes 4096
...

It then just goes on and on with a bunch of read errors, and soon enough
write errors too. Win 7 does not manage to get past the initial splash
screen.

This happens every time. I think I also encountered it when trying to
install Ubuntu 15.04 to a fresh HVM. I get the same results with both the
3.19.8 and 4.1.13 kernels.

This is about (not yet released) R3.1-rc1.

@marmarek marmarek added this to the Release 3.1 milestone Dec 4, 2015

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Dec 5, 2015

Member

I've seen also errors like:

track_dirty_vram(f0000000, fd5) failed (-1, 12)

This one is in VGA emulation code, but errno=12 is ENOMEM, so it may be related.

Member

marmarek commented Dec 5, 2015

I've seen also errors like:

track_dirty_vram(f0000000, fd5) failed (-1, 12)

This one is in VGA emulation code, but errno=12 is ENOMEM, so it may be related.

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Dec 5, 2015

Member

It looks to be a problem with some driver in stubdomain (blkfront? netfront?). With debugging enabled, I've got:

ASSERTION FAILED: !(!inuse[ref]) at gnttab.c:42.

This is here: http://xenbits.xen.org/gitweb/?p=mini-os.git;a=blob;f=gnttab.c;h=f395d12ab6e18ff60ca4b2dc5933342941d4290e;hb=HEAD#l42
So basically some grant entry is released twice.
There is also stack trace. It leads to netfront.c:127 (network_rx) called from netfront_receive.
There is literally one commit touching that code between 4.4 and 4.6 release:
http://xenbits.xen.org/gitweb/?p=mini-os.git;a=commit;h=7c8f348390652a67e9356eec9cd2b0f76a9c7c72

Member

marmarek commented Dec 5, 2015

It looks to be a problem with some driver in stubdomain (blkfront? netfront?). With debugging enabled, I've got:

ASSERTION FAILED: !(!inuse[ref]) at gnttab.c:42.

This is here: http://xenbits.xen.org/gitweb/?p=mini-os.git;a=blob;f=gnttab.c;h=f395d12ab6e18ff60ca4b2dc5933342941d4290e;hb=HEAD#l42
So basically some grant entry is released twice.
There is also stack trace. It leads to netfront.c:127 (network_rx) called from netfront_receive.
There is literally one commit touching that code between 4.4 and 4.6 release:
http://xenbits.xen.org/gitweb/?p=mini-os.git;a=commit;h=7c8f348390652a67e9356eec9cd2b0f76a9c7c72

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Dec 5, 2015

Member

Reverting this commit fixes the issue.

Member

marmarek commented Dec 5, 2015

Reverting this commit fixes the issue.

@marmarek

This comment has been minimized.

Show comment
Hide comment

marmarek added a commit to marmarek/old-qubes-vmm-xen that referenced this issue Dec 6, 2015

stubdom: revert commit breaking netfront driver in stubdomain
Makes it working for now this way. The issue was reported upstream and
when the proper patch will be released, this one should be replaced.

Fixes QubesOS/qubes-issues#1486
@esheltone

This comment has been minimized.

Show comment
Hide comment
@esheltone

esheltone Dec 15, 2015

Reverting the commit seems to have worked, as I have been using Windows in an HVM ever since R3.1-rc1 got released. Did you want a proper fix accepted upstream before closing this issue, such as what was discussed in the above upstream report?

Reverting the commit seems to have worked, as I have been using Windows in an HVM ever since R3.1-rc1 got released. Did you want a proper fix accepted upstream before closing this issue, such as what was discussed in the above upstream report?

@sarahn

This comment has been minimized.

Show comment
Hide comment
@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Mar 23, 2016

Member

On Wed, Mar 23, 2016 at 02:49:18PM -0700, sarahn wrote:

Can someone else also try the patch from http://lists.xenproject.org/archives/html/xen-devel/2016-03/msg03080.html ?

Do you have any test case for it?

Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

Member

marmarek commented Mar 23, 2016

On Wed, Mar 23, 2016 at 02:49:18PM -0700, sarahn wrote:

Can someone else also try the patch from http://lists.xenproject.org/archives/html/xen-devel/2016-03/msg03080.html ?

Do you have any test case for it?

Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

@sarahn

This comment has been minimized.

Show comment
Hide comment
@sarahn

sarahn Mar 23, 2016

On Wed, Mar 23, 2016 at 02:49:18PM -0700, sarahn wrote:
Can someone else also try the patch from http://lists.xenproject.org/archives/html/xen-devel/2016-03/msg03080.html ?
Do you have any test case for it?

test.c for mini-os doesn't show a difference with/without this patch if that's what you're asking. Currently the netfront thread immediately exits, but even putting in "while (!do_shutdown) schedule();" won't lead to any complaints even if network traffic is received. I'm not sure how to trigger the bug with test.c.

I'm not a qubes user, but this issue report came up for me when searching for the same error messages while booting nested xen.

sarahn commented Mar 23, 2016

On Wed, Mar 23, 2016 at 02:49:18PM -0700, sarahn wrote:
Can someone else also try the patch from http://lists.xenproject.org/archives/html/xen-devel/2016-03/msg03080.html ?
Do you have any test case for it?

test.c for mini-os doesn't show a difference with/without this patch if that's what you're asking. Currently the netfront thread immediately exits, but even putting in "while (!do_shutdown) schedule();" won't lead to any complaints even if network traffic is received. I'm not sure how to trigger the bug with test.c.

I'm not a qubes user, but this issue report came up for me when searching for the same error messages while booting nested xen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment