Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dosemu2 legacy hang with ubuntu 18.04 kernel 4.15.0-136 #1404

Closed
bpranoto opened this issue Feb 26, 2021 · 51 comments
Closed

dosemu2 legacy hang with ubuntu 18.04 kernel 4.15.0-136 #1404

bpranoto opened this issue Feb 26, 2021 · 51 comments
Labels
kvm kvm-related problems

Comments

@bpranoto
Copy link

Describe the bug
I think it is not dosemu2 bug, but I think it's better to report here

After kernel upgrade to 4.15.0-136, dosemu2-legacy hang
Ubuntu 18.04 64bits, it also happens on ubuntu 16.04 using the same kernel version

Run fine after downgrade to kernel previous version 4.15.0-135

To Reproduce
If you use ubuntu with kernel version 4.15, upgrade to latest kernel at this time (4.15.0-136).
Run dosemu2, dosemu2 appears but hang afterward.
boot.log looks normal.

Attach the log
It is located in ~/.dosemu/boot.log
To make the log more useful, you may need to enable some logging flags.
See description of -D option in man dosemu.bin.

A regression?
No.
boot.log

@stsp
Copy link
Member

stsp commented Feb 26, 2021

Does $_cpu_vm="emulated" fixes it?

@bpranoto
Copy link
Author

With the settings
$_cpu_vm="emulated"
$_cpu_vm_dpmi = "auto"
dosemu comes up, but the log shows a lot of error.

Here is the log with -De
boot.log

@stsp
Copy link
Member

stsp commented Feb 26, 2021

Please fill another ticket with a test-case
for these error msgs.

As for the hang - please attach the -D9+gQDdi# log.

@bpranoto
Copy link
Author

freezing log with -D9+gQDdi#
boot.log

@stsp
Copy link
Member

stsp commented Feb 27, 2021

OK, what I see is that it goes to KVM and
never returns. I added more logging in case
the loop is inside dosemu's kvm impl, which
is unlikely.
Anyway, this kernel is too old.
How about trying 5.x?

In a mean time, dosemu's KVM impl is
quirky: it uses ring0 "monitor" to save registers
on interrupt. It should instead use kvm_sync_regs.
Maybe that will avoid most of kvm pitfalls.

@bpranoto
Copy link
Author

kernel 5.x is fine, we have some computers with it at work.

kernel 4.15 also is usually fine. It only happens to Feb 25,2021 update

I guess one of the patches here:https://ubuntu.com/security/notices/USN-4749-1 is the cause. Some of them are related with the VM thing

@stsp
Copy link
Member

stsp commented Feb 27, 2021

kernel 5.x is fine, we have some computers with it at work.

In this case please fill that to ubuntu.

ome of them are related with the VM thing

They only seem to mention xen.
Not sure if that's related.

@bpranoto
Copy link
Author

Just reported the bug to ubuntu:

https://bugs.launchpad.net/ubuntu/+bug/1917138

@haenschen
Copy link

I also am experiencing this since yesterday on various 16.04. LTS systems, I added onto that report. Thanks for finding @bpranoto!

@bpranoto
Copy link
Author

@haenschen , you're welcome, franky I am pessimistic it will get attention from the ubuntu developers..

@stsp
Copy link
Member

stsp commented Feb 28, 2021

In #1408 I reworked the kvm code.
Please see if that helps.
It doesn't mean something was fixed.
Just different code can probably miss
the bug you see here.

@stsp stsp added the kvm kvm-related problems label Feb 28, 2021
@bpranoto
Copy link
Author

It also failed. It comes up, show up initial screen, then freezed

image

Besides that, I notice that the setting $_X_font = "vga11x19" was ignored.

On second try, it ran better, the autoexec is executed but then it freezed again. Here is the screenshot on second try:
image

The boot.log for the 2nd run is too big 92.5 MB, so I zip it.
bootlog.zip

@bpranoto
Copy link
Author

Sorry, the 2nd screenshot was not captured correctly, I ran again, but it ran like the first time.

@stsp
Copy link
Member

stsp commented Feb 28, 2021

There is no evidence in the log that
its the code from #1408.
I'll think how to make builds more descriptive.

@stsp
Copy link
Member

stsp commented Feb 28, 2021

I think you cheated.
I tried the local deb build and it identifies
itself properly.
So I really suspect you haven't tried #1408

@bpranoto
Copy link
Author

bpranoto commented Mar 1, 2021

Oh, I am sorry, I see that #1408 is a pull request. I didn't pay attention to the number 14.08 yesterday. And I don't know how to download a pull request.

Besides that, I no longer can compile dosemu in my laptop due to the udev library version of ubuntu 18.04 doesn't meet the minimal requirement to compile.

@bpranoto
Copy link
Author

bpranoto commented Mar 1, 2021 via email

@stsp
Copy link
Member

stsp commented Mar 1, 2021

Try debuild -i -us -uc -b -d which IIRC
would ignore unsatisfied build deps.

@bpranoto
Copy link
Author

bpranoto commented Mar 1, 2021

There is error on make deb:

/home/bambang/master/dosemu2/dosemu2/build/../src/base/emu-i386/kvm.c:990:28: error: ‘KVM_SYNC_X86_REGS’ undeclared (first use in this function); did you mean ‘KVM_SET_ONE_REG’?
     run->kvm_dirty_regs |= KVM_SYNC_X86_REGS;
                            ^~~~~~~~~~~~~~~~~
                            KVM_SET_ONE_REG

Here is what I did from my bash history, I started from a fresh git clone

2041  rm -rf dosemu2
 2042  git clone https://github.com/dosemu2/dosemu2.git
 2043  cd dosemu2
 2044  git branch
 2045  git branch -r
 2046  git fetch
 2047  git branch
 2048  git checkout
 2049  git checkout kvm_syn
 2050  git pull
 2051  git branch
 2052  make deb

Did I miss some step?

@stsp
Copy link
Member

stsp commented Mar 1, 2021

See this:
#1408 (comment)
Bart is saying that this code won't
compile on bionic pristine (which is
what you're seeing), but you can
update kernel to 5.3 (4.17 is minimum
for that code).
So I guess I won't port that patch to
legacy branch.

@bpranoto
Copy link
Author

bpranoto commented Mar 1, 2021

Unfortunately, I don't have kernel v5 right now. Sadly, this means there is nothing we can do with the 4.15.0-136 problem... :(

@stsp
Copy link
Member

stsp commented Mar 1, 2021

Perhaps precisely finding the first
broken and first fixed versions may
be helpful. We can then try to see
all the patches from these 2 versions
and find something interesting.

@stsp
Copy link
Member

stsp commented Mar 1, 2021

Sadly, this means there is nothing we can do with the 4.15.0-136 problem...

Quite the opposite, this means we can
perfectly fix it. :)
I updated the patches.
Now instead of the compilation error
it will just disable KVM, effectively fixing
the problem.

@bpranoto
Copy link
Author

bpranoto commented Mar 1, 2021

Did you mean there will be no kvm support for kernel 4.15?

kvm works perfectly with 4.15.0-135, only the 4.15.0-136 gives the problem..

@stsp
Copy link
Member

stsp commented Mar 1, 2021

50Mb diff, no way.
Try to find the exact version it was
fixed in. Although I suppose it won't
be among 4.15.0-x. :(

Did you mean there will be no kvm support for kernel 4.15?

Well, I can of course not port that change
to legacy. But given that kvm doesn't work
there anyway with "most recent" kernels,
that was the plan.

@haenschen
Copy link

It works with 4.15.0.-133 for sure

@stsp
Copy link
Member

stsp commented Mar 1, 2021

You need to find where it works, with
the number above 136 (as on 136 it was
broken). Then maybe we can see a
"fix KVM" in the log.

@andrewbird
Copy link
Member

-136 is the ubuntu's latest security fix to the 4.15.0 kernel, see https://packages.ubuntu.com/bionic/main/linux-image-4.15.0-136-generic there is currently no -137 or above.

@stsp
Copy link
Member

stsp commented Mar 1, 2021

Do they (canonical) have a git repo where the one
can at least do bisect?

@andrewbird
Copy link
Member

I don't know of one. I did look at the source package it has the diffs between -136 and vanilla 4.15.0 and I guess that's the 50mb you mentioned earlier. The changelog mentions a couple of x86 related kvm changes that occurred in -136 http://changelogs.ubuntu.com/changelogs/pool/main/l/linux/linux_4.15.0-136.140/changelog

@guilhermepiccoli
Copy link

guilhermepiccoli commented Mar 1, 2021

Do they (canonical) have a git repo where the one
can at least do bisect?

Indeed, Canonical has git repos for all kernels! A good first step would be go to the generic git index and search there:
https://kernel.ubuntu.com/git/

For this specific release (Bionic / 18.04), the git repository is: git://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/bionic
You can checks tags to determine what is between releases.

Anyway, this was just informative - the kernel debug should continue in the Launchpad, it is better exposed to community, and in the end, it seems a kernel issue (I'm preparing a test package, I think I know what's going on).
Cheers,

Guilherme

@stsp
Copy link
Member

stsp commented Mar 1, 2021

That sounds great!
You can as well use our dosemu2-legacy
package from here:
https://code.launchpad.net/~dosemu2/+archive/ubuntu/ppa/+packages
for your tests.

Thanks!

@bpranoto
Copy link
Author

bpranoto commented Mar 2, 2021

I am very happy, I just tried the special 4.15.0-137 Guilherme prepared for us. It works perfectly! :)

Can we close this report now?

@stsp
Copy link
Member

stsp commented Mar 2, 2021

I think Guilherme would prefer if you write
that to the ubuntu bug report.
Otherwise he may have problems delivering
an update.

Thanks!

@bpranoto
Copy link
Author

bpranoto commented Mar 2, 2021

I already did.

@stsp
Copy link
Member

stsp commented Mar 2, 2021

Ok, the mails from LP seems to be not that
fast. :) Now it have arrived.

@bpranoto bpranoto closed this as completed Mar 2, 2021
@guilhermepiccoli
Copy link

Thanks bpranoto and stsp! Regarding GitHub, you can indeed close, it's up to you - I appreciate that this is reported here, people can find that in a web search plus the Launchpad.
About the kernel, I'll need a 3rd test, with the fix. The kernel I prepared contains a revert, which makes my hypothesis strong...now the fix will prove that and we can have that released.
I'll comment details over there soon, I just need to finish the backport of the fix.
Cheers!

@stsp
Copy link
Member

stsp commented Mar 3, 2021

@guilhermepiccoli Thanks for your work
on this, and the fix!
Unfortunately this represents a problem here.
My plan was to drop the KVM support for
bionic, both because it was broken and because
it lacks the KVM_CAP_SYNC_REGS extension,
which is what we are going to use. The patch
is already pending: #1408.

Now you kinda ruined my plan to drop kvm
from bionic, but the alternative is too poor.
So while you are at it, would you consider
to backport this:
torvalds/linux@01643c5#diff-57de5203cc2796c8dfb7cc4c308a2e3b01dde41795d495746034446d5d366c56

If you do that, then we can keep supporting kvm
on bionic. And if not - we can neither support it
nor drop, which is kinda unfortunate situation.

@stsp stsp reopened this Mar 3, 2021
@stsp
Copy link
Member

stsp commented Mar 3, 2021

@rkrcmar What do you think about porting
the KVM_CAP_SYNC_REGS patch to stable?
Otherwise Bionic will probably not have it in.
Patch looks rather small, and its a new functionality
so should not affect existing users?

torvalds/linux@01643c5

@guilhermepiccoli
Copy link

@guilhermepiccoli Thanks for your work
on this, and the fix!
Unfortunately this represents a problem here.
My plan was to drop the KVM support for
bionic, both because it was broken and because
it lacks the KVM_CAP_SYNC_REGS extension,
which is what we are going to use. The patch
is already pending: #1408.

Now you kinda ruined my plan to drop kvm
from bionic, but the alternative is too poor.
So while you are at it, would you consider
to backport this:
torvalds/linux@01643c5#diff-57de5203cc2796c8dfb7cc4c308a2e3b01dde41795d495746034446d5d366c56

If you do that, then we can keep supporting kvm
on bionic. And if not - we can neither support it
nor drop, which is kinda unfortunate situation.

My apologies for having fixed the Ubuntu kernel; if I had known it would affect the decision
about keeping the KVM support, I'd never do that! heheh

Jokes aside, I understand your problem, but I'm afraid the commit you mentioned
is not really a candidate for both Linux stable or Ubuntu, since it's not a fix. It is an
improvement per my understanding, right?

So, how about if you perform a check in dosemu2 code, and if the commit is present you might
use it's performatic API, or else you fallback to the more conservative/slow way of accessing the
registers - would that work?

If you really want to try getting this patch on Ubuntu kernels, I'd suggest opening a Launchpad bug
explaining the reasons why do you think that'd be a good addition (loop me in there, in case you open the bug!)
and it may get included in Focal kernel (5.4) and so on, so Bionic users could use it through the HWE kernel[0].

Cheers,

Guilherme

[0] https://wiki.ubuntu.com/Kernel/LTSEnablementStack

@stsp
Copy link
Member

stsp commented Mar 3, 2021

since it's not a fix. It is an
improvement per my understanding, right?

Yes, its not a fix at all.
Its just a new functionality.
But seeing a 50Mb diff between
-136 and 4.15.0 I can bet anything
that its not a fixes.

use it's performatic API, or else you fallback to the more conservative/slow way of accessing the
registers - would that work?

This would be quite difficult, as the new
API allowed to completely remove half
of our KVM code. Instead I'll be reverting
the aforementioned patches from the
legacy branch of dosemu2 if no better
solution is found. But reverting such a
big patches will sooner or later lead to
a merge conflicts.

and it may get included in Focal kernel (5.4) and so on

But its already there.
In fact, the patch I was pointing, was
dated v4.17-rc1, so today it should be
everywhere but Bionic.

I would rather think that HWE kernels is
a good solution too.
@bpranoto @haenschen any reason you
do not use the HWE kernels? According to
the above URL, all you need to do is:
sudo apt-get install --install-recommends linux-generic-hwe-18.04 xserver-xorg-hwe-18.04

@bpranoto
Copy link
Author

bpranoto commented Mar 4, 2021

@bpranoto @haenschen any reason you do not use the HWE kernels? ``

No special reason, it just because I didn't realize that it exists. For the current version which I use, as long as it works, I am never bothered to modify my configuration.

A little background why I use dosemu, please skip if you are not interested.

I use dosemu to keep my old accounting programs. The program resides in Netware Server, at this moment the only practical way to access Netware for a dos program is only dosemu. Even microsoft dropped ipx and netware support since Windows 7.

On Linux, ipx and ncpfs kernel modules also have been dropped since kernel 4.18 iirc, I had difficult time when we needed to set up a new computer with Ubuntu 20.04 because of it, our dosemu program can not access the netware server because the ipx and ncpfs kernel modules were dropped. Fortunately, there are people who provide ipx and ncpfs installable kernel modules. In case any body interested, here are the links:

  1. ipx kernel module: https://github.com/pasis/ipx
  2. ncpfs: https://github.com/EnzephaloN/ncpfs_dkms

So it is okay for me to use the newer kernel modules. As for the existing old office computers which use dosemu, I can simply version freeze their kernel and dosemu version.

I have been rewriting my software using modern GUI interface ( fpc+lazarus) and linux as the server. However, it is still far from complete as my good old dos applications are very wide and complicated, because they are not just only accounting programs but more ERP which used for daily operations and still developing if some necessities arise (change of tax regulation, workflow, etc).

Thank you very much for keeping dosemu evolves. It is very valuable to me.

@stsp
Copy link
Member

stsp commented Mar 4, 2021

If you really need an in-kernel ipx, and
instead of saying so 3 years ago in lkml
you decided to stick with old distros, you
deserve a lot of punishment.
But I dont think its the case.
Have you tried ipxodi+vlm?
I suppose that should give
you ncp under dos.

@bpranoto
Copy link
Author

bpranoto commented Mar 4, 2021

I use the ipx built in support + vlm. And I just realized that some months ago (not 3 years ago)

@stsp
Copy link
Member

stsp commented Mar 4, 2021

So use ipxodi?
What's the problem?

@bpranoto
Copy link
Author

bpranoto commented Mar 4, 2021

The problem is because I never thought of ipxodi until you mention. Besides that, with the help of https://github.com/pasis/ipx it's not a problem any more....

@guilhermepiccoli
Copy link

since it's not a fix. It is an
improvement per my understanding, right?

Yes, its not a fix at all.
Its just a new functionality.
But seeing a 50Mb diff between
-136 and 4.15.0 I can bet anything
that its not a fixes.

Take a look in the diff on linux-stable, like from v4.14 to latest tag in 4.14.y hehe
Bet it'll be as large as 50M.

use it's performatic API, or else you fallback to the more conservative/slow way of accessing the
registers - would that work?

This would be quite difficult, as the new
API allowed to completely remove half
of our KVM code. Instead I'll be reverting
the aforementioned patches from the
legacy branch of dosemu2 if no better
solution is found. But reverting such a
big patches will sooner or later lead to
a merge conflicts.

and it may get included in Focal kernel (5.4) and so on

But its already there.
In fact, the patch I was pointing, was
dated v4.17-rc1, so today it should be
everywhere but Bionic.

Oh yeah, I just checked quickly and didn't notice it's an old improvement. So, HWE seems a perfect solution. You could resort to uname() syscall and allow KVM mode only if kernel version 4.17+, and also, if you package dosemu2 in Debian format (for Debian/Ubuntu consumption) you could do package checks about the kernel version.

@stsp
Copy link
Member

stsp commented Mar 4, 2021

Take a look in the diff on linux-stable, like from v4.14 to latest tag in 4.14.y hehe

Yes but I am pretty sure its back-ports,
not fixes. So while I realize you may be
reluctant to back-port that on your own,
back-porting it to linux-stable should be
an option. Unfortunately this will mean
that either I do that myself or it won't
happen. :)

you could do package checks about the kernel version.

Do you suggest me to add it to Depends or
Recommends? I can add it to Recommends,
but that may not help. And if I add it to Depends,
I will likely break the user's system, as the
procedure of installing HWE kernel involves
installing also recommends for linux-generic-hwe-18.04
and xserver-xorg-hwe-18.04.
But I added now the Build-Depends on
linux-headers-5.4.0-37-generic
and the build was successful, so I suppose
at least now there will be the HWE-enabled builds.

@guilhermepiccoli
Copy link

Yes but I am pretty sure its back-ports,
not fixes. So while I realize you may be
reluctant to back-port that on your own,
back-porting it to linux-stable should be
an option. Unfortunately this will mean
that either I do that myself or it won't
happen. :)

Take a look in this document: https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
I'm not reluctant, I'm just following the generic rule that stable kernels inhirit fixes, not features. Improvements
are sublte, potentially added, e.g., if you have a 4 line patch that removes an unneeded lock, it's not properly a fix
but if it speedups in 3x a hot path, it might br backported - usually Ubuntu kernels have more flexibility than
linux-stable.

Do you suggest me to add it to Depends or
Recommends? I can add it to Recommends,
but that may not help. And if I add it to Depends,
I will likely break the user's system, as the
procedure of installing HWE kernel involves
installing also recommends for linux-generic-hwe-18.04
and xserver-xorg-hwe-18.04.
But I added now the Build-Depends on
linux-headers-5.4.0-37-generic
and the build was successful, so I suppose
at least now there will be the HWE-enabled builds.

I guess it's "cheap" just to add a check on code based on uname() - disallow/block the KVM mode if
users are running kernel < 4.17. This way, Bionic users can either run in non-KVM mode if they want to
stick with 4.15 kernel, or they can update to HWE and use KVM mode. Also, that'd hold for any distro,
not debian-based only.

@stsp
Copy link
Member

stsp commented Mar 4, 2021

Take a look in this document:

Yes, this one is pretty clear.
But I was after that one:
https://lwn.net/Articles/700530/
Though now I am not sure if
"long-term stable" and "stable"
is the same thing. :)
In either case, its hard to believe
in 50Mb of trivial fixes...

I guess it's "cheap" just to add a check on code

Yes, the check is of course there.
When we query the needed KVM extension,
it will tell us its not there. And with Build-Depend'ing
the HWE headers, such check will at least
be compiled in, rather than out because of
an unsatisfied ifdef.

Also, that'd hold for any distro,
not debian-based only.

We also support fedora, but it doesn't
seem to have LTS. Quite the opposite,
COPR periodically asks me to log in and
disable old builds. So as a matter of fact,
the problem seems specific to Ubuntu.

@stsp
Copy link
Member

stsp commented Mar 4, 2021

As @andrewbird noted, Build-Depending on
a HWE headers doesn't lead us anywhere
because they do not propagate to /usr/include/linux.
Is there some package to update the
UAPI headers to HWE?
Otherwise this HWE idea doesn't work.

stsp added a commit that referenced this issue Mar 4, 2021
@stsp stsp closed this as completed Apr 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kvm kvm-related problems
Projects
None yet
Development

No branches or pull requests

5 participants