Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kvm syn #1408

Draft
wants to merge 11 commits into
base: devel
Choose a base branch
from
Draft

Kvm syn #1408

wants to merge 11 commits into from

Conversation

stsp
Copy link
Member

@stsp stsp commented Feb 27, 2021

This converts kvm to sync_regs API.
But the remaining bit is to handle exceptions
w/o kvmmon.

Bart, would you like to review this, and maybe
complete the removal of kvmmon?

@stsp
Copy link
Member Author

stsp commented Feb 28, 2021

I finished the conversion.
It appears that KVM does not provide
any API for intercepting exceptions, so
the HLTs are still there in some form.
But it all is now much, much simpler.

@stsp
Copy link
Member Author

stsp commented Feb 28, 2021

PVI cannot be enabled because then
we can't record the CLI addresses to
black-list them later.

@bartoldeman
Copy link
Contributor

My only concern is that this raises the minimal kernel version, though I am not sure by how much and if it affects currently supported LTS distributions.

I'll do a proper review when I have time (more likely tomorrow).

@stsp
Copy link
Member Author

stsp commented Feb 28, 2021

Well, we only support bionic with
"dosemu2-legacy" build. Even xenial
is dropped.

@bartoldeman
Copy link
Contributor

torvalds/linux@01643c5#diff-57de5203cc2796c8dfb7cc4c308a2e3b01dde41795d495746034446d5d366c56
it's in 4.17-rc1 and up

Bionic was released with 4.15 but 5.3 is available as well. Debian stable is at 4.19.
Forget about RHEL/CentOS 7 (kernel 3.10) though 8 looks ok (4.18).

So that seems ok, now "just" need to review the code.

Indeed PVI doesn't help much, though I have thought about setting CPL=IOPL=3 for DPMI and maybe even vm86. It'd would change the port i/o handling to be for KVM (instead of a GPF) and some other things but still keeps enough protections (because CPL=3 still).

@stsp
Copy link
Member Author

stsp commented Mar 1, 2021

t'd would change the port i/o handling to be for KVM (instead of a GPF)

Would be an interesting experiment, but
I am sceptical it will gain much. Maybe
KVM provides some hints to simplify the
IO decoding? But we already have the IO
decoders in, and also I wonder what
kind of hints are those?
There are really some very bad things
happening, and kvm is now very slow
for me. I doubt that small simplifications
in IO decoding can cure that.
But at least simplifying the code (as this
patch does) may be a good start.

@stsp
Copy link
Member Author

stsp commented Mar 1, 2021

Well if they fix KVM on bionic, then
Houston, we have a problem.
Obviously ppl won't give up KVM on
dosemu2-legacy then. I can prevent
this patch from going to legacy branch,
but its too big and will inevitably mean
the inability to backport any kvm fixes +
clashes on merges.

@stsp
Copy link
Member Author

stsp commented Apr 18, 2021

The job exceeded the maximum time limit for jobs, and has been terminated.

Seems hangs not fixed?

@andrewbird
Copy link
Member

Does it need a rebase?

@stsp
Copy link
Member Author

stsp commented Apr 18, 2021

Even worse after rebase:

200
$ ./ci_prereq.sh
201
 
202
 More info: https://launchpad.net/~dosemu2/+archive/ubuntu/ppa
203
Press [ENTER] to continue or Ctrl-c to cancel adding it.
204
205
206
No output has been received in the last 10m0s, this potentially indicates a stalled build or something wrong with the build itself.
207
Check the details on how to adjust your build configuration on: https://docs.travis-ci.com/user/common-build-problems/#build-times-out-because-no-output-was-received
208
209
The build has been terminated

@stsp
Copy link
Member Author

stsp commented Apr 18, 2021

No, its not, its a bad rebase. :)

@stsp
Copy link
Member Author

stsp commented Apr 18, 2021

Only many FS tests fail.
KVM tests pass!
Very strange.

@stsp
Copy link
Member Author

stsp commented Apr 18, 2021

Locally these tests pass.
I think the problem is not in KVM,
but eg in xattrs?

@stsp
Copy link
Member Author

stsp commented Apr 19, 2021

Moreover, the same FS tests that
fail on travies, passes on github, within
this same PR!
There is something still broken with
CI for sure?

@andrewbird
Copy link
Member

Is it possible with this branch that kvm just got a whole lot slower, as there seem to be many tests that are failing with timeout?

@stsp
Copy link
Member Author

stsp commented Apr 19, 2021

Mm, yes, seems to have got twice
slower.

@andrewbird
Copy link
Member

andrewbird commented Apr 19, 2021

Even more than that in some cases, the 'PP-DOS-GIT MFS DOSv3 share open set file attrs DOSv2' test took ~1 sec on GA (emulated) and ~12 secs on Travis(kvm)

@stsp
Copy link
Member Author

stsp commented Apr 19, 2021

Can't confirm that weirdness:

$ time test/test_dos.py PPDOSGITTestCase.test_mfs_ds3_share_open_setfattrs
Test PP-DOS-GIT  MFS DOSv3 share open set file attrs DOSv2                       ... ok (  3.44s)

----------------------------------------------------------------------
Ran 1 test in 3.445s

OK

real	0m3,642s
user	0m0,302s
sys	0m0,136s

... on kvm_sync and:

$ time test/test_dos.py PPDOSGITTestCase.test_mfs_ds3_share_open_setfattrs
Test PP-DOS-GIT  MFS DOSv3 share open set file attrs DOSv2                       ... ok (  2.66s)

----------------------------------------------------------------------
Ran 1 test in 2.663s

OK

real	0m2,864s
user	0m0,297s
sys	0m0,105s

... on devel, so it only became slightly
slower here, locally.

@andrewbird
Copy link
Member

Interestingly when I run that test here I see a kvm error, just like those I saw a while back.

ERROR: KVM_EXIT_FAIL_ENTRY: hardware_entry_failure_reason = 0x80000021

@andrewbird
Copy link
Member

I'll retest on devel to see if the error is specific to kvm_syn

@andrewbird
Copy link
Member

test passes on current devel

$ test/test_dos.py PPDOSGITTestCase.test_mfs_ds3_share_open_setfattrs
Test PP-DOS-GIT  MFS DOSv3 share open set file attrs DOSv2                       ... ok (  3.57s)

----------------------------------------------------------------------
Ran 1 test in 3.587s

OK

@stsp
Copy link
Member Author

stsp commented Apr 19, 2021

Please post output of cpuid command
(probably gzipped).

@stsp
Copy link
Member Author

stsp commented Jun 18, 2021

Andrew, could you please check
that this branch now works properly?
If so, I may eventually find some time
to defeat the slowdown problem.

Compare with old copies.
This should speed up the things.
@stsp
Copy link
Member Author

stsp commented Jun 19, 2021

Seems no travis check any more?

@andrewbird
Copy link
Member

andrewbird commented Jun 19, 2021

No Travis for anything but the weekly cron at the moment. I need to split out the kvm tests from the others and just run those. I'll try out the kvm_syn branch as soon as my devel test run is complete.

@andrewbird
Copy link
Member

So kvm_syn seems a lot better now than before (as far as i remember), as the kvm tests pass. I do see random GPF failures t1.zip with kvm_syn, and overall it's slower than devel.

devel: 1139 seconds
kvm_syn: 1306 seconds

@stsp
Copy link
Member Author

stsp commented Jun 19, 2021

What are the failures, are they
specific to a particular test?

Also good to know its not 30 times
slower but just a few percents slower.
This is what I see on AMD as well,
so at least we are on a same boat now.
I think the slowdowns can be cured
with the use of a task-gate instead
of the kvm_syn.

@andrewbird
Copy link
Member

No, it's unrepeatable and seems random. I run the test again 5 times and it doesn't fail. In both the log files attached above the error seems similar, which to me looks like an attempt to execute zeroed memory.

@stsp
Copy link
Member Author

stsp commented Jul 7, 2021

Not sure if there is still a reason
behind this development after 71c0037.
It appears KVM is inherently non-atomic,
and the one may want to restart it during
the exception delivery. So making it
atomic wrt ring0 monitor, looks like a
non-goal now.
And SYNC_REGS API is slow.
This was probably a false start.

@andrewbird
Copy link
Member

What happened with the Core 2 Duo kernel problem, was there any agreement over the seemingly bad patch in 5.10 (and its backport)?

@stsp
Copy link
Member Author

stsp commented Jul 7, 2021

Yes, absolutely.
And I was "dragged" into that
agreement, rather than proposing
it, because I though this whole
thing should be atomic.
Now the problem is that the
problematic patch went to -stable,
and so - on all old kernels in the
world.

@andrewbird
Copy link
Member

Thanks. I just read the kvm list and can see your back and forth with the kvm kernel devs, so it's still rumbling on...

@stsp
Copy link
Member Author

stsp commented Jul 7, 2021

Its a minor details only.
As soon as I applied the patch that
works only on old kernels, this became
an official regression. So the ball is on
their side now anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants