Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Xen singlestep support #150

Open
wants to merge 21 commits into
base: master
Choose a base branch
from
Open

Xen singlestep support #150

wants to merge 21 commits into from

Conversation

Wenzel
Copy link
Owner

@Wenzel Wenzel commented Dec 20, 2020

Continuing #113

@Wenzel
Copy link
Owner Author

Wenzel commented Dec 20, 2020

This PR brings singlestep support for Xen, and implements the reply_event method, which was missing.

I'm still having an issue to listen on multiple VCPU, I only receive events from VCPU 0

// enable singlestep monitoring
// it will only intercept events when explicitely requested using
// xc_domain_debug_control()
// TODO: call get_vcpu_count()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That comment is obsolete afais, right?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually no.
I copy pasted the get_vcpu_count() implementation here, because if i created the Xen struct below and tried to call get_vcpu_count() afterwards, it had an issue with xc being moved above when constructing the Xen struct.

As I didn't have any solution, I used this.

I suggest we should make a function get_vcpu_count(xc: &Xc) and call it from the trait and the Xen driver init function

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just refactored it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I don't see the problem. This works fine for me. On a side note, this function has become too large imho, we should split it.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem was that I copy pasted a chunk of code, so there was duplication.
I we update the definition of get_vcpu_count one day, the other one becomes obselete, etc.

This works fine for me.

Github won't diplay your link: Non-Image content-type returned

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh.
I don't know I was seeing having a mutable Xen struct as a bad idea.
it's not, since it's restricted on the init function anyway.

I refactored and implemented your changes, thank you

@codecov-io
Copy link

codecov-io commented Dec 20, 2020

Codecov Report

Merging #150 (f3ea0ae) into master (b61e80f) will decrease coverage by 0.11%.
The diff coverage is 0.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #150      +/-   ##
==========================================
- Coverage   13.95%   13.84%   -0.12%     
==========================================
  Files           5        5              
  Lines         480      484       +4     
  Branches       71       71              
==========================================
  Hits           67       67              
- Misses        396      400       +4     
  Partials       17       17              
Flag Coverage Δ
unittests 13.84% <0.00%> (-0.12%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/api.rs 0.00% <0.00%> (ø)
src/driver/kvm.rs 22.00% <0.00%> (-0.08%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b61e80f...0d957f0. Read the comment docs.

@Wenzel
Copy link
Owner Author

Wenzel commented Dec 20, 2020

I can confirm that it traps events for all VCPUs

Capture d’écran de 2020-12-20 13-04-54

@Wenzel
Copy link
Owner Author

Wenzel commented Dec 20, 2020

One issue that needs a fix is that the singlestep-events example leave the domain frozen.
The cause should be that there is a singlestep event left in the ring that requires processing, but we already left.

I thought that closing the monitoring and unbinding would clear the queue, but that's not the case apparently.

Looking at Libvmi xen_events_destroy():
https://github.com/libvmi/libvmi/blob/master/libvmi/driver/xen/xen_events.c#L3253

They do a finall pass of listening for events, if any, to clear the queue.
We should do the same

match self.xev.xenevtchn_pending()? {
-1 => {
// no event channel port is pending
// TODO: Err
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why you haven't implemented this yet? Feels to me like it should be done in this PR.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this means creating custom error types, and I'm not feeling confident / knowleadgable enough on the topic to implement this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the overall error handling in this library is not very good. I once tried to refactor it, but got discouraged after realizing how much I would need to change.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some point we should create a custom error type (or perhaps even multiple, one for each driver) and implement error wrapping instead of error boxing. This is the way. ;)

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm totally in favor of something like this.
I'm simply lacking the time to invest in learning state of the art Rust, and to reflect the changes in libmicrovmi.

But my heart breaks every time I call unwrap() in libmicrovmi simply because I'm a noob in error handling.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:D

}
};
let back_ring_ptr = &mut self.back_ring;
match RING_HAS_UNCONSUMED_REQUESTS!(back_ring_ptr) != 0 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pattern matching is overkill for a simple boolean check. Also, didn't we come to the conclusion that all those macros could have been simple functions?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, didn't we come to the conclusion that all those macros could have been simple functions?

Probably, but my focus right now is to make it work.
I'm sure we can improve afterwards.

Comment on lines 418 to 423
let mut rsp: vm_event_response_t =
unsafe { mem::MaybeUninit::<vm_event_response_t>::zeroed().assume_init() };
rsp.reason = req.reason;
rsp.version = VM_EVENT_INTERFACE_VERSION;
rsp.vcpu_id = req.vcpu_id;
rsp.flags = req.flags & add_flags;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let mut rsp: vm_event_response_t =
unsafe { mem::MaybeUninit::<vm_event_response_t>::zeroed().assume_init() };
rsp.reason = req.reason;
rsp.version = VM_EVENT_INTERFACE_VERSION;
rsp.vcpu_id = req.vcpu_id;
rsp.flags = req.flags & add_flags;
let mut rsp = vm_event_response_t {
reason: req.reason,
version: VM_EVENT_INTERFACE_VERSION,
vcpu_id: req.vcpu_id,
flags: req.flags & add_flags,
..Default::default()
};

Comment on lines 454 to 457
let op: u32 = match enabled {
false => XEN_DOMCTL_DEBUG_OP_SINGLE_STEP_OFF,
true => XEN_DOMCTL_DEBUG_OP_SINGLE_STEP_ON,
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let op: u32 = match enabled {
false => XEN_DOMCTL_DEBUG_OP_SINGLE_STEP_OFF,
true => XEN_DOMCTL_DEBUG_OP_SINGLE_STEP_ON,
};
let op: u32 = if enabled {
XEN_DOMCTL_DEBUG_OP_SINGLE_STEP_ON
} else {
XEN_DOMCTL_DEBUG_OP_SINGLE_STEP_OFF
};

@Wenzel
Copy link
Owner Author

Wenzel commented Dec 20, 2020

the latest commits implemented your fixes @rageagainsthepc

I also implemented a cleanup of the ring when we close the Xen driver.
And even I can clearly see that we catch and replying to a remaining singlestep event:
Capture d’écran de 2020-12-20 13-58-05

My domain is still frozen.
if you have ideas.

@rageagainsthepc
Copy link
Collaborator

My domain is still frozen.
if you have ideas.

The cleanup code looks good to me. 🤔

@Wenzel
Copy link
Owner Author

Wenzel commented Dec 20, 2020

I add debug output in xenevtchn
Capture d’écran de 2020-12-20 15-07-04

@Wenzel
Copy link
Owner Author

Wenzel commented Dec 29, 2020

So, latest update.

the domain survives now, but only if it has 1 VCPU.
with 2 VCPU it freezes, which means that the second VCPU is missing a singlestep shutdown API call somehow, or it misses an event notification, idk.

I added a -c argument to singlestep-events example, to only catch a limited amount of events.

@Wenzel
Copy link
Owner Author

Wenzel commented Dec 29, 2020

Here is the xen-access.c source that I tweaked
https://gist.github.com/Wenzel/c9fc0ab4f6d4e878e39dcca0fa2a79f4

compile with gcc xen-access.c -o xen-access -lxenctrl -lxenevtchn
run with sudo ./xen-access <dom_id> singlestep

@rageagainsthepc
Copy link
Collaborator

rageagainsthepc commented Dec 29, 2020

Here is the xen-access.c source that I tweaked

What of it? Do you use it as a "control group" to see what we did differently? Does it work without a problem?

@Wenzel
Copy link
Owner Author

Wenzel commented Dec 29, 2020

Yes, it is a small example, catching one singlestep and quit, with prints on any Xen API calls.

Yes, it works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants