Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sigalrm support #106

Merged
merged 10 commits into from
Jul 29, 2018
Merged

Sigalrm support #106

merged 10 commits into from
Jul 29, 2018

Conversation

devietti
Copy link
Collaborator

@devietti devietti commented Jul 28, 2018

Here's my prototype of alarm support. There are two issues that I know of:

  • I wasn't sure how to kill the tracee cleanly when we need to (this is why the alarm-nohandler test is failing)
  • I seem to be reading tracee memory incorrectly.

These are also called out via commit comments.

devietti and others added 6 commits July 7, 2018 21:35
…fo_t and ucontext_t) and 2) avoid concurrency issues between main thread and signal handler. This test currently fails, so I'm isolating it on its own branch.
…TIMER_VIRTUAL and ITIMER_PROF. No expected output yet. #95
@devietti devietti self-assigned this Jul 28, 2018
@devietti devietti requested a review from gatoWololo July 28, 2018 05:46
//int retVal = syscall(SYS_tgkill, t.getPid(), t.getPid(), SIGALRM);
gs.log.writeToLog(Importance::info, "alarm pre-hook, requesting alarm in %u second(s)\n", t.arg1());
/*
To determine how to handle alarm(), we need to know what kind of SIGALRM
Copy link
Collaborator Author

@devietti devietti Jul 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gatoWololo This is the main description of how alarm support works.

case DEFAULT_HANDLER: {
// if tracee doesn't have a SIGALRM handler, we need to kill the tracee per `man 7 signal`
gs.log.writeToLog(Importance::info, "tracee has default SIGALRM handler, sending SIGKILL to pid %u\n", t.getPid());
int retVal = syscall(SYS_tgkill, t.getPid(), t.getPid(), SIGKILL);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gatoWololo This is where I know the tracee should get torn down (SIGALRM with default handler terminates the tracee). Sending this SIGKILL causes dettrace to throw an exception later on:

terminate called after throwing an instance of 'std::runtime_error'
  what():  Ptrace failed with error: No such process

There is probably some canonical way to shut down the tracee I don't know.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the tracer dies, the tracee is automatically killed, so I just exit from dettrace. That will kill everything though, which might be too much?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think we want to just kill this tracee. E.g., I saw an alarm call from configure, (I think) testing for the presence/behavior of the system call. Not sure what kind of signal handler it had, though, but if it had a simple alarm call without a handler, some subprocess of configure should terminate without tearing down the whole dettrace job.

That said, if it's hard to cleanly kill just one tracee then I don't think it's worth adding just for this corner case of alarm, which may not even arise in practice.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thought: maybe injecting an exit into the tracee is the right approach here? The tracee won't exit for the right reason, which is maybe an issue, but probably not.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this exit-injection in 288757d

// struct sigaction incorrectly? Or maybe there's one definition in the
// program and another one here?

//if (sa.sa_flags & SA_RESETHAND) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gatoWololo Here's where I'm having some trouble reading tracee memory. The sa.sa_flags field seems to sometimes have a garbage value of -1 (all 1's), when it should only have a few bits set. strace sees the field correctly, so there's something I'm doing wrong. Maybe the ptracer::readFromTracee() call is incorrect?

@gatoWololo
Copy link
Collaborator

@devietti
I had a couple of questions. I'm confused about the need to check for the signal handler, and only inject the signal and pause in some cases. Why don't we just send the signal and pause every time?

Similarly, if the semantics of an alarm with no signal handler dictate the program should be killed, why don't we just let the alarm through and have the OS kill the tracee?

Based on this output:

[3]INTER [Pid 1] Intercepted pause
[4]INFO    pause pre-hook
[4]INFO    tracee has a custom SIGALRM handler, sending SIGALRM to pid 13412
[4]INFO    pause value before post-hook: -514
[4]INFO    pause post-hook
[4]INFO    pause returned -514
[4]INFO    pause returned with value: 0
[3]INTER [1] Tracer: Received signal: 14. Forwading signal to tracee.

It looks like we're not actually using the pause() to inject the alarm? Instead the pause fails with 514 and we just inject the alarm afterwards? Which I'm not sure is getting delivered deterministically?

@devietti
Copy link
Collaborator Author

I'll respond to your questions below. In general, I may have been too consumed by corner cases :-/. I'm not sure if these actually come up, but once I was in the midst of things it seemed easiest to add the extra functionality now instead of having to return to this later on.

Why don't we just send the signal and pause every time?

Because if the SIGALRM is ignored, then alarm should be a NOP, so I'm not sending the signal (and therefore not using a pause either). Without a real signal being sent, there's no risk of some random subsequent system call getting interrupted.

Similarly, if the semantics of an alarm with no signal handler dictate the program should be killed, why don't we just let the alarm through and have the OS kill the tracee?

We could, but I want to make sure the tracee exits at a deterministic point.

It looks like we're not actually using the pause() to inject the alarm? Instead the pause fails with 514 and we just inject the alarm afterwards? Which I'm not sure is getting delivered deterministically?

I don't fully understand the pause returning -514 (ERESTARTNOHAND). It does that whether there is a SIGALRM handler or not (both the alarm-handler and alarm-nohandler tests). So I'm assuming this is "ok". The SIGALRM does seem to be arriving deterministically, per the alarm-handler test. If there are other/better ways to test determinism that would be good to add, though.

@gatoWololo
Copy link
Collaborator

If there are other/better ways to test determinism that would be good to add, though.
Right, it's interesting that's difficult to test if it's actually working but if you see it working deterministically, we will assume it's correct.

We can come back to it someday post-deadline.

Because if the SIGALRM is ignored, then alarm should be a NOP
Huh, I figured the program would get stuck forever: Since it would mask it's own alarm and never return.

Sounds good.

@gatoWololo gatoWololo merged commit 006ccb8 into master Jul 29, 2018
@devietti devietti deleted the sigalrm branch July 29, 2018 14:08
This was referenced Jul 29, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants