CRETE hangs indefinitely if QEMU terminates before first trace is dumped #10

moralismercatus · 2016-12-03T00:25:41Z

Consider the following scenario:

vm-node starts QEMU.
vm-node transmits the seed to QEMU.
QEMU terminates/crashes before a trace is dumped.
vm-node, via fault tolerance, logs the crash, restarts QEMU and proceeds as normal.

At this point, there are no more test cases and no traces from which to generate new test cases. This is the logical point at which dispatch should recognize that there is nothing more to do and terminate gracefully; however, dispatch instead hangs at this point indefinitely.

The reason for this is that the guard DispatchFSM_::is_target_expired returns false if the first trace has not yet been received.

A simple fix may be to use the first test case, instead of the first trace, as the condition in which to start checking if the target is expired. If the first test case is not a consistent source of indicating that a VM instance has started - because the first test may originate from a seed - then a consistent indicator should be the reception of guest data by dispatch. We can be sure of this because a VM instance must have been started in order for vm-node to get the data from the guest OS.

A more thorough fix would be to re-evaluate how CRETE determines when testing has completed.

likebreath · 2017-02-22T00:31:14Z

The timer of crete-dispatch now started once the first node is connected to dispatch, and hence will not hang indefinitely (instead the crete will terminate the current test once timeout is reached). Please refer to the relevant commit 4763919.

moralismercatus · 2017-02-24T01:26:04Z

@likebreath The reason why I originally changed the timer to start when the first trace was received is because copying over the VM image in distributed mode can be time consuming. Have you resolved this issue, or is it no longer a concern?

likebreath · 2017-02-24T01:34:44Z

@moralismercatus Yes, you are right. You introduced the change when you did the work about regression framework. I juset undo that change, because the infinitely hangging issue is more seious than the copying image issue. Certainly, a better solution to resolve both the infinitely hangging and the copying image issue is needed. This is why I am not closing this issue. Let me know your thoughts.

moralismercatus · 2017-02-24T01:50:53Z

From my perspective, indefinite hanging is more important. Just wanted to make sure you were aware of the implications.

Since this change has been merged to master, maybe we should close this issue and open another e.g., "Timer runs while VM image is being copied"

likebreath · 2017-02-24T01:53:05Z

My motivation to comment on this issue was also to make sure you are aware of this change. Let's open another issue.

moralismercatus added bug enhancement labels Dec 3, 2016

likebreath mentioned this issue Jan 2, 2017

[vm-node] Dead lock caused by VMNode::reset() while executing QemuFSM_::connect_vm() #14

Closed

likebreath closed this as completed Feb 24, 2017

likebreath mentioned this issue Feb 24, 2017

Dispatch timer runs while VM image is being copied #20

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CRETE hangs indefinitely if QEMU terminates before first trace is dumped #10

CRETE hangs indefinitely if QEMU terminates before first trace is dumped #10

moralismercatus commented Dec 3, 2016 •

edited

Loading

likebreath commented Feb 22, 2017

moralismercatus commented Feb 24, 2017

likebreath commented Feb 24, 2017

moralismercatus commented Feb 24, 2017

likebreath commented Feb 24, 2017

CRETE hangs indefinitely if QEMU terminates before first trace is dumped #10

CRETE hangs indefinitely if QEMU terminates before first trace is dumped #10

Comments

moralismercatus commented Dec 3, 2016 • edited Loading

likebreath commented Feb 22, 2017

moralismercatus commented Feb 24, 2017

likebreath commented Feb 24, 2017

moralismercatus commented Feb 24, 2017

likebreath commented Feb 24, 2017

moralismercatus commented Dec 3, 2016 •

edited

Loading