-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CRETE hangs indefinitely if QEMU terminates before first trace is dumped #10
Comments
The timer of crete-dispatch now started once the first node is connected to dispatch, and hence will not hang indefinitely (instead the crete will terminate the current test once timeout is reached). Please refer to the relevant commit 4763919. |
@likebreath The reason why I originally changed the timer to start when the first trace was received is because copying over the VM image in distributed mode can be time consuming. Have you resolved this issue, or is it no longer a concern? |
@moralismercatus Yes, you are right. You introduced the change when you did the work about regression framework. I juset undo that change, because the infinitely hangging issue is more seious than the copying image issue. Certainly, a better solution to resolve both the infinitely hangging and the copying image issue is needed. This is why I am not closing this issue. Let me know your thoughts. |
From my perspective, indefinite hanging is more important. Just wanted to make sure you were aware of the implications. Since this change has been merged to master, maybe we should close this issue and open another e.g., "Timer runs while VM image is being copied" |
My motivation to comment on this issue was also to make sure you are aware of this change. Let's open another issue. |
Consider the following scenario:
At this point, there are no more test cases and no traces from which to generate new test cases. This is the logical point at which dispatch should recognize that there is nothing more to do and terminate gracefully; however, dispatch instead hangs at this point indefinitely.
The reason for this is that the guard DispatchFSM_::is_target_expired returns false if the first trace has not yet been received.
A simple fix may be to use the first test case, instead of the first trace, as the condition in which to start checking if the target is expired. If the first test case is not a consistent source of indicating that a VM instance has started - because the first test may originate from a seed - then a consistent indicator should be the reception of guest data by dispatch. We can be sure of this because a VM instance must have been started in order for vm-node to get the data from the guest OS.
A more thorough fix would be to re-evaluate how CRETE determines when testing has completed.
The text was updated successfully, but these errors were encountered: