Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve io waiting, ensure timeout thread runs immediately #225

Merged
merged 5 commits into from
Nov 23, 2015

Conversation

wfleming
Copy link
Contributor

  1. Yesterday's patch for IO wait worked, but this technique is more performant (and I'm deleting redundant code I missed yesterday, and adding a spec.)
  2. Due to how the timeout thread was instantiated, it might not be scheduled soon, and so actual timeout could be well after what was desired. By running the thread immediately, this should improve this behavior.

This is working for me locally, and I have managed to make things timeout correctly locally. It should also be noted that timeout behavior was already specced as well, and those specs were running fine. So I think the missed timeout I saw in production this morning was something different, possibly an isolated problem, but almost certainly not a consequence of the other recent changes. This should tighten things up overall. I'll continue to dig into potential ways timeouts could be delayed or missed as well.

👓 @codeclimate/review

@@ -60,7 +60,7 @@ def run(options = [])
duration = timeout
@listener.timed_out(container_data(duration: duration))
else
sleep 0.01 until !t_out.alive? && !t_err.alive?
t_out.join || t_err.join
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible for one of these to finish before the other? Imagining if stderr closes but stdout is still flushing? Is that possible?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd be safest / simplest to just join both

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, you're right. That was my intent but I messed up my boolean logic.

Will Fleming added 2 commits November 23, 2015 12:29
This is an optimization of the fix from 320eda4.

The previous code in a tight loop using `sleep` seemed like it thrashed
CPU context switches. Hence why it could potentially take quite a lot
longer, as evidenced by the spec change.

This should have the same behavior: `Thread#value` is a blocking call
that waits until the thread has finished. But this avoids the `sleep` on
the main thread which resulted in thread thrashing.

I have also devised a spec that actually tests this behavior reasonably
well. It's a bit hacky in some ways.
Just creating the thread doesn't ensure it gets run immediately. This
schedules the thread immediately, so timeouts should be more prompt
@pbrisbin
Copy link
Contributor

This appears to now just remove the Thread#join calls -- what waits on the threads without them?

@wfleming
Copy link
Contributor Author

@pbrisbin Gordon just reverted to the sleep usage since it appeared to be more reliable, if slower. The join was causing problems for us. At this point the only changes here are really cleanup & a spec, which I'm fine with. I do think we need to look more closely at the threading behavior still, but the current state of the PR should be behavior neutral as far as that goes.

@@ -75,13 +75,8 @@ def run(options = [])
)
ensure
t_timeout.kill if t_timeout
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💄, but looking at this now I think it would be nicer to read as [t_timeout, t_out, t_err].each { |t| t.kill if t }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think doing that would be an issue if something raised an error before one of them was initialized (which is the purpose of the if t_out right?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good point, but, that being the case, I'm not sure the existing code is correct either: t_out.kill if t_out will also raise an exception if t_out never got defined. We'd want to either use defined?(t_tout), or switch to instance vars that have the nil-if-undefined behavior.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm that might be right. I'll use the array and see what happens

@timed_out = true
reap_running_container
end
end.run
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious why this call was absent before but present now?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is from @wfleming's original change - it's explained in the PR description

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 thanks

In testing, I was seeing the `sleep timeout` be skipped. Doing multiple
smaller sleeps seems to be more reliable.
@gdiggs
Copy link
Contributor

gdiggs commented Nov 23, 2015

Would like support to mercy merge this as it could help fix the waiting status

@listener.timed_out(container_data(duration: duration))
else
sleep 0.01 until !t_out.alive? && !t_err.alive?
sleep 1 until !t_out.alive? && !t_err.alive?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment lost to the diff:

I wondered this last time it came though: doesn't this basically just re-implement Thread#join?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wfleming had a reason to back that change out - will, what was that?

jpignata added a commit that referenced this pull request Nov 23, 2015
improve io waiting, ensure timeout thread runs immediately
@jpignata jpignata merged commit beb515b into master Nov 23, 2015
@jpignata jpignata deleted the will/timeout branch November 23, 2015 18:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants