New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Be more lenient with streams in the `pending_send` queue. #261

Merged
merged 3 commits into from May 10, 2018

Conversation

Projects
None yet
4 participants
@goffrie
Contributor

goffrie commented Apr 20, 2018

The is_peer_reset() check doesn't quite cover all the cases where we call
clear_queue, such as when we call recv_err. Instead of trying to make the
check more precise, let's gracefully handle spurious entries in the queue.

This fixes the issue I mentioned at #258 (comment).

Be more lenient with streams in the `pending_send` queue.
The `is_peer_reset()` check doesn't quite cover all the cases where we call
`clear_queue`, such as when we call `recv_err`. Instead of trying to make the
check more precise, let's gracefully handle spurious entries in the queue.

@goffrie goffrie referenced this pull request Apr 20, 2018

Closed

Fuzzing with honggfuzz-rs #263

@hawkw hawkw requested a review from carllerche Apr 20, 2018

@hawkw

This comment has been minimized.

Collaborator

hawkw commented Apr 20, 2018

At a glance, this seems right to me --- it's similar to the first thing I tried while working on #247, which appeared to work correctly. However, I'd like to hear from @carllerche before approving it.

@carllerche

I'm not necessarily against this change. But, it is a step away from trying to be explicit about state transitions and tracking the current expected state w/ assertions (usually debug_assert).

I don't know if anybody has thoughts on that change in strategy.

// in clear_queue(). Instead of doing O(N) traversal through queue
// to remove, lets just ignore the stream here.
trace!("removing dangling stream from pending_send");
counts.transition_after(stream, is_counted, is_pending_reset);

This comment has been minimized.

@carllerche

carllerche Apr 23, 2018

Owner

Could you explain why this needs to be called here? I don't think this was called before (and I could believe it was another bug).

This comment has been minimized.

@goffrie

goffrie Apr 25, 2018

Contributor

I agree that it was not called before. I think that it should be, otherwise we could leak streams - it's possible that the pending_send queue here was holding onto the last reference to the stream.

This comment has been minimized.

@carllerche

carllerche Apr 26, 2018

Owner

I think it is highly likely that this change is correct, but i'd like to see if we can figure out a failing test case.

@seanmonstar

This comment has been minimized.

Collaborator

seanmonstar commented Apr 24, 2018

The change looks fine, though I'm also curious about two things:

  1. Why is counts.transition_after called? It wasn't before, does it need to be?
  2. While I'm always happy to reduce crashes, having "spurious" streams worries me there's a bug somewhere. It'd probably be nice to at least have a debug_assert to look for bugs?
@carllerche

This comment has been minimized.

Owner

carllerche commented Apr 26, 2018

I believe that this change is probably the right direction. Panicking in production is no good. It would be good to figure out how to catch any bugs that are let through though. At the very least a debug_assert is good.

@goffrie

This comment has been minimized.

Contributor

goffrie commented Apr 27, 2018

I added a debug_assert!, though I'm not sure what precisely you had in mind - this one's pretty general. (Notably, asserting is_reset() doesn't work because the stream can be in the Closed(EndStream) state and still get cleared.)

@carllerche

This comment has been minimized.

Owner

carllerche commented May 4, 2018

So I spent a bunch of time trying to reproduce the issue described in the comment.

All the paths that I tracked down via recv_err would not hit the issue.

Either recv_err is called because of a connection level issue. At this point, the code switches to GOAWAY and won't go through prioritize anymore. Or, recv_err is called when sending an explicit reset, in which case there will always be a frame to pop (the reset frame).

That said, I think that the consensus is that this is still a good change even if it does not fix any bugs.

Edit: This is referencing streams still being in pending_send even though there is no more data to send. I will dig into the counts a bit more now.

@goffrie

This comment has been minimized.

Contributor

goffrie commented May 4, 2018

If you run the fuzzer in #263 you can definitely run into panics in this code if this patch isn't applied.

@carllerche

This comment has been minimized.

Owner

carllerche commented May 4, 2018

k, i will try that

@carllerche

This comment has been minimized.

Owner

carllerche commented May 5, 2018

Thanks to the fuzzer (#263), I have identified the logic path that results in the panic.

Unfortunately, it relies on a very specific pattern of the underlying I/O's write fn returning Ready / NotReady, which means that writing a robust test is not really possible. Any change to the h2 implementation could break the test.

So, given that, I think the best course of action is just going to be to include the fuzzing run and call it a day.

I still need to verify the counts.transition_after(stream, is_counted, is_pending_reset); line. I believe that this does fix a bug, but I would like to confirm this first. This will be my next focus.

@goffrie Thanks for your patience. My assumption is that you are currently using a fork that includes this patch, so you are not currently blocked. If this assumption is incorrect, please let me know and we can merge the PR before I complete the work stated above.

@goffrie

This comment has been minimized.

Contributor

goffrie commented May 5, 2018

@carllerche carllerche referenced this pull request May 6, 2018

Merged

Misc bug fixes #273

@carllerche

This comment has been minimized.

Owner

carllerche commented May 6, 2018

So, I haven't reproed the issue w/ the missing count decrement, but I did find & fix a number of other related bugs (#273).

carllerche added a commit that referenced this pull request May 6, 2018

Assert that stream state is released
This patch adds a debug level assertion checking that state associated
with streams is successfully released. This is done by ensuring that no
stream state remains when the connection inner state is dropped. This
does not happen until all handles to the connection drops, which should
result in all streams being dropped.

This also pulls in a fix for one bug that is exposed with this change.
The fix is first proposed as part of #261.
@carllerche

This comment has been minimized.

Owner

carllerche commented May 6, 2018

Ok! I finally got an assertion that fails and is fixed by adding transition_after where you have added it.

I pulled in that specific change as part of 44c9005.

Once I'm done with all that work, this PR can be rebased on top of that.

carllerche added a commit that referenced this pull request May 9, 2018

Misc bug fixes related to stream state (#273)
This patch includes two new significant debug assertions:

* Assert stream counts are zero when the connection finalizes.
* Assert all stream state has been released when the connection is 
  dropped.

These two assertions were added in an effort to test the fix provided
by #261. In doing so, many related bugs have been discovered and fixed.
The details related to these bugs can be found in #273.
@carllerche

This comment has been minimized.

Owner

carllerche commented May 9, 2018

Alright, I have rebased this change.

@carllerche

This patch fixes a bug that is only exposed with a particular sequence of write calls returning Ready vs. NotReady.

Fuzzing catches this but there is no explicit test that does.

@carllerche carllerche merged commit 571bb14 into carllerche:master May 10, 2018

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment