New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[h2] scheduler: fix stream starvation #204
Conversation
lib/scheduler.ml
Outdated
* won't be flushed until a few write operations later. *) | ||
then ( | ||
update_t_last p_node i.t; | ||
update_t i_node written; | ||
if subtree_is_active |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this case https://github.com/anmonteiro/ocaml-h2/pull/199/files#r1093672918 is fixed here: we functionally remove + add the i_node
back to the children of the node being traversed.
lib/scheduler.ml
Outdated
Stream | ||
{ descriptor | ||
; t_last = 0 | ||
; t = 0 | ||
; t_last |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reverting this doesn't seem to have an effect on the failing example from #162.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was my reasoning, maybe I'm misunderstanding something:
- If we set
t = 0
then younger streams will have a lower priority on creation than older streams which means that they get an unfair extra "budget" of data. So they should havet
set to their parent'st_last
- For that reason,
t_last
should be inherited from the parent and not set to0
in case new streams are added beforet_last
was ever updated (if that's possible)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I agree for t
. But t_last
only matters if this stream has children under them. So that one should start at 0, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess you're right for t_last
. Priorities are not global so they should start at 0.
There's still something missing, which is updating the |
I’ve tried this branch on my project where Firefox (and Firefox only) randomly infinit loops when loading a webpage using h2, but it doesn’t seem to have fixed that issue sadly. Should i try the original #199 PR instead? |
else | ||
(* If we haven't written anything, check if any of our children | ||
have. *) | ||
let written, subtree_is_active' = traverse p_node in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the depth-first traversal of the node's children
written, subtree_is_active) | ||
and traverse : type a. a node -> int * bool = | ||
fun p_node -> | ||
let rec loop remaining_children = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
loop
traverses a level of the tree to find data to send.
EDIT: I was not in the loop :) @anmonteiro informed me that stacking #204 and #205 solves your issue. |
The issue was that in some situations, the scheduler would not flush stream B if there was a stream A with higher priority even if stream A had no data to send. This would not be an issue if streams were constantly sending data, but if all streams stopped sending, frames might be stuck in the scheduler. The fix to this is to traverse the children until a stream that needs to send data is found. Also fixes a possibly broken test. After the changes to the scheduler, a frame that wasn't flushed before is now flushed.
e43922e
to
a0d3cea
Compare
Update priority queue after updating `t` value refactor: use `update_children` instead of `remove_child` we already know what the children are after removing, this should be a bit faster fix: update the children of original queue being traversed fix: update t_last on every pop, update_t only if we haven't thrown the node away fix: traverse children depth-wise too fix: update_t when first pushing a stream refactor: clean up schedule / traverse functions fix: set initial `t` value after reprioritizing otherwise it'd get `t_last` from the wrong parent
d2ee482
to
507dfdb
Compare
@quernd thanks for the help on this one |
…2-async (0.10.0) CHANGES: - hpack: fix a case where hpack would raise an array out of bounds exception ([anmonteiro/ocaml-h2#183](anmonteiro/ocaml-h2#183)) ([@jonathanjameswatson](https://github.com/jonathanjameswatson)) - h2: (client) handle multiple RST_STREAM frames ([anmonteiro/ocaml-h2#184](anmonteiro/ocaml-h2#184)) ([@jonathanjameswatson](https://github.com/jonathanjameswatson)) - h2: (client) Fix a race condition with `~flush_headers_immediately:false` and empty request bodies ([anmonteiro/ocaml-h2#186](anmonteiro/ocaml-h2#186)) - h2: Make `H2.Reqd.error_code` part of the public interface ([anmonteiro/ocaml-h2#188](anmonteiro/ocaml-h2#188)) - h2: Add `~request_method` argument to `H2.Method.body_length` ([anmonteiro/ocaml-h2#190](anmonteiro/ocaml-h2#190)) ([@jonathanjameswatson](https://github.com/jonathanjameswatson)) - h2: Don't send any frames on a stream after an `RST_STREAM` frame ([anmonteiro/ocaml-h2#187](anmonteiro/ocaml-h2#187), [anmonteiro/ocaml-h2#194](anmonteiro/ocaml-h2#194)) - h2: call error handler on the client if the remote peer closes the commmunication channel ([anmonteiro/ocaml-h2#177](anmonteiro/ocaml-h2#177), [anmonteiro/ocaml-h2#196](anmonteiro/ocaml-h2#194)) - h2: when reprioritizing a stream, respect its new priority (accounts for inferred default priority when a dependent stream is not in the tree ([RFC7540§5.3.1](https://www.rfc-editor.org/rfc/rfc7540.html#section-5.3.1))) ([anmonteiro/ocaml-h2#200](anmonteiro/ocaml-h2#200)) - h2: don't remove parent streams from the scheduler if they have children ([anmonteiro/ocaml-h2#201](anmonteiro/ocaml-h2#201)) - h2: don't schedule streams as dependencies of others marked for removal ([anmonteiro/ocaml-h2#205](anmonteiro/ocaml-h2#205)) - h2: revise scheduling algorithm to avoid starvation ([anmonteiro/ocaml-h2#199](anmonteiro/ocaml-h2#199), [anmonteiro/ocaml-h2#204](anmonteiro/ocaml-h2#204), reported in [anmonteiro/ocaml-h2#162](anmonteiro/ocaml-h2#162), thanks [@quernd](https://github.com/quernd)) - h2-eio: adapt to the next gluten-eio version ([anmonteiro/ocaml-h2#210](anmonteiro/ocaml-h2#210))
#199 + a few fixes