Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[h2] scheduler: fix stream starvation #204

Merged
merged 3 commits into from Feb 4, 2023
Merged

[h2] scheduler: fix stream starvation #204

merged 3 commits into from Feb 4, 2023

Conversation

anmonteiro
Copy link
Owner

#199 + a few fixes

lib/scheduler.ml Outdated
* won't be flushed until a few write operations later. *)
then (
update_t_last p_node i.t;
update_t i_node written;
if subtree_is_active
Copy link
Owner Author

@anmonteiro anmonteiro Feb 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this case https://github.com/anmonteiro/ocaml-h2/pull/199/files#r1093672918 is fixed here: we functionally remove + add the i_node back to the children of the node being traversed.

lib/scheduler.ml Outdated
Stream
{ descriptor
; t_last = 0
; t = 0
; t_last
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm now doubting that my comment here is wrong.

@quernd why are new streams created with the parent's t_last?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverting this doesn't seem to have an effect on the failing example from #162.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was my reasoning, maybe I'm misunderstanding something:

  • If we set t = 0 then younger streams will have a lower priority on creation than older streams which means that they get an unfair extra "budget" of data. So they should have t set to their parent's t_last
  • For that reason, t_last should be inherited from the parent and not set to 0 in case new streams are added before t_last was ever updated (if that's possible)

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I agree for t. But t_last only matters if this stream has children under them. So that one should start at 0, I think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you're right for t_last. Priorities are not global so they should start at 0.

@anmonteiro
Copy link
Owner Author

anmonteiro commented Feb 2, 2023

There's still something missing, which is updating the t value recursively for parents, if a subtree has just written.

kit-ty-kate added a commit to kit-ty-kate/exn.st that referenced this pull request Feb 2, 2023
@kit-ty-kate
Copy link

kit-ty-kate commented Feb 2, 2023

I’ve tried this branch on my project where Firefox (and Firefox only) randomly infinit loops when loading a webpage using h2, but it doesn’t seem to have fixed that issue sadly. Should i try the original #199 PR instead?

else
(* If we haven't written anything, check if any of our children
have. *)
let written, subtree_is_active' = traverse p_node in
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the depth-first traversal of the node's children

written, subtree_is_active)
and traverse : type a. a node -> int * bool =
fun p_node ->
let rec loop remaining_children =
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

loop traverses a level of the tree to find data to send.

@quernd
Copy link
Contributor

quernd commented Feb 3, 2023

I’ve tried this branch on my project where Firefox (and Firefox only) randomly infinit loops when loading a webpage using h2, but it doesn’t seem to have fixed that issue sadly. Should i try the original #199 PR instead?

From what I can tell, both PRs intend to solve the same issue, but this one is more correct. I'd be surprised if the original PR fixes your issue while this one doesn't, but feel free to give it a try. It's a bit unfortunate your issue occurs randomly.

My specific use case that this fixes is many streams multiplexed over the same connection, is that also your setup? If not, maybe it's a different issue.

EDIT: I was not in the loop :) @anmonteiro informed me that stacking #204 and #205 solves your issue.

The issue was that in some situations, the scheduler would not flush
stream B if there was a stream A with higher priority even if stream
A had no data to send. This would not be an issue if streams were
constantly sending data, but if all streams stopped sending, frames
might be stuck in the scheduler.

The fix to this is to traverse the children until a stream that needs
to send data is found.

Also fixes a possibly broken test. After the changes to the scheduler,
a frame that wasn't flushed before is now flushed.
Update priority queue after updating `t` value

refactor: use `update_children` instead of `remove_child`

we already know what the children are after removing, this should be a
bit faster

fix: update the children of original queue being traversed

fix: update t_last on every pop, update_t only if we haven't thrown the
node away

fix: traverse children depth-wise too

fix: update_t when first pushing a stream

refactor: clean up schedule / traverse functions

fix: set initial `t` value after reprioritizing

otherwise it'd get `t_last` from the wrong parent
@anmonteiro
Copy link
Owner Author

@quernd thanks for the help on this one

@anmonteiro anmonteiro merged commit fceaad2 into master Feb 4, 2023
@anmonteiro anmonteiro deleted the fix-scheduling branch February 4, 2023 07:51
anmonteiro added a commit to anmonteiro/opam-repository that referenced this pull request Mar 17, 2023
…2-async (0.10.0)

CHANGES:

- hpack: fix a case where hpack would raise an array out of bounds exception
  ([anmonteiro/ocaml-h2#183](anmonteiro/ocaml-h2#183))
  ([@jonathanjameswatson](https://github.com/jonathanjameswatson))
- h2: (client) handle multiple RST_STREAM frames
  ([anmonteiro/ocaml-h2#184](anmonteiro/ocaml-h2#184))
  ([@jonathanjameswatson](https://github.com/jonathanjameswatson))
- h2: (client) Fix a race condition with `~flush_headers_immediately:false` and
  empty request bodies
  ([anmonteiro/ocaml-h2#186](anmonteiro/ocaml-h2#186))
- h2: Make `H2.Reqd.error_code` part of the public interface
  ([anmonteiro/ocaml-h2#188](anmonteiro/ocaml-h2#188))
- h2: Add `~request_method` argument to `H2.Method.body_length`
  ([anmonteiro/ocaml-h2#190](anmonteiro/ocaml-h2#190))
  ([@jonathanjameswatson](https://github.com/jonathanjameswatson))
- h2: Don't send any frames on a stream after an `RST_STREAM` frame
  ([anmonteiro/ocaml-h2#187](anmonteiro/ocaml-h2#187),
  [anmonteiro/ocaml-h2#194](anmonteiro/ocaml-h2#194))
- h2: call error handler on the client if the remote peer closes the
  commmunication channel
  ([anmonteiro/ocaml-h2#177](anmonteiro/ocaml-h2#177),
  [anmonteiro/ocaml-h2#196](anmonteiro/ocaml-h2#194))
- h2: when reprioritizing a stream, respect its new priority (accounts for
  inferred default priority when a dependent stream is not in the tree
  ([RFC7540§5.3.1](https://www.rfc-editor.org/rfc/rfc7540.html#section-5.3.1)))
  ([anmonteiro/ocaml-h2#200](anmonteiro/ocaml-h2#200))
- h2: don't remove parent streams from the scheduler if they have children
  ([anmonteiro/ocaml-h2#201](anmonteiro/ocaml-h2#201))
- h2: don't schedule streams as dependencies of others marked for removal
  ([anmonteiro/ocaml-h2#205](anmonteiro/ocaml-h2#205))
- h2: revise scheduling algorithm to avoid starvation
  ([anmonteiro/ocaml-h2#199](anmonteiro/ocaml-h2#199),
  [anmonteiro/ocaml-h2#204](anmonteiro/ocaml-h2#204), reported in
  [anmonteiro/ocaml-h2#162](anmonteiro/ocaml-h2#162), thanks
  [@quernd](https://github.com/quernd))
- h2-eio: adapt to the next gluten-eio version
  ([anmonteiro/ocaml-h2#210](anmonteiro/ocaml-h2#210))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants