Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect ping/pong flow-control abilities properly #9642

Closed

Conversation

stefwalter
Copy link
Contributor

There is a rare situation, where cockpit-ws is of an older version
than a cockpit-bridge/cockpit.js pair (always shipped in same package).

In this situation, if flow control is used for a data heavy sideband
channel, there is the likelyhood that the channel will slow and
hang due to flow control. This is because cockpit-ws is not new
enough to know to respond to "ping" messages with "pong" messages.

To combat this, we expect to first see a "ping" message in a channel
before throttling senders for that channel. To facilitate this we
send a "ping" message as one of the first control messages in a
channel.

We still follow the general mechanism of sending a "ping" message
every N sequence bytes ... it's just now that we also do that when
N == 0.

This is a very small code change, with a very big change to the tests.
In order to facilitate backporting these patches I have stayed away
from refactoring the tests.

{
self->priv->out_sequence = 0;
self->priv->out_window = CHANNEL_FLOW_WINDOW;
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, this change is unrelated.

@allisonkarlitskaya
Copy link
Member

I don't understand what's going on here. Shouldn't we rather send a ping at N==0 and expect to see the first pong in response, before applying backpressure? Or do we expect both sides to now start out by sending 'ping' to each other? That would certainly be a bigger change in behaviour, at least for one-directional flows.

@stefwalter
Copy link
Contributor Author

I don't understand what's going on here. Shouldn't we rather send a ping at N==0 and expect to see the first pong in response, before applying backpressure?

We apply back-pressure when the window is full.

However we turn that off completely, if we never see a channel "ping" from the other side.

Or do we expect both sides to now start out by sending 'ping' to each other?

Yes, both sides do ping each other at sequence % N == 0 as before.

@allisonkarlitskaya
Copy link
Member

And there's no such thing as a single-directional channel?

@stefwalter
Copy link
Contributor Author

And there's no such thing as a single-directional channel?

Not according to the protocol, no. Obviously many types channels send data in only or mostly in one direction, but the protocol does not declare or restrict this.

@stefwalter
Copy link
Contributor Author

I just imagine a situation where someone is using ping for diagnostic purposes (ie: not flow control) and can therefore bring down an entire channel by accident.

The only requirement that flow control places on "ping" is that they're replied to with "pong" and the same contents. I can think of no scenario where a channel would go down by accident, if pings are used for diagnostic purposes.

@stefwalter
Copy link
Contributor Author

stefwalter commented Jul 12, 2018

In addition, nothing previously sent "ping" as a "channel" control message.

But I'm happy to adapt my pull request. Do you have specific feedback on what I would change it to?

@allisonkarlitskaya
Copy link
Member

I imagine (admittedly) a pretty contrived situation where someone opens a JavaScript console or something and sends ping to debug a channel, which causes the other end to assume that this channel is now flow-control-capable on the other end.

Feel free to ignore this feedback, or we can chat about it tomorrow.

@allisonkarlitskaya
Copy link
Member

summary of IRC: It's better to do this with a new field sent as part of the open command which indicates that the opposite side will respond to pings with pongs (ie: supports flow control). That should also make the patch much smaller because it will reduce the thrashing on the tests.

@martinpitt
Copy link
Member

Might this be related to the test hang on rhel-7-5, see issue #9662? This sounds rather plausible, given that rhel-7-5 uses an old cockpit-bridge and -ws.

@stefwalter
Copy link
Contributor Author

Did fix up based on discussion with @allisonkarlitskaya

Carrying a "flow-control" option in perpetuity just for this purpose is not a beautiful solution. But it's acceptable. In addition we can change the default for this later.

But why do I see "Rebase failed" everywhere here?

Copy link
Member

@allisonkarlitskaya allisonkarlitskaya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really quite a lot nicer. Thanks!

/*
* The default here, can change from FALSE to TRUE over time once we assume that all
* cockpit-ws participants have been upgraded sufficiently. The default when we're
* on the channel creation side is to handle flow control.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably once we make this decision, we can just revert this patch, in fact. I can't imagine we'll ever have someone sending "{"flow-control": false}".

@@ -395,6 +395,8 @@ cockpit_channel_response_prepare (CockpitChannel *channel)
const gchar *payload;
JsonObject *open;

COCKPIT_CHANNEL_CLASS (cockpit_channel_response_parent_class)->prepare (channel);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add the previously-missing chain-up as a separate commit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

In the CockpitChannelResponse implementation we need to call
the base class prepare virtual function from the overridden one.
There is a rare situation, where cockpit-ws is of an older version
than a cockpit-bridge/cockpit.js pair (always shipped in same package).

In this situation, if flow control is used for a data heavy sideband
channel, there is the likelyhood that the channel will slow and
hang due to flow control. This is because cockpit-ws is not new
enough to know to respond to "ping" messages with "pong" messages.

To combat this, we expect to first see a "ping" message in a channel
before throttling senders for that channel. To facilitate this we
send a "ping" message as one of the first control messages in a
channel.

We still follow the general mechanism of sending a "ping" message
every N sequence bytes ... it's just now that we also do that when
N == 0.

This is a very small code change, with a very big change to the tests.
In order to facilitate backporting these patches I have stayed away
from refactoring the tests.
@@ -841,6 +841,7 @@ function Channel(options) {
else
delete command.binary;

command["flow-control"] = true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to understand -- the binary cockpit-bridge and thus the protocol default to "off" -- but the JavaScript API defaults to on, so that as soon as one uses a newer cockpit-system they get flow control enabled for everything. This would still work with new cockpit-system and old cockpit-bridge as unknown options get ignored. Did I understand that correctly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's correct.

Copy link
Member

@martinpitt martinpitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@allisonkarlitskaya is out for a few days, and you addressed her feedback already. Looks good to me, thanks! I adjust the commit message on merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants