-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce frequency of keep-alives on non-nominated pairs #464
Comments
If we'd ignore trickle ICE, then the spec would actually say to stop sending keep alives on all but the nominated candidate pair: https://datatracker.ietf.org/doc/html/rfc8445#section-8.3 |
I think what I would want is a behaviour such as "move to completed if we haven't received a new candidate in a while" but still allow new candidates to be added in case they come through. I am not sure if time is the best trigger for this condition. Perhaps what we need is an implementation of the "end of candidates" indication, plus triggering an ICE restart if we learn a new candidate after we have sent "end of candidates"? |
I think I figured out something that solves this for now. Upon every nomination, I'll invalidate all other candidates with less or eq the nominated priority. This means we can still find a better pair but we won't keep around others. This removes the ability of a "fallback" in case the current one stops working but in that case, we just discard the connection and form a new one (kind of like an ICE restart). |
Yeah. So for WebRTC the goal is to maximize the chances of connectivity. The frequency of STUN requests to the fallbacks could potentially be reduced, but it's not right to remove them altogether.
This is something we have a deliberate stance on: https://github.com/algesten/str0m/blob/main/docs/ice.md#nomination-doesnt-stop-gathering And also see: https://github.com/algesten/str0m/blob/main/docs/ice.md#why-aend-of-candidates Ultimately it seems your application is less concerned with connectivity and more about not being too noisy. That's slightly at odds with the goals of str0m. https://datatracker.ietf.org/doc/html/rfc5245#section-5.7.3
If we made this runtime configurable, you could potentially lower the limit to 1-2 after successful connect. |
I imagine many applications will run into that. We "only" generate a handful of candidates (8-10 per peer) and if most of them are active, str0m will generate multiple megabytes of traffic per minute just for keep-alives. On mobile devices, that is not exactly ideal. The quoted section sounds interesting, I think that may be another good angle to solve this. We would also need to be somewhat smart about which ones are dropped. Also, it would be important that this only happens once we've nominated a pair. |
Did you measure this? |
That is not quite how I'd put it. Connectivity is super important actually but once we've found a pair, we'd rather have that connection fail and make a new one via the signaling layer instead of falling back :) It appears to me that this is a trade-off that most apps will want to make at some point: Have lots of candidates first to maximize connectivity, then prune them to save bandwidth, battery and reduce noise. |
Yeah. Sorry. Clumsy way of putting it.
Noise apart, this presupposes there is a lot of bandwidth being uses here. My gut feeling is that it can't be that much – hence I asked whether you measured it. |
We (non-scientifically) measured the total data usage of the Android app when it was just idling for about 5 minutes, at which point the only thing that should be happening is keep-alives, plus some book-keeping with the TURN servers which admittedly we didn't exclude from that measurement. I'd have to go back and actually tally up all the binding requests to see how much it actually is if I remove all other traffic! Multiple megabytes might have been a bit of stretch :) When we turned It shouldn't be too difficult to implement some counters that sum up all traffic generated by our own TURN client and str0m. I'll implement that tomorrow and see what numbers we get. |
Excellent! Thanks! I love to get some hard data on that. |
Here are some early logs:
This is already with some of the optimisations applied that we discussed, i.e. invalidating a lot of candidate pairs. Something is still off because there are still more than 1 pair being tested. The above logs are with 5 pairs I think. |
The above comes down to 80kb a minute for keep-alives. During that time, we are sending 28 unique messages, (14 replies & 14 responses), so I think in total, it means we are keeping 14 candidate pairs alive. If my math doesn't fail me, that is ~ 6kb per pair per minute. |
This sounds within expected parameters. It will further be improved by having a resolution to #490 – where the user can reduce the frequency if they so wishes. Notice that 6kb per pair/minute is nothing for the WebRTC use case. |
Can we close this in favor of #490? |
Yeah we resolved this by invalidating all other candidate pairs :) |
Currently, I believe
str0m
sends keep-alives on all valid candidate pairs at the same rate. Sending frequent keep-alives is useful to detect network partitions. However, I think that is only relevant for the currently nominated pair.Could we reduce the keep-alives on other pairs to something like 5 or 10 seconds?
The text was updated successfully, but these errors were encountered: