Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

str0m locally nominates host candidate, remote peer nominates srflx candidate #525

Closed
thomaseizinger opened this issue Jun 10, 2024 · 19 comments

Comments

@thomaseizinger
Copy link
Collaborator

We've been running into a bug caused by invalidating remote candidates. I think it might be worth investigating but I also want to say that I've changed my mind on how I want roaming to work and we will adopt ICE restarts soon.

It might also not be a bug but it is somewhat unexpected behaviour: A remote and local agent have a different view on which candidates are nominated when it comes to host vs srflx candidates.

A STUN binding sent from a host candidate will arrive at a remote's srflx candidate and produce a response returning back the observed address. When processing the response, str0m will use the transaction ID to find the original candidate. Here is where the state disparity starts:

  1. The remote thinks it is talking to our srflx candidate.
  2. We think we are using our host candidate.

As long as candidates don't get invalidated, this is somewhat wrong but works fine because a srflx candidate's base is a host candidate's address. Once I send all my "unused" candidates to the other party to invalidate them, the connection will break because I send my srflx candidate which the other party nominated.

@thomaseizinger
Copy link
Collaborator Author

I think the tricky part here is working out, what we want to happen. Matching the transaction IDs for the response is obviously what we need to do. We also already record the candidate which we received a response on. It is stored in valid_idx but this index isn't used for anything.

Should we be nominating that one instead of the candidate we originally sent from?

@algesten
Copy link
Owner

The way to order the candidates (calculating prio) is determined by the spec. That prio order should be enough to get connectivity. I can see how the invalidate remote candidate call can upset that in the situation you describe.

If you are not continuing the invalidate remote route, I would rather drop the API than try to fix it. Does that make sense?

@thomaseizinger
Copy link
Collaborator Author

If you are not continuing the invalidate remote route, I would rather drop the API than try to fix it. Does that make sense?

Yeah I can understand that. Still, I think there is a bug currently that might surface again in other situations. str0m thinks it is sending via the host candidate when in reality, it is connected via its server-reflexive candidate.

@thomaseizinger
Copy link
Collaborator Author

str0m thinks it is sending via the host candidate when in reality, it is connected via its server-reflexive candidate.

In particular, the PairId that gets nominated will point to the host candidate but has its valid_idx set to the server-reflexive candidate. That valid_idx is never used for anything which appears to be a bit dodgy too because the comment referenced some "valid candidate" logic that I can't find anywhere.

@algesten
Copy link
Owner

Googling around a bit, I think the behavior is correct. The spec says https://www.rfc-editor.org/rfc/rfc5245#section-5.7.3

Since an agent cannot send requests directly from a reflexive
candidate, but only from its base, the agent next goes through the
sorted list of candidate pairs. For each pair where the local
candidate is server reflexive, the server reflexive candidate MUST be
replaced by its base.

In some respects I don't think server reflexive candidates should be added to the local ICE agent at all. It should only know about the host candidates.

Maybe we split off the valid_idx issue separate?

@thomaseizinger
Copy link
Collaborator Author

Since an agent cannot send requests directly from a reflexive
candidate, but only from its base, the agent next goes through the
sorted list of candidate pairs. For each pair where the local
candidate is server reflexive, the server reflexive candidate MUST be
replaced by its base.

In some respects I don't think server reflexive candidates should be added to the local ICE agent at all. It should only know about the host candidates.

But instead only send them as a candidate to the remote peer?

@algesten
Copy link
Owner

But instead only send them as a candidate to the remote peer?

Yes!

@thomaseizinger
Copy link
Collaborator Author

But instead only send them as a candidate to the remote peer?

Yes!

I hadn't really considered this but I think this could be a solution too, let me try.

If I never add the server-reflexive candidate locally, only ever the host candidate can be nominated so it would be unambiguous.

@algesten
Copy link
Owner

It probably works.

I think with WebRTC we are adding them locally as well, however the treatment of them wrt the spec is at least directionally correct. Though we are not doing the literal "replace by its base" – could be some inefficiency here.

@thomaseizinger
Copy link
Collaborator Author

In some respects I don't think server reflexive candidates should be added to the local ICE agent at all.

Should the agent reject adding those as local candidates? Or do you think that goes too far?

@algesten
Copy link
Owner

I was just pondering that.

I wonder if there is a case when only adding server reflexive and no host candidates. I think on balance we should allow it. Looking at form_pairs, it seems we at least attempt to avoid making redundant pairs if the bases are equivalent.

@thomaseizinger
Copy link
Collaborator Author

thomaseizinger commented Jun 10, 2024

I wonder if there is a case when only adding server reflexive and no host candidates.

Did you ever consider an API where candidates are created on an agent? Like:

impl IceAgent {
	fn new_host_candidate(&mut self, addr: SocketAddr, protocol: Protocol) -> Candidate;
	fn new_relay_candidate(&mut self, addr: SocketAddr, protocol: Protocol) -> Candidate;
	fn new_server_reflexive_candidate(&mut self, addr: SocketAddr, base: SocketAddr, protocol: Protocol) -> Candidate;
}

That would give us more control over, where they are stored internally and for example, we could emit a warning if somebody adds a server-reflexive candidate without also adding a host candidate for the same base.

@algesten
Copy link
Owner

I guess I never thought much of the ICE agent in isolation from the RTC stuff. Rtc::add_local_candidate() vs Rtc::add_remote_candidate().

@algesten
Copy link
Owner

Maybe rather than invalidate_candidate you want invalidate_base?

@thomaseizinger
Copy link
Collaborator Author

Maybe rather than invalidate_candidate you want invalidate_base?

I'll try and implement ICE restarts first and see where that leads us!

@thomaseizinger
Copy link
Collaborator Author

Closing this as a non-issue. Should we add a warning to add_local_candidate if we encounter a server-reflexive candidate?

@thomaseizinger thomaseizinger closed this as not planned Won't fix, can't repro, duplicate, stale Jun 11, 2024
@algesten
Copy link
Owner

Should we add a warning to add_local_candidate if we encounter a server-reflexive candidate?

I think the current behavior is correct. For the Rtc case, it would be quite common to throw in any local candidate. This code is where we ensure that we don't form redundant pairs. If you add both the host and server-reflexive, you'll end up with only the higher prio as a pair.

If you add one, start communicating, and later add the other, the nominated pair swaps out and it still works because the abstraction isn't telling you which exact candidate the agent is using but rather instructions on how to communicate ("send from this address to that address").

I'd like to document this better. Given the journey you guys been on, there clearly is something missing in communicating how the agent works. Maybe it's just better doc on the NominatedSend, but could also be on the whole module explaining the operation.

@algesten
Copy link
Owner

Also. There has been a lot of rediscovering my own thoughts here. I simply don't recall how I arrived where it is today. Getting old I suppose :(

@thomaseizinger
Copy link
Collaborator Author

I'd like to document this better. Given the journey you guys been on, there clearly is something missing in communicating how the agent works. Maybe it's just better doc on the NominatedSend, but could also be on the whole module explaining the operation.

Some of this is definitely my fault in just trying to use IceAgent and not learning about ICE properly first :)
For example, the whole idea of invalidating selective candidates was a pretty wild-goose chase :D
I thought I can be clever in not needing to do ICE restart but in reality, there are so many combinations of network changes that are way too hard to cover if you try to selectively "patch" your local state. Ultimately, you need to start gathering new candidates and do the selection process all over again to get a working connection.

Some peer-to-peer ICE docs or an example probably wouldn't hurt though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants