Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stale DTLS handshake / earlyStopRetransmission = true #335

Closed
boaks opened this issue Jun 23, 2017 · 9 comments
Closed

stale DTLS handshake / earlyStopRetransmission = true #335

boaks opened this issue Jun 23, 2017 · 9 comments
Labels

Comments

@boaks
Copy link
Contributor

boaks commented Jun 23, 2017

What happends with handshakes, where the client gets broken within a "flight" (doesn't send data anymore)?
With earlyStopRetransmission = false, the flight retransmission counter of the server overflows and the handshake then fails. With earlyStopRetransmission = true I'm not sure, what happens.
Is there a unit tests for that case?

@boaks boaks added the question label Jun 23, 2017
@sbernard31
Copy link
Contributor

There is a MaxRetransmission, what do you mean by retransmission "counter of the server overflows" ?

@boaks
Copy link
Contributor Author

boaks commented Jun 23, 2017

With earlyStopRetransmission = true and a client, which only sent the 1. message of a flight and then dies, this MaxRetransmission counter doesn't work, or? It works with earlyStopRetransmission = false.
Yesterday I started to report the handshake errors (see #334) an I think, I didn't catch all errors.

@boaks
Copy link
Contributor Author

boaks commented Jun 23, 2017

There is a MaxRetransmission, what do you mean by retransmission "counter of the server overflows" ?

I ment, when the retransmission counter reaches this maximum. This works for a lot of cases, but with earlyStopRetransmission = true and a client the gets broken within a flight, I have my doubts about this mechansims is still working.

@sbernard31
Copy link
Contributor

With earlyStopRetransmission = true and a client, which only sent the 1. message of a flight and then dies, this MaxRetransmission counter doesn't work.

I think so. And it seems ok for me. Retransmission aim to solve packet lost. In this case we don't face a packet lost.

You seems to say that with earlyStopRetransmission = false, if max retransmission is reached, an handshake error is raised ? I didn't see that in the code. It seems to me we just stop retransmission.

Anyway, I'm not sure we want to do that ?

@boaks
Copy link
Contributor Author

boaks commented Jun 23, 2017

Anyway, I'm not sure we want to do that ?

I think, that any unseccessfull outcome of the handshake should be somehow reported. I'm currently collecting the cases.
On the described case "earlyStopRetransmission = true and broken within flight" currently results in a "stale ongoing handshake". And I think, there should be timeout for such "stale ongoing handshakes", which then could be reported as error.

@sbernard31
Copy link
Contributor

In practice, the "stale ongoing handshake" will stay in the LRU cache (ConnectionStore) and will be removed when the LRU cache will be full. I think there is no memory leak.
Ideally, we could remove this kind of incomplete handshake before but the spirit of our LRU cache is to keep it full most of the time.

I think there is 2 uses cases :

  1. stale ongoing handshake at server side (client gets broken within a "flight"), in that case :
  • we have no behavior issue, as explained above.
  • we have no log about that, maybe just adding a EvictionListener and log if the handshake is incomplete is enough ?
  1. stale ongoing handshake at client side (server gets broken within a "flight"), in that case the handshake will never end, the application data will never be sent and the upper layer will never know, I'm right ?

@boaks
Copy link
Contributor Author

boaks commented Jun 27, 2017

The resulting issue is in my opinion in the coap layer. Without error nor sent callback, the exchange is not cleaned up.

@sbernard31
Copy link
Contributor

sbernard31 commented Jun 27, 2017

We agree, that's more or less what I suspected in 2).

This reminds me an old discussion #74.

The choice was : "this is up to the application layer to cancel request if they does not receive response in time".
But rethinking about that this could be problematic with notify and "magical blockwise".

@boaks
Copy link
Contributor Author

boaks commented Jun 28, 2017

As @sbernard31 is right, that in both cases the handshake just gets stale without signalling an error, I close this issue to create a more general issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants