add unit test for a client which crash during an handshake #60

sbernard31 · 2015-11-27T15:01:49Z

We have some issue with the last version of scandium when we try to start a fresh new handshake while the first one is still ongoing. (This could happen if the device crash during the handshake then restart a new handshake on the same IP)
In this case, the client will not be able to rehandshake as long as the session in stored.

I just create the test case corresponding to the issue, this will allow to start discussion.
This is the current exchange :

 ---> CLIENT_HELLO ( no cookie + PSK)
<--- HELLO_VERY_REQUEST (cookie)
---> CLIENT_HELLO (cookie +  PSK)
<-- SERVER_HELLO ( PSK), SERVER HELLO DONE  
***** crash / restart *****
---> CLIENT_HELLO (no cookie +RPK)
// Discarded by the server handshaker duplicate message
<--- retransmission SERVER_HELLO ( PSK), SERVER HELLO DONE  
// X NPE at client side (field pskStore is null at ClientHandshaker.java:499) 
etc etc  ...

There is several issue :

The pskStore NPE.
A CLIENT_HELLO with no cookie is handle by the handshaker, this expose us to IP spoofing attack during handshake.
We discard the new CLIENT_HELLO, so we are unable to start a fresh new handshake.

Possible solution :

test if pskStore is null or always create an empty pskStore.
always send HELLO_VERY_REQUEST if we receive a CLIENT_HELLO without cookie (even if an hanshake is ongoing)
The hardest one ... Does we restart a fresh new handshake earh time we receive a CLIENT_HELLO with cookie ? If we do that we could restart handshake because of duplicated/badly ordered message? I don't know if we could do something better ?

(Do not merger this one, while we don't provide fix for this issue)

sophokles73 · 2015-11-27T15:45:10Z

Hey @sbernard31,

couldn't you have brought this up last week, i.e. before we released 1.0.0? ;-)

My thoughts:
2) I thought we were already doing what you propose

always send HELLO_VERY_REQUEST if we receive a CLIENT_HELLO without cookie (even if an hanshake is ongoing)

If not, this seems to be a regression we introduced with the improved session resumption behavior :-(

I think the pskStore NPE is not much of an issue since it is a follow-up problem of the Handshaker erroneously handling a CLIENT_HELLO w/o cookie.
Is this a real world problem or just of an academic nature? In your example the client seems to have changed its cipher preference from PSK to RPK after the crash. Why would it do so in the first place? If, however, this is a situation we need to be able to handle, I think we need to do better duplicate detection, i.e. if the client sends another CLIENT_HELLO while the ServerHandshaker is still valid on the server, then we will need to compare the raw bytes of the new CLIENT_HELLO with the bytes from the originally received bytes (which we keep in the handshaker). We could then discover whether it is a real duplicate or the start of a new handshake ...

sbernard31 · 2015-11-27T16:45:08Z

bad timing I say that to myself when I see your mail this morning. :p

I just take a look at the code and it seems the code which is responsible of this behavior was present long before the session resumption behavior introduction. (see processHandshakeRecord)

     Connection connection = connectionStore.get(peerAddress);
     if (connection != null && connection.getOngoingHandshake() != null) {
         // we are already in an ongoing handshake
         // simply delegate the processing of the record to the handshaker
         flight = connection.getOngoingHandshake().processMessage(record);
     } else { ....

You're right this is a strange use case but this shows that we are not really bullet proof. The good behavoir should be to send a ALERT message to the server to stop the handshake at we can not handle this kind of SERVER_HELLO. Maybe we could catch any Exception raise in DTLSConnector.processHandshakeRecord and do a terminateOngoingHandshake.
Changing from PSK to RPK was not the real use case, I want to check (when I write the test I don't do that intentionally :p, but as it show an issue, I let it). In fact, I just want to test a crash on an ongoing handshake. I made the test without changing the cipher suite and there is another behavior but this doesn't works too.

jvermillard · 2015-11-27T17:01:07Z

we found those issues testing eclipse-wakaama/wakaama#61
I reproduced several time this issue: some packet loss made scandium session stuck.
The only way to make it reworking was to restart the server.
Also tinydtls is not behaving so well and also suffer issues on some handshake message loss.

Also for restarting an handshake with a new credential type, it's something unusual but we can have real life while migrating device from one security mode to another one: we can be suffering reboot and rollback (for example our devices have automatic upgrade rollback if something wrong happen like multiple reboot or user app freaking out which can happen easly..).

sbernard31 · 2015-11-27T17:50:00Z

the commit above fix the 1) issue.

sophokles73 · 2015-11-30T08:49:54Z

@sbernard31 ,

you are right, I remember that I stumbled about this code section several weeks ago, wondering whether this is the right way to handle an incoming Handshake message. I was playing around with it for a while but couldn't get it right at the time being :-( If only I had been a little more persistent ...
Anyways, now we have a good reason to finally get this solved ... I have checked out the handshake_failed branch and will think about it today. I'll keep you posted ...

sbernard31 · 2015-11-30T13:47:50Z

On second thought, the 2) issue is maybe not a real one. Because if some body is able to do IP spoofing during handshake, it will be able to abort the current handshake with something else than a CLIENT_HELLO. So this is not so important.

sbernard31 · 2015-11-30T17:35:26Z

Ok I propose a fix for this one. I use the random field of CLIENT_HELLO to know if this is retransmission or a new one. (This should fix 2 and 3)

sophokles73 · 2015-12-01T10:43:03Z

I have re-factored the code so that all test cases now succeed (incl. the one added by @sbernard31) to verify that a client start over with a new handshake after crashing ...

Let me split up the changes into multiple commits in order to make them easier to swallow and create another pull request for you to review, ok?

@sbernard31: since this has now become a noteworthy issue/fix, would you mind opening a bugzilla issue for it as well which I can refer to in the commit messages of the PR?

sbernard31 · 2015-12-01T11:14:01Z

The bugzilla issue is opened : https://bugs.eclipse.org/bugs/show_bug.cgi?id=483371

sophokles73 · 2015-12-01T11:39:38Z

Thanks, Simon :-)

sophokles73 · 2015-12-01T14:37:05Z

I have created PR #61 containing Simon's unit test and the fix. Please review and test if this works for you ...

sophokles73 · 2016-01-15T15:25:38Z

This has been merged as part of #61

add unit test for a client which crash during an handshake

a9aa2ce

handling unexpected exception during handshake

5c037f5

handle new CLIENT_HELLO while handshaking

d4b0949

sophokles73 closed this Jan 15, 2016

sophokles73 deleted the hanshake_failed branch January 15, 2016 15:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add unit test for a client which crash during an handshake #60

add unit test for a client which crash during an handshake #60

sbernard31 commented Nov 27, 2015

sophokles73 commented Nov 27, 2015

sbernard31 commented Nov 27, 2015

jvermillard commented Nov 27, 2015

sbernard31 commented Nov 27, 2015

sophokles73 commented Nov 30, 2015

sbernard31 commented Nov 30, 2015

sbernard31 commented Nov 30, 2015

sophokles73 commented Dec 1, 2015

sbernard31 commented Dec 1, 2015

sophokles73 commented Dec 1, 2015

sophokles73 commented Dec 1, 2015

sophokles73 commented Jan 15, 2016

add unit test for a client which crash during an handshake #60

add unit test for a client which crash during an handshake #60

Conversation

sbernard31 commented Nov 27, 2015

sophokles73 commented Nov 27, 2015

sbernard31 commented Nov 27, 2015

jvermillard commented Nov 27, 2015

sbernard31 commented Nov 27, 2015

sophokles73 commented Nov 30, 2015

sbernard31 commented Nov 30, 2015

sbernard31 commented Nov 30, 2015

sophokles73 commented Dec 1, 2015

sbernard31 commented Dec 1, 2015

sophokles73 commented Dec 1, 2015

sophokles73 commented Dec 1, 2015

sophokles73 commented Jan 15, 2016