Skip to content
This repository has been archived by the owner on Aug 30, 2022. It is now read-only.

add unit test for a client which crash during an handshake #60

Closed
wants to merge 3 commits into from

Conversation

sbernard31
Copy link
Contributor

We have some issue with the last version of scandium when we try to start a fresh new handshake while the first one is still ongoing. (This could happen if the device crash during the handshake then restart a new handshake on the same IP)
In this case, the client will not be able to rehandshake as long as the session in stored.

I just create the test case corresponding to the issue, this will allow to start discussion.
This is the current exchange :

 ---> CLIENT_HELLO ( no cookie + PSK)
<--- HELLO_VERY_REQUEST (cookie)
---> CLIENT_HELLO (cookie +  PSK)
<-- SERVER_HELLO ( PSK), SERVER HELLO DONE  
***** crash / restart *****
---> CLIENT_HELLO (no cookie +RPK)
// Discarded by the server handshaker duplicate message
<--- retransmission SERVER_HELLO ( PSK), SERVER HELLO DONE  
// X NPE at client side (field pskStore is null at ClientHandshaker.java:499) 
etc etc  ...

There is several issue :

  1. The pskStore NPE.
  2. A CLIENT_HELLO with no cookie is handle by the handshaker, this expose us to IP spoofing attack during handshake.
  3. We discard the new CLIENT_HELLO, so we are unable to start a fresh new handshake.

Possible solution :

  1. test if pskStore is null or always create an empty pskStore.
  2. always send HELLO_VERY_REQUEST if we receive a CLIENT_HELLO without cookie (even if an hanshake is ongoing)
  3. The hardest one ... Does we restart a fresh new handshake earh time we receive a CLIENT_HELLO with cookie ? If we do that we could restart handshake because of duplicated/badly ordered message? I don't know if we could do something better ?

(Do not merger this one, while we don't provide fix for this issue)

@sophokles73
Copy link
Contributor

Hey @sbernard31,

couldn't you have brought this up last week, i.e. before we released 1.0.0? ;-)

My thoughts:
2) I thought we were already doing what you propose

  1. always send HELLO_VERY_REQUEST if we receive a CLIENT_HELLO without cookie (even if an hanshake is ongoing)

If not, this seems to be a regression we introduced with the improved session resumption behavior :-(

  1. I think the pskStore NPE is not much of an issue since it is a follow-up problem of the Handshaker erroneously handling a CLIENT_HELLO w/o cookie.
  2. Is this a real world problem or just of an academic nature? In your example the client seems to have changed its cipher preference from PSK to RPK after the crash. Why would it do so in the first place? If, however, this is a situation we need to be able to handle, I think we need to do better duplicate detection, i.e. if the client sends another CLIENT_HELLO while the ServerHandshaker is still valid on the server, then we will need to compare the raw bytes of the new CLIENT_HELLO with the bytes from the originally received bytes (which we keep in the handshaker). We could then discover whether it is a real duplicate or the start of a new handshake ...

@sbernard31
Copy link
Contributor Author

bad timing I say that to myself when I see your mail this morning. :p

  1. I just take a look at the code and it seems the code which is responsible of this behavior was present long before the session resumption behavior introduction. (see processHandshakeRecord)
     Connection connection = connectionStore.get(peerAddress);
     if (connection != null && connection.getOngoingHandshake() != null) {
         // we are already in an ongoing handshake
         // simply delegate the processing of the record to the handshaker
         flight = connection.getOngoingHandshake().processMessage(record);
     } else { ....
  1. You're right this is a strange use case but this shows that we are not really bullet proof. The good behavoir should be to send a ALERT message to the server to stop the handshake at we can not handle this kind of SERVER_HELLO. Maybe we could catch any Exception raise in DTLSConnector.processHandshakeRecord and do a terminateOngoingHandshake.

  2. Changing from PSK to RPK was not the real use case, I want to check (when I write the test I don't do that intentionally :p, but as it show an issue, I let it). In fact, I just want to test a crash on an ongoing handshake. I made the test without changing the cipher suite and there is another behavior but this doesn't works too.

@jvermillard
Copy link
Contributor

we found those issues testing eclipse-wakaama/wakaama#61
I reproduced several time this issue: some packet loss made scandium session stuck.
The only way to make it reworking was to restart the server.
Also tinydtls is not behaving so well and also suffer issues on some handshake message loss.

Also for restarting an handshake with a new credential type, it's something unusual but we can have real life while migrating device from one security mode to another one: we can be suffering reboot and rollback (for example our devices have automatic upgrade rollback if something wrong happen like multiple reboot or user app freaking out which can happen easly..).

@sbernard31
Copy link
Contributor Author

the commit above fix the 1) issue.

@sophokles73
Copy link
Contributor

@sbernard31 ,

you are right, I remember that I stumbled about this code section several weeks ago, wondering whether this is the right way to handle an incoming Handshake message. I was playing around with it for a while but couldn't get it right at the time being :-( If only I had been a little more persistent ...
Anyways, now we have a good reason to finally get this solved ... I have checked out the handshake_failed branch and will think about it today. I'll keep you posted ...

@sbernard31
Copy link
Contributor Author

On second thought, the 2) issue is maybe not a real one. Because if some body is able to do IP spoofing during handshake, it will be able to abort the current handshake with something else than a CLIENT_HELLO. So this is not so important.

@sbernard31
Copy link
Contributor Author

Ok I propose a fix for this one. I use the random field of CLIENT_HELLO to know if this is retransmission or a new one. (This should fix 2 and 3)

@sophokles73
Copy link
Contributor

I have re-factored the code so that all test cases now succeed (incl. the one added by @sbernard31) to verify that a client start over with a new handshake after crashing ...

Let me split up the changes into multiple commits in order to make them easier to swallow and create another pull request for you to review, ok?

@sbernard31: since this has now become a noteworthy issue/fix, would you mind opening a bugzilla issue for it as well which I can refer to in the commit messages of the PR?

@sbernard31
Copy link
Contributor Author

The bugzilla issue is opened : https://bugs.eclipse.org/bugs/show_bug.cgi?id=483371

@sophokles73
Copy link
Contributor

Thanks, Simon :-)

@sophokles73
Copy link
Contributor

I have created PR #61 containing Simon's unit test and the fix. Please review and test if this works for you ...

@sophokles73
Copy link
Contributor

This has been merged as part of #61

@sophokles73 sophokles73 deleted the hanshake_failed branch January 15, 2016 15:27
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants