Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DTLS Full Handshake failure #964

Closed
francopo opened this issue Jan 27, 2021 · 29 comments
Closed

DTLS Full Handshake failure #964

francopo opened this issue Jan 27, 2021 · 29 comments
Labels
bug Dysfunctionnal behavior client Impact LWM2M client server Impact LWM2M server

Comments

@francopo
Copy link

I'm testing client/server DTLS communication with leshan-server-demo and leshan-client-demo v1.3.0 (californium 2.6.0). Leshan client fails to keep connecting with Leshan server after a period of time due to an error of DTLS Full Handshake initiated by client : FAILED (Received 'fatal alert/HANDSHAKE_FAILURE'). Leshan server restart is required to allow the client to connect again. Please advise on how to solve this issue.

client log

2021-01-27 07:30:21.143 [RegistrationEngine#0] INFO  o.e.l.c.e.DefaultRegistrationEngine - Trying to update registration to coaps://219.x.x.x:5684 (response timeout 120000ms)...
2021-01-27 07:30:21.143 [RegistrationEngine#0] INFO  o.e.l.c.c.CaliforniumEndpointsManager - Clear DTLS session for server coaps://219.x.x.x:5684
2021-01-27 07:30:21.144 [DTLS-Timer-0.0.0.0/0.0.0.0:58070#1] INFO  o.e.l.client.demo.LeshanClientDemo - DTLS Full Handshake initiated by client : STARTED ...
2021-01-27 07:30:21.144 [DTLS-Timer-0.0.0.0/0.0.0.0:58070#1] INFO  o.e.c.scandium.dtls.DTLSFlight - Prepare flight 1, using max. datagram size 1436, max. fragment size 16384 [mhm=false, mr=true]
2021-01-27 07:30:21.144 [DTLS-Timer-0.0.0.0/0.0.0.0:58070#1] INFO  o.e.c.scandium.dtls.DTLSFlight - Effective max. datagram size 1436
2021-01-27 07:30:21.147 [DTLS-Timer-0.0.0.0/0.0.0.0:58070#1] INFO  o.e.c.scandium.dtls.DTLSFlight - Prepare flight 3, using max. datagram size 1436, max. fragment size 16384 [mhm=false, mr=true]
2021-01-27 07:30:21.148 [DTLS-Timer-0.0.0.0/0.0.0.0:58070#1] INFO  o.e.c.scandium.dtls.DTLSFlight - Effective max. datagram size 1436
2021-01-27 07:30:21.154 [DTLS-Timer-0.0.0.0/0.0.0.0:58070#1] INFO  o.e.l.client.demo.LeshanClientDemo - DTLS Full Handshake initiated by client : FAILED (Received 'fatal alert/HANDSHAKE_FAILURE')
2021-01-27 07:30:21.155 [RegistrationEngine#0] INFO  o.e.l.c.e.DefaultRegistrationEngine - Unable to send update request : Received 'fatal alert/HANDSHAKE_FAILURE'
2021-01-27 07:30:21.155 [RegistrationEngine#0] INFO  o.e.l.c.e.DefaultRegistrationEngine - Trying to register to coaps://219.x.x.x:5684 ...
2021-01-27 07:30:21.156 [DTLS-Timer-0.0.0.0/0.0.0.0:58070#1] INFO  o.e.l.client.demo.LeshanClientDemo - DTLS Full Handshake initiated by client : STARTED ...
2021-01-27 07:30:21.156 [DTLS-Timer-0.0.0.0/0.0.0.0:58070#1] INFO  o.e.c.scandium.dtls.DTLSFlight - Prepare flight 1, using max. datagram size 1436, max. fragment size 16384 [mhm=false, mr=true]
2021-01-27 07:30:21.156 [DTLS-Timer-0.0.0.0/0.0.0.0:58070#1] INFO  o.e.c.scandium.dtls.DTLSFlight - Effective max. datagram size 1436
2021-01-27 07:30:21.160 [DTLS-Timer-0.0.0.0/0.0.0.0:58070#1] INFO  o.e.c.scandium.dtls.DTLSFlight - Prepare flight 3, using max. datagram size 1436, max. fragment size 16384 [mhm=false, mr=true]
2021-01-27 07:30:21.160 [DTLS-Timer-0.0.0.0/0.0.0.0:58070#1] INFO  o.e.c.scandium.dtls.DTLSFlight - Effective max. datagram size 1436
2021-01-27 07:30:21.165 [DTLS-Timer-0.0.0.0/0.0.0.0:58070#1] INFO  o.e.l.client.demo.LeshanClientDemo - DTLS Full Handshake initiated by client : FAILED (Received 'fatal alert/HANDSHAKE_FAILURE')
2021-01-27 07:30:21.165 [RegistrationEngine#0] INFO  o.e.l.c.e.DefaultRegistrationEngine - Unable to send register request : Received 'fatal alert/HANDSHAKE_FAILURE'
2021-01-27 07:30:21.165 [RegistrationEngine#0] INFO  o.e.l.c.e.DefaultRegistrationEngine - Try to register to coaps://219.x.x.x:5684 again in 600s...
2021-01-27 07:30:31.125 [DTLS-Timer-0.0.0.0/0.0.0.0:58070#1] INFO  o.e.l.client.demo.LeshanClientDemo - DTLS Full Handshake initiated by server : STARTED ...
2021-01-27 07:30:31.133 [DTLS-Timer-0.0.0.0/0.0.0.0:58070#1] WARN  o.e.c.scandium.dtls.ServerHandshaker - Cannot process handshake message from peer [/219.x.x.x:5684] due to [null]
java.lang.NullPointerException: null
	at org.eclipse.californium.scandium.dtls.CertificateRequest.addCerticiateAuthorities(CertificateRequest.java:421)
	at org.eclipse.californium.scandium.dtls.ServerHandshaker.createCertificateRequest(ServerHandshaker.java:576)
	at org.eclipse.californium.scandium.dtls.ServerHandshaker.receivedClientHello(ServerHandshaker.java:449)
	at org.eclipse.californium.scandium.dtls.ServerHandshaker.doProcessMessage(ServerHandshaker.java:225)
	at org.eclipse.californium.scandium.dtls.Handshaker.processNextHandshakeMessages(Handshaker.java:854)
	at org.eclipse.californium.scandium.dtls.Handshaker.processNextMessages(Handshaker.java:702)
	at org.eclipse.californium.scandium.dtls.Handshaker.processMessage(Handshaker.java:660)
	at org.eclipse.californium.scandium.DTLSConnector.startNewHandshake(DTLSConnector.java:2098)
	at org.eclipse.californium.scandium.DTLSConnector.processClientHello(DTLSConnector.java:1989)
	at org.eclipse.californium.scandium.DTLSConnector.access$1200(DTLSConnector.java:231)
	at org.eclipse.californium.scandium.DTLSConnector$14.run(DTLSConnector.java:1929)
	at org.eclipse.californium.elements.util.SerialExecutor$1.run(SerialExecutor.java:289)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
	at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.base/java.lang.Thread.run(Unknown Source)

server log

2021-01-27 07:29:28.137 [DTLS-Worker-0.0.0.0/0.0.0.0:5684#20] INFO  o.e.c.scandium.dtls.DTLSFlight - Effective max. datagram size 1436
2021-01-27 07:29:28.141 [CoapServer(main)#4] INFO  o.e.l.server.demo.LeshanServerDemo - Accepted registration from endpoint payphonekioskPg001 at /218.x.x.x:58070
2021-01-27 07:30:31.116 [DTLS-Worker-0.0.0.0/0.0.0.0:5684#21] INFO  o.e.c.scandium.dtls.DTLSFlight - Prepare flight 1, using max. datagram size 1436, max. fragment size 16384 [mhm=false, mr=true]
2021-01-27 07:30:31.116 [DTLS-Worker-0.0.0.0/0.0.0.0:5684#21] INFO  o.e.c.scandium.dtls.DTLSFlight - Effective max. datagram size 1436
2021-01-27 07:30:31.122 [DTLS-Worker-0.0.0.0/0.0.0.0:5684#16] INFO  o.e.c.scandium.dtls.DTLSFlight - Prepare flight 3, using max. datagram size 1436, max. fragment size 16384 [mhm=false, mr=true]
2021-01-27 07:30:31.122 [DTLS-Worker-0.0.0.0/0.0.0.0:5684#16] INFO  o.e.c.scandium.dtls.DTLSFlight - Effective max. datagram size 1436
2021-01-27 07:30:31.138 [qtp1003693033-32] WARN  o.e.l.s.demo.servlet.ClientServlet - Unexpected exception
org.eclipse.leshan.core.request.exception.SendFailedException: Request CON-GET    MID=17697, Token=8C5BA5578EC0D4A7, OptionSet={"Uri-Path":["26241","0"]}, Received 'fatal alert/INTERNAL_ERROR' no payload cannot be sent
        at org.eclipse.leshan.core.californium.CoapSyncRequestObserver.onSendError(CoapSyncRequestObserver.java:106)
        at org.eclipse.californium.core.coap.Message.setSendError(Message.java:1063)
        at org.eclipse.californium.core.coap.Request.setSendError(Request.java:1132)
        at org.eclipse.californium.core.network.CoapEndpoint$SendingCallback.onError(CoapEndpoint.java:1531)
        at org.eclipse.californium.elements.RawData.onError(RawData.java:303)
        at org.eclipse.californium.scandium.DTLSConnector$2.handshakeFailed(DTLSConnector.java:521)
        at org.eclipse.californium.scandium.dtls.Handshaker.handshakeFailed(Handshaker.java:1882)
        at org.eclipse.californium.scandium.DTLSConnector.processAlertRecord(DTLSConnector.java:1732)
        at org.eclipse.californium.scandium.DTLSConnector.processRecord(DTLSConnector.java:1503)
        at org.eclipse.californium.scandium.DTLSConnector$12.run(DTLSConnector.java:1365)
        at org.eclipse.californium.elements.util.SerialExecutor$1.run(SerialExecutor.java:289)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: org.eclipse.californium.scandium.dtls.HandshakeException: Received 'fatal alert/INTERNAL_ERROR'
        at org.eclipse.californium.scandium.DTLSConnector.processAlertRecord(DTLSConnector.java:1722)
        ... 6 common frames omitted
2021-01-27 07:31:31.111 [DTLS-Worker-0.0.0.0/0.0.0.0:5684#23] INFO  o.e.c.scandium.dtls.DTLSFlight - Prepare flight 1, using max. datagram size 1436, max. fragment size 16384 [mhm=false, mr=true]
2021-01-27 07:31:31.111 [DTLS-Worker-0.0.0.0/0.0.0.0:5684#23] INFO  o.e.c.scandium.dtls.DTLSFlight - Effective max. datagram size 1436
2021-01-27 07:31:31.114 [DTLS-Worker-0.0.0.0/0.0.0.0:5684#14] INFO  o.e.c.scandium.dtls.DTLSFlight - Prepare flight 3, using max. datagram size 1436, max. fragment size 16384 [mhm=false, mr=true]
2021-01-27 07:31:31.114 [DTLS-Worker-0.0.0.0/0.0.0.0:5684#14] INFO  o.e.c.scandium.dtls.DTLSFlight - Effective max. datagram size 1436
@sbernard31 sbernard31 added the question Any question about leshan label Jan 27, 2021
@sbernard31
Copy link
Contributor

sbernard31 commented Jan 27, 2021

If we look at client side :

C1) It seems for some unknown reason that the Registration Update timeout ?
This is a misinterpretation, this is just the log saying that the device try to do a registration update
==> Trying to update registration to coaps://219.x.x.x:5684 (response timeout 120000ms)...

C2) So client consider that maybe DTLS connection is lost at server side and clear its DTLS state
-f -r is used at client side, so for each update we do a new FULL handshake.
==> Clear DTLS session for server coaps://219.x.x.x:5684

C3) Then try to establish a new DTLS connection.
==> DTLS Full Handshake initiated by client : STARTED

C4) without success because receiving a FATAL ALERT
==> DTLS Full Handshake initiated by client : FAILED (Received 'fatal alert/HANDSHAKE_FAILURE')

C5) server initiate an handshake and this generate an NPE :
==> DTLS Full Handshake initiated by server : STARTED ...
==> Cannot process handshake message from peer [/219.x.x.x:5684] due to [null] java.lang.NullPointerException: null

About C1) I would understand why registration update timeout ? are you behind a NAT ? C1 is normal behavior.
C2 and C3 are normal behavior.
About C4) This is pretty strange because at server side we see nothing about this handshake failure, I feel you log is not complete or ?
I will investigate C5)

If we look at server side :

S1) Try to send a request to a client but it has no connection ...
S2) The server initiated handshake failed because of C5) NPE.

About S1) I can not explain why connection is lost.
About S2) this sounds "normal" regarding the issue at C5.

So many missing puzzle piece to understand what happened.
We need more complete logs or more context to better understand.
I think we need first focus on explaining C1 and maybe also try to get some logs at server side about C4).

@francopo
Copy link
Author

francopo commented Jan 27, 2021

If we look at client side :

C1) It seems for some unknown reason that the Registration Update timeout ?
==> Trying to update registration to coaps://219.x.x.x:5684 (response timeout 120000ms)...

C2) So client consider that maybe DTLS connection is lost at server side and clear its DTLS state
==> Clear DTLS session for server coaps://219.x.x.x:5684

C3) Then try to establish a new DTLS connection.
==> DTLS Full Handshake initiated by client : STARTED

C4) without success because receiving a FATAL ALERT
==> DTLS Full Handshake initiated by client : FAILED (Received 'fatal alert/HANDSHAKE_FAILURE')

C5) server initiate an handshake and this generate an NPE :
==> DTLS Full Handshake initiated by server : STARTED ...
==> Cannot process handshake message from peer [/219.x.x.x:5684] due to [null] java.lang.NullPointerException: null

About C1) I would understand why registration update timeout ? are you behind a NAT ?

Those Trying to update registration messages look normal. Registration update schedules to run in an interval of 53s.
Yeah, the client is behind a NAT.
Other than this, I did add -r -f into the command line arguments when starting my modified version of leshan-client-demo to force full rehandshake. And the authorizer in leshan-server-demo was customized to trust all X509 secured clients and bypass common name checking.

client log

2021-01-27 07:28:35.064 [RegistrationEngine#0] INFO  o.e.l.c.e.DefaultRegistrationEngine - Trying to update registration to coaps://219.x.x.x:5684 (response timeout 120000ms)...
2021-01-27 07:28:35.064 [RegistrationEngine#0] INFO  o.e.l.c.c.CaliforniumEndpointsManager - Clear DTLS session for server coaps://219.x.x.x:5684
2021-01-27 07:28:35.065 [DTLS-Timer-0.0.0.0/0.0.0.0:58070#1] INFO  o.e.l.client.demo.LeshanClientDemo - DTLS Full Handshake initiated by client : STARTED ...
2021-01-27 07:28:35.065 [DTLS-Timer-0.0.0.0/0.0.0.0:58070#1] INFO  o.e.c.scandium.dtls.DTLSFlight - Prepare flight 1, using max. datagram size 1436, max. fragment size 16384 [mhm=false, mr=true]
2021-01-27 07:28:35.065 [DTLS-Timer-0.0.0.0/0.0.0.0:58070#1] INFO  o.e.c.scandium.dtls.DTLSFlight - Effective max. datagram size 1436
2021-01-27 07:28:35.069 [DTLS-Timer-0.0.0.0/0.0.0.0:58070#1] INFO  o.e.c.scandium.dtls.DTLSFlight - Prepare flight 3, using max. datagram size 1436, max. fragment size 16384 [mhm=false, mr=true]
2021-01-27 07:28:35.069 [DTLS-Timer-0.0.0.0/0.0.0.0:58070#1] INFO  o.e.c.scandium.dtls.DTLSFlight - Effective max. datagram size 1436
2021-01-27 07:28:35.075 [DTLS-Timer-0.0.0.0/0.0.0.0:58070#1] INFO  o.e.c.scandium.dtls.ClientHandshaker - Start certificate verification.
2021-01-27 07:28:35.075 [DTLS-Timer-0.0.0.0/0.0.0.0:58070#1] INFO  o.e.c.scandium.dtls.ClientHandshaker - Process result of certificate verification.
2021-01-27 07:28:35.090 [DTLS-Timer-0.0.0.0/0.0.0.0:58070#1] INFO  o.e.c.scandium.dtls.DTLSFlight - Prepare flight 5, using max. datagram size 1436, max. fragment size 16384 [mhm=false, mr=true]
2021-01-27 07:28:35.090 [DTLS-Timer-0.0.0.0/0.0.0.0:58070#1] INFO  o.e.c.scandium.dtls.DTLSFlight - Effective max. datagram size 1436
2021-01-27 07:28:35.099 [DTLS-Timer-0.0.0.0/0.0.0.0:58070#1] INFO  o.e.l.client.demo.LeshanClientDemo - DTLS Full Handshake initiated by client : SUCCEED
2021-01-27 07:28:35.103 [RegistrationEngine#0] INFO  o.e.l.c.e.DefaultRegistrationEngine - Registration update succeed.
2021-01-27 07:28:35.103 [RegistrationEngine#0] INFO  o.e.l.c.e.DefaultRegistrationEngine - Next registration update to coaps://219.x.x.x:5684 in 53s...

2021-01-27 07:29:28.103 [RegistrationEngine#0] INFO  o.e.l.c.e.DefaultRegistrationEngine - Trying to update registration to coaps://219.x.x.x:5684 (response timeout 120000ms)...
2021-01-27 07:29:28.103 [RegistrationEngine#0] INFO  o.e.l.c.c.CaliforniumEndpointsManager - Clear DTLS session for server coaps://219.x.x.x:5684
2021-01-27 07:29:28.104 [DTLS-Timer-0.0.0.0/0.0.0.0:58070#1] INFO  o.e.l.client.demo.LeshanClientDemo - DTLS Full Handshake initiated by client : STARTED ...
2021-01-27 07:29:28.104 [DTLS-Timer-0.0.0.0/0.0.0.0:58070#1] INFO  o.e.c.scandium.dtls.DTLSFlight - Prepare flight 1, using max. datagram size 1436, max. fragment size 16384 [mhm=false, mr=true]
2021-01-27 07:29:28.104 [DTLS-Timer-0.0.0.0/0.0.0.0:58070#1] INFO  o.e.c.scandium.dtls.DTLSFlight - Effective max. datagram size 1436
2021-01-27 07:29:28.107 [DTLS-Timer-0.0.0.0/0.0.0.0:58070#1] INFO  o.e.c.scandium.dtls.DTLSFlight - Prepare flight 3, using max. datagram size 1436, max. fragment size 16384 [mhm=false, mr=true]
2021-01-27 07:29:28.107 [DTLS-Timer-0.0.0.0/0.0.0.0:58070#1] INFO  o.e.c.scandium.dtls.DTLSFlight - Effective max. datagram size 1436
2021-01-27 07:29:28.113 [DTLS-Timer-0.0.0.0/0.0.0.0:58070#1] INFO  o.e.c.scandium.dtls.ClientHandshaker - Start certificate verification.
2021-01-27 07:29:28.114 [DTLS-Timer-0.0.0.0/0.0.0.0:58070#1] INFO  o.e.c.scandium.dtls.ClientHandshaker - Process result of certificate verification.
2021-01-27 07:29:28.130 [DTLS-Timer-0.0.0.0/0.0.0.0:58070#1] INFO  o.e.c.scandium.dtls.DTLSFlight - Prepare flight 5, using max. datagram size 1436, max. fragment size 16384 [mhm=false, mr=true]
2021-01-27 07:29:28.130 [DTLS-Timer-0.0.0.0/0.0.0.0:58070#1] INFO  o.e.c.scandium.dtls.DTLSFlight - Effective max. datagram size 1436
2021-01-27 07:29:28.139 [DTLS-Timer-0.0.0.0/0.0.0.0:58070#1] INFO  o.e.l.client.demo.LeshanClientDemo - DTLS Full Handshake initiated by client : SUCCEED
2021-01-27 07:29:28.143 [RegistrationEngine#0] INFO  o.e.l.c.e.DefaultRegistrationEngine - Registration update succeed.
2021-01-27 07:29:28.143 [RegistrationEngine#0] INFO  o.e.l.c.e.DefaultRegistrationEngine - Next registration update to coaps://219.x.x.x:5684 in 53s...

2021-01-27 07:30:21.143 [RegistrationEngine#0] INFO  o.e.l.c.e.DefaultRegistrationEngine - Trying to update registration to coaps://219.x.x.x:5684 (response timeout 120000ms)...

customized authorizer in server

        builder.setAuthorizer(new Authorizer() {
            @Override
            public Registration isAuthorized(UplinkRequest<?> uplinkRequest, Registration registration,
                Identity identity) {
                if (identity.isSecure() && identity.isX509()) {
                    LOG.info("Accepted registration from endpoint {} at {}", registration.getEndpoint(), identity.getPeerAddress().toString());
                    return registration;
                } else {
                    LOG.info("Declined registration from endpoint {} at {}", registration.getEndpoint(), identity.getPeerAddress().toString());
                    return null;
                }
            }
        });

C2 and C3 are normal behavior.
About C4) This is pretty strange because at server side we see nothing about this handshake failure, I feel you log is not complete or ?

Not much information from the INFO log can explain C4).

I developed another tiny program to poll for the device shadow data from HTTP API endpoints exposed in leshan-server-demo periodically. Would this trigger a HandshakeException if the connection is lost?

BTW, will enable DEBUG log in server to capture more useful information.

I will investigate C5)

If we look at server side :

S1) Try to send a request to a client but it has no connection ...
S2) The server intiated handshake failed because of C5) NPE.

About S1) I can not explain why connection is lost.

There should be no network disconnection at that moment. Can the client log above prove this?

About S2) this sounds "normal" regarding the issue at C5.

So many missing puzzle piece to understand what happened.
We need more complete logs or more context to better understand.
I think we need first focus on explaining C1 and maybe also try to get some logs at server side about C4).

@sbernard31
Copy link
Contributor

If you are behind a NAT this could explain C1)
Using -r should avoid the timeout.

About C5 this is a bug, I will fix it in 1.x branch soon.

@francopo
Copy link
Author

@sbernard31 Attached please find the log files for your investigation. Is it likely the fix for C5 in server side can make client and server reconnect again without server restart? 🙇

client_log.txt
server_log.txt

@sbernard31
Copy link
Contributor

sbernard31 commented Jan 28, 2021

This logs at server side could explain C4)

2021-01-28 12:12:20.157 [DTLS-Worker-0.0.0.0/0.0.0.0:5684#14] DEBUG o.e.c.scandium.dtls.ServerHandshaker - handshake failed dtls-con: CID=79FFCC21FF68
org.eclipse.californium.scandium.dtls.HandshakeException: Client proposed unsupported cipher suites only
	at org.eclipse.californium.scandium.dtls.ServerHandshaker.negotiateCipherSuite(ServerHandshaker.java:784)
	at org.eclipse.californium.scandium.dtls.ServerHandshaker.createServerHello(ServerHandshaker.java:491)
	at org.eclipse.californium.scandium.dtls.ServerHandshaker.receivedClientHello(ServerHandshaker.java:443)
	at org.eclipse.californium.scandium.dtls.ServerHandshaker.doProcessMessage(ServerHandshaker.java:225)
	at org.eclipse.californium.scandium.dtls.Handshaker.processNextHandshakeMessages(Handshaker.java:854)
	at org.eclipse.californium.scandium.dtls.Handshaker.processNextMessages(Handshaker.java:702)
	at org.eclipse.californium.scandium.dtls.Handshaker.processMessage(Handshaker.java:660)
	at org.eclipse.californium.scandium.DTLSConnector.startNewHandshake(DTLSConnector.java:2098)
	at org.eclipse.californium.scandium.DTLSConnector.processClientHello(DTLSConnector.java:19

But I can not understand how its possible. Now a wireshark capture could help to understand.

authorizer in leshan-server-demo was customized to trust all X509 secured clients and bypass common name checking.

This is OK for a demo, but for production this NOT SAFE AT ALL because if you get credential of 1 device using x509, you can usurp identity all others devices which also use x509.

Is it likely the fix for C5 in server side can make client and server reconnect again without server restart?

It could but not sure. In all case we need to fix/understand the C4 issue too.

@sbernard31
Copy link
Contributor

I pushed a fix for C5 in 1.x : 36f5756
Could you try it ?

@sbernard31
Copy link
Contributor

sbernard31 commented Jan 28, 2021

Other than this, I did add -r -f into the command line arguments when starting my modified version of leshan-client-demo to force full rehandshake.

@francopo did you modify cipher suite to use ?

@francopo
Copy link
Author

'''
2021-01-28 12:11:27.104 [DTLS-Worker-0.0.0.0/0.0.0.0:5684#9] DEBUG o.e.c.scandium.dtls.ServerHandshaker - Negotiated cipher suite [TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256] with peer [/218.103.196.205:59953]
'''

Other than this, I did add -r -f into the command line arguments when starting my modified version of leshan-client-demo to force full rehandshake.

@francopo did you modify cipher suite to use ?

No. Keep using recommended cipher suites.

2021-01-28 12:11:27.104 [DTLS-Worker-0.0.0.0/0.0.0.0:5684#9] DEBUG o.e.c.scandium.dtls.ServerHandshaker - Negotiated cipher suite [TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256] with peer [/218.x.x.x:59953]

@francopo
Copy link
Author

'''
2021-01-28 12:11:27.104 [DTLS-Worker-0.0.0.0/0.0.0.0:5684#9] DEBUG o.e.c.scandium.dtls.ServerHandshaker - Negotiated cipher suite [TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256] with peer [/218.103.196.205:59953]
'''

Other than this, I did add -r -f into the command line arguments when starting my modified version of leshan-client-demo to force full rehandshake.

@francopo did you modify cipher suite to use ?

No. Keep using recommended cipher suites.

2021-01-28 12:11:27.104 [DTLS-Worker-0.0.0.0/0.0.0.0:5684#9] DEBUG o.e.c.scandium.dtls.ServerHandshaker - Negotiated cipher suite [TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256] with peer [/218.x.x.x:59953]

Regarding to the reconnection problem, client was scheduled to update registration but it received 'fatal alert/HANDSHAKE_FAILURE' from server.

2021-01-28 12:22:20.162 [RegistrationEngine#0] INFO  o.e.l.c.e.DefaultRegistrationEngine - Trying to register to coaps://219.76.98.202:5684 ...
2021-01-28 12:22:20.163 [DTLS-Timer-0.0.0.0/0.0.0.0:59953#1] INFO  o.e.l.client.demo.LeshanClientDemo - DTLS Full Handshake initiated by client : STARTED ...
2021-01-28 12:22:20.163 [DTLS-Timer-0.0.0.0/0.0.0.0:59953#1] INFO  o.e.c.scandium.dtls.DTLSFlight - Prepare flight 1, using max. datagram size 1436, max. fragment size 16384 [mhm=false, mr=true]
2021-01-28 12:22:20.163 [DTLS-Timer-0.0.0.0/0.0.0.0:59953#1] INFO  o.e.c.scandium.dtls.DTLSFlight - Effective max. datagram size 1436
2021-01-28 12:22:20.168 [DTLS-Timer-0.0.0.0/0.0.0.0:59953#1] INFO  o.e.c.scandium.dtls.DTLSFlight - Prepare flight 3, using max. datagram size 1436, max. fragment size 16384 [mhm=false, mr=true]
2021-01-28 12:22:20.168 [DTLS-Timer-0.0.0.0/0.0.0.0:59953#1] INFO  o.e.c.scandium.dtls.DTLSFlight - Effective max. datagram size 1436
2021-01-28 12:22:20.173 [DTLS-Timer-0.0.0.0/0.0.0.0:59953#1] INFO  o.e.l.client.demo.LeshanClientDemo - DTLS Full Handshake initiated by client : FAILED (Received 'fatal alert/HANDSHAKE_FAILURE')
2021-01-28 12:22:20.173 [RegistrationEngine#0] INFO  o.e.l.c.e.DefaultRegistrationEngine - Unable to send register request : Received 'fatal alert/HANDSHAKE_FAILURE'
2021-01-28 12:22:20.173 [RegistrationEngine#0] INFO  o.e.l.c.e.DefaultRegistrationEngine - Try to register to coaps://219.76.98.202:5684 again in 600s...

Server aborted the DTLS handshake because client proposed unsupported cipher suites only. It's a bit weird here.

2021-01-28 12:22:20.169 [DTLS-Worker-0.0.0.0/0.0.0.0:5684#12] DEBUG o.e.c.s.dtls.InMemoryConnectionStore - connection: missing connection for /218.103.196.205:59953!
2021-01-28 12:22:20.169 [DTLS-Worker-0.0.0.0/0.0.0.0:5684#12] DEBUG o.e.c.s.dtls.InMemoryConnectionStore - connection: add CID=291ACF112F29 (size 927)
2021-01-28 12:22:20.169 [DTLS-Worker-0.0.0.0/0.0.0.0:5684#12] DEBUG o.e.c.s.dtls.InMemoryConnectionStore - connection: CID=291ACF112F29 - /218.103.196.205:59953 added!
2021-01-28 12:22:20.169 [DTLS-Worker-0.0.0.0/0.0.0.0:5684#12] DEBUG o.e.c.scandium.dtls.ServerHandshaker - Processing CLIENT_HELLO (1) message from peer [/218.103.196.205:59953], seqn: [1]
2021-01-28 12:22:20.169 [DTLS-Worker-0.0.0.0/0.0.0.0:5684#12] DEBUG o.e.c.scandium.dtls.ServerHandshaker - handshake started dtls-con: CID=291ACF112F29, /218.103.196.205:59953, is alive
2021-01-28 12:22:20.169 [DTLS-Worker-0.0.0.0/0.0.0.0:5684#12] DEBUG o.e.c.scandium.dtls.Connection - Handshake with [/218.103.196.205:59953] has been started
2021-01-28 12:22:20.169 [DTLS-Worker-0.0.0.0/0.0.0.0:5684#12] DEBUG o.e.c.scandium.DTLSConnector - Aborting handshake with peer [/218.103.196.205:59953]: Client proposed unsupported cipher suites only
2021-01-28 12:22:20.169 [DTLS-Worker-0.0.0.0/0.0.0.0:5684#12] DEBUG o.e.c.s.dtls.InMemoryConnectionStore - connection: remove dtls-con: CID=291ACF112F29, /218.103.196.205:59953, ongoing handshake E81C9C317D7D (size 926)
2021-01-28 12:22:20.169 [DTLS-Worker-0.0.0.0/0.0.0.0:5684#12] DEBUG o.e.c.scandium.dtls.ServerHandshaker - handshake failed dtls-con: CID=291ACF112F29
org.eclipse.californium.scandium.dtls.HandshakeException: Client proposed unsupported cipher suites only
	at org.eclipse.californium.scandium.dtls.ServerHandshaker.negotiateCipherSuite(ServerHandshaker.java:784)
	at org.eclipse.californium.scandium.dtls.ServerHandshaker.createServerHello(ServerHandshaker.java:491)
	at org.eclipse.californium.scandium.dtls.ServerHandshaker.receivedClientHello(ServerHandshaker.java:443)
	at org.eclipse.californium.scandium.dtls.ServerHandshaker.doProcessMessage(ServerHandshaker.java:225)
	at org.eclipse.californium.scandium.dtls.Handshaker.processNextHandshakeMessages(Handshaker.java:854)
	at org.eclipse.californium.scandium.dtls.Handshaker.processNextMessages(Handshaker.java:702)
	at org.eclipse.californium.scandium.dtls.Handshaker.processMessage(Handshaker.java:660)
	at org.eclipse.californium.scandium.DTLSConnector.startNewHandshake(DTLSConnector.java:2098)
	at org.eclipse.californium.scandium.DTLSConnector.processClientHello(DTLSConnector.java:1989)
	at org.eclipse.californium.scandium.DTLSConnector.access$1200(DTLSConnector.java:231)
	at org.eclipse.californium.scandium.DTLSConnector$14.run(DTLSConnector.java:1929)
	at org.eclipse.californium.elements.util.SerialExecutor$1.run(SerialExecutor.java:289)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:834)

@sbernard31
Copy link
Contributor

sbernard31 commented Jan 29, 2021

Server aborted the DTLS handshake because client proposed unsupported cipher suites only. It's a bit weird here.

I agree that's why I would like to see a wireshark capture. To be able to check the DTLS handshake.
(see how to provide an IP capture)

I tried to reproduce this on my side without success all if OK for me.
Do you succeed to reproduce this with a "not modified" version of leshan-client-demo?
Did you try the fix above for C5) ?

@boaks did you already face something like this ?

@boaks
Copy link

boaks commented Jan 29, 2021

That issue is somehow confusing, I'm not sure, if I understand it.

The first comment contains a "client.log" which has a ServerHandshakerin the stacktrace. That looks like an role exchange. The leshan-client seems to chose a ECDSA cipher suite, but has no (role-exchange) sever-certificate. I'm not sure, what exactly caused that. Maybe, you can provide more details about the client's setup.

The last comment:

Client proposed unsupported cipher suites only

The current Californium master (3.0.0-SNAPSHOT) contains already:

Client proposed unsupported cipher suites or parameters only.

It's not only the cipher suite, it's also the other parameters.

As @sbernard31 already wrote, wireshark captuers helps here a lot. (See Logs and IP-Capturing )

@boaks
Copy link

boaks commented Jan 29, 2021

java.lang.NullPointerException: null
	at org.eclipse.californium.scandium.dtls.CertificateRequest.addCerticiateAuthorities(CertificateRequest.java:421)
	at org.eclipse.californium.scandium.dtls.ServerHandshaker.createCertificateRequest(ServerHandshaker.java:576)
	at org.eclipse.californium.scandium.dtls.ServerHandshaker.receivedClientHello(ServerHandshaker.java:449)

That seems to be caused by returning null from getAcceptedIssuers() of the certificate verifier.

	/**
	 * Return an array of certificate authority certificates which are trusted
	 * for authenticating peers.
	 * 
	 * The javadoc of previous versions (2.1.0 and before) permits to use
	 * {@code null}. This causes a failure, please adapt to use an empty array.
	 * 
	 * @return a non-null (possibly empty) array of acceptable CA issuer
	 *         certificates.
	 */
	X509Certificate[] getAcceptedIssuers();

@sbernard31
Copy link
Contributor

sbernard31 commented Jan 29, 2021

That seems to be caused by returning null from getAcceptedIssuers() of the certificate verifier.

@boaks this is already fixed. see #964 (comment)

The leshan-client seems to chose a ECDSA cipher suite, but has no (role-exchange) sever-certificate.

I'm not sure to get your point, is it related to the fixed issue above ☝️ ?

That issue is somehow confusing, I'm not sure, if I understand it.

#964 (comment) is a description of my understanding.

Currently, we try to understand the issue before the role exchange. Issue related to Client proposed unsupported cipher suites only
This is surprising because a first handshake probably succeed to do the register request but another one failed for update request with this error at server side ? 🤔

@boaks
Copy link

boaks commented Jan 29, 2021

I didn't read the whole thread too carefully ... I just jumped to the first and the last.
If the null is fixed, great.

Which detail caused the handshake in the last example fails, is for me unclear and requires either a ip-capture or extended logging messages.

@sbernard31
Copy link
Contributor

I did some test on my side and I not able to reproduce :/

2021-01-29 11:00:38,158 INFO LeshanClient - Starting Leshan client ...
2021-01-29 11:00:38,388 INFO CaliforniumEndpointsManager - New endpoint created for server coaps://localhost:5684 at coaps://0.0.0.0:45301
2021-01-29 11:00:38,390 INFO LeshanClient - Leshan client[endpoint:client-x509] started.
2021-01-29 11:00:38,395 INFO DefaultRegistrationEngine - Trying to register to coaps://localhost:5684 ...
2021-01-29 11:00:38,450 INFO LeshanClientDemo - DTLS Full Handshake initiated by client : STARTED ...
2021-01-29 11:00:38,529 INFO LeshanClientDemo - DTLS Full Handshake initiated by client : SUCCEED
2021-01-29 11:00:38,557 INFO DefaultRegistrationEngine - Registered with location '/rd/L1gaFAMVsP'.
2021-01-29 11:00:38,557 INFO DefaultRegistrationEngine - Next registration update to coaps://localhost:5684 in 53s...
2021-01-29 11:01:31,559 INFO DefaultRegistrationEngine - Trying to update registration to coaps://localhost:5684 (response timeout 120000ms)...
2021-01-29 11:01:31,561 INFO CaliforniumEndpointsManager - Clear DTLS session for server coaps://localhost:5684
2021-01-29 11:01:31,566 INFO LeshanClientDemo - DTLS Full Handshake initiated by client : STARTED ...
2021-01-29 11:01:31,610 INFO LeshanClientDemo - DTLS Full Handshake initiated by client : SUCCEED
2021-01-29 11:01:31,616 INFO DefaultRegistrationEngine - Registration update succeed.
2021-01-29 11:01:31,617 INFO DefaultRegistrationEngine - Next registration update to coaps://localhost:5684 in 53s...
======> Connection lost at server side, which will trigger a role exchange to be able to read a resource  <=====
2021-01-29 11:01:44,900 INFO LeshanClientDemo - DTLS Full Handshake initiated by server : STARTED ...
2021-01-29 11:01:45,027 INFO LeshanClientDemo - DTLS Full Handshake initiated by server : SUCCEED
2021-01-29 11:01:45,034 INFO MyDevice - Read on Device resource /3/0/0
2021-01-29 11:02:24,618 INFO DefaultRegistrationEngine - Trying to update registration to coaps://localhost:5684 (response timeout 120000ms)...
2021-01-29 11:02:24,619 INFO CaliforniumEndpointsManager - Clear DTLS session for server coaps://localhost:5684
2021-01-29 11:02:24,622 INFO LeshanClientDemo - DTLS Full Handshake initiated by client : STARTED ...
2021-01-29 11:02:24,677 INFO LeshanClientDemo - DTLS Full Handshake initiated by client : SUCCEED
2021-01-29 11:02:24,705 INFO DefaultRegistrationEngine - Registration update succeed.

@sbernard31
Copy link
Contributor

Thx @boaks, this was mainly in case you face something like this in your recent tests.
Let's see if the IP capture helps to understand this.
@francopo maybe you can activate log in TRACE for :

  • org.eclipse.californium.scandium (or org.eclipse.californium)
  • org.eclipse.leshan

@boaks
Copy link

boaks commented Jan 29, 2021

No, I wasn't faced by such issues. It depends heavily on the exact dtls-configuration, so I'm not sure, what caused it. For the future, I add a detailed logging for the client and server hello in that case (for Californium 3.0).

@francopo
Copy link
Author

Thx @boaks, this was mainly in case you face something like this in your recent tests.
Let's see if the IP capture helps to understand this.
@francopo maybe you can activate log in TRACE for :

  • org.eclipse.californium.scandium (or org.eclipse.californium)
  • org.eclipse.leshan

@sbernard31, 36f575 can fix the NPE. 👏

Please download issue-964-20210131.zip, which contains client (unmodified version) TRACE log and wireshark capture for these 2 handshake events.

  1. Abnormal DTLS handshake @ 21:48:58

    • A new client started connecting with the server.
    • Client did answer HELLO_VERIFY_REQUEST with a CLIENT_HELLO using expected set of cipher suites. Server then aborted the handshake by sending a Alert (Level: Fatal, Description: Handshake Failure).
      abnormal_client_wireshark
  2. Normal DTLS handshake @ 22:32:40

    • Since no client can't complete successful handshake with the server, leshan server restart is required.

Please check if the information inside can help.

@boaks
Copy link

boaks commented Feb 1, 2021

I checked the wireshark captures, and I can see a difference in the client_hello's.

What makes me (just) wonder, are the used x509 certificates. They look strange to me.
I hope, I can spend some time into the "cipher suite negotiation" failure message this week, but I have also other tasks.

My "wild guess", your x509 do not only looks strange, it may make the lwm2m-server also behave strange. But, that's just a wild guess,

@boaks
Copy link

boaks commented Feb 1, 2021

If you want, you may also add your own logging lines in

DefaultCipherSuiteSelector

to see more details about the negotiation.

@boaks
Copy link

boaks commented Feb 1, 2021

Not strictly related to this issue:
The x509 certificate uses "ecdsa-with-SHA1" as signature algorithm.
Please check the IETF TLS mailing lists. AFAIK, this is going to be "deprecated".
If possible, replace it by "ecdsa-with-SHA256".

@boaks
Copy link

boaks commented Feb 1, 2021

DefaultCipherSuiteSelector.checkCertificateSupport

stacks on false, once failed ;-(.

I prepare a fix.

@sbernard31
FMPOV, with that fix, I would release Californium 2.6.1. WDYT?

@boaks
Copy link

boaks commented Feb 1, 2021

See Californium, PR 1528, Move certificate supported check to local field. If possible, please cherry-pick it and test it, if it fixes this issue.

@sbernard31
Copy link
Contributor

FMPOV, with that fix, I would release Californium 2.6.1. WDYT?

It sounds good. I will also probably release a 1.3.1 with it and the Leshan fixed above.
But we should wait @francopo feedback before :)

@francopo
Copy link
Author

francopo commented Feb 2, 2021

See Californium, PR 1528, Move certificate supported check to local field. If possible, please cherry-pick it and test it, if it fixes this issue.

@boaks , it works. No more DTLS Full Handshake initiated by client : FAILED (Received 'fatal alert/HANDSHAKE_FAILURE') now. Thank you very much.

@boaks
Copy link

boaks commented Feb 2, 2021

Great!

Thank you also for reporting that nasty bug!

@sbernard31 sbernard31 added bug Dysfunctionnal behavior client Impact LWM2M client server Impact LWM2M server and removed question Any question about leshan labels Feb 2, 2021
@sbernard31
Copy link
Contributor

@francopo thx you for taking time to reporting this and help us to fix it. 🙏 !

@sbernard31
Copy link
Contributor

The NPE fix and integration of californium 2.6.1 (which fix the bug above) are now available in master and 1.x branch.
A release 1.3.1 will be available soon.

@sbernard31
Copy link
Contributor

@francopo : release 1.3.1 is out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Dysfunctionnal behavior client Impact LWM2M client server Impact LWM2M server
Projects
None yet
Development

No branches or pull requests

3 participants