Skip to content
Simon edited this page Feb 17, 2023 · 9 revisions

(work in progress)

Since LWM2M v1.1.x, CoAP over TCP was added to LWM2M specification. Here we will try to summarize our understanding of this topic. Please do not hesitate to open an issue if something seems not clear, incomplete, or even wrong to you.

ℹ️ Note that #1047 is an exploration issue about supporting CoAP over TCP in Leshan.

Why/When should I use CoAP over TCP

CoAP over UDP (or CoAP over DTLS over UDP) is the recommended binding for constrained-node networks mainly because of its low code footprint, and small over-the-wire message size. For those same reasons, you should consider it your primary choice in most use cases.

But there are some reasons which could lead you to accept CoAP over TCP costs (a larger code size, more round trips, increased RAM requirements, and larger packet sizes).

The known reasons are :

  1. Your network blocks UDP traffic.
  2. You face some NAT traversal issues with UDP.
    Using LWM2M Queue Mode could help for UDP, but the "Always-Reachable" use case could be very difficult to achieve on UDP when NAT is involved.
  3. You plan to transfer large payloads.

Some reasons that are more rarely put forward::

  • Compare to DTLS, TLS is more popular and it's easier to find mature library.
  • coap+tcp doesn't use MID and so there is no message rate limit like in coap. (See : rfc7252§4.5 - Message Deduplication)

Sources:

TCP Connection lifecycle in LWM2M

The LWM2M specification doesn't specify anything about this. The only thing specified is:

"The LwM2M Server expects that the LwM2M Client is reachable via the TCP binding at any time."

(from LWM2M-v1.1.1@core§6.2.1.2. Behavior with Current Transport Binding and Modes)

This sentence mainly concerns the default LWM2M use case but doesn't apply to the Queue Mode use case.
Note that actually, chapter about Queue Mode in LWM2M specification is strongly CoAP over UDP/DTLS oriented.

Let's try to go deeper into these two different use cases (Always-Reachable and Queue Mode).
ℹ️ Note these are just some thoughts, and we don't know what LWM2M specification authors expect.

Always-Reachable use case

In that case, the client should be reachable for the LWM2M Registration lifetime. A TCP connection should always be open during the registration lifetime, even for long periods of inactivity.

What is not clear is :

  • How should the client behave if the connection is closed by a foreign peer (server/middleware)? I guess it should try to re-established the connection as soon as possible but should it register itself again?
  • Is it allowed for a server to initiate a TCP connection?
  • How does it work with CoAP Observe knowing that "Responses MUST be returned over the same connection as the originating request" (from rfc8323§4.3-Message Transmission)

How to deal with NAT :

NAT binding expires after a certain time of inactivity. So if you want to keep your connection always on, you need to send data periodically. TCP is better than UDP in that case because:

According to HomeGateway, the mean for TCP and UDP NAT binding timeouts is 386 minutes (TCP) and 160 seconds (UDP). Shorter timeout values require keepalive messages to be sent more frequently. Hence, the use of CoAP over TCP requires less-frequent transmission of keepalive messages.

(source : RFC8323 - CoAP (Constrained Application Protocol) over TCP, TLS, and WebSockets § 1-Introduction)

A pure LWM2M solution could be sending a Registration Update Request periodically to maintain NAT and update the registration lifetime. But maybe a more optimal solution could rather be to send CoAP Keep-Alive message :

In CoAP over reliable transports, Empty messages (Code 0.00) can always be sent and MUST be ignored by the recipient. This provides a basic keepalive function.

Queue Mode use case

In that case, the client is not reachable most of the time, it is just "on" sporadically.

Regarding TCP, you have 2 strategies :

  • either you use a long-lived connection (with application-layer heartbeat if NAT is involved)
  • or multi short-lived connections for each awake time.

See RFC9006 - TCP Usage Guidance in the Internet of Things (IoT) § 4.3.TCP Connection Lifetime

What is not clear is :

  • for long-lived connection, if the server tries to send data when the device is "sleeping," this could lead to an unwanted closing connection from the server.
  • multi short-lived connections seem not compatible with CoAP Observe knowing that "Responses MUST be returned over the same connection as the originating request" (from rfc8323§4.3-Message Transmission)
  • should the TCP connection state be used to guess the sleeping/awake state of the device?
  • the CoAP MAX_TRANSMIT_WAIT wait time sounds strange for CoAP over TCP?
  • Is there a better strategy that lets the client know when to go back to sleep? Maybe using Ping/Pong with custody Option instead of MAX_TRANSMIT_WAIT time-out?

How to deal with NAT :

This is pretty much like the Always-reachable use case for long-lived connections: see How to deal with NAT above ☝️ .

For short-lived connections, NAT traversal should not be an issue because there is no inactivity period.

Dealing with half-open connections

An established connection is said to be "half-open" if one of the TCP peers has closed or aborted the connection at its end without the knowledge of the other.
... ... However, half- open connections are expected to be unusual.

(source : rfc9293 - § 3.5.1 - Half-Open Connections and Other Anomalies )

As long as the remaining peer doesn't send any data, the connection may remain in the half-open state and consume memory resources. Even if this is unusual, most implementations need to deal with it.

Is LWM2M peer concerned by this?

Half-open connection at server side

The server must probably deal with a lot of clients. Keeping a half-open connection forever could maybe lead to memory issues.

As long as it exists an LWM2M registration for a given client, there is probably no need for a server to test if the connection is still fully open. If long-lived connection is used in Queue Mode scenario, testing if the client is still alive during "sleep" time could even lead to closing a TCP connection which should not be closed.

On registration expiration, I guess a server could close the corresponding TCP connection and so there is no half-connection to deal with.
The client should close the connection on de-registration after receiving the server response. If, for some reason, the client doesn't close the connection server should probably close it after an arbitrary among of time.

More generally, an idle TCP connection not linked to an LWM2M registration should probably not be kept alive for a long time.

Half-open connection at client side

The client will deal with not so many servers (multi-server use case is not so common). So not a memory issue.

The issue with half-open connection on client side is more about being sure to be reachable by the server. If the connection is closed by the peer (server/middle-box), the client knows that the server can not send requests to it anymore and so it should probably try to reconnect later. In the case of a half-open connection, the client doesn't know that the server cannot reach it anymore.

To mitigate this, a client could test if its connection is still OK. This could probably be achieved by sending data on an idle connection. This could be :

  1. LWM2M registration update,
  2. CoAP Ping message,
  3. Coap Keep-Alive,
  4. Optional TCP Keep-Alive (which are more a detect-alive feature)

Note that if the server were allowed to initiate a TCP connection, this ☝️ would not be an issue anymore, but we currently don't know if this could be an acceptable solution.