Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add configuration to disable reuse of tokens on blockwise transfer. #2088

Merged
merged 1 commit into from Nov 21, 2022

Conversation

boaks
Copy link
Contributor

@boaks boaks commented Nov 17, 2022

Improve protection from delay attacks, if no other means, maybe on application level, are available.

Signed-off-by: Achim Kraus achim.kraus@cloudcoap.net

Improve protection from delay attacks, if no other
means, maybe on application level, are available.

Signed-off-by: Achim Kraus <achim.kraus@cloudcoap.net>
@boaks
Copy link
Contributor Author

boaks commented Nov 17, 2022

This PR is changing the behavior of Californium within RFC7959.

Californium was using the same token for blockwise transfers in order to ease traceability.

attacks-on-coap shows the downside of this. Even if for ACKs (piggy-backed-responses) Californium also uses the MID, using different tokens is the most common practice.

If someone detects trouble with it, it's possible to switch it back using

"COAP.BLOCKWISE_REUSE_TOKEN=true"

@SeppoTakalo
Copy link

I am not an CoAP expert, but I think this change has introduced problems, without fixing the potential repply attack.

In summary:

I claim that this change broke Blockwise transfers that might require you to maintain a state. I think this would be the case if the targeted resource might change between queries. If token is not kept the same, you cannot ensure integrity of the content as you cannot distinct between queries. Etag option would ensure that you have received a same representation of the content, but you cannot really use that in the GET query as it is generated by the server.

Resoning

I believe this change is meant to fix potential Response Delay and Mismatch Attack described in Attacks on CoAP page.

However, I think the approach to fix the issue is wrong. As explained in the link above, the attack requires that attacker can guess the re-use of the token, in order to repply. And to fix the issue, you should use better random source to generate tokens, which I believe is not an issue here. But this change did not do anything with token generations. So if there was issue, it is still there. (I'm not suggesting there was a problem).

In blockwise transfer, if the server is implemented in stateless manner, this approach here is working. However, each query of a next block, have now no relationship to a previous query. They are completely independent. So even if server is implemented to maintain a state it cannot work anymore. I'll explain why this is a problem in a bit..

In RFC7959: Section 3.4 there are few examples of usage of tokens. If you follow the example after "Retrieval of remaining blocks" note in the first sequence diagram, you notice that GET queries of remaining block keeps using the same token. It is not very clear there in RFC, but my assumption here is, that as long as you are requesting pieces of the same block, you should retain the same token.

If you use the same token for the whole blockwise transmission, it does not danger you to the reply attack, because that token exist only once. You only query one block once using the same token. Then next block uses the same token, but different block number. If your token algorithm works correctly, you are not going to use same token ever when requesting the same block for the same resource. You you do repply a packet during the blockwise transaction, the requesting client will just deal that block as a dublicate.

I have an example in my mind why this token rotation is problematic during the blockwise. Lets imagine that we have a resource called "/camera". Every time you do a GET request for it, it gives you a still image. But every request give a new still image from a live feed. So how can you do a blockwise transfer of a image, if you cannot tell which requests belong together? Every block would be a block from a new image, not from the same that started the transaction. So, this token-rotation only works with stateles servers. I breaks things if there is state involved.

The problem I'm facing, is LwM2M related. I'm using Leshan server which uses this library and after an update, things started to break. In LwM2M, you could send GET requests that might result more that one resource to be packet into a payload. If, for example TLS, SenML JSON/CBOR are used, the payload might contain lots of resources. This means that when first request arrives, the payload is formed, and kept in the memory until all is send. Now if we do this stateless, it means that on every request we form a new paylod, split it, and send one block of it. How can you ensure that all the blocks are from similar payload? For example, if one of the resource is a "timestamp" that changes value on every second (or ms). Then every time you form a payload, it will be different than last time.

Last problem, which clearly broke Leshan, is that after this change, LwM2M Composite-Read does not work anymore. In composite-read, the payload of GET request contains a list of resources that should be put into the response payload. In next query, Leshan is not sending the paylod anymore, just the indicator to get Block N=1. But if your token changes, there is no indication that where your request should be targeted?
Here is the example of the first request:
Screenshot from 2023-08-25 13-32-30
Now here is the second request:
Screenshot from 2023-08-25 13-33-32
As you can see, there is no payload, and the token is different. How am I supposed to tell that this request is supposed to continue the previous request?

@boaks
Copy link
Contributor Author

boaks commented Aug 25, 2023

Maybe just as very first information:

  • this PR makes the reuse of token in a blockwise transfer configurable. You may configure the old behavior or the new, see BLOCKWISE_REUSE_TOKEN. The PR changes the default behavior, but it's possible to overwrite that default application specific, if that's required.

  • this PR doesn't fix the vulnerabilities of plain CoAP. If there are vulnerabilities left for encrypted CoAP is then an question, which may get different answers. It may be that implementing RFC 9175 improves that, but for now, I don't see that soon happens.

I will try to provide an answer for the points in your issue, but I will need some time.
In the past, several implementations try to use "protocol artifacts" to make things possible, which are just excluded by RFC7959, including using etags or reuse tokens.
For more general discussion about CoAP (RFC 7252, 7641, 7959 ...) our Wiki contains a couple of useful links, here the IETF core mailing-list and IETF Constrained Application Protocol (CoAP): Corrections and Clarifications will be a good choice.

@boaks
Copy link
Contributor Author

boaks commented Aug 25, 2023

I claim that this change broke Blockwise transfers that might require you to maintain a state. I think this would be the case if the targeted resource might change between queries. If token is not kept the same, you cannot ensure integrity of the content as you cannot distinct between queries. Etag option would ensure that you have received a same representation of the content, but you cannot really use that in the GET query as it is generated by the server.

OK, that beams back ... as I already wrote, with the years a lot of approaches have been implemented, but they change the trade-offs chosen in RFC7959. Same approaches have also be applied to RFC7641, when some want to use CON notifies to achieve a "value stream without gaps". In my sum: that may be all very nice ideas, but the place to discus is the IETF core mailing-list. Alternatively, we added such stuff with a configuration flag, e.g. BLOCKWISE_STRICT_BLOCK1_OPTION and BLOCKWISE_STRICT_BLOCK2_OPTION.

To your claim:

RFC7959 - 2.4 Using the Block2 Option bottom of the page:

The Block2 Option provides no way for a single endpoint to perform
multiple concurrently proceeding block-wise response payload transfer
(e.g., GET) operations to the same resource. This is rarely a
requirement, but as a workaround, a client may vary the cache key
(e.g., by using one of several URIs accessing resources with the same
semantics, or by varying a proxy-safe elective option).

Basically, that means, if the resource is changing, the etag is changing. Old transfers are detected by the changed etag and are canceled and the new presentation of the resource gets available for download. That affects also RFC7641. Alternative definitions would cause the server to allocate (much?) more memory. If I remember well, it's now a coupe of years, when Californium (miss)used the etag for "multiple concurrently". We stopped with that (miss)used. There is no configuration value available to enable that again.

@boaks
Copy link
Contributor Author

boaks commented Aug 25, 2023

In RFC7959: Section 3.4 there are few examples of usage of tokens. If you follow the example after "Retrieval of remaining blocks" note in the first sequence diagram, you notice that GET queries of remaining block keeps using the same token. It is not very clear there in RFC, but my assumption here is, that as long as you are requesting pieces of the same block, you should retain the same token.

RFC7959: Section 3.4

This shows not a GET block2 transfer, instead it shows a observe & get.

The GET uses token 0xfb, and so the first response (etag 6f00f38e). The next response/notify (etag 6f00f392) also uses that token 0xfb, as defined by RFC7641. But the follow-up request for that notify are then using a different token 0xfc. Finally the RFC mentions:

(Note that the choice of token 0xfc in this example is arbitrary;
tokens are just shown in this example to illustrate that the requests
for additional blocks cannot make use of the token of the Observation
relationship. As a general comment on tokens, there is no other
mention of tokens in this document, as block-wise transfers handle
tokens like any other CoAP exchange. As usual, the client is free to
choose tokens for each exchange as it likes.)

If the client is free to choose tokens, a server can't bind state to it!

@boaks
Copy link
Contributor Author

boaks commented Aug 25, 2023

The problem I'm facing, is LwM2M related. I'm using Leshan server which uses this library and after an update, things started to break.

Just to ensure, the token is the root-cause, could you please test that with COAP.BLOCKWISE_REUSE_TOKEN=true?

Did you already open an issue in Leshan (I haven't found that), maybe changing the default value specific for the lwm2m application helps.

But if your token changes, there is no indication that where your request should be targeted?

That's not defined by the token, that's defined by the the clients-identity and the coap-options, mainly the uri and uri-query. There is a second PR #2161, which adds the message code also to that.

@jvermillard
Copy link
Contributor

@SeppoTakalo I confirm this change in behavior breaks Zephyr LWM2M clients and Leshan, I solved it 5 months ago by configuring BLOCKWISE_REUSE_TOKEN to true on my servers. I should have opened a zephyr ticket or sent a discord message

@boaks
Copy link
Contributor Author

boaks commented Aug 25, 2023

@jvermillard

I'm wondering. As I wrote in the leshan issue:
If californium is the CoAP client, then this may have caused the failure.
If californium is the CoAP server, then I don't get it, why that should have caused an issue.

Do you remember the case, when you observed the error?

@sbernard31
Copy link
Contributor

I just take a quick look at this, so I could have missed a lot.

But concerning :

Last problem, which clearly broke Leshan, is that after this change, LwM2M Composite-Read does not work anymore. In composite-read, the payload of GET request contains a list of resources that should be put into the response payload. In next query, Leshan is not sending the paylod anymore, just the indicator to get Block N=1. But if your token changes, there is no indication that where your request should be targeted?

Note that Composite-Read is a FETCH on / (I mean all read-composite request target same URI which is /)
"What should be returned in the response' is driven by the FETCH payload. (which is always the case with FETCH not specific to LWM2M)

So there is maybe a potential issue here.
Not even sure to know what should be the right behavior, I mean :

  • does a FETCH using block2 should send payload for each block request ?
  • If yes what happened when you use block1 and block 2 with FETCH request.

And for more fun, FETCH can also be used with observe. (see : OpenMobileAlliance/OMA_LwM2M_for_Developers#528)

@boaks
Copy link
Contributor Author

boaks commented Aug 25, 2023

Yeep, some of those questions are already pending on IETF Constrained Application Protocol (CoAP): Corrections and Clarifications.

If you like, I will create a PR for leshan to change the default for the token reuse in order to keep the old behavior until the IETF Core clarifies the usage of the token.

@jvermillard
Copy link
Contributor

FYI openthread is not affected: openthread/openthread#7976

@sbernard31
Copy link
Contributor

My guess : we found some issue mainly with FETCH and blockwise which could take long time before to be solve.
So this is not only a LWM2M / Leshan issue, this is also a CoAP/Californium issue.

So I guess the question are :

  1. does BLOCKWISE_REUSE_TOKEN=false really help to solve Response Delay and Mismatch Attack ?
    (Personnally, I don't know but I see that at least @SeppoTakalo seems not to be sure of that)

  2. IF we think it does not OR we are not very sure, then maybe better to also move Californium to BLOCKWISE_REUSE_TOKEN=true as default.

  3. ELSE IF we think it helps to solve this attacks, let's keep BLOCKWISE_REUSE_TOKEN=false as default behavior but in this case, all users which are using FETCH with blockwise should use BLOCKWISE_REUSE_TOKEN=false and so there will be vulnerable to Response Delay and Mismatch Attack, so that sounds not so good too.

(changing or not the default behavior in Leshan depends a lot of answers to questions above ☝️, so we can see that after 🙂 )

@boaks
Copy link
Contributor Author

boaks commented Aug 25, 2023

I'm aware of those corrclar issues.

FMPOV, if the client is defined to be free to choose the token, then no server state could be related to.

If some new RFC seems to be undefined, I would rather wait on their clarification before manifest, the choose a token is required in order to have some of the new stuff working. I think, it doesn't help, if that is done in advance of the answer. I will try to contact Jon from libcoap in order to get his opinion, maybe also Olaf (libcoap).

@cabo
Copy link

cabo commented Aug 25, 2023

I have been alerted to the existence of this discussion.
If I understand the problem correctly (correlating requests), the solution is RFC 9175 Request-Tag.
I don't think the approach "we don't know when that will be implemented, so instead we invent our own protocol" (giving the token semantics it does not have) will help.
But maybe I don't fully understand the discussion yet.
Instead of everybody scrambling to implement the unspecified deviant protocol, wouldn't it be easier if everyone implemented RFC 9175 Request-Tag?

@boaks
Copy link
Contributor Author

boaks commented Aug 25, 2023

If RFC 9175 solves the issue, I think LwM2M needs to consider that.

@boaks
Copy link
Contributor Author

boaks commented Aug 25, 2023

wouldn't it be easier if everyone implemented RFC 9175 Request-Tag?

FMPOV, the question is, who needs it, and who will implement it.

The misuse of the token, caused by the behavior of Californium before 3.8 is already implemented.
That makes the idea of document the misuse as Interim solution attractive.

@sbernard31
Copy link
Contributor

@cabo,

I don't think the approach "we don't know when that will be implemented, so instead we invent our own protocol" (giving the token semantics it does not have) will help.

Before 3.8 californium was reusing the token. (This is not mandatory but I guess this is not forbidden too)

It seems some implementation decide to use that token to identify the "block exchange".
I guess for several reasons :

  • maybe they didn't read the spec carefully (not so good reason)
  • they was misleading by some example who reuse token. (not so good reason)
  • they just see californium reuse token and so presume that was OK to rely on it (not good reason)
  • or they didn't find a way to implement FETCH + block2. (especially as Californium don't send payload in block2 request with FETCH, @boaks could you confirm ?) (maybe a good reason ? to be investigate)

But since californium 3.8, user can choose to reuse token or not, using BLOCKWISE_REUSE_TOKEN option and change the default behavior to "not reuse token".

And visibly this breaks some devices.

So the question is not really about "implementing our own protocol".
It is more what is the best default behavior (between 2 RFC compliant behavior).
And so to decide, I try to understand if BLOCKWISE_REUSE_TOKEN=false solve a real issue or not.
And to clarify my opinion, I'm not against breaking compatibility with device which does follow the specification but only we can provide a better and working alternative.

So maybe before to decide if we should change or not the default behavior, we should identify the real problem with BLOCKWISE_REUSE_TOKEN=false and clarify the benefits too.
Because at least for me this is not still perfectly clear.

@boaks
Copy link
Contributor Author

boaks commented Aug 28, 2023

I would prefer to move the discussion to a separate issue to have it more transparent for others.

Before 3.8 californium was reusing the token. (This is not mandatory but I guess this is not forbidden too)

Yes, but it should always be clear, that a client is free to do so. If a coap-server implementation breaks using different tokens it's not a compliant implementation.

And visibly this breaks some devices.

FMPOV, it breaks only not compliant devices. To keep not compliant devices running, would makes it hard to change anything. What I usually try is to add a configuration, to switch back to the old behavior, as here.

we should identify the real problem

agreed. Unfortunately, I'm neither common with FETCH nor the Zephyr LwM2M client.

could you confirm ?

Californium sends the payload of the fetch only for the first request. For none-block1 case it will be not too hard to change that. But that will increase the data volume. So, if it goes in that direction, I guess we need the next configuration flag ;-):

As the discussion send payload in block2 request already pointed out, there is an conflict of "stateless FETCH" an "reducing the data volume". And with block1 it get's even more complicated.

So, is the Zephyr implementation really a stateless one? Are the users there really willing to spend more data? Because, if the Zephyr implementation isn't stateless nor the users are willing to spend more data, this will not be an solution either.

@sbernard31
Copy link
Contributor

a client is free to do so. If a coap-server implementation breaks using different tokens it's not a compliant implementation.

I understand that too.

it breaks only not compliant devices.

FMPOV, This is what we need to clarify, if this is true. There is nothing to change in Californium OR Leshan.

To keep not compliant devices running, would makes it hard to change anything.

I agree

What I usually try is to add a configuration, to switch back to the old behavior, as here.

Some thought about it, from Leshan Project : https://github.com/eclipse-leshan/leshan/wiki/How-Leshan-should-behave-with-Non-Compliant-Implementations-%3F

@cabo
Copy link

cabo commented Aug 28, 2023 via email

@boaks
Copy link
Contributor Author

boaks commented Aug 29, 2023

support several concurrent ReadComposite request

If that is a block2 blockwise transfer, the RFC7959 already states, that this is not supported.

As I cited above:

RFC7959 - 2.4 Using the Block2 Option bottom of the page:

The Block2 Option provides no way for a single endpoint to perform
multiple concurrently proceeding block-wise response payload transfer
(e.g., GET) operations to the same resource.

So, if the ReadComposites are addressing the same resource, then concurrent operation is not supported. The term resource reflects here in my interpretation mainly the resource path.

@sbernard31
Copy link
Contributor

The term resource reflects here in my interpretation mainly the resource path.

I have some doubt that maybe this could be interpreted as URI + payload.

So reading RFC8132§2. FETCH Method a bit more , I guess that same resource probably also means same URI (but not crystal clear to me), e.g. :

The CoAP FETCH method is used to obtain a representation of a
resource, specified by a number of request parameters. Unlike the
CoAP GET method, which requests that a server return a representation
of the resource identified by the effective request URI (as defined
by [RFC7252]), the FETCH method is used by a client to ask the server
to produce a representation as described by the request parameters
(including the request options and the payload) based on the resource
specified by the effective request URI. The payload returned in
response to a FETCH cannot be assumed to be a complete representation
of the resource identified by the effective request URI, i.e., it
cannot be used by a cache as a payload to be returned by a GET
request.

@boaks
Copy link
Contributor Author

boaks commented Aug 29, 2023

The term is RFC7959 and that's the base for the implementation decision.

RFC8132 was later and never really considered. We added mainly just the message codes and some methods to override the fetch operation. FMPOV, it's still in discussion. Once that discussion concluded, the implementation will be adapted.

@mrdeep1
Copy link

mrdeep1 commented Aug 29, 2023

I think we have 5 underlying activities here which are causing some confusion when overlapped.

Tokens

RFC 7252 5.3.1 Token

   An endpoint receiving a token it did not generate MUST treat the
   token as opaque and make no assumptions about its content or
   structure.

It appears that the (non-compliant) server is effectively including the Token in its 'cache-key' to determine which response data to send back when there are concurrent requests. I think it is dangerous for a server to do (assume) that as we have seen with a client that varies the Token.

Having the same token to get all of the body for the multiple Block2 payloads can be useful diagnosing Block2 issues and is valid. libcoap does this with "Base Token + left shifted block NUM" to keep the tokens unique across a multiple Block2 transfer.

Request-Tag RFC9175

With this sent in a request (different between each new request to a resource), as it is a part of the server's 'cache-key', concurrent responses are supported and the server can differentiate between which resource response is required, rather than the unsafe method of using the Token as part of the 'cache-key'.

There are discussions about whether Request-Tag should be sent in all requests or not. core-wg/corrclar#28

FETCH RFC8132 Block2 only

This adds in an extra level of complexity when looking at how to handle Block2 responses when compared to GET as the data payload of the request dictates what the server responds with. Two effectively concurrent requests to the same resource with different data payloads which gives 2 different responses starts to get messy when Block2 responses are needed.

There is debate over whether the data payload should be included with each FETCH request for the next Block2 response. core-wg/corrclar#27

FETCH RFC8132 Block1 and Block2

The main debate here is whether the entire Block1 sequence needs to be repeated when requesting the next Block2 from the server. core-wg/corrclar#28

ETag

If ETag is used in a response and the response changes over time, then the ETag needs to be different whenever the response changes. The client can then detect the change and process the new response accordingly (if the ETag changes during a Block2 transfer, it is likely that the client will need to re-request the information to get the new updated body).

ETag cannot be used in a request as a differentiator between concurrent responses as it is likely to solicit a 2.03 response instead of a 2.05 response.

Request-Tag may fail here when a client client tries to retrieve remaining blocks of a Block2 transfer accessing data that has now changed, unless the server has cached the previous data.

@sbernard31
Copy link
Contributor

@mrdeep1 thx for that clarification.

@boaks
Copy link
Contributor Author

boaks commented Aug 29, 2023

PFC 2252 = RFC 7252?

@boaks
Copy link
Contributor Author

boaks commented Aug 29, 2023

Usage of RFC9175:
The pain is not only CoAP related, the issue caused an impact in LwM2M.
If the solution would be RFC 9175, then this would require to update LwM2M as well.
I'm not sure, who is still working on IETF/core and LwM2M. Hannes has changed the job, so I'm afraid, he will not be able to care.

@SeppoTakalo

Are you still interested? Could you provide the infos about the zephyr implementation?

@sbernard31
Copy link
Contributor

Usage of RFC9175 :

It solves the concurrent access issue.
But this doesn't solve the stateless implementation support because using Request-Tag means that you need to store a state at server side, right ?

@cabo
Copy link

cabo commented Aug 30, 2023

Usage of RFC9175 :

It solves the concurrent access issue. But this doesn't solve the stateless implementation support because using Request-Tag means that you need to store a state at server side, right ?

Indeed, if we want to enable clients to do a FETCH without sending the entire request body again for every block, the server needs to keep state, corrclar 27/28. (This gets ridiculous when Block1 is used with Block2, but still is an issue for Block2 only.)

Blockwise tries to be open both to stateless servers and servers that want to keep state; so if the client behavior needs to depend on which of these is the case, we'd need additional signaling.

@sbernard31
Copy link
Contributor

@cabo,

This gets ridiculous when Block1 is used with Block2,

Just to be sure, you mean this is ridiculous to re-send all payload using block1 for each block2 request, right ? (core-wg/corrclar#28)

@sbernard31
Copy link
Contributor

Blockwise tries to be open both to stateless servers and servers that want to keep state; so if the client behavior needs to depend on which of these is the case, we'd need additional signaling.

This additional signaling should ideally be part of RFC8132, right ? (I mean we should not need additional RFC ?)

@mrdeep1
Copy link

mrdeep1 commented Aug 30, 2023

Just to be sure, you mean this is ridiculous to re-send all payload using block1 for each block2 request, right ?

For stateful servers, subsequent FETCH requests for the next Block2 do not need the FETCH data as the server can build a cache-key that contains the cachable options that points to the appropriate data that needs a Block2 slice returned. BUT, if there is a chance that there are multiple concurrent FETCH to the same resource, but with different FETCH data, then the request will need a Request-Tag per different FETCH data so that the server can include this in the cache-key to differentiate which data-set should be used for getting the appropriate slice.

For non-stateful servers, the entire FETCH request needs to be repeated so the server can re-generate the data and then send back the appropriate slice of the data based on the block size and requested block number. This includes any Block1 set of transfers handling the FETCH data. The server will need to maintain some sort of state during the Block1 transfers to be able to assemble the entire FETCH data (note that if there are Block1 transfers and there is any chance of concurrent FETCH, then you must use Request-Tag to differentiate between the discrete FETCHes so server can correctly assemble the Block1s during the transitional state assemble phase).

Note that if Observe is being used, the server will need to maintain something to be able to generate any unsolicited responses, and that if the client wants to explicitly de-register the Observe, this has to be the original register FETCH request (including data), with just the original Observe option updated to de-register.

@mrdeep1
Copy link

mrdeep1 commented Aug 30, 2023

Blockwise tries to be open both to stateless servers and servers that want to keep state; so if the client behavior needs to depend on which of these is the case, we'd need additional signaling.

This additional signaling should ideally be part of RFC8132, right ? (I mean we should not need additional RFC ?)

Agreed. the client has no way of knowing that the server is stateless or not unless there is some sort of out of band knowledge, or the server signals (new/bis RFC) its capabilities.

@cabo
Copy link

cabo commented Aug 30, 2023

This additional signaling should ideally be part of RFC8132, right ? (I mean we should not need additional RFC ?)

Well, RFC 8132 is published, so unless we can find the signaling in there or in another published RFC, we'll need to do something. I'll bring this up in today's CoRE WG Interim meeting.

@chrysn
Copy link

chrysn commented Aug 30, 2023

I don't want to dogpile on here (as cabo already made the good points, and I hope that this option stays off by default, so that breakage is not silent), but I'd like to weigh in on the Zephyr side, providing more evidence points that their implementation does break when interacting with other implementations. I failed to find a reference to Zephyr's issue tracker, where the reliance on tokens in block-wise is discussed. Could you give me a pointer?

@mrdeep1
Copy link

mrdeep1 commented Aug 30, 2023

In the same way that Extended Tokens (RFC8974 Section 2.2 Discovering Support) and Q-Block (RFC9177 Section 4.1 Properties of the Q-Block1 and Q-Block2 Options) test for functionality support in the server, I guess we could suggest something like (for stateless FETCH support) on the client

-> FETCH + Block2 option (num = 0, szx = 0) + data against resource URI client is going to do FETCH against
<- check response - is resource valid for FETCH etc.  If OK continue.
-> FETCH + Block2 option (num = 1, szx = 0)  with no data against same resource URI
<- check response  - ok - supports stateful, failure is stateless (or perhaps something else?)

@sbernard31
Copy link
Contributor

I guess we could suggest something like (for stateless FETCH support) on the client

That sounds a good possible way.

@chrysn
Copy link

chrysn commented Aug 30, 2023

As Request-Tag has been thrown around here as a solution to what apparently was done with token matching before: A Request-Tag will only ever distinguish two requests. If two requests don't match in their block key (eg. have a different Uri-Path), the request tags will not magically make them match.

@boaks
Copy link
Contributor Author

boaks commented Aug 31, 2023

magically

Yep.
For me the outcome of this discussion is:

  • I don't know, what is specified in LwM2M nor what is exactly implemented in the Zephyr lwm2m client.
  • without that, the discussion is more theoretical, then it could help here with this specific "unsharp" issue.
  • the feature this PR added, stop reusing the token on a blockwise transfer, is in line with the specification. It harms mainly non-compliant implementations. For short term work-around someone may configure Californium the reuse the token again, but for long- and midterm that kind of token usage MUST be omitted.

Everything else, details of the RFC 8132 or contribution of RFC9175 to a solution of this issue, is for me out of the scope of discussing the re-use of tokens in a blockwise transfer. If that stuff requires attention, then I think, a separate issue (as #2168) makes that work easier.

Thanks to all, who contributed their knowledge, experience and opinions.

@sbernard31
Copy link
Contributor

I don't know, what is specified in LwM2M nor what is exactly implemented in the Zephyr lwm2m client.
without that, without that, the discussion is more theoretical, ...

@boaks, about LWM2M (not Zephir), I tried to answer over the various discussions we had about it but I can try to summarize here :

If you want to know more, there is not too much to read about it in LWM2M specification :

IF you don't know so much about LWM2M, you should at least try to understand :

  • LWM2M object tree, maybe reading this OR this can help you)
  • and also that for FETCH : a LWM2M client acts as CoAP server and vice versa.

For people we know a bit about LWM2M, this looks like :

You send a FETCH on / with a payload containing LWM2M nodes you want to read, e.g using ct=SENML-JSON:

[{"n":"/3/0/0"},
{"n":"/3/0/9"},
{"n":"/1/0/1"}]

Then you can get an answer containing LWM2M nodes value in the payload, e,g. using ct=SENML-JSON :

{"n":"/3/0/0", "vs":"Open Mobile Alliance"},
{"n" :"/3/0/9", "v":95},
{"n":"/1/0/1", "v":86400}]

In this example, we only ask for LWM2M Single resource but you can ask for object, object instance, any kind of resource, resource instance or even the whole LWM2M object tree using `"n":"/")

About Request-Tag, I just see that since LWM2M v1.1.x, the specification says :

The CoAP Request-Tag Option [CoAP_ERT] SHOULD be used to detect interchange of blocks between different blockwise requests to the same resource over unreliable transport.

@boaks
Copy link
Contributor Author

boaks commented Aug 31, 2023

I tried to answer over the various discussions

Sure, my point is more:
The use of the token in CoAP seems to be clear now. So at least the discussion for this PR is done for me.
If there is more to discus, that should be done under a better title, but not here in this PR. this PR is about
being able to not reuse the token in a blockwise transfer.
And that's done for me.
And because this turned into a long discussion even without more comments by the original reporter, I don't plan to spend here more time in.

@RomainPelletant
Copy link

RomainPelletant commented Sep 6, 2023

For information, the related Zephyr PR is here
This discussion was really instructive so thanks for your time.

@SeppoTakalo
Copy link

I have been away for a week, so I did not find time to comment. Sorry about that.

Thank you for all the valuable information here. It definitely looks like this falls in between not so properly defined behaviour of CoAP.
As pointed out, the Request-Tag seem to fix the FETCH/PATCH queries.

However, I still don't think its use is properly defined, and I'm mostly considering LwM2M usecase here, when the response payload might be generated, and each time it is generated it might be different. So on most cases, I need to maintain a state so I can ensure the integrity of the payload.
The RFC 9175 says:

The Request-Tag option MUST NOT be present in response messages.

So clearly whoever initiates a GET query must already know that response might be a block-wise transfer and append the Request-Tag option. Or (worst case and against recommendation) append a Request-Tag into all queries. Otherwise, if you send a normal GET and the server splits it into multiple blocks and sends BLOCK N=0, then what do you do for the next block query? Do you generate a Request-Tag or not? If you generate a Request-Tag, this is actually a new transfer and you need to start block block N=0.

When does Californium add the Request-Tag into queries? Or is it implemented?

As I'm one of the contributor to Zephyr's LwM2M client, I can alter the behavior here. But going through this long discussion have not really helped. You seem to have concluded that re-using the token is wrong, but what should I then use to match the following GET query to a previous one, if I need to maintain a state?

@mrdeep1
Copy link

mrdeep1 commented Sep 7, 2023

To accurately determine which specific body of data you want to get from the server, the request has to (at a minimum) include the Uri-Path and Request-Tag options so that the server can use the Request-Tag to differentiate requests to the same URI (resource) and use the Block2 option to get the appropriate slice.

When or when not to add in the Request-Tag option on the request (or a way of signalling that a Request-Tag is required) is currently a subject of debate. Certainly if the request has Block1 or Block2 options, then a Request-Tag option needs to be added. Otherwise, currently, you have to send a Request-Tag with every request, or re-request using Request-Tag on detecting a response has used Block2.

@boaks
Copy link
Contributor Author

boaks commented Sep 7, 2023

I'm still not sure, why it is considered, that a lwm2m-server uses concurrent GET/FETCH operations (Composite-Read) for a single lwm2m-client. From my point of view, there is not that much benefit in that. If the server finishes the blockwise transfer before starting the next, the current RFC 7959 works pretty well. Using ETAG also ensures, that if the resource is changing during the transfer, the client gets noticed about that.
It may be a topic, how the lwm2m-server "serializes" the requests, but I would not move then that question into concurrent blockwise operations.
(By the way PR #2161 enables to have concurrent FETCH and GET operations, even if I would serialize them also.)

@sbernard31
Copy link
Contributor

My understanding,

You seem to have concluded that re-using the token is wrong, but what should I then use to match the following GET query to a previous one, if I need to maintain a state?

If no request-tag used, you can not have concurrent block transfer for same resource/same peer.
So you should match by peer identity / resource URI.
If you face concurrent transfer, cancel previous one (I think this is the californium choice) or reject second one.
As explained :
"As it has always been, a server that can only serve a limited number of block-wise operations at the same time can delay the start of the operation by replying with 5.03 (Service Unavailable) and a Max-Age indicating how long it expects the existing operation to go on, or it can forget about the state established with the older operation and respond with 4.08 (Request Entity Incomplete) to later blocks on the first operation."

If request-tag is used you can match by peer Identity / resource URI / request-tag and you can support concurrent block transfer for same resource of same peer.

(In all case, FETCH doesn't work with stateless implementation with current RFC states)

So unless I missed something request-tag will not solve your issue unless you really want to support concurrent block transfer.

When does Californium add the Request-Tag into queries? Or is it implemented?

Not implemented and It seems to be not planned : #2174

Or (worst case and against recommendation) append a Request-Tag into all queries.

This seems to be the simple choice and the choice made by libcoap :

"For libcoap, I took the decision that by default, Request-Tag is sent with every request (even if Block1 is not being used and Block2 was not defined) "just in case" there is a Block2 sized response. The client application can however disable the CoAP stack doing this, only sending (done by CoAP stack) Request-Tag if Block1 or Block2 were defined."

But I agree this sounds not recommended by the RFC :

  • "The Request-Tag option is only used in the request messages of block-wise operations."
  • "... this means sending messages without Request-Tag options whenever possible "

Even if that should work because :
"Note that Request-Tag options can be present in request messages that carry no Block options (for example, because a proxy unaware of Request-Tag reassembled them)."

Otherwise, if you send a normal GET and the server splits it into multiple blocks and sends BLOCK N=0, then what do you do for the next block query? Do you generate a Request-Tag or not? If you generate a Request-Tag, this is actually a new transfer and you need to start block block N=0.

I guess this is another working possible option, but I can understand that at first sight this could be considered as not ideal.

@boaks
Copy link
Contributor Author

boaks commented Sep 7, 2023

When does Californium add the Request-Tag into queries? Or is it implemented?

It's not implemented. For me concurrent blockwise transfers have no general benefit.
There maybe a benefit assuming that the blockwise resources are changing fast, but that is not too frequently the case. And also there you need to decide, if you really want to spend the bandwidth into transfer an old resource representation.

At least for now, Californium is unfortunately not a AI, which implements the stuff by importing RFCs on itself ;-).

So it requires some contribution.

In the years ago I had a paid job for doing so, since a year this is not longer the case. I currently try to limit the invested free time in open source to 4h a week. That's eaten up by answering questions and providing bugfixes.
I'm not sure, if that changes.At least, I don't see that.
There is even one more consequence of that time shortage:
Contribution will require also to do some "quality work", because this would otherwise also fall into that 4h.

@chrysn
Copy link

chrysn commented Sep 7, 2023

However, I still don't think its use is properly defined, and I'm mostly considering LwM2M usecase here, when the response payload might be generated, and each time it is generated it might be different. So on most cases, I need to maintain a state so I can ensure the integrity of the payload.

(Trying to only add what has not been said)

All the CoAP server can do to protect the integrity of the payload is to set an ETag (eg. a hash of the full body). It is the CoAP client that must then verify that all the ETags match, and will thus verify the integrity.

Unless there are concurrent requests (in which case the client will set a Request-Tag), the server that receives a GET for FETCH request for Block2:1/-/.. (i.e. a non-initial one) does not know which "instance" of the larger request this comes for. But it doesn't have to: It will just pick the latest one it has (for the given client and the given set of non-block options), trusting that the client really doesn't do concurrent requests. If the client did try concurrent requests, it hopefully at least fails when checking the ETags -- there's only so much the server can do here.

How long the server keeps that "instance" around will depend on its capabilities, but at any rate, the lookup criterion are the client's address and the relevant options. (7959 had no precise definition of "relevant" options; 9175 calls the criterion being "matchable").

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants