Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LwM2mNodeSenMLDecoder sometimes fails trying to convert from double to long #1170

Closed
sonny82 opened this issue Dec 2, 2021 · 26 comments · Fixed by #1174
Closed

LwM2mNodeSenMLDecoder sometimes fails trying to convert from double to long #1170

sonny82 opened this issue Dec 2, 2021 · 26 comments · Fixed by #1174
Labels
core Impact core of Leshan enhancement Improvement of existing features

Comments

@sonny82
Copy link

sonny82 commented Dec 2, 2021

Hi,

We are using Leshan 2.0.0-M5.
In the log files I sometimes see that when reading currentTime resource from device object (3/0/13) decoding sometimes fails with the following exception:

08:50:47 [CoapServer(main)#27] ERROR c.n.s.d.s.s.d.ManagerImpl - Endpoint 94193A0407000000: Read request for path 3 failed with exception: {}
org.eclipse.leshan.core.request.exception.InvalidResponseException: Unable to decode response payload of request [ReadRequest [path=/3 format=null]] from client [94193A0407000000]
at org.eclipse.leshan.server.californium.request.LwM2mResponseBuilder.decodeCoapResponse(LwM2mResponseBuilder.java:458)
at org.eclipse.leshan.server.californium.request.LwM2mResponseBuilder.visit(LwM2mResponseBuilder.java:122)
at org.eclipse.leshan.core.request.ReadRequest.accept(ReadRequest.java:187)
at org.eclipse.leshan.server.californium.request.RequestSender$2.buildResponse(RequestSender.java:226)
at org.eclipse.leshan.core.californium.AsyncRequestObserver$1.onResponse(AsyncRequestObserver.java:60)
at org.eclipse.leshan.core.californium.CoapAsyncRequestObserver.onResponse(CoapAsyncRequestObserver.java:99)
at org.eclipse.californium.core.coap.Request.setResponse(Request.java:931)
at org.eclipse.californium.core.server.ServerMessageDeliverer.deliverResponse(ServerMessageDeliverer.java:258)
at org.eclipse.californium.core.network.stack.BaseCoapStack$StackTopAdapter.receiveResponse(BaseCoapStack.java:217)
at org.eclipse.californium.core.network.stack.AbstractLayer.receiveResponse(AbstractLayer.java:89)
at org.eclipse.californium.core.network.stack.ExchangeCleanupLayer.receiveResponse(ExchangeCleanupLayer.java:105)
at org.eclipse.californium.core.network.stack.ObserveLayer.receiveResponse(ObserveLayer.java:139)
at org.eclipse.californium.core.network.stack.BlockwiseLayer.receiveResponse(BlockwiseLayer.java:776)
at org.eclipse.californium.core.network.stack.ReliabilityLayer.receiveResponse(ReliabilityLayer.java:308)
at org.eclipse.californium.core.network.stack.AbstractLayer.receiveResponse(AbstractLayer.java:89)
at org.eclipse.californium.core.network.stack.BaseCoapStack.receiveResponse(BaseCoapStack.java:146)
at org.eclipse.californium.core.network.CoapEndpoint$1.receiveResponse(CoapEndpoint.java:314)
at org.eclipse.californium.core.network.UdpMatcher$4.run(UdpMatcher.java:457)
at org.eclipse.californium.elements.util.SerialExecutor$1.run(SerialExecutor.java:290)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: org.eclipse.leshan.core.node.codec.CodecException: Invalid content [1.638435E9] for type TIME for path /3/0/13
at org.eclipse.leshan.core.node.codec.senml.LwM2mNodeSenMLDecoder.parseResourceValue(LwM2mNodeSenMLDecoder.java:504)
at org.eclipse.leshan.core.node.codec.senml.LwM2mNodeSenMLDecoder.extractLwM2mResources(LwM2mNodeSenMLDecoder.java:426)
at org.eclipse.leshan.core.node.codec.senml.LwM2mNodeSenMLDecoder.parseRecords(LwM2mNodeSenMLDecoder.java:193)
at org.eclipse.leshan.core.node.codec.senml.LwM2mNodeSenMLDecoder.decode(LwM2mNodeSenMLDecoder.java:97)
at org.eclipse.leshan.core.node.codec.DefaultLwM2mDecoder.decode(DefaultLwM2mDecoder.java:156)
at org.eclipse.leshan.core.node.codec.DefaultLwM2mDecoder.decode(DefaultLwM2mDecoder.java:138)
at org.eclipse.leshan.server.californium.request.LwM2mResponseBuilder.decodeCoapResponse(LwM2mResponseBuilder.java:450)
... 21 common frames omitted
Caused by: java.lang.IllegalStateException: Can not convert 1.638435E9 to long safely : Unsupported number java.lang.Double
at org.eclipse.leshan.core.util.datatype.NumberUtil.numberToLong(NumberUtil.java:59)
at org.eclipse.leshan.core.node.codec.senml.LwM2mNodeSenMLDecoder.numberToLong(LwM2mNodeSenMLDecoder.java:527)
at org.eclipse.leshan.core.node.codec.senml.LwM2mNodeSenMLDecoder.parseResourceValue(LwM2mNodeSenMLDecoder.java:493)
... 27 common frames omitted

Best regards,
Sonny

@sbernard31
Copy link
Contributor

Are you using senml-json or senml-cbor ?

@sbernard31
Copy link
Contributor

sbernard31 commented Dec 2, 2021

could you share the exact payload (byte for cbor / string for json) sent by the client ?
(or a wireshark capture)

@sonny82
Copy link
Author

sonny82 commented Dec 2, 2021

Hmm, I don't set a specific ContentFormat in ReadRequest, so I'm not sure which one is used by default.
I'll try to capture the exact payload but the problem is this happens rarely (maybe once or several times per day).

Note: I only 4 devices for development/testing, we're not in production yet.

@sbernard31
Copy link
Contributor

Hmm, I don't set a specific ContentFormat in ReadRequest, so I'm not sure which one is used by default.

You can put in the ReadRequest the format you would like to be used for the response.
If you set nothing, the client will chose its preferred format but in all case the format used to encode the response payload is in the response.

Eg. :

Frame 218: 64 bytes on wire (512 bits), 64 bytes captured (512 bits) on interface lo, id 0
Ethernet II, Src: 00:00:00_00:00:00 (00:00:00:00:00:00), Dst: 00:00:00_00:00:00 (00:00:00:00:00:00)
Internet Protocol Version 4, Src: 127.0.0.1, Dst: 127.0.0.1
User Datagram Protocol, Src Port: 52954, Dst Port: 5683
Constrained Application Protocol, Acknowledgement, 2.05 Content, MID:32713
    01.. .... = Version: 1
    ..10 .... = Type: Acknowledgement (2)
    .... 1000 = Token Length: 8
    Code: 2.05 Content (69)
    Message ID: 32713
    Token: 7c515cfbdc858367
    Opt Name: #1: Content-Format: application/vnd.oma.lwm2m+tlv
        Opt Desc: Type 12, Elective, Safe
        1100 .... = Opt Delta: 12
        .... 0010 = Opt Length: 2
        Content-type: application/vnd.oma.lwm2m+tlv
    End of options marker: 255
    Payload: Payload Content-Format: application/vnd.oma.lwm2m+tlv, Length: 6
        Payload Desc: application/vnd.oma.lwm2m+tlv
        [Payload Length: 6]
    [Uri-Path: /3/0/13]
    [Request In: 217]
    [Response Time: 0.014677148 seconds]
Lightweight M2M TLV, Device (1 element)

See in my example : Content-type: application/vnd.oma.lwm2m+tlv

@sbernard31
Copy link
Contributor

Just speculation,

I'll try to capture the exact payload but the problem is this happens rarely (maybe once or several times per day).

It seems at some point we get a double (a floating point number) in Leshan where we expect a long (an integer).
In cbor number format is pretty clear but in json this is more fuzzy.
So I will bet that your application rather use senml+json ?

My guess is when the timestamp ends with several 0, your client will use the exponential format.
E.g. in your stacktrace : 1.638435E9 for 1638435000 (for 2021-12-02T08:50:00.000Z)

I also guess our json library (jackson) convert a timestamp like 1.638435E9 in double and 1638435000 in long.

Why sometime it works ? because when timestamp does ends with many 0 maybe you client use the "normal notation" like 1638451878

Anyway, this is too many guess and I would like to be sure so I wait for more input from you.

@sbernard31
Copy link
Contributor

Any news about this ? I'm really curious to know what happens as maybe we should be more flexible here. 🤔

@sonny82
Copy link
Author

sonny82 commented Dec 3, 2021

Not yet. I tried to record the messages using Wireshark during the night to catch this error but system crashed somewhere at midnight due to a lack of memory..
I'll post Wireshark capture as soon as I'll have it.

@sonny82
Copy link
Author

sonny82 commented Dec 3, 2021

Content format is "Content-Format":"application/senml+json".

@sbernard31
Copy link
Contributor

OK so I will try to test how Leshan behave with JSON number exponential notation.

@sbernard31
Copy link
Contributor

I'm able to reproduce it : #1174.

I will try to add a safe conversion in NumberUtils.

@sbernard31
Copy link
Contributor

@sonny82 #1174 should fix your issue.

@sbernard31 sbernard31 added core Impact core of Leshan enhancement Improvement of existing features labels Dec 6, 2021
@sonny82
Copy link
Author

sonny82 commented Dec 6, 2021

Great, thanks.

@sonny82 sonny82 closed this as completed Dec 6, 2021
@sbernard31
Copy link
Contributor

This is not yet available in master
If you can test the PR and tell me if it works for you, this would be great.
If you can not test, let me know this way I don't wait for you before to merge. 😉

@sonny82
Copy link
Author

sonny82 commented Dec 6, 2021

I'm using senml-cbor now. In order to test this properly I would have to let it run for at least one day, but we are solving some other issues now, so I'm not sure when I'll be able to do it. Please, don't wait with the merge (regarding the fact hat you are able to reproduce this issue in a unit test). In any case, I'll do some more testing when I'll have time and I'll also test using senml-json. BR

@sbernard31
Copy link
Contributor

@sonny82 could I ask how you are using Leshan ?
If it makes sense maybe you could consider to comment at : #830

@sonny82
Copy link
Author

sonny82 commented Dec 6, 2021

Sure.
We are working as an outsourcing partner for a company that has an existing generic backend for managing different kinds of devices (electric meters, water meters, heat meters, ect). On this specific project they have to integrate and manage around 13.000 heat meters, where each meter has an attached NB-IoT module that supports LWM2M 1.1.
Meter data will be sent using MQTT-SN protocol, LWM2M (Leshan 2.0) will be used for bootstrapping (LWM2M DM server + MQTT-SN server) and for management of the NB-IoT module (FW upgrades, tracking battery and signal strength information, configuration of readout and report intervals, etc.).
Devices will be battery powered and will use queue mode.

In the first version we will not use Leshan cluster (not sure if it's really needed as the number of devices is not that big).
We will use DTLS. Due to security reasons we are not allowed to use existing Redis or file based security stores.
We will retrieve keys from a secure key store on demand when needed and then use in-memory security and configuration store.
Communication with the backend will be done using the REST API. We developed a simple API using async servlets.
End customer doesn't need GUI (backend has it's own GUI), however, demo-GUI will also be deployed for easier admin-management.
We are not in production yet.
I'll also write a comment to #830.

@sonny82
Copy link
Author

sonny82 commented Dec 8, 2021

hi,

Now we found a similar problem using SENM_CBOR: Can not convert 3000.0 to long safely

[CoapServer(main)#6] ERROR c.n.s.d.s.s.d.ModuleManagerImpl - Endpoint 94193A04070000EA: Read request for path 3 failed with exception: {}
org.eclipse.leshan.core.request.exception.InvalidResponseException: Unable to decode response payload of request [ReadRequest [path=/3 format=SENML_CBOR(112)]] from client [94193A04070000EA]
at org.eclipse.leshan.server.californium.request.LwM2mResponseBuilder.decodeCoapResponse(LwM2mResponseBuilder.java:458)
at org.eclipse.leshan.server.californium.request.LwM2mResponseBuilder.visit(LwM2mResponseBuilder.java:122)
at org.eclipse.leshan.core.request.ReadRequest.accept(ReadRequest.java:187)
at org.eclipse.leshan.server.californium.request.RequestSender$2.buildResponse(RequestSender.java:226)
at org.eclipse.leshan.core.californium.AsyncRequestObserver$1.onResponse(AsyncRequestObserver.java:60)
at org.eclipse.leshan.core.californium.CoapAsyncRequestObserver.onResponse(CoapAsyncRequestObserver.java:99)
at org.eclipse.californium.core.coap.Request.setResponse(Request.java:931)
at org.eclipse.californium.core.server.ServerMessageDeliverer.deliverResponse(ServerMessageDeliverer.java:258)
at org.eclipse.californium.core.network.stack.BaseCoapStack$StackTopAdapter.receiveResponse(BaseCoapStack.java:217)
at org.eclipse.californium.core.network.stack.AbstractLayer.receiveResponse(AbstractLayer.java:89)
at org.eclipse.californium.core.network.stack.ExchangeCleanupLayer.receiveResponse(ExchangeCleanupLayer.java:105)
at org.eclipse.californium.core.network.stack.ObserveLayer.receiveResponse(ObserveLayer.java:139)
at org.eclipse.californium.core.network.stack.BlockwiseLayer.receiveResponse(BlockwiseLayer.java:776)
at org.eclipse.californium.core.network.stack.ReliabilityLayer.receiveResponse(ReliabilityLayer.java:308)
at org.eclipse.californium.core.network.stack.AbstractLayer.receiveResponse(AbstractLayer.java:89)
at org.eclipse.californium.core.network.stack.BaseCoapStack.receiveResponse(BaseCoapStack.java:146)
at org.eclipse.californium.core.network.CoapEndpoint$1.receiveResponse(CoapEndpoint.java:314)
at org.eclipse.californium.core.network.UdpMatcher$4.run(UdpMatcher.java:457)
at org.eclipse.californium.elements.util.SerialExecutor$1.run(SerialExecutor.java:290)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: org.eclipse.leshan.core.node.codec.CodecException: Invalid content [3000.0] for type INTEGER for path /3/0/7
at org.eclipse.leshan.core.node.codec.senml.LwM2mNodeSenMLDecoder.parseResourceValue(LwM2mNodeSenMLDecoder.java:504)
at org.eclipse.leshan.core.node.codec.senml.LwM2mNodeSenMLDecoder.extractLwM2mResources(LwM2mNodeSenMLDecoder.java:452)
at org.eclipse.leshan.core.node.codec.senml.LwM2mNodeSenMLDecoder.parseRecords(LwM2mNodeSenMLDecoder.java:193)
at org.eclipse.leshan.core.node.codec.senml.LwM2mNodeSenMLDecoder.decode(LwM2mNodeSenMLDecoder.java:97)
at org.eclipse.leshan.core.node.codec.DefaultLwM2mDecoder.decode(DefaultLwM2mDecoder.java:156)
at org.eclipse.leshan.core.node.codec.DefaultLwM2mDecoder.decode(DefaultLwM2mDecoder.java:138)
at org.eclipse.leshan.server.californium.request.LwM2mResponseBuilder.decodeCoapResponse(LwM2mResponseBuilder.java:450)
... 21 common frames omitted
Caused by: java.lang.IllegalStateException: Can not convert 3000.0 to long safely : Unsupported number java.lang.Double
at org.eclipse.leshan.core.util.datatype.NumberUtil.numberToLong(NumberUtil.java:59)
at org.eclipse.leshan.core.node.codec.senml.LwM2mNodeSenMLDecoder.numberToLong(LwM2mNodeSenMLDecoder.java:527)
at org.eclipse.leshan.core.node.codec.senml.LwM2mNodeSenMLDecoder.parseResourceValue(LwM2mNodeSenMLDecoder.java:485)
... 27 common frames omitted

@sbernard31
Copy link
Contributor

sbernard31 commented Dec 8, 2021

#1174 should fix this kind of problem for senML+cbor too. Which version are you using ?

Note that : even if I add this more flexible behavior which tolerates to convert Double in Long safely.
I feel that your device behaves strangely ... maybe you should report it ?

For JSON, there is no clear way to know the type of a number.
A common way is to consider that :

  • 3000.0 is a floating point number (float/double in java)
  • 3E3 is a floating point number (float/double in java)
  • 3000 is a integer (short/integer/long in java) .
    So OK maybe it's OK to return 1.638435E9 insteadof 1638435000 but I feel this is not the best move and so other implementation could have issue too.

For CBOR, I feel this is even more wrong as the LWM2M specify that integer representation must be used for integer.
See http://www.openmobilealliance.org/release/LightweightM2M/V1_1_1-20190617-A/HTML-Version/OMA-TS-LightweightM2M_Core-V1_1_1-20190617-A.html#Table-C-2-Data-Type-Mapping

Reading the specification again, I'm wondering if Leshan should tolerate this by default for senml+cbor. I feel it should not. (and so I will maybe change this again)

@sonny82
Copy link
Author

sonny82 commented Dec 8, 2021

We don't use your fix yet.
Yes, I think I will contact the manufacturer regarding this issue.

@sbernard31
Copy link
Contributor

I create an issue about this : #1176

@sonny82
Copy link
Author

sonny82 commented Dec 14, 2021

I contacted the manufacturer regarding the senml-cbor decoding issue.
They think that the problem might be on the Leshan side in the following bits of code:

https://github.com/eclipse/leshan/blob/da6d32e5079e393e58cba6379223d7d721e6e549/leshan-core/src/main/java/org/eclipse/leshan/senml/cbor/upokecenter/SenMLCborPackSerDes.java#L83

Their comment: "Regardless of BigNum being Integer, it is set as float."

@sbernard31
Copy link
Contributor

sbernard31 commented Dec 14, 2021

I think their are wrong. 🤔

  1. If this was true, why sometime it works and sometime it does not.
  2. record.setFloatValue method name is misleading but we don't set a float here but a Number which could be an integer. (it should probably be renamed)
  3. there is a simple way to verify. Instead of looking at Leshan code, just capture the traffic and look at the payload send by the device if this is a float number at cbor point of view. I feel this is an issue at client side. For now without more information, I bet on this.

@sonny82
Copy link
Author

sonny82 commented Jan 4, 2022

Hi,

I'm attaching Wireshark captures for both senml+json and senml+cbor (I'm sorry it took so long).
It fails when the integer value I'm trying to read is a thousand-round value (in this case 3000).
If I change the value to 2999 decoding works.
Note: this has been reproduced with the 2.0.0-M5 version (#1174 not applied yet).

If I read these captures correctly I see that also if I set content format in request to senml+cbor, I'm getting senml+json response..

wireshark_captures.zip

@sbernard31
Copy link
Contributor

About the float number SenML-JSON issue :
Captures confirm what I explained in : #1170 (comment)
3e3 notation is used and so is considered as a Float by jackson.
Even if I don't recommend to use exponential notation in JSON for Integer. We can not consider it as a bug at client side.
#1174 should fix the problem at server side. So I think we are good here.

Not related to this issue but looking your capture, I feel there is maybe some issue at client side :

1) too many answer about DTLS retransmission ?
See failure_senml-json.pcapng :

1	0.000000	2022-01-04 07:27:36,129189	10.1.0.5	95.159.202.153	5683	9180	CoAP	67	CON, MID:49225, GET, TKN:00 fc ee 4e 86 1a cb 5c, /33906/0/35	Error	2022-01-04 07:27:36,129189
2	2.030676	2022-01-04 07:27:38,159865	10.1.0.5	95.159.202.153	5683	9180	CoAP	67	CON, MID:49225, GET, TKN:00 fc ee 4e 86 1a cb 5c, /33906/0/35 [Retransmission]	Note	2022-01-04 07:27:38,159865
3	6.092597	2022-01-04 07:27:42,221786	10.1.0.5	95.159.202.153	5683	9180	CoAP	67	CON, MID:49225, GET, TKN:00 fc ee 4e 86 1a cb 5c, /33906/0/35 [Retransmission]	Note	2022-01-04 07:27:42,221786
4	14.213182	2022-01-04 07:27:50,342371	10.1.0.5	95.159.202.153	5683	9180	CoAP	67	CON, MID:49225, GET, TKN:00 fc ee 4e 86 1a cb 5c, /33906/0/35 [Retransmission]	Note	2022-01-04 07:27:50,342371
5	21.520511	2022-01-04 07:27:57,649700	95.159.202.153	10.1.0.5	9180	5683	CoAP	86	ACK, MID:49225, 2.05 Content, TKN:00 fc ee 4e 86 1a cb 5c, /33906/0/35		2022-01-04 07:27:57,649700
6	21.534478	2022-01-04 07:27:57,663667	95.159.202.153	10.1.0.5	9180	5683	CoAP	86	ACK, MID:49225, 2.05 Content, TKN:00 fc ee 4e 86 1a cb 5c, /33906/0/35		2022-01-04 07:27:57,663667
7	21.551604	2022-01-04 07:27:57,680793	95.159.202.153	10.1.0.5	9180	5683	CoAP	86	ACK, MID:49225, 2.05 Content, TKN:00 fc ee 4e 86 1a cb 5c, /33906/0/35		2022-01-04 07:27:57,680793
8	22.800684	2022-01-04 07:27:58,929873	95.159.202.153	10.1.0.5	9180	5683	CoAP	86	ACK, MID:49225, 2.05 Content, TKN:00 fc ee 4e 86 1a cb 5c, /33906/0/35		2022-01-04 07:27:58,929873

There is a lot of retransmission because it seems that device takes long time too answer. (1 request + 3 retransmission)
Then we get 4 answers in a short frame time.
Maybe this is a network issue but here I would expect only 1 answer from the device (maybe 2 because of network latency) but 4 sounds a bit strange.
(Anyway this is not really a bug at client side but something which could maybe be improved)

2) request where senml-cbor is asked and senml-json is returned
I think this behavior does not follow the LWM2M specification.
The specification says :

An LwM2M Server data request MAY contain an option specifying the data formats the Server would prefer to receive for the payload; if this data format is not accepted by the LwM2M Client, the request is rejected;

And

Operations Available CoAP Response Codes Reason Phrase
Read 4.06 Not Acceptable None of the preferred Content-Formats can be returned

(Table: 6.7.-3 Response Codes: Device Management and Service Enablement Interface)

So either the device should return payload using senml+cbor or it should return 4.06 error.
(At least this is my understanding)

sbernard31 added a commit that referenced this issue Jan 6, 2022
@sonny82
Copy link
Author

sonny82 commented Jan 10, 2022

Thank you.
I can confirm that applying #1174 solves this issue.

@sbernard31
Copy link
Contributor

Thx again for taking time to reporting this. That's help a lot 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Impact core of Leshan enhancement Improvement of existing features
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants