Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] An MQTT OTA request made by the library to AWS using CBOR encoding results in the Invalid response on the /rejected topic #490

Closed
arkhipenko opened this issue Jun 2, 2023 · 25 comments
Labels
bug Something isn't working

Comments

@arkhipenko
Copy link

arkhipenko commented Jun 2, 2023

Describe the bug
An MQTT OTA request made by the library to AWS usinf CBOR encoding results in the Invalid response on the /rejected topic

Host

  • Host OS: esp32
  • Version: latest library code as is on github

To Reproduce

  • Initiate OTA update
  • Run on esp32 and observe all CBOR requests failing

Expected behavior
I was expecting to start receiving OTA file chunks on MQTT topics.

Additional context
Using the latest tinycbor version 0.6.0 from intel.

Initial request is made successfully

00:00:00:05.252 T: prvMqttPublish()
{"clientToken":":840d8ee419b8"}
7b 22 63 6c 69 65 6e 74 54 6f 6b 65 6e 22 3a 22 3a 38 34 30 64 38 65 65 34 31 39 62 38 22 7d 

AWS IOT responds with the job document as expected and a few ota methods are invoked as expected:

00:00:00:05.345 V: subscribeCB: Data
	topic: $aws/things/840d8ee419b8/jobs/$next/get/accepted
	payload: {"clientToken":":840d8ee419b8","timestamp":1685747465,"execution":{"jobId":"AFR_OTA-OTA_TEST-230602-1828","status":"QUEUED","queuedAt":1685747355,"lastUpdatedAt":1685747355,"versionNumber":1,"executionNumber":1,"jobDocument":{"afr_ota":{"protocols":["MQTT"],"streamname":"AFR_OTA-93e1bb19-0dcd-4aeb-9946-5df0260dc063","files":[{"filepath":"/ota/file","filesize":127762,"fileid":0,"certfile":"/pem","fileType":0,"sig-sha1-rsa":"na"}]}}}}
00:00:00:05.397 V: subscribeCB: handling JOB_ACCEPTED_RESPONSE_TOPIC
00:00:00:05.407 T: otaPal_GetPlatformImageState()
00:00:00:05.409 T: otaPal_CreateFileForRx()
00:00:01:26.993 T: otaPal_GetPlatformImageState()
00:00:01:27.000 V: prvMqttSubscribe: subscribed to topic $aws/things/840d8ee419b8/streams/AFR_OTA-93e1bb19-0dcd-4aeb-9946-5df0260dc063/data/cbor

then a request for the first chunk is published:

00:00:01:27.005 T: prvMqttPublish()
¦accrdyaf al�� ao abDÿÿÿÿan�
a6 61 63 63 72 64 79 61 66 00 61 6c 19 10 00 61 6f 00 61 62 44 ff ff ff ff 61 6e 01 
00:00:01:27.028 V: publishMessage: topic   : $aws/things/840d8ee419b8/streams/AFR_OTA-93e1bb19-0dcd-4aeb-9946-5df0260dc063/get/cbor

this message decoded is:
{"c": "rdy", "f": 0, "l": 4096, "o": 0, "b": h'FFFFFFFF', "n": 1}
according to this: https://cbor.me/

A6             # map(6)
   61          # text(1)
      63       # "c"
   63          # text(3)
      726479   # "rdy"
   61          # text(1)
      66       # "f"
   00          # unsigned(0)
   61          # text(1)
      6C       # "l"
   19 1000     # unsigned(4096)
   61          # text(1)
      6F       # "o"
   00          # unsigned(0)
   61          # text(1)
      62       # "b"
   44          # bytes(4)
      FFFFFFFF # "\xFF\xFF\xFF\xFF"
   61          # text(1)
      6E       # "n"
   01          # unsigned(1)

the response is:

00:00:01:27.149 V: subscribeCB: Message arrived
00:00:01:27.150 V: subscribeCB: Data
	topic: $aws/things/840d8ee419b8/streams/AFR_OTA-93e1bb19-0dcd-4aeb-9946-5df0260dc063/rejected/cbor
	payload: ¿dcodekInvalidCboraokInvalidCborgmessagetInvalid CBOR messageamtInvalid CBOR messageÿ

or 

{"code": "InvalidCbor", "o": "InvalidCbor", "message": "Invalid CBOR message", "m": "Invalid CBOR message"}

BF                                      # map(*)
   64                                   # text(4)
      636F6465                          # "code"
   6B                                   # text(11)
      496E76616C696443626F72            # "InvalidCbor"
   61                                   # text(1)
      6F                                # "o"
   6B                                   # text(11)
      496E76616C696443626F72            # "InvalidCbor"
   67                                   # text(7)
      6D657373616765                    # "message"
   74                                   # text(20)
      496E76616C69642043424F52206D657373616765 # "Invalid CBOR message"
   61                                   # text(1)
      6D                                # "m"
   74                                   # text(20)
      496E76616C69642043424F52206D657373616765 # "Invalid CBOR message"
   FF                                   # primitive(*)

This is consistent with the failure I reported for the MQTT download agent library here:
aws-samples/aws-iot-mqtt-download-agent#3

Can someone please look into this? The CBOR message seems to produce the bitmap that is rejected by the backend!

@arkhipenko arkhipenko added the bug Something isn't working label Jun 2, 2023
@arkhipenko arkhipenko changed the title [BUG] [BUG] An MQTT OTA request made by the library to AWS usinf CBOR encoding results in the Invalid response on the /rejected topic Jun 2, 2023
@arkhipenko arkhipenko changed the title [BUG] An MQTT OTA request made by the library to AWS usinf CBOR encoding results in the Invalid response on the /rejected topic [BUG] An MQTT OTA request made by the library to AWS using CBOR encoding results in the Invalid response on the /rejected topic Jun 2, 2023
@gmtt
Copy link

gmtt commented Jun 4, 2023

Thank you for bringing this to our attention! We will be looking further into this issue as soon as we can, thank you for your patience

@arkhipenko
Copy link
Author

@gmtt - thank you - do you have an estimate of when we can expect some answer/resolution? We are in the middle of implementation and were hoping that the AWS library will work with the AWS backend "as-is". This is quite unexpected honestly.
We need to make a decision on whether to continue using the AWS OTA library or move to some other solution (which would be very unfortunate to be honest).

@gmtt
Copy link

gmtt commented Jun 7, 2023

Hi @arkhipenko, the encoded hex appears to be correct. That makes us suspect that the issue may lie in the cloud part. Could you please enable debug logging to obtain more information about this? Additionally, could you try using the tinycbor version that we tested here: https://github.com/aws/ota-for-aws-iot-embedded-sdk/tree/main/source/dependency/3rdparty instead of the latest version? As we haven't tested the latest version of this library, it's possible that an incompatible library is causing the problem.

@arkhipenko
Copy link
Author

Hi @gmtt
Thank you for getting back to me.
Based on the observed behavior, I do also suspect that the problem is with the cloud end of the process.

Meanwhile:
I have downgraded CBOR library to the version you advised me to work with (0.5.4)
I have enabled all logging and attached is the interaction log from the last session (a few minutes ago) with the same unfortunate outcome.
ota-poc.log

@arkhipenko
Copy link
Author

Hi @gmtt
Just checking about the status of this - we are on hold until we hear back from you as to whether to expect a fix and when.
Our assumption with choosing of this library was that it should "just work" b/c it is a library provided by AWS communicating with the AWS backend. It is quite surprising to discover that this is not the case. I would much prefer it was a bug in our code since that is easy to fix. Please advise!

@gmtt
Copy link

gmtt commented Jun 9, 2023

Hi @arkhipenko
Sorry for the delayed response. We are currently investigating the issue and communicating with the cloud team to identify the problem. Please note that it may take some time due to our bandwidth limitations. We appreciate your patience.

@gmtt
Copy link

gmtt commented Jun 14, 2023

Hi @arkhipenko
We have used our other tool to perform the same task and encountered the same error, indicating a strong likelihood that the problem lies in the cloud component. We have already informed our cloud team about the issue and requested an investigation. We will provide you with an update as soon as we have identified the cause.

@arkhipenko
Copy link
Author

Thank you @gmtt
What is surprising to me is that there were no 1000s of reports of MQTT based OTA not working. I was expecting this to be used a lot!

@AniruddhaKanhere
Copy link
Member

Hello @arkhipenko,

Firstly, apologies that it is taking a bit longer than expected.

I tried with the older release version of OTA library and my firmware upgrade worked as expected.
Can you also try using the last released stable version of the library as here: https://github.com/aws/ota-for-aws-iot-embedded-sdk/releases/tag/v3.4.0?

We are working on figuring out what is causing the issue you are seeing.
This experiment would let us know whether there is anything wrong in the configuration that you are using to build/compile the library.

Thank you,
Aniruddha

@arkhipenko
Copy link
Author

Hi @AniruddhaKanhere - thank you for getting back to me.

I tried with the older release version of OTA library and my firmware upgrade worked as expected.

Could you please let me know which version of the library did work? I will try that as well,
So far, using v3.4.0 produced exact same result - Invalid CBOR

=~=~=~=~=~=~=~=~=~=~=~= PuTTY log 2023.06.16 09:26:36 =~=~=~=~=~=~=~=~=~=~=~=
ets Jun  8 2016 00:22:57

rst:0x1 (POWERON_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:1
load:0x3fff0030,len:1184
load:0x40078000,len:13220
ho 0 tail 12 room 4
load:0x40080400,len:3028
entry 0x400805e4
E (373) esp_core_dump_flash: core dump partition found!
E (373) esp_core_dump_flash: No core dump partition found!


ZEF MQTT FD POC
Version: 0.1.1
ESP Chip ID: esp32-840d8ee419b8
In case of deep sleep, reset was not caused by exit from deep sleep

00:00:00:01.496 T: setupLogging()
00:00:00:01.497 T: setup()
00:00:00:01.497 N: setup: wake up reason: 0
00:00:00:01.498 V: after: setupLogging
00:00:00:01.508 V: total heap = 274528
00:00:00:01.509 V: free  heap = 246120
00:00:00:01.509 V: lfree heap = 240308
00:00:00:01.509 V: max   heap = 110580
00:00:00:01.520 T: setupFS()
00:00:00:01.531 V: Found certificate partition
00:00:00:01.537 V: Certs partition address mapped to 3F430000
00:00:00:01.538 T: setupOTA()
00:00:00:01.538 V: setupOTA: appVersion = zef-0.1.1
00:00:00:01.539 V: after: setupOTA
00:00:00:01.549 V: total heap = 274408
00:00:00:01.549 V: free  heap = 241740
00:00:00:01.550 V: lfree heap = 235976
00:00:00:01.560 V: max   heap = 110580
00:00:00:01.561 T: setupTS()
00:00:00:01.561 T: tsCB()
00:00:00:01.561 V: Created TS job
00:00:00:01.572 V: after: setupTS
00:00:00:01.572 V: total heap = 274408
00:00:00:01.573 V: free  heap = 241740
00:00:00:01.573 V: lfree heap = 235976
00:00:00:01.584 V: max   heap = 110580
00:00:00:01.584 T: setupWiFi()
00:00:00:01.584 V: after: setupWiFi
00:00:00:01.595 V: total heap = 274400
00:00:00:01.595 V: free  heap = 241724
00:00:00:01.595 V: lfree heap = 235880
00:00:00:01.596 V: max   heap = 110580
00:00:00:01.606 T: setupMQTT()
00:00:00:01.607 V: setupMQTT: clientID: 840d8ee419b8
00:00:00:01.607 T: loadMQTTCertificates()
00:00:00:01.618 V: loadMQTTCertificates: Loading Amazon Root CA
00:00:00:01.619 T: populateFlash()
00:00:00:01.629 V: loadMQTTCertificates: Loading Amazon OTA Root CA
00:00:00:01.630 T: populateFlash()
00:00:00:01.640 V: loadMQTTCertificates: Loading thing certificate
00:00:00:01.641 T: populateFlash()
00:00:00:01.651 V: loadMQTTCertificates: Loading thing private key
00:00:00:01.652 T: populateFlash()
00:00:00:01.662 T: tsMqtt()
00:00:00:01.663 T: startMqttClient()
00:00:00:01.663 V: tsMQTT suspended
00:00:00:01.664 V: after: setupMQTT
00:00:00:01.664 V: total heap = 274344
00:00:00:01.674 V: free  heap = 241544
00:00:00:01.675 V: lfree heap = 235436
00:00:00:01.675 V: max   heap = 110580
00:00:00:01.686 N: startOTA()
00:00:00:01.686 T: tsOTA()
00:00:00:01.686 T: prvSetOtaInterfaces()
00:00:00:01.687 V: after: startOTA
00:00:00:01.697 V: total heap = 274296
00:00:00:01.698 V: free  heap = 232892
00:00:00:01.698 V: lfree heap = 227128
00:00:00:01.709 V: max   heap = 110580
00:00:00:01.709 T: setupCore()
00:00:00:01.709 T: startWiFi()
00:00:00:01.710 N: resumed MQTT job
00:00:00:01.721 V: tsMQTT resumed
00:00:00:01.724 T: startMqttClient()
00:00:00:01.730 T: WiFiEvent()
00:00:00:01.730 T: SYSTEM_EVENT_WIFI_READY
00:00:00:01.821 T: WiFiEvent()
00:00:00:01.821 T: SYSTEM_EVENT_STA_START
00:00:00:02.032 V: after: setupCore
00:00:00:02.032 V: total heap = 271928
00:00:00:02.032 V: free  heap = 186464
00:00:00:02.033 V: lfree heap = 184156
00:00:00:02.043 V: max   heap = 110580
00:00:00:02.044 T: setup: Setup complete
00:00:00:02.727 T: startMqttClient()
00:00:00:03.231 T: WiFiEvent()
00:00:00:03.231 T: SYSTEM_EVENT_STA_CONNECTED
00:00:00:03.242 T: WiFiEvent()
00:00:00:03.242 V: Obtained IP address: 192.168.122.219
00:00:00:03.727 T: startMqttClient()
00:00:00:03.728 V: tsMqtt: Starting esp32 mqtt client with:
00:00:00:03.729 V: tsMqtt: mqtt_cfg.uri = mqtts://a255ntx17b1o8v-ats.iot.us-east-1.amazonaws.com
00:00:00:03.740 V: tsMqtt: mqtt_cfg.client_id = 840d8ee419b8
00:00:00:03.741 V: MQTT_EVENT_OTHER
00:00:00:05.192 N: MQTT_EVENT_CONNECTED
00:00:00:05.193 T: mqttSubscribe()
00:00:00:05.200 V:  Received: 0   Queued: 0   Processed: 0   Dropped: 0
00:00:00:05.201 T: prvOTAAgentTask()
00:00:00:05.202 T: otaPal_GetPlatformImageState()
00:00:00:05.203 T: prvMqttSubscribe()
00:00:00:05.218 V: prvMqttSubscribe: subscribed to topic $aws/things/840d8ee419b8/jobs/notify-next
00:00:00:05.220 T: prvMqttPublish()
{"clientToken":":840d8ee419b8"}
7b 22 63 6c 69 65 6e 74 54 6f 6b 65 6e 22 3a 22 3a 38 34 30 64 38 65 65 34 31 39 62 38 22 7d 
00:00:00:05.231 T: publishMessage()
00:00:00:05.244 V: publishMessage: Queueing for publishing:
00:00:00:05.246 V: publishMessage: topic   : $aws/things/840d8ee419b8/jobs/$next/get
00:00:00:05.257 V: tsMqtt: Publishing to broker:
00:00:00:05.257 V: tsMqtt: topic   : $aws/things/840d8ee419b8/jobs/$next/get
00:00:00:05.271 V: tsMqtt: published successfully!
00:00:00:05.284 V: MQTT_EVENT_SUBSCRIBED
00:00:00:05.308 V: MQTT_EVENT_DATA
00:00:00:05.308 V: mqtt_event_handler: using app handler
00:00:00:05.309 T: subscribeCB()
00:00:00:05.309 V: Lengths - topic: 48, payload: 436
00:00:00:05.320 V: subscribeCB: Message arrived
00:00:00:05.321 V: subscribeCB: Data
topic: $aws/things/840d8ee419b8/jobs/$next/get/accepted
payload: {"clientToken":":840d8ee419b8","timestamp":1686922001,"execution":{"jobId":"AFR_OTA-OTA_TEST-230602-2229","status":"QUEUED","queuedAt":1685759430,"lastUpdatedAt":1685759430,"versionNumber":1,"executionNumber":1,"jobDocument":{"afr_ota":{"protocols":["MQTT"],"streamname":"AFR_OTA-2ee1fa35-479b-4210-8b1b-547f8765a3dd","files":[{"filepath":"/ota/file","filesize":127762,"fileid":0,"certfile":"/pem","fileType":0,"sig-sha1-rsa":"na"}]}}}}
7b 22 63 6c 69 65 6e 74 54 6f 6b 65 6e 22 3a 22 3a 38 34 30 64 38 65 65 34 31 39 62 38 22 2c 22 74 69 6d 65 73 74 61 6d 70 22 3a 31 36 38 36 39 32 32 30 30 31 2c 22 65 78 65 63 75 74 69 6f 6e 22 3a 7b 22 6a 6f 62 49 64 22 3a 22 41 46 52 5f 4f 54 41 2d 4f 54 41 5f 54 45 53 54 2d 32 33 30 36 30 32 2d 32 32 32 39 22 2c 22 73 74 61 74 75 73 22 3a 22 51 55 45 55 45 44 22 2c 22 71 75 65 75 65 64 41 74 22 3a 31 36 38 35 37 35 39 34 33 30 2c 22 6c 61 73 74 55 70 64 61 74 65 64 41 74 22 3a 31 36 38 35 37 35 39 34 33 30 2c 22 76 65 72 73 69 6f 6e 4e 75 6d 62 65 72 22 3a 31 2c 22 65 78 65 63 75 74 69 6f 6e 4e 75 6d 62 65 72 22 3a 31 2c 22 6a 6f 62 44 6f 63 75 6d 65 6e 74 22 3a 7b 22 61 66 72 5f 6f 74 61 22 3a 7b 22 70 72 6f 74 6f 63 6f 6c 73 22 3a 5b 22 4d 51 54 54 22 5d 2c 22 73 74 72 65 61 6d 6e 61 6d 65 22 3a 22 41 46 52 5f 4f 54 41 2d 32 65 65 31 66 61 33 35 2d 34 37 39 62 2d 34 32 31 30 2d 38 62 31 62 2d 35 34 37 66 38 37 36 35 61 33 64 64 22 2c 22 66 69 6c 65 73 22 3a 5b 7b 22 66 69 6c 65 70 61 74 68 22 3a 22 2f 6f 74 61 2f 66 69 6c 65 22 2c 22 66 69 6c 65 73 69 7a 65 22 3a 31 32 37 37 36 32 2c 22 66 69 6c 65 69 64 22 3a 30 2c 22 63 65 72 74 66 69 6c 65 22 3a 22 2f 70 65 6d 22 2c 22 66 69 6c 65 54 79 70 65 22 3a 30 2c 22 73 69 67 2d 73 68 61 31 2d 72 73 61 22 3a 22 6e 61 22 7d 5d 7d 7d 7d 7d 
00:00:00:05.486 V: subscribeCB: handling JOB_ACCEPTED_RESPONSE_TOPIC
00:00:00:05.496 T: otaPal_GetPlatformImageState()
00:00:00:05.498 T: otaPal_CreateFileForRx()
00:00:00:06.201 V:  Received: 0   Queued: 0   Processed: 0   Dropped: 0
00:00:00:07.202 V:  Received: 0   Queued: 0   Processed: 0   Dropped: 0
00:00:00:08.203 V:  Received: 0   Queued: 0   Processed: 0   Dropped: 0
00:00:00:09.204 V:  Received: 0   Queued: 0   Processed: 0   Dropped: 0
00:00:00:10.205 V:  Received: 0   Queued: 0   Processed: 0   Dropped: 0
00:00:00:11.229 V:  Received: 0   Queued: 0   Processed: 0   Dropped: 0
00:00:00:12.230 V:  Received: 0   Queued: 0   Processed: 0   Dropped: 0
00:00:00:13.231 V:  Received: 0   Queued: 0   Processed: 0   Dropped: 0
00:00:00:14.232 V:  Received: 0   Queued: 0   Processed: 0   Dropped: 0
00:00:00:15.233 V:  Received: 0   Queued: 0   Processed: 0   Dropped: 0
00:00:00:16.234 V:  Received: 0   Queued: 0   Processed: 0   Dropped: 0
00:00:00:17.235 V:  Received: 0   Queued: 0   Processed: 0   Dropped: 0
00:00:00:18.237 V:  Received: 0   Queued: 0   Processed: 0   Dropped: 0
00:00:00:19.238 V:  Received: 0   Queued: 0   Processed: 0   Dropped: 0
00:00:00:20.239 V:  Received: 0   Queued: 0   Processed: 0   Dropped: 0
00:00:00:21.260 V:  Received: 0   Queued: 0   Processed: 0   Dropped: 0
00:00:00:22.261 V:  Received: 0   Queued: 0   Processed: 0   Dropped: 0
00:00:00:23.262 V:  Received: 0   Queued: 0   Processed: 0   Dropped: 0
00:00:00:23.725 T: otaPal_GetPlatformImageState()
00:00:00:23.726 T: prvMqttSubscribe()
00:00:00:23.732 V: prvMqttSubscribe: subscribed to topic $aws/things/840d8ee419b8/streams/AFR_OTA-2ee1fa35-479b-4210-8b1b-547f8765a3dd/data/cbor
00:00:00:23.737 T: prvMqttPublish()
accrdyafalaoabDan
a6 61 63 63 72 64 79 61 66 00 61 6c 19 10 00 61 6f 00 61 62 44 ff ff ff ff 61 6e 01 
00:00:00:23.750 T: publishMessage()
00:00:00:23.761 V: publishMessage: Queueing for publishing:
00:00:00:23.763 V: publishMessage: topic   : $aws/things/840d8ee419b8/streams/AFR_OTA-2ee1fa35-479b-4210-8b1b-547f8765a3dd/get/cbor
00:00:00:23.774 V: tsMqtt: Publishing to broker:
00:00:00:23.774 V: tsMqtt: topic   : $aws/things/840d8ee419b8/streams/AFR_OTA-2ee1fa35-479b-4210-8b1b-547f8765a3dd/get/cbor
00:00:00:23.787 V: tsMqtt: published successfully!
00:00:00:23.877 V: MQTT_EVENT_SUBSCRIBED
00:00:00:23.925 V: MQTT_EVENT_DATA
00:00:00:23.925 V: mqtt_event_handler: using app handler
00:00:00:23.926 T: subscribeCB()
00:00:00:23.926 V: Lengths - topic: 91, payload: 85
00:00:00:23.937 V: subscribeCB: Message arrived
00:00:00:23.938 V: subscribeCB: Data
topic: $aws/things/840d8ee419b8/streams/AFR_OTA-2ee1fa35-479b-4210-8b1b-547f8765a3dd/rejected/cbor
payload: dcodekInvalidCboraokInvalidCborgmessagetInvalid CBOR messageamtInvalid CBOR message
bf 64 63 6f 64 65 6b 49 6e 76 61 6c 69 64 43 62 6f 72 61 6f 6b 49 6e 76 61 6c 69 64 43 62 6f 72 67 6d 65 73 73 61 67 65 74 49 6e 76 61 6c 69 64 20 43 42 4f 52 20 6d 65 73 73 61 67 65 61 6d 74 49 6e 76 61 6c 69 64 20 43 42 4f 52 20 6d 65 73 73 61 67 65 ff 
00:00:00:23.982 T: handleMessage()
00:00:00:24.263 V:  Received: 0   Queued: 0   Processed: 0   Dropped: 0
00:00:00:25.264 V:  Received: 0   Queued: 0   Processed: 0   Dropped: 0
00:00:00:26.265 V:  Received: 0   Queued: 0   Processed: 0   Dropped: 0
00:00:00:27.266 V:  Received: 0   Queued: 0   Processed: 0   Dropped: 0
00:00:00:28.267 V:  Received: 0   Queued: 0   Processed: 0   Dropped: 0
00:00:00:29.268 V:  Received: 0   Queued: 0   Processed: 0   Dropped: 0
00:00:00:30.269 V:  Received: 0   Queued: 0   Processed: 0   Dropped: 0
00:00:00:31.270 V:  Received: 0   Queued: 0   Processed: 0   Dropped: 0
00:00:00:32.271 V:  Received: 0   Queued: 0   Processed: 0   Dropped: 0
00:00:00:33.272 V:  Received: 0   Queued: 0   Processed: 0   Dropped: 0
00:00:00:33.738 T: prvMqttPublish()
accrdyafalaoabDan
a6 61 63 63 72 64 79 61 66 00 61 6c 19 10 00 61 6f 00 61 62 44 ff ff ff ff 61 6e 01 
00:00:00:33.740 T: publishMessage()
00:00:00:33.752 V: publishMessage: Queueing for publishing:
00:00:00:33.754 V: publishMessage: topic   : $aws/things/840d8ee419b8/streams/AFR_OTA-2ee1fa35-479b-4210-8b1b-547f8765a3dd/get/cbor
00:00:00:33.765 V: tsMqtt: Publishing to broker:
00:00:00:33.765 V: tsMqtt: topic   : $aws/things/840d8ee419b8/streams/AFR_OTA-2ee1fa35-479b-4210-8b1b-547f8765a3dd/get/cbor
00:00:00:33.779 V: tsMqtt: published successfully!
00:00:00:33.914 V: MQTT_EVENT_DATA
00:00:00:33.915 V: mqtt_event_handler: using app handler
00:00:00:33.916 T: subscribeCB()
00:00:00:33.916 V: Lengths - topic: 91, payload: 85
00:00:00:33.927 V: subscribeCB: Message arrived
00:00:00:33.928 V: subscribeCB: Data
topic: $aws/things/840d8ee419b8/streams/AFR_OTA-2ee1fa35-479b-4210-8b1b-547f8765a3dd/rejected/cbor
payload: dcodekInvalidCboraokInvalidCborgmessagetInvalid CBOR messageamtInvalid CBOR message
bf 64 63 6f 64 65 6b 49 6e 76 61 6c 69 64 43 62 6f 72 61 6f 6b 49 6e 76 61 6c 69 64 43 62 6f 72 67 6d 65 73 73 61 67 65 74 49 6e 76 61 6c 69 64 20 43 42 4f 52 20 6d 65 73 73 61 67 65 61 6d 74 49 6e 76 61 6c 69 64 20 43 42 4f 52 20 6d 65 73 73 61 67 65 ff 
00:00:00:33.972 T: handleMessage()
00:00:00:34.273 V:  Received: 0   Queued: 0   Processed: 0   Dropped: 0
00:00:00:35.274 V:  Received: 0   Queued: 0   Processed: 0   Dropped: 0
00:00:00:36.275 V:  Received: 0   Queued: 0   Processed: 0   Dropped: 0
00:00:00:37.276 V:  Received: 0   Queued: 0   Processed: 0   Dropped: 0

@arkhipenko
Copy link
Author

I tried compiling with releases 2.0.0 and 3.0.0, but there are too many binary API mismatches - it would take a long time to adjust.

@AniruddhaKanhere
Copy link
Member

Hey @arkhipenko, I was trying the projects in one of our reference integration projects in this repository: https://github.com/FreeRTOS/iot-reference-nxp-rt1060

It uses the OTA lib submoduled here: https://github.com/FreeRTOS/iot-reference-nxp-rt1060/tree/main/Middleware/AWS. The version in use is v3.4.0.

One of my colleagues also mentioned that it worked for them when they tried it on Tuesday (13th June).

That being said, we are looking into the cloud side of things as well to make sure that there is no breaking change there.

Thank you for your patience.

@arkhipenko
Copy link
Author

@AniruddhaKanhere - could you please confirm that your colleague tested MQTT-based OTA, and not HTTPS-based (which works fine for me as well)? I struggle to understand how the same code base produces different results - I do not modify the library code in any way!

I would also very much like to see the CBOR message sent to the cloud from that test on June 13th, and the response from the cloud. Would your colleague have those by any chance?
Thank you!

@arkhipenko
Copy link
Author

@AniruddhaKanhere - just tried the exact version that you mentioned with the same result:

00:00:01:27.954 T: prvMqttPublish()
accrdyafalaoabDan
a6 61 63 63 72 64 79 61 66 00 61 6c 19 10 00 61 6f 00 61 62 44 ff ff ff ff 61 6e 01 

followed by the same

00:00:01:28.056 V: subscribeCB: Message arrived
00:00:01:28.057 V: subscribeCB: Data
topic: $aws/things/840d8ee419b8/streams/AFR_OTA-2ee1fa35-479b-4210-8b1b-547f8765a3dd/rejected/cbor
payload: dcodekInvalidCboraokInvalidCborgmessagetInvalid CBOR messageamtInvalid CBOR message
bf 64 63 6f 64 65 6b 49 6e 76 61 6c 69 64 43 62 6f 72 61 6f 6b 49 6e 76 61 6c 69 64 43 62 6f 72 67 6d 65 73 73 61 67 65 74 49 6e 76 61 6c 69 64 20 43 42 4f 52 20 6d 65 73 73 61 67 65 61 6d 74 49 6e 76 61 6c 69 64 20 43 42 4f 52 20 6d 65 73 73 61 67 65 ff 
00:00:01:28.101 T: handleMessage()

@AniruddhaKanhere
Copy link
Member

Hey @arkhipenko,

Yes, my colleague did test MQTT based OTA. And it succeeded.

However, the following is the CBOR that his OTA sent (just maybe a configuration issue). Not sure why are there so many Fs in the bitmap though.

A6                                      # map(6)
   61                                   # text(1)
      63                                # "c"
   63                                   # text(3)
      726479                            # "rdy"
   61                                   # text(1)
      66                                # "f"
   00                                   # unsigned(0)
   61                                   # text(1)
      6C                                # "l"
   19 1000                              # unsigned(4096)
   61                                   # text(1)
      6F                                # "o"
   00                                   # unsigned(0)
   61                                   # text(1)
      62                                # "b"
   54                                   # bytes(20)
      000000FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF7F # "\u0000\u0000\u0000\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\u007F"
   61                                   # text(1)
      6E                                # "n"
   04                                   # unsigned(4)

That being said, we also sent a valid CBOR to the IoT core and it was rejected. That is what is causing so much confusion to me. I am not sure how one message succeeds and the other doesn't. We are looking into the cloud side as well.

Again, thank you for your patience Anatoli as we work on finding the root cause.

Cheers,
Aniruddha

@arkhipenko
Copy link
Author

arkhipenko commented Jun 20, 2023

@AniruddhaKanhere - it is quite interesting.
I had another look at the payload and as far as I can understand the bitmapping, every bit of the bitmap represents a block of data. What was your colleague's file size?
According to my stream definition: [{"filepath":"/ota/file","filesize":127762 the file size is 127762 bytes.
That size represents 31.19 blocks 4096 bytes each., so a 32 bit bitmap (or 4 bytes) should be sufficient, which explains the number of 'F's in my 'b' field CBOR message:

   44          # bytes(4)
      FFFFFFFF # "\xFF\xFF\xFF\xFF"

a 20-byte 'b' field would suggest a file 655360 or so bytes long.
Regardless, it is still perplexing why his message is accepted and mine is rejected.

@AniruddhaKanhere
Copy link
Member

AniruddhaKanhere commented Jun 21, 2023

I had another look at the payload and as far as I can understand the bitmapping, every bit of the bitmap represents a block of data. What was your colleague's file size?

Yes, that is what the bitmap means. But I don't think that the size of the file should matter because we are still requesting the file and while doing that, we do not know the size of the new file. For all it matters, the new file can be double the size and may not fit in the bitmap that we sent at all. I am not sure though so don't quote me on this :). But, yes, the image is quite big as you predicted: 34 28770 [OTAAgent] [INFO] Extracted parameter: [key: value]=[filesize: 647324].

That being said, the bitmap has a 7F at the end in his case which I find a bit odd.

By the way, we were able to figure out that there is nothing wrong with the cloud side of things. We were just sending wrong CBOR. But this is good news because it now pinpoints that there might be something wrong on the application side of things. And we can figure out what is wrong in your application. :)

Thanks,
Aniruddha

@arkhipenko
Copy link
Author

When you say "we are sending the wrong CBOR", what does it mean exactly?
What would the right CBOR be for that scenario? (Given that you have the exact file stream information).
@AniruddhaKanhere

@AniruddhaKanhere
Copy link
Member

Ah yes, let me clarify.

If we were supposed to send A6 F1 (assume it to be valid CBOR), then we were actually sending the binary for ASCII A (decimal 65) then binary for ASCII 6(decimal 54) then binary for ASCII F(decimal 70) and then binary for ASCII 1 (decimal 49).
We were supposed to send 0xA6 (decimal 166) and 0xF1 (decimal 241).

This happened because we assumed that the test setup we were using was aware of the hexadecimal numbers. So when we sent 0xF1, we assumed that it would send 241 in decimal notation. But, that test setup wasn't hexadecimal system aware and it actually sent 0..x..F...1.

But, when we did send the valid CBOR message, it was accepted by the cloud.

Following up on your issue though; would it be possible for you to upload your code (only the parts we need to understand this issue) to github and share with us? That way we can replicate your results.

@arkhipenko
Copy link
Author

arkhipenko commented Jun 22, 2023

@AniruddhaKanhere
Thank you for your message.
I fully appreciate how the CBOR messaging works. What surprises me is that I literally use CBOR library 0.5.4 as-is, no modifications, AWS OTA library code 3.4.0 as-is, and the results are rejected. I cannot explain this.
The library does not produce '0x' prefixed bitmap, it produces an 'h' prefixed bitmap, which seems to be rejected by the backend. I am yet to see a reasonable explanation why that is. There is barely any code in want you refer to as 'my code' - it just facilitates the MQTT messaging via standard esp32 library. The rest is 100% unmodified AWS, Core JSON and CBOR code. If the 'h' prefixe before the bitmap is a problem, how do I make CBOR not prefix it ?

Can you actually pinpoint what is wrong with my CBOR message?:

A6             # map(6)
   61          # text(1)
      63       # "c"
   63          # text(3)
      726479   # "rdy"
   61          # text(1)
      66       # "f"
   00          # unsigned(0)
   61          # text(1)
      6C       # "l"
   19 1000     # unsigned(4096)
   61          # text(1)
      6F       # "o"
   00          # unsigned(0)
   61          # text(1)
      62       # "b"
   44          # bytes(4)
      FFFFFFFF # "\xFF\xFF\xFF\xFF"
   61          # text(1)
      6E       # "n"
   01          # unsigned(1)

@arkhipenko
Copy link
Author

arkhipenko commented Jun 22, 2023

@AniruddhaKanhere - would it be possible for me to add you to a private Git repo for code sharing?
Also, this is written for esp32 - do you have an esp32 dev board you can run it on?
(there is no special equipment needed - just any esp32 dev board)

@AniruddhaKanhere
Copy link
Member

Hey @arkhipenko,

I am not saying that anything is wrong with your CBOR as it is, maybe we are missing something.

Can you try the steps mentioned here in this repository: https://github.com/FreeRTOS/iot-reference-esp32c3?
This is one of our reference integration repositories and the OTA on that should just work if you follow the instructions in the getting started guide that I linked above. Can you please try that and let us know what happens?

I suggest using everything new for this test (including the 'thing' and corresponding certs/keys and policies). All the process should be outlined in the above linked GSG.

would it be possible for me to add you to a private Git repo for code sharing?

I think it should be fine if you think there is nothing proprietary in there which I should not see.

@arkhipenko
Copy link
Author

@AniruddhaKanhere - after careful re-examination of my code I found a nasty little bug in the way outgoing MQTT messages were handled corrupting the CBOR message at the very last step of the process. Corrected and I seem to be getting the MQTT file payloads as expected now.
I apologize for tormenting you with what indeed ended up being an issue on my end, albeit not trivial.
Thank you for your help.
I will close this issue now and open a new one if there are other problems.
Thank you again!

@AniruddhaKanhere
Copy link
Member

It is great news that you found the bug @arkhipenko!
Thank you for informing us about it. Would you mind elaborating what was happening in your MQTT sending logic? I ask because if this is common enough, then it would benefit all readers of this issue :)

Thanks again for informing us about the issue and the fix,
Aniruddha

@arkhipenko
Copy link
Author

The original MQTT code assumed payload was always an ASCII string, so the very final buffer copy was done with strncpy() method, which of course is not suitable for true binary payload, as it stops at the first zero.
Replacing that with memcpy() solved the problem.
So everything looked correct and was correct right up to the point of sending to the backend.
It's embarrassing to admit, but those are the bugs you don't think about...
Again, thanks for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants