Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Domoticz: Since v10.0.0 devices randomly get corrupted timezone (+104) and MQTT gets disabled #13700

Closed
7 tasks
sammyke007 opened this issue Nov 16, 2021 · 164 comments
Closed
7 tasks
Assignees
Labels
bug Type - Confirmated Bug fixed Result - The work on the issue has ended

Comments

@sammyke007
Copy link

sammyke007 commented Nov 16, 2021

PROBLEM DESCRIPTION

My devices (10 Sonoff Mini, MiniR2 and BasicR3) randomly get their MQTT disabled. The console spams

04:13:27.880 RSL: STATE = {"Time":"2021-11-21T04:13:27","Uptime":"5T02:48:59","UptimeSec":442139,"Heap":16,"SleepMode":"Dynamic","Sleep":50,"LoadAvg":19,"MqttCount":8,"POWER":"ON","Wifi":{"AP":1,"SSId":"Wifi13","BSSId":"04:D4:C4:B4:A2:98","Channel":1,"Mode":"11n","RSSI":62,"Signal":-69,"LinkCount":6,"Downtime":"0T00:01:40"}}
04:13:27.885 RSL: SENSOR = {"Time":"2021-11-21T04:13:27","Switch1":"ON"}

every second. Timezone99 was my setting, but upon checking, the time is off and timezone gives result +104 instead of 99. As you can see below in the requested information, time says 04:14 instead of 21:14 (UTC+1 in Belgium)...

The following commands work OK, but after some time, the problem acts up again...

Reset 2

Backlog template {"NAME":"Sonoff Basic R3","GPIO":[32,1,1,1,1,0,0,0,224,320,1,0,160,0],"FLAG":0,"BASE":1}; module 0

Backlog Timezone 99; Latitude 51.212935; Longitude 4.284476; DzIdx1 43; ipaddress1 192.16.1.143; FriendlyName Tasmota badkamer spiegel; DeviceName Tasmota badkamer spiegel .143; MqttHost 192.168.1.*****; MqttUser ********; MqttPassword *****

REQUESTED INFORMATION

Make sure your have performed every step and checked the applicable boxes before submitting your issue. Thank you!

  • [ *] Read the Contributing Guide and Policy and the Code of Conduct
  • [ *] Searched the problem in issues
  • [ *] Searched the problem in discussions
  • [ *] Searched the problem in the docs
  • [ *] Searched the problem in the chat
  • [ *] Device used (e.g., Sonoff Basic): 10 Sonoff Mini, MiniR2 and BasicR3
  • Tasmota binary firmware version number used: 10.0.0
    • [* ] Pre-compiled
    • Self-compiled
  • Flashing tools used: OTA
  • Provide the output of command: Backlog Template; Module; GPIO 255:
  Configuration output here:

04:14:38.327 RSL: RESULT = {"NAME":"Sonoff Basic R3","GPIO":[57344,57599,57599,57599,255,0,0,57344,16384,57345,255,40960,0,0],"FLAG":0,"BASE":1}
04:14:38.338 RSL: RESULT = {"Module":{"0":"Sonoff Basic R3"}}
04:14:38.403 RSL: RESULT = {"GPIO0":{"57344":""},"GPIO1":{"57599":""},"GPIO2":{"57599":""},"GPIO3":{"57599":""},"GPIO4":{"255":"Relay32"},"GPIO5":{"0":"None"},"GPIO9":{"0":"None"},"GPIO10":{"57344":""},"GPIO12":{"16384":""},"GPIO13":{"57345":""},"GPIO14":{"255":"Relay32"},"GPIO15":{"40960":""},"GPIO16":{"0":"None"},"GPIO17":{"0":"None"}}
  • If using rules, provide the output of this command: Backlog Rule1; Rule2; Rule3:
  Rules output here:

04:17:08.185 RSL: RESULT = {"Rule1":{"State":"OFF","Once":"OFF","StopOnError":"OFF","Length":105,"Free":406,"Rules":"on button1#state==3 do publish domoticz/in {\"command\":\"switchlight\",\"idx\":146,\"switchcmd\":\"Toggle\"} endon"}}
04:17:08.192 RSL: RESULT = {"Rule2":{"State":"OFF","Once":"OFF","StopOnError":"OFF","Length":0,"Free":511,"Rules":""}}
04:17:08.243 RSL: RESULT = {"Rule3":{"State":"OFF","Once":"OFF","StopOnError":"OFF","Length":0,"Free":511,"Rules":""}}
  • Provide the output of this command: Status 0:
  STATUS 0 output here:

04:17:38.741 RSL: STATUS = {"Status":{"Module":0,"DeviceName":"Tasmota badkamer spiegel .143","FriendlyName":["Tasmota badkamer spiegel"],"Topic":"tasmota_BadkamerSpiegel","ButtonTopic":"0","Power":1,"PowerOnState":255,"LedState":20,"LedMask":"FFFF","SaveData":25344,"SaveState":0,"SwitchTopic":"0","SwitchMode":[5,0,0,0,0,0,0,0],"ButtonRetain":0,"SwitchRetain":1,"SensorRetain":0,"PowerRetain":0,"InfoRetain":0,"StateRetain":0}}
04:17:38.747 RSL: STATUS1 = {"StatusPRM":{"Baudrate":115200,"SerialConfig":"8N1","GroupTopic":"tasmotas","OtaUrl":"ttp://ota.tasmota.com/tasmota/release/tasmota.bin.gz","RestartReason":"Software/System restart","Uptime":"5T02:53:10","StartupUTC":"2021-11-11T17:24:28","Sleep":50,"CfgHolder":18,"BootCount":20992,"BCResetTime":"2020-08-13T23:47:04","SaveCount":8,"SaveAddress":"F7000"}}
04:17:38.752 RSL: STATUS2 = {"StatusFWR":{"Version":"10.0.0(tasmota)","BuildDateTime":"2021-10-19T08:23:49","Boot":31,"Core":"2_7_4_9","SDK":"2.2.2-dev(38a443e)","CpuFrequency":80,"Hardware":"ESP8285","CR":"410/699"}}
04:17:38.757 RSL: STATUS3 = {"StatusLOG":{"SerialLog":2,"WebLog":2,"MqttLog":0,"SysLog":0,"LogHost":"","LogPort":514,"SSId":["Wifi13",""],"TelePeriod":1,"Resolution":"558180C0","SetOption":["01000080","05C8000100060000005A0A0000000000003C","05000000","00006000","00004000"]}}
04:17:38.766 RSL: STATUS4 = {"StatusMEM":{"ProgramSize":612,"Free":388,"Heap":16,"ProgramFlashSize":1024,"FlashSize":1024,"FlashChipId":"144051","FlashFrequency":40,"FlashMode":3,"Features":["00000809","8FDAC787","04368001","000000CF","010013C0","C000F981","00004004","00001000","00000020"],"Drivers":"1,2,3,4,5,6,7,8,9,10,12,16,18,19,20,21,22,24,26,27,29,30,35,37,45","Sensors":"1,2,3,4,5,6"}}
04:17:38.772 RSL: STATUS5 = {"StatusNET":{"Hostname":"tasmota-BadkamerSpiegel-2441","IPAddress":"192.168.1.143","Gateway":"192.168.1.1","Subnetmask":"255.255.255.0","DNSServer1":"192.168.1.1","DNSServer2":"0.0.0.0","Mac":"DC:4F:22:AB:C9:89","Webserver":2,"HTTP_API":1,"WifiConfig":4,"WifiPower":17.0}}
04:17:38.777 RSL: STATUS7 = {"StatusTIM":{"UTC":"2021-11-16T20:17:38","Local":"2021-11-21T04:17:38","StartDST":"2021-01-30T14:00:00","EndDST":"2021-01-23T06:00:00","Timezone":"+104:0","Sunrise":"15:09","Sunset":"23:46"}}
04:17:38.782 RSL: STATUS10 = {"StatusSNS":{"Time":"2021-11-21T04:17:38","Switch1":"ON"}}
04:17:38.787 RSL: STATUS11 = {"StatusSTS":{"Time":"2021-11-21T04:17:38","Uptime":"5T02:53:10","UptimeSec":442390,"Heap":16,"SleepMode":"Dynamic","Sleep":50,"LoadAvg":19,"MqttCount":8,"POWER":"ON","Wifi":{"AP":1,"SSId":"Wifi13","BSSId":"04:D4:C4:B4:A2:98","Channel":1,"Mode":"11n","RSSI":58,"Signal":-71,"LinkCount":6,"Downtime":"0T00:01:40"}}}
  • Set weblog to 4 and then, when you experience your issue, provide the output of the Console log:
  Console output here:

TO REPRODUCE

Happens randomly after half a day, 2 days, ...

EXPECTED BEHAVIOUR

Keep MQTT enabled and keep correct timezone. (I guess time difference gets the MQTT disabled?)

SCREENSHOTS

/

ADDITIONAL CONTEXT

These 10 devices have been running rock solid on 9.5.0 and before. Never had this problem before and it started giving this problem right after updating to v10.0.0.
I already tried a complete "Reset 2" and configuring my device as new, but after some days it start's again (MQTT suddenly disabled, timezone off and console message spam every second).

(Please, remember to close the issue when the problem has been addressed)

@sfromis
Copy link
Contributor

sfromis commented Nov 16, 2021

I've got many devices running version 10 without any resetting of Timezone from my standard setting of 99. I'd be surprised if time difference would be a reason for MQTT getting disabled.

It looks like the settings of your device is broken, not just the weird "Timezone":"+104:0" but especially also the template with very strange component numbers. Not sure what could cause this. Losing MQTT is probably just a result of broken settings structure.

@sammyke007
Copy link
Author

sammyke007 commented Nov 16, 2021

I tried a full reset offcourse before creating an issue. The "Reset 2" + the 2 above backlogs is how I configure the device after reset. Nothing else. After some time, boom, bootcount totally off, timezone wrong, MQTT disabled and spam in the console. MQTT disabled is how I get alerted, because Domoticz can't control it anymore. It happens on all of these 10 devices as long as they are on V10. No issues on 9.5.0. That's the weird thing.

@Jason2866
Copy link
Collaborator

Can you erase the flash from a device and flash v.10. via serial?

@barbudor
Copy link
Contributor

barbudor commented Nov 16, 2021

You should track this bootcount & possible reboots
It may means that your devices are rebooting a lot.
This plus strange values could be a problem with the flash chips.
If possible, I would revert a few units back to 9.5.0 as a control group

My ESP8266 devices were still on 9.5.0. (only ESP32 were on 10.0.0.x).
I just move a few of them to 10.0.0.3 to see that makes any difference

@sammyke007
Copy link
Author

sammyke007 commented Nov 16, 2021

Can you erase the flash from a device and flash v.10. via serial?

Most of them are build behind wall switches or behind a closet. I'll flash a new basic R3 with 10.0.0 and see what it gives.
They were all flashed initially with the DiY tool, from there on always OTA upgrades. As said, all of these 10 devices worked rock solid on 9.5.0. MQTT never got disabled by itself.

@sammyke007
Copy link
Author

You should track this bootcount & possible reboots
It may means that your devices are rebooting a lot.
This plus strange values could be a problem with the flash chips.
If possible, I would revert a few units back to 9.5.0 as a control group

My ESP8266 devices were still on 9.5.0. (only ESP32 were on 10.0.0.x).
I just move a few of them to 10.0.0.3 to see that makes any difference

The very high boot count was also corrupt above. After a restart it's back normal and it was 31 or so. The device was online for 5 days btw.

@ascillato2 ascillato2 added the troubleshooting Type - Troubleshooting label Nov 16, 2021
@sammyke007
Copy link
Author

And another one of my devices got this crazy corruption. Bootcount crazy, MQTT disabled, console message spam every second and other corrupted settings. I flashed 5 out of 10 devices back to 9.5.0 and they seem to be stable again (except some Exception 9 reboots once in a while, but they reboot and work fine, nothing gets corrupted).

I installed 10.0.0.3 on a test sonoffMiniR2 and I'll check what it gives.

@sfromis
Copy link
Contributor

sfromis commented Nov 17, 2021

Speculatively, I'm wondering if your issue is somehow related to using Domoticz, which AFAIK is not a very commonly used backend for Tasmota.

@sammyke007
Copy link
Author

Speculatively, I'm wondering if your issue is somehow related to using Domoticz, which AFAIK is not a very commonly used backend for Tasmota.

My testdevice isn't configured with domoticz parameters. I'll see what it gives (also against the exception 9 error they all give once a day).

@sfromis
Copy link
Contributor

sfromis commented Nov 17, 2021

Your posted config includes Domoticz: DzIdx1 43

@sammyke007
Copy link
Author

I know, but on my testdevice I installed this morning, I left that out

@ascillato2
Copy link
Collaborator

Hi, any news on this?

Have you tried to erase and flash again with Tasmotizer?

@ascillato2 ascillato2 added the awaiting feedback Action - Waiting for response or more information label Nov 19, 2021
@sammyke007
Copy link
Author

Test device is still OK and MQTT still enabled, but just an hour ago, a Wemos D1 with 10.0.0 got MQTT disabled. Timezone 104, bootcount 55k +, ...

@ascillato
Copy link
Contributor

ascillato commented Nov 19, 2021

how are you flashing these devices? are you using Tasmotizer?

Can you try with latest version 10.0.0.3? ( http://ota.tasmota.com/tasmota/tasmota.bin.gz )

@sammyke007
Copy link
Author

sammyke007 commented Nov 19, 2021

For the easy accessible ones (serial) (Wemos, NodeMCU, ...), I indeed always use the latest Tasmotizer. For the Sonoff's, I use the DIY tool. But both seem to be affected.
My test device (MiniR2) is already on 10.0.0.3 and it's been running fine for +- 2 days. I'll update the Wemos that got corrupt too with 10.0.0.3 too, just to make sure. As this happens on one of my 10 (+1 test) devices once every +- 24h, it's not always easy to pinpoint any other exact reason. Only thing that I'm sure off, is that it hasn't been doing this on 9.5.0.

Edit: my test device now runs on 10.0.0.3 instead of 10.0.0. Syslog set to 4 and being logged to my NAS. I hope it corrupts soon :-D

@Jason2866
Copy link
Collaborator

Do you have new devices to test too? Looking at your BootCount there is maybe the flash chip defect. If new devices does work without a problem you have a hardware issue. Have you googled with the Flash Chip Id from the Information page which manufactorer your flash chip is? There are some with very bad lifetimes.

`

@sammyke007
Copy link
Author

sammyke007 commented Nov 19, 2021

The bootcount is perfectly fine on all devices. It only gets corrupt / very high when it goes crazy. Like I said, the following things occur:

  • Boot count very high
  • Weblog spams RSL messages every second
  • Time is off (Timezone +104 and can't be changed from console)
  • MQTT gets disabled
  • Other parameters in the status 0 seem way off

Restarting the module from console fixes it immediately.

I've got Sonoff MiniR2's that are max. 3 months old, but also Wemos D1, NodeMCU and other Sonoff's that are older (1 year or older). They all seem to show this very weird crash / behavior.

@barbudor
Copy link
Contributor

Just to confirm that none of the units I moved to 10.0.0.3 earlier this week have shown the problem. Still connected on MQTT and expected Timezone 99

@ascillato
Copy link
Contributor

They all seem to show this very weird crash / behavior.

This is super weird. Do you have any magnetic interference close to your devices like antennae or similar?

@sammyke007
Copy link
Author

sammyke007 commented Nov 19, 2021

I know...

I have nothing special in my home. I do run however a ZigBee network of +- 30 devices, but my 2,4 WiFi is on channel 1 and ZigBee is on the highest band. Like I said, never had this behaviour before on previous firmwares. Nothing changed in my house at the time of upgrading to 10.0.0

I'll keep logging as much as I can. I have 4 devices on 9.5.0, 3 on 10.0.0 and 2 + 1 test device on 10.0.0.3

@ascillato
Copy link
Contributor

ascillato commented Nov 19, 2021

you said:

Weblog spams RSL messages every second

what messages do you have? can you paste here a log of several of those? Can you tried using WEBLOG 4 to catch debug messages too?

and what is the Flash Chip Id in the Information menu in Tasmota WebUI ?

@sammyke007
Copy link
Author

sammyke007 commented Nov 19, 2021

you said:

Weblog spams RSL messages every second

what messages do you have?

and what is the Flash Chip Id in the Information menu in Tasmota WebUI ?

The RSL and STATE message in the first message above. (04:13:27....)

Do you want all 11 IDs?
My Wemos D1 that crashed today is Flash Chip Id0x164020

@ascillato
Copy link
Contributor

can you paste here a log of several of those? Can you tried using WEBLOG 4 to catch debug messages too?

and what is the Flash Chip Id in the Information menu in Tasmota WebUI ?

@sammyke007
Copy link
Author

sammyke007 commented Nov 19, 2021

Syslog 4 is now active on 3 devices. Waiting for the error / crash to show up...

@ascillato2
Copy link
Collaborator

ascillato2 commented Nov 19, 2021

What is the FLASH CHIP ID from the Information page on the failing devices?

this value:

image

@sammyke007
Copy link
Author

sammyke007 commented Nov 19, 2021

Wemos d1 on 10.0.0 that went crazy today:

ESP Chip Id6837606 (ESP8266EX)
Flash Chip Id0x164020

Sonoff Basic R3 on 10.0.0 that went crazy 2 days ago:

ESP Chip Id11258249 (ESP8285)
Flash Chip Id0x144051

@Jason2866
Copy link
Collaborator

The 0x164020 is a XMC flash chip. In our experience it is a troublemaker and do fail more often than other flash chips very early.
The ESP8285 has embedded flash and until now we do not had users with higher fail rates.
Have you clicked erase flash in Tasmotizer when you are flashing?
Which USB - serial adapter do you use?

@sammyke007
Copy link
Author

No, I don't think I've erased it before. But like I said, everything was "rock solid" before jumping to 10.0.0.
I use a USB to TTL CP2102 module to flash the Wemos and NodeMCU devices. Itead DIY tool for the Sonoff devices.

@sammyke007
Copy link
Author

Everything is back on 9.5.0 since 20 days here. Very rare some exceptions (and they happen at the same moment on 2 or more devices) but not a single crash / mqtt bug. Most devices with uptimes of 20 days.

@arendst
Copy link
Owner

arendst commented Dec 27, 2021

@Eddie-BS Thx for the info.

I'm still investigating the issue. From the original OPs status 0 I can now conclude that part of the Settings area has been shifted down by one byte (8-bits) instead of overwritten. This results in CfgHolder:18, SaveData:25344, Bootcount:20992 and missing first character of OtaUrl.

If you see it happen again pls report status 0 allowing me to further look for similarities.

@Eddie-BS
Copy link

@arendst Thx for your effort so far, I will let this version run and hope te catch it next time with the webinterface still accessible to show you a status 0

@arendst
Copy link
Owner

arendst commented Dec 27, 2021

Just for my information the issue occured "out of the blue", there were no configuration changes executed?

@Eddie-BS
Copy link

indeed, the devices are just running, no changes made to them. they all send data over mqtt to domoticz.
Normally I don't change anything on them besides upgrading, wich I do not now ... They are all running the version @Jason2866 created above here in this thread ...
at some random moment, the devices lose there config and I need to reconfigure them. Often I can reach them on the webinterface until it is rebooted. It returns on 192.168.4.1 with an "empty" config ..

@Eddie-BS
Copy link

Eddie-BS commented Dec 28, 2021

Too Bad my S5 was gone again this morning. It was rebooted and in 192.168.4.1 mode ...
I needed to reenter the wifi credentials before I could enter this status 0
below the output but I doubt if that is usefull ...

00:00:00.001 HDW: ESP8266EX
00:00:00.042 CFG: Loaded from flash at F6, Count 6
00:00:00.049 QPC: Count 1
00:00:00.059 Project tasmota - Tasmota Version 10.1.0.1(tasmota)-STAGE(2021-12-20T22:04:45)
00:00:00.505 WIF: Connecting to AP1 Xxxxxxx in mode 11n as tasmota-DAAD37-3383...
00:00:05.003 WIF: Connected
00:00:05.254 HTP: Web server active on tasmota-DAAD37-3383 with IP address 192.168.26.165
11:42:01.303 RSL: INFO1 = {"Info1":{"Module":"Sonoff Basic","Version":"10.1.0.1(tasmota)","FallbackTopic":"cmnd/DVES_DAAD37_fb/","GroupTopic":"cmnd/tasmotas/"}}
11:42:01.308 RSL: INFO2 = {"Info2":{"WebServerMode":"Admin","Hostname":"tasmota-DAAD37-3383","IPAddress":"192.168.26.165"}}
11:42:01.319 RSL: INFO3 = {"Info3":{"RestartReason":"Software/System restart"}}
11:42:01.326 RSL: RESULT = {"POWER":"OFF"}
11:42:01.329 RSL: POWER = OFF
11:42:01.712 QPC: Reset
11:42:07.770 RSL: STATE = {"Time":"2021-12-28T11:42:07","Uptime":"0T00:00:14","UptimeSec":14,"Heap":27,"SleepMode":"Dynamic","Sleep":50,"LoadAvg":249,"MqttCount":0,"POWER":"OFF","Wifi":{"AP":1,"SSId":"Xxxxxxxx","BSSId":"6C:3B:6B:18:A4:6F","Channel":1,"Mode":"11n","RSSI":72,"Signal":-64,"LinkCount":1,"Downtime":"0T00:00:06"}}
11:42:14.859 CMD: Status 0
11:42:14.864 RSL: STATUS = {"Status":{"Module":1,"DeviceName":"Tasmota","FriendlyName":["Tasmota"],"Topic":"tasmota_DAAD37","ButtonTopic":"0","Power":0,"PowerOnState":3,"LedState":1,"LedMask":"FFFF","SaveData":1,"SaveState":1,"SwitchTopic":"0","SwitchMode":[0,0,0,0,0,0,0,0],"ButtonRetain":0,"SwitchRetain":0,"SensorRetain":0,"PowerRetain":0,"InfoRetain":0,"StateRetain":0}}
11:42:14.890 RSL: STATUS1 = {"StatusPRM":{"Baudrate":115200,"SerialConfig":"8N1","GroupTopic":"tasmotas","OtaUrl":"http://ota.tasmota.com/tasmota/release/tasmota.bin.gz","RestartReason":"Software/System restart","Uptime":"0T00:00:21","StartupUTC":"2021-12-28T10:41:53","Sleep":50,"CfgHolder":4617,"BootCount":3,"BCResetTime":"2021-12-28T11:42:00","SaveCount":8,"SaveAddress":"F4000"}}
11:42:14.923 RSL: STATUS2 = {"StatusFWR":{"Version":"10.1.0.1(tasmota)","BuildDateTime":"2021-12-20T22:04:45","Boot":31,"Core":"STAGE","SDK":"2.2.2-dev(38a443e)","CpuFrequency":80,"Hardware":"ESP8266EX","CR":"354/699"}}
11:42:14.943 RSL: STATUS3 = {"StatusLOG":{"SerialLog":2,"WebLog":2,"MqttLog":0,"SysLog":0,"LogHost":"","LogPort":514,"SSId":["Xxxxxxxx",""],"TelePeriod":300,"Resolution":"558180C0","SetOption":["00008009","2805C80001000600003C5A0A000000000000","00000080","00006000","00004000"]}}
11:42:14.974 RSL: STATUS4 = {"StatusMEM":{"ProgramSize":619,"Free":384,"Heap":25,"ProgramFlashSize":1024,"FlashSize":1024,"FlashChipId":"146085","FlashFrequency":40,"FlashMode":3,"Features":["00000809","8FDAC787","04368001","000000CF","010013C0","C000F981","00004004","00001000","00000020"],"Drivers":"1,2,3,4,5,6,7,8,9,10,12,16,18,19,20,21,22,24,26,27,29,30,35,37,45","Sensors":"1,2,3,4,5,6"}}
11:42:14.002 RSL: STATUS5 = {"StatusNET":{"Hostname":"tasmota-DAAD37-3383","IPAddress":"192.168.26.165","Gateway":"192.168.26.254","Subnetmask":"255.255.255.0","DNSServer1":"192.168.26.246","DNSServer2":"0.0.0.0","Mac":"A4:CF:12:DA:AD:37","Webserver":2,"HTTP_API":1,"WifiConfig":4,"WifiPower":17.0}}
11:42:15.021 RSL: STATUS6 = {"StatusMQT":{"MqttHost":"","MqttPort":1883,"MqttClientMask":"DVES_%06X","MqttClient":"DVES_DAAD37","MqttUser":"DVES_USER","MqttCount":0,"MAX_PACKET_SIZE":1200,"KEEPALIVE":30,"SOCKET_TIMEOUT":4}}
11:42:15.043 RSL: STATUS7 = {"StatusTIM":{"UTC":"2021-12-28T10:42:15","Local":"2021-12-28T11:42:15","StartDST":"2021-03-28T02:00:00","EndDST":"2021-10-31T03:00:00","Timezone":"+01:00","Sunrise":"08:43","Sunset":"16:59"}}
11:42:15.061 RSL: STATUS10 = {"StatusSNS":{"Time":"2021-12-28T11:42:15"}}
11:42:15.068 RSL: STATUS11 = {"StatusSTS":{"Time":"2021-12-28T11:42:15","Uptime":"0T00:00:22","UptimeSec":22,"Heap":25,"SleepMode":"Dynamic","Sleep":50,"LoadAvg":19,"MqttCount":0,"POWER":"OFF","Wifi":{"AP":1,"SSId":"Xxxxxxxx","BSSId":"6C:3B:6B:18:A4:6F","Channel":1,"Mode":"11n","RSSI":72,"Signal":-64,"LinkCount":1,"Downtime":"0T00:00:06"}}}

going to restore the config on it now ...

arendst added a commit that referenced this issue Dec 28, 2021
Fix possible heap corruption due to invalid PubSubClient memmove parameters (#13700)
@arendst
Copy link
Owner

arendst commented Dec 28, 2021

I found at least one possible cause of observed heap corruption.

The latest development branch contains an updated PubSubClient library. Pls try this on one of your devices and report back.

@Eddie-BS
Copy link

tnx @arendst
I just updated my s5 device ... Hope this is the current develop release ...

7:16:58.219 CMD: status 0
17:16:58.226 MQT: stat/s5/STATUS = {"Status":{"Module":1,"DeviceName":"s5","FriendlyName":["s5 licht kerstboom"],"Topic":"s5","ButtonTopic":"0","Power":1,"PowerOnState":3,"LedState":1,"LedMask":"FFFF","SaveData":1,"SaveState":1,"SwitchTopic":"0","SwitchMode":[0,0,0,0,0,0,0,0],"ButtonRetain":0,"SwitchRetain":0,"SensorRetain":0,"PowerRetain":0,"InfoRetain":0,"StateRetain":0}}
17:16:58.254 MQT: stat/s5/STATUS1 = {"StatusPRM":{"Baudrate":115200,"SerialConfig":"8N1","GroupTopic":"tasmotas","OtaUrl":"http://ota.tasmota.com/tasmota/tasmota.bin.gz","RestartReason":"Software/System restart","Uptime":"0T00:01:24","StartupUTC":"2021-12-28T16:15:34","Sleep":50,"CfgHolder":4617,"BootCount":1,"BCResetTime":"2021-12-28T17:13:18","SaveCount":19,"SaveAddress":"FA000"}}
17:16:58.289 MQT: stat/s5/STATUS2 = {"StatusFWR":{"Version":"10.1.0.3(tasmota)","BuildDateTime":"2021-12-28T13:36:21","Boot":31,"Core":"2_7_4_9","SDK":"2.2.2-dev(38a443e)","CpuFrequency":80,"Hardware":"ESP8266EX","CR":"372/699"}}
17:16:58.307 MQT: stat/s5/STATUS3 = {"StatusLOG":{"SerialLog":2,"WebLog":2,"MqttLog":0,"SysLog":2,"LogHost":"192.168.26.241","LogPort":1514,"SSId":["Xxxxxxxxx","Recovery"],"TelePeriod":300,"Resolution":"558180C0","SetOption":["00008009","2805C80000000600001E5A0A000000000000","000000C0","00006000","00004000"]}}
17:16:58.342 MQT: stat/s5/STATUS4 = {"StatusMEM":{"ProgramSize":616,"Free":384,"Heap":19,"ProgramFlashSize":1024,"FlashSize":1024,"FlashChipId":"146085","FlashFrequency":40,"FlashMode":3,"Features":["00000809","8FDAC787","04368001","000000CF","010013C0","C000F981","00004004","00001000","00000020"],"Drivers":"1,2,3,4,5,6,7,8,9,10,12,16,18,19,20,21,22,24,26,27,29,30,35,37,45","Sensors":"1,2,3,4,5,6"}}
17:16:58.376 MQT: stat/s5/STATUS5 = {"StatusNET":{"Hostname":"s5-3383","IPAddress":"192.168.26.165","Gateway":"192.168.26.254","Subnetmask":"255.255.255.0","DNSServer1":"192.168.26.246","DNSServer2":"0.0.0.0","Mac":"A4:CF:12:DA:AD:37","Webserver":2,"HTTP_API":1,"WifiConfig":4,"WifiPower":17.0}}
17:16:58.399 MQT: stat/s5/STATUS6 = {"StatusMQT":{"MqttHost":"192.168.26.241","MqttPort":1883,"MqttClientMask":"DVES_%06X","MqttClient":"DVES_DAAD37","MqttUser":"DVES_USER","MqttCount":1,"MAX_PACKET_SIZE":1200,"KEEPALIVE":30,"SOCKET_TIMEOUT":4}}
17:16:58.422 MQT: stat/s5/STATUS7 = {"StatusTIM":{"UTC":"2021-12-28T16:16:58","Local":"2021-12-28T17:16:58","StartDST":"2021-03-28T02:00:00","EndDST":"2021-10-31T03:00:00","Timezone":"+01:00","Sunrise":"08:43","Sunset":"16:59"}}
17:16:58.441 MQT: stat/s5/STATUS10 = {"StatusSNS":{"Time":"2021-12-28T17:16:58"}}
17:16:58.450 MQT: stat/s5/STATUS11 = {"StatusSTS":{"Time":"2021-12-28T17:16:58","Uptime":"0T00:01:24","UptimeSec":84,"Heap":19,"SleepMode":"Dynamic","Sleep":50,"LoadAvg":19,"MqttCount":1,"POWER":"ON","Wifi":{"AP":1,"SSId":"Xxxxxxxx","BSSId":"6C:3B:6B:18:A4:6F","Channel":1,"Mode":"11n","RSSI":82,"Signal":-59,"LinkCount":1,"Downtime":"0T00:00:07"}}}

@arendst
Copy link
Owner

arendst commented Dec 28, 2021

You were just a few minutes too fast. Pls redo and look for a compile time of 15:06:38 visible at boottime and in the information webpage.

@Eddie-BS
Copy link

like this one ;-)

17:39:46.652 MQT: stat/s5/STATUS = {"Status":{"Module":1,"DeviceName":"s5","FriendlyName":["s5 licht kerstboom"],"Topic":"s5","ButtonTopic":"0","Power":1,"PowerOnState":3,"LedState":1,"LedMask":"FFFF","SaveData":1,"SaveState":1,"SwitchTopic":"0","SwitchMode":[0,0,0,0,0,0,0,0],"ButtonRetain":0,"SwitchRetain":0,"SensorRetain":0,"PowerRetain":0,"InfoRetain":0,"StateRetain":0}}
17:39:46.679 MQT: stat/s5/STATUS1 = {"StatusPRM":{"Baudrate":115200,"SerialConfig":"8N1","GroupTopic":"tasmotas","OtaUrl":"http://ota.tasmota.com/tasmota/tasmota.bin.gz","RestartReason":"Software/System restart","Uptime":"0T00:00:36","StartupUTC":"2021-12-28T16:39:10","Sleep":50,"CfgHolder":4617,"BootCount":2,"BCResetTime":"2021-12-28T17:13:18","SaveCount":21,"SaveAddress":"FA000"}}
17:39:46.712 MQT: stat/s5/STATUS2 = {"StatusFWR":{"Version":"10.1.0.3(tasmota)","BuildDateTime":"2021-12-28T15:06:38","Boot":31,"Core":"2_7_4_9","SDK":"2.2.2-dev(38a443e)","CpuFrequency":80,"Hardware":"ESP8266EX","CR":"372/699"}}

@Eddie-BS
Copy link

Eddie-BS commented Dec 30, 2021

S5 up for 48 hours, no exceptions, no hardware watchdog, no reboot, looks fine for now ...
upgraded 1 other device to latest ...

@arendst arendst added bug Type - Confirmated Bug fixed Result - The work on the issue has ended and removed troubleshooting Type - Troubleshooting labels Jan 1, 2022
@arendst
Copy link
Owner

arendst commented Jan 1, 2022

I've upgraded my "exceptiontional" devices too and see indeed no more unexplainable exceptions.

What you will see is an increase in MqttCount representing MQTT re-connection due to badly received MQTT data in the PubSubClient library. Where it used to move this bad data without checking it's boundaries it now disconnects from the server.

Tasmota will automaticcaly re-connect in that case.

This PubSubClient issue will more likely pop-up with users using Domoticz as Domoticz sends all it's sensor updates to the MQTT broker in addition to Domoticz commands. Tasmota needs to receive this data and decide on it's index number if it needs to do something with the Domoticz command.

@sammyke007
Copy link
Author

Today I probably can't test any further, but I'll flash my devices with the latest dev (2022.01.1 ?) ASAP to try it out too. Very curious! Tnx Theo for looking into this!

BTW: best wishes for everyone!

@Eddie-BS
Copy link

Eddie-BS commented Jan 1, 2022

Hi @arendst , in the mean time I updated 4 of my devices, each time after it had a corruption.
and yes, on all of the updated devices I don't see any exceptions anymore ...

my S5 is running for almost 4 days without exception or reboot.
S1 is running for 43 hours, and S4 and T5 for 4 hours now.

Only S1 shows a lot of MQTT reconnects (72) while the others are low (<5)

I will let you know in a couple of days, looks promising.
tnx again for your effort !

@sammyke007
Copy link
Author

Is 2022.01.1 OK?

@arendst
Copy link
Owner

arendst commented Jan 1, 2022

Yes.

@sammyke007
Copy link
Author

I'll OTA update a device or three (coming from 9.5.0) with it and I'll report back.

@sammyke007
Copy link
Author

First 3 devices on 2022.01.1 up for 20h. No exceptions, no corruptions. I've updated 5 more devices this morning. Only 2 left on 9.5.0.
I'll keep reporting in the coming days. Fingers crossed, but looks very promising!

@sammyke007
Copy link
Author

sammyke007 commented Jan 3, 2022

image

Daily report ;-)
The 2 top ones (28 days online) are still one 9.5.0. These are the only two devices that also don't seem to have rare exceptions on 9.5.0. I'll update them too if everything stays stable for some more days.

@Eddie-BS
Copy link

Eddie-BS commented Jan 3, 2022

Screenshot 2022-01-03 at 13 50 31

Report from me too ;-) as you can see, no exceptions and reboots ... Just increasing MqttCount. S5 was the 1st device I upped with this version. Looks pretty good to me. only the device tas test had 1 exception, that is the only nodemcu and not a sonoff ...

@arendst
Copy link
Owner

arendst commented Jan 3, 2022

OK. Thanks for the feedback.

That's in line with my obervations of previous "exceptional" devices too. I think it's safe to close this issue now.

@arendst arendst closed this as completed Jan 3, 2022
@sammyke007
Copy link
Author

image

HERO!

@arendst , somewhere I can donate you some coffee's?

@arendst
Copy link
Owner

arendst commented Jan 7, 2022

See the link in README.md

@mcmellow
Copy link

Hi to all.
At this moment my devices are on FW version 2022-01-08T18:01:27. I have no execptions in TDM or any odd on the console of the devices. All seems to be OK. However on the Domoticz site I am receiving the following error msg.

2022-01-15 18:21:57.315 Error: MQTT: Exception (GetValueFromTemplate): in Json::Value::resolveReference(key, end): requires objectValue! (Template: value_json[ENERGY][ApparentPower][0]) 2022-01-15 18:21:57.316 Error: MQTT: Exception (GetValueFromTemplate): in Json::Value::resolveReference(key, end): requires objectValue! (Template: value_json[ENERGY][ApparentPower][1]) 2022-01-15 18:21:57.316 Error: MQTT: Exception (GetValueFromTemplate): in Json::Value::resolveReference(key, end): requires objectValue! (Template: value_json[ENERGY][Current][0]) 2022-01-15 18:21:57.317 Error: MQTT: Exception (GetValueFromTemplate): in Json::Value::resolveReference(key, end): requires objectValue! (Template: value_json[ENERGY][Current][1]) 2022-01-15 18:21:57.317 Error: MQTT: Exception (GetValueFromTemplate): in Json::Value::resolveReference(key, end): requires objectValue! (Template: value_json[ENERGY][Factor][0]) 2022-01-15 18:21:57.317 Error: MQTT: Exception (GetValueFromTemplate): in Json::Value::resolveReference(key, end): requires objectValue! (Template: value_json[ENERGY][Factor][1]) 2022-01-15 18:21:57.317 Error: MQTT: Exception (GetValueFromTemplate): in Json::Value::resolveReference(key, end): requires objectValue! (Template: value_json[ENERGY][Power][0]) 2022-01-15 18:21:57.318 Error: MQTT: Exception (GetValueFromTemplate): in Json::Value::resolveReference(key, end): requires objectValue! (Template: value_json[ENERGY][Power][1]) 2022-01-15 18:21:57.318 Error: MQTT: Exception (GetValueFromTemplate): in Json::Value::resolveReference(key, end): requires objectValue! (Template: value_json[ENERGY][ReactivePower][0]) 2022-01-15 18:21:57.318 Error: MQTT: Exception (GetValueFromTemplate): in Json::Value::resolveReference(key, end): requires objectValue! (Template: value_json[ENERGY][ReactivePower][1])

My question. Is this Domoticz or Tasmota related?

@sammyke007
Copy link
Author

I think it's best to open a new issue as this doesn't seem related. All 10 devices still up and running on the 2022.01.1 release. Domoticz 2021.1 beta 13872. I don't use the MQTT AD (autodiscovery) however, I use dummy devices.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Type - Confirmated Bug fixed Result - The work on the issue has ended
Projects
None yet
Development

No branches or pull requests