Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Permanently MQTT Reconnects #5146

Closed
3 of 9 tasks
helmar74 opened this issue Feb 7, 2019 · 30 comments
Closed
3 of 9 tasks

Permanently MQTT Reconnects #5146

helmar74 opened this issue Feb 7, 2019 · 30 comments
Labels
awaiting feedback Action - Waiting for response or more information duplicated Result - Duplicated Issue troubleshooting Type - Troubleshooting

Comments

@helmar74
Copy link

helmar74 commented Feb 7, 2019

Hi,

I do have permanent MQTT Disconnects/Connects in my mosquitto MQTT Broker log:
1549537348: New client connected from 192.168.1.XXX as S20Steckdose5 (c1, k10, u'DVES_USER'). 1549537348: New connection from 192.168.1.YYY on port 1883. 1549537348: New client connected from 192.168.1.YYY as S20Steckdose4 (c1, k10, u'DVES_USER'). 1549537349: New connection from 192.168.1.ZZZ on port 1883. 1549537349: New client connected from 192.168.1.ZZZ as S20Steckdose7 (c1, k10, u'DVES_USER'). 1549537577: Saving in-memory database to /var/lib/mosquitto/mosquitto.db.

I had this issue when using CORE Library 2.4.2. I updated to CORE Library 2.5.0, but the issue still persists.
Does anyone know, how to fix this issue?

I also found this interessting articles:
https://community.home-assistant.io/t/constant-mqtt-devices-disconnections-socket-error/83477/50

He fixed it like that:
image

He raised KEEPALIVE and TIMEOUT Parameter for MQTT connection.
Is this "recommend" to fix the issue, or is there an other solution?

Thank's a lot

Helmar

IMPORTANT NOTICE
If you do not complete the template below it is likely that your issue will not be addressed. When providing information about your issue please be as extensive as possible so that it can be solved by as little as possible responses.

FAILURE TO COMPLETE THE REQUESTED INFORMATION WILL RESULT IN YOUR ISSUE BEING CLOSED

Make sure these boxes are checked [x] before submitting your issue - Thank you!

19:20:54 CMD: status 0
19:20:54 MQT: stat/sonoffS20_5/STATUS = {"Status":{"Module":8,"FriendlyName":["S20Steckdose5"],"Topic":"sonoffS20_5","ButtonTopic":"0","Power":0,"PowerOnState":3,"LedState":1,"SaveData":1,"SaveState":1,"SwitchTopic":"0","SwitchMode":[0,0,0,0,0,0,0,0],"ButtonRetain":0,"SwitchRetain":0,"SensorRetain":0,"PowerRetain":0}}
19:20:54 MQT: stat/sonoffS20_5/STATUS1 = {"StatusPRM":{"Baudrate":115200,"GroupTopic":"sonoffS20StandardGruppe","OtaUrl":"http://192.168.1.XXX:5555/data/firmwares/sonoff.bin","RestartReason":"Software/System restart","Uptime":"0T00:18:34","StartupUTC":"2019-02-07T11:02:20","Sleep":50,"BootCount":38,"SaveCount":448,"SaveAddress":"FA000"}}
19:20:54 MQT: stat/sonoffS20_5/STATUS2 = {"StatusFWR":{"Version":"6.4.1(sonoff)","BuildDateTime":"2019.02.07 11:56:51","Boot":31,"Core":"2_5_0","SDK":"3.0.0-dev(c0f7b44)"}}
19:20:54 MQT: stat/sonoffS20_5/STATUS3 = {"StatusLOG":{"SerialLog":2,"WebLog":2,"SysLog":1,"LogHost":"192.168.1.4","LogPort":514,"SSId":["MYSSID","sonoffap"],"TelePeriod":300,"SetOption":["00000009","558180C0","00000001"]}}
19:20:54 MQT: stat/sonoffS20_5/STATUS4 = {"StatusMEM":{"ProgramSize":588,"Free":412,"Heap":13,"ProgramFlashSize":1024,"FlashSize":1024,"FlashChipId":"14405E","FlashMode":3,"Features":["00000407","0FFAE78C","00038FA0","23BE1FCE","00003BC0"]}}
19:20:54 MQT: stat/sonoffS20_5/STATUS5 = {"StatusNET":{"Hostname":"S20Steckdose5","IPAddress":"192.168.1.XXX53","Gateway":"192.168.1.XXX","Subnetmask":"255.255.255.0","DNSServer":"192.168.1.XXX","Mac":"DC:4F:22:08:45:60","Webserver":2,"WifiConfig":3}}
19:20:54 MQT: stat/sonoffS20_5/STATUS6 = {"StatusMQT":{"MqttHost":"192.168.1.XXX","MqttPort":1883,"MqttClientMask":"S20Steckdose5","MqttClient":"S20Steckdose5","MqttUser":"DVES_USER","MqttType":1,"MAX_PACKET_SIZE":1000,"KEEPALIVE":15}}
19:20:54 MQT: stat/sonoffS20_5/STATUS7 = {"StatusTIM":{"UTC":"Thu Feb 07 18:20:54 2019","Local":"Thu Feb 07 19:20:54 2019","StartDST":"Sun Mar 31 02:00:00 2019","EndDST":"Sun Oct 27 03:00:00 2019","Timezone":99,"Sunrise":"07:30","Sunset":"17:16"}}
19:20:54 MQT: stat/sonoffS20_5/STATUS10 = {"StatusSNS":{"Time":"2019-02-07T19:20:54"}}
19:20:54 MQT: stat/sonoffS20_5/STATUS11 = {"StatusSTS":{"Time":"2019-02-07T19:20:54","Uptime":"0T00:18:34","Vcc":3.475,"SleepMode":"Dynamic","Sleep":50,"LoadAvg":19,"POWER":"OFF","Wifi":{"AP":1,"SSId":"MYSSID","BSSId":"9C:C7:A6:F7:72:A2","Channel":6,"RSSI":88}}}

(Please, remember to close the issue when the problem has been addressed)

@ascillato
Copy link
Contributor

Hi, the real fix was that he uses core 2.3.0.

Some routers are better for core 2.3.0 and some others for newer cores.
You need to test it for your case.

@helmar74
Copy link
Author

helmar74 commented Feb 7, 2019

OK, I'm running a Fritzbox 7590 wit 1750E Accesspoints in MESH Configuration.
Is someone here, who have the same setup and is not suffering this problem?

The link above shows for CORE 2.3.0:
Some Routers of brands Ubiquity and Fritzbox produces conflicts and disconnections

As I'm running a Fritzbox Core 2.3.0 shouldn't be a solution....

@ascillato
Copy link
Contributor

If you look into issues you will see that fritzbox has issues with the Arduino core.
Please, try core 2.3.0. The precompiled bins with core 2.3.0 are already compiled for you.

@ascillato2 ascillato2 added duplicated Result - Duplicated Issue awaiting feedback Action - Waiting for response or more information troubleshooting Type - Troubleshooting labels Feb 7, 2019
@helmar74
Copy link
Author

helmar74 commented Feb 7, 2019

If the Fritzbox is the cause, it's more a WIfI issue, than an MQTT issue?
(I also was thinking of changing MQTT Library to MQTT_TASMOTAMQTT)
Currently I'm using MQTT_PUBSUBCLIENT

I also turned off mDNS, until now the issue doesn't occured on that devices any more..
Have to wait longer, will provide feedback...

@ascillato2
Copy link
Collaborator

ascillato2 commented Feb 7, 2019

The mqtt reconnections are for socket error. Sockets are managed by the core. The core 2.3.0 may give you the solution. Please please test it. You can just do an OTA with the precompiled bin.

TasmotaMQTT is deprecated and no longer supported. It will not solve your issue. In fact it will give you under some circunstances software watchdogs restarts.

mDNS is unrelated to mqtt and sockets.

@helmar74
Copy link
Author

helmar74 commented Feb 7, 2019

After studying
https://github.com/arendst/Sonoff-Tasmota/wiki/Troubleshooting#wifi-issues-arduino-core-versions-and-expressif-sdk

My "plan" is like that:

  1. If the disconnects occurs again, I'll try sleep 0
    image

  2. If this doesn't work I'll try Core 2.3.0 Library

I keep you informed...

@helmar74
Copy link
Author

helmar74 commented Feb 7, 2019

Setting sleep 0 isn't really an option as the Load grows dramatically.
So I decided to "downgrade" to Core 2.3.0 all my devices.
Will provide feedback.
Thanks for your help so far!

@AndyPuettmann
Copy link

AndyPuettmann commented Feb 7, 2019

Great!
Will confirm/test the fix too then.
I am also running into the issues with mqtt disconnects.
I am having a fritz box 7490. Could probably be the issue. Was suspecting node-red as culprit, but this could also explain it.

Downgraded to:
Program Version 6.4.1(release-sonoff)
Build Date & Time 2018-12-24T15:51:34
Core/SDK Version 2_3_0/1.5.3(aec24ac9)

for now.
Will give feedback in a day or two.
Should be easy to spot. As I had disconnects every 20-40 minutes

@AndyPuettmann
Copy link

AndyPuettmann commented Feb 7, 2019

Looks like I did not get a single disconnect since downgrading.
I noticed the Vcc reading changed a bit.

image

Should be a measurement change, not some actual change, right?
(I used the Vcc reported from the telemetry)

I'll leave it this way until tomorrow.
I will then switch back to the latest version and check if it starts again.

  • Andy

@ascillato
Copy link
Contributor

The change in VCC measurement is a known issue of the old core. Don't worry. It is ok.

@helmar74
Copy link
Author

helmar74 commented Feb 7, 2019

I updated 27 devices with core 2.3.0. Still having MQTT reconnects on 3 of them
it is one Sonoff S20, one Wemos Mini D (with WS 8212 connected) and one Shelly2...
The other Devices are working fine. Not quite sure if Core 2.3.0 really fixes the issue...

1549566494: New client connected from 192.168.1.180 as sonoffShelly2-1 (c1, k10, u'DVES_USER'). 1549566520: Client sonoffShelly2-1 has exceeded timeout, disconnecting. 1549566520: Socket error on client sonoffShelly2-1, disconnecting. 1549566541: New connection from 192.168.1.180 on port 1883.

1549561198: Client S20Steckdose3 has exceeded timeout, disconnecting. 1549561198: Socket error on client S20Steckdose3, disconnecting. 1549561199: New connection from 192.168.1.151 on port 1883. 1549561199: New client connected from 192.168.1.151 as S20Steckdose3 (c1, k10, u'DVES_USER').
Client sonoffWemosMini-7-GRBW has exceeded timeout, disconnecting. 1549564518: Socket error on client sonoffWemosMini-7-GRBW, disconnecting. 1549564527: New connection from 192.168.1.181 on port 1883. 1549564527: New client connected from 192.168.1.181 as sonoffWemosMini-7-GRBW (c1, k10, u'DVES_USER').

When the Shelly2 starts reconnecting, it happened very often:
image

@helmar74
Copy link
Author

helmar74 commented Feb 7, 2019

Disconnected Shelly2, giving again Feedback

@AndyPuettmann
Copy link

AndyPuettmann commented Feb 7, 2019

I had a single disconnect since installing the Core 2.3.0 version.
However, this was when the Fritz-Box switched the wireless channel to a different one for performance reasons. (My device count is only 3, so it cannot be compared to your setup.)

Were the now disconnecting devices having issues before?

@ascillato2
Copy link
Collaborator

ascillato2 commented Feb 7, 2019

@helmar74 can you config your router to use a fixed channel?

@helmar74
Copy link
Author

helmar74 commented Feb 8, 2019

@AndyPuettmann : Yes the Devices, which are disconnecting now, had also trouble before
@ascillato2 : I configured all of my Accesspoints (and the AVM MESH WIFI Network) to fixed channels.

Will provide feedback.........

@DavidFW1960
Copy link

I have a FB 7590 and 7490 (mesh repeater) so 2 strikes for me but the 2.3.0 core is 100% stable here. 2.4.2 and 2.5.0(beta 2 or 3) were an absolute nightmare! 2.3.0 is rock solid. I was also previously using auto for wifi channel but did fix that to a static channel... haven’t tried 2.3.0 core with auto channel..

@AndyPuettmann
Copy link

Okay. Well.
Not a single disconnect since switching version.

I the cause known exactly or pinned down specific parts of the Fritz-Box?
I am playing with the idea to get an old router with openwrt in access point mode for the sonoffs. Also possibly the raspberry pis with wlan to lan bridge via hostapd so they serve as access points in a transparent mode. Would this fix the issue?
I.e. is the direct connection to the fritzbox the problem or the fritzbox as dhcp server?

@helmar74
Copy link
Author

helmar74 commented Feb 8, 2019

Hi everbody,

what I've done so far:
I switched my Wifi completely to fixed channels.
I "upgraded" 9 of my S20 Sonoffs to Tasmota 6.4.1 with Core Version 2.5.0 (non BETA)
The FW was compiled by myself with some personal settings.
I was used to disable the mDNS usage in my_user_config.h, as I'm getting DNS failed messages
every second on a few of the S20's after a while. (not after startup, it took ~60 minutes)

After about 90 minutes the first MQTT reconnects occured.
When the MQTT reconnect occurs, I also see, that there has been a WIFI reconnect on the Access Point:
image

The Tasmota Device didn't logged a Wifi reconnect in his webconsole log. (I raised webconsole loglevel to debug)
The Device I was looking on is in the same room as the wifi access point.

Until now there were two reconnects of one device within 3 hours.

All other Tasmota Devices, which are running on a 2.3.0 Core based Firmware aren't reconnecting.... at the moment

I'll give regular feedback in this post.

Regards

helmar

@ascillato2
Copy link
Collaborator

Nice to know that fixing the wifi channel in the router and using core 2.3.0, the reconnection issue has been improved. Anyway, this is a known issue of the core that it is outside the control of Tasmota.

Closing this issue as it is not a Tasmota bug, but anyway, feel free to ask and comment on this trying to find a better way to workaround this core problem with fritzbox routers.

@helmar74
Copy link
Author

helmar74 commented Feb 8, 2019

OK, will do so ;-)

Here's the weblog of a 2.5.0 CORE Device, when MQTT Reconnect occurs:
17:25:23 MQT: tele/sonoffS20_7/STATE = {"Time":"2019-02-08T17:25:23","Uptime":"0T03:40:16","Vcc":3.552,"SleepMode":"Dynamic","Sleep":50,"LoadAvg":19,"POWER":"OFF","Wifi":{"AP":1,"SSId":"SSID","BSSId":"E8:DF:70:67:B9:00","Channel":11,"RSSI":68}} 17:28:14 WIF: verbinden mit AP1 SSID in Modus 11N wie S20Steckdose7... 17:28:18 WIF: verbunden 17:28:19 MQT: Verbindungsversuch... 17:28:19 MQT: verbunden 17:28:19 MQT: tele/sonoffS20_7/LWT = Online (beibehalten) 17:28:19 MQT: cmnd/sonoffS20_7/POWER = 17:30:24 MQT: tele/sonoffS20_7/STATE = {"Time":"2019-02-08T17:30:24","Uptime":"0T03:45:17","Vcc":3.540,"SleepMode":"Dynamic","Sleep":50,"LoadAvg":19,"POWER":"OFF","Wifi":{"AP":1,"SSId":"SSID","BSSId":"E8:DF:70:67:B9:00","Channel":11,"RSSI":72}}

@helmar74
Copy link
Author

helmar74 commented Feb 8, 2019

I increased weblog level to 4. Now I'm getting the following messages in webconsole:
17:33:08 WIF: Prüfe Verbindung... 17:33:08 WIF: verbunden 17:33:28 WIF: Prüfe Verbindung... 17:33:28 WIF: verbunden 17:33:48 WIF: Prüfe Verbindung... 17:33:48 WIF: verbunden

Will post update, when the reconnect occurs.....

@helmar74
Copy link
Author

helmar74 commented Feb 8, 2019

After about 3:30 hours it seems that the reconnects occurs more often.
Now I have reconnects of more and more 2.5.0 devices....

@meingraham
Copy link
Collaborator

Helmar,

Do you have a system where you can set up a syslog server? If so, it may be prudent to do so and to configure your devices to send their logs there so you don't miss any details. YOu can set the same level of logging (SysLog 4) and perhaps back down your WebLog level since it will be redundant once the detailed information is being sent to the server.

Mike

@helmar74
Copy link
Author

helmar74 commented Feb 8, 2019

When the MQTT Reconnect occurs, the 2.5.0 Device is logging this:

20:17:26 WIF: Prüfe Verbindung... 20:17:26 WIF: verbunden 20:17:44 MQT: Verbindungsversuch... 20:17:45 MQT: verbunden 20:17:45 MQT: tele/sonoffS20_8/LWT = Online (beibehalten) 20:17:45 MQT: cmnd/sonoffS20_8/POWER = 20:17:45 MQT: abonniere cmnd/sonoffS20_8/# 20:17:45 MQT: abonniere cmnd/sonoffStandardGruppe/# 20:17:45 MQT: abonniere cmnd/S20Steckdose8_fb/# 20:17:47 WIF: Prüfe Verbindung... 20:17:47 WIF: verbunden 20:18:07 WIF: Prüfe Verbindung...

@helmar74
Copy link
Author

helmar74 commented Feb 9, 2019

During the night (about 14 hours) all Devices on CORE 2.3.0 hadn't any disconnects.
now I tried to flash CORE 2.5.0 on 2 devices again, with only 1 Accesspoint configured.
One of the devices i do a reset 5 after flashing and I reconfigured new.

Waiting for any results... will provide Feedback.
I think, this will be my last try.

@helmar74
Copy link
Author

helmar74 commented Feb 9, 2019

Had reconnects again.
Also tried sleep 0, (with core 2.5.0) as suggested here:
https://github.com/arendst/Sonoff-Tasmota/wiki/Troubleshooting#disconnects-and-reconnects

Now (of course) the load get's higher.
Is it a Problem?

15:18:29 MQT: tele/sonoffS26_1/STATE = {"Time":"2019-02-09T15:18:29","Uptime":"0T06:51:53","Vcc":3.532,"SleepMode":"Dynamic","Sleep":0,"LoadAvg":677,"POWER":"OFF","Wifi":{"AP":1,"SSId":"SSID","BSSId":"9C:C7:A6:F7:72:A2","Channel":11,"RSSI":80}}

@helmar74
Copy link
Author

I've done the following with one of my core 2.5.0 S20s which was having permanent MQTT reconnects:

Erase the complete flash with command
esptool.py --port COM5 erase_flash
Load Tasmota into a device with command
esptool.py --port COM5 write_flash -fs 1MB -fm dout 0x0 sonoff.bin

Will provide feedback....

@AndyPuettmann
Copy link

Sleep 0 prevents the cpu from going to sleep.
So it does only increase the power consumption from something around 0,6W up to around 1.1W.

This may make a difference on your bill for electricity if running a ton of those devices.
But should not have any effect other than this.
(Could possibly get warmer if its inside an air-tight spot, should examine it after a day or two.

@helmar74
Copy link
Author

helmar74 commented Feb 12, 2019

I flashed a few S20 by wire. First I erased flash with esptool like suggested here:
https://github.com/arendst/Sonoff-Tasmota/wiki/Troubleshooting#disconnects-and-reconnects

After a while a few of the S20 reconnected, others didn't.
What I tried next:

  1. One S20 is connected to a 1750E, which is part of a FB7590 WLAN (Both running mesh on WIFI CHANNEL 6)
  2. One S20 is connected to separat Fritzbox 7390 (WIFI CHANNEL 1)
  3. As said, the Fritzbox is set to fixed channel, not autochannel.
  4. The 1750E is connected with Wire to the 7590.
  5. Both S20 are placed in the same room as the wifi components. (only 50cm between S20 and 1750E/7390)

Will see, what happens.... Provide again feedback

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting feedback Action - Waiting for response or more information duplicated Result - Duplicated Issue troubleshooting Type - Troubleshooting
Projects
None yet
Development

No branches or pull requests

6 participants