-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
All devices gone after power outage #11759
Comments
If having one system in production with light then user real Zigbee light switches and binding them to the Zigbee Light groups so they is always working if the host system or internal internet is having problem. The reason you need repower the device and having joining enable is that the coordinator is have its frame counter for the network key is out off sync or the system have restored one old backup of its after coming back after the power problem => all devices in the network is blocking then they thing its one replay attack (normal Zigbee security). |
But how come if I manually cut the power to the host, everything works fine? I've never had that issue before when my host loses and restores power. And also, HA does not really have anything to do with this, as the Z2M docker is running completely separately and it was the Z2M docker that had the issue. Edit: Added some more information in the reply. |
I had something similar happen to my docker-based instance 3 days ago after rebooting my server. |
Had the same issue today :( Really a pitty that the system is not self-healing.
|
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days |
Same issue today. Power loss on both Z2M and lights / devices and a lot of devices gone when Z2M reboots. I have manually cut the power before without consecuences. |
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days |
is this really stale? |
just had the same issue. Almost all devices gone after power outage, only 2 were present in z2m, both Hue motion sensors, the rest of 20 devices gone. I managed to get most of them back by resetting them but I have 4 Philips Hue outdoor lights that don't want to rejoin by themselves and I can't reset them. I'll need to remove them from the wall, get the serial number, add them to Philips Hue app and I think I'll keep them there, don't want to get the ladder out if a power outage happens again. Using zzh stick. Also tried updating to latest coordinator firmware and keeping "Allow join" on all the time with the hope that some devices will rejoin by themselves but those 4 lights never did. Some IKEA buttons did rejoin when I pressed them but not all. |
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days |
AFAIK not stale |
What do you mean with "gone"?
|
Both |
Yep for me it was also the same.. nearly all devices were not visible in Z2M frontend nor controllable in Homeassistent. Maybe there is some unwritten state in Z2M which creates an inconsistency when the Z2M system is killed hardly? |
Did something special happen around during crash? Z2M won't just empty its |
hmm i can not remember any special expect the power outage.. just checked what the database.db is.. and afaik this is just a json file? as no proper database is used the "database.db" is lacking corruption prevention. further the last_seen is also stored in this file. so this file is going to have a lot write operations which could lead to a corruption while a power outage |
Hi, sorry for the lack of clarity in my last response. The devices were present in z2m frontend but unreachable (all of them, 91 devices). None of the devices or groups were controllable. Coordinator were reflashed with same version but with no effect. Repairing all devices was necessary. Just for your info, 80/91 devices are power line operated (not battery devices). Just in case that helps. I'm going to search if i have Database.db to share it in this conversation. The workaround i have deployed is to connect Zigbee2mqtt machine to an UPS 🤣. At this time i think that z2m was writting just in the right moment and got corrupted. Only if i find the file we will know. Thank you for your time |
there is some corruption prevention, the db is first written to a temp path and then renamed (https://github.com/Koenkk/zigbee-herdsman/blob/f1c6a3887e9d7a763e9ec981543881716c75c5ff/src/controller/database.ts#L75). I agree that sqlite may be a better option but its also more complicated (and we are not sure yet this causes the issue).
not every last_seen state will rewrite the db, this is done occasionally @adelaiglesia what did you see in the log when sending messages to the devices? |
I don't know if it's the same but I describe my situation: stop ( maybe ungraceful ) of the container then all sensor are still paired but they receive no signal. |
I just had the same problem after a short power outage Z2M no longer showed any devices in the web interface, however the log looked normal. {"id":1,"type":"Coordinator","ieeeAddr":"0x...." ... }
{"id":2,"type":"Group","groupID":1,"members":[],"meta":{}}
{"id":3,"type":"Group","groupID":2,"members":[],"meta":{}} All devices were gone. |
@tripplet okay.. that makes is more clear that the problem seems to be related to the database.db as restoring will fix it. @Koenkk maybe as a quickfix the database.db can be duplicated every time? so maybe the old version will be just renamed with .bak or something like this? |
I will check if I can come up with an easy recovery solution. Something like: on save of db:
|
@Koenkk sounds like a good solution. |
Had the same issue the other night. Short power outage made my server reboot and only my Philips Hue motion sensors appeared, after they had detected motion. Rolled back VM snapshot and everything was back to normal and I could finally turn off all the lights that turned on when the power came back. 😵💫 |
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days |
not stale AFAIK |
Just got the issue after power outage too, so stressing, luckcly i had a backup from a few days back. |
Note : my database.db.backup was only 4ko, when my backuped database.db was 43ko ! |
Happened yet again. Sigh. |
Just had a power outage too. In my case the database was ok, but all z2m devices were offline and couldn't be controlled. |
Happened to me now twice too. Database was corrupted, and configuration.yaml isn't used to restore anything. |
@skinkie can you provide me an example of how the data/database.db looked? |
For me ever time it looked brand new ! |
I've added the Changes will be available in the dev branch in a few hours from now. (https://www.zigbee2mqtt.io/advanced/more/switch-to-dev-branch.html) |
I saw this behaviour with 1.28.4 yesterday. At one point, I had around 5 devices paired and backend crashed - when restarted, the devices were gone. |
I had a very similar situation yesterday and while I had no power outage as far as I'm aware, I start to wonder if it's not somehow related : #15868 |
Hi, I just ran into a similar issue on 1.28.1 running in docker. I restarted the container and the WebUI was now depopulated of all devices and most other info (version numbers in about, map not working, .etc). Though looking at the log, it appears that devices are still paired and transmitting state and commands correctly. Here's what I tried to no avail:
|
Current version of Z2M: 1.31.2 commit: 21f51258 After a restart (requested through either the webinterface, or systemd restart for the Z2M process) 44 devices, 31 using mains, 12 on battery, one never bothered to report either battery/mains (does run on battery). Is it possible that a single status request that for some device gets lost in the traffic during startup causing disabling the device somehow?
|
Additional: database.db entry for such a device that stubornly doesn't come back...
|
Given the lack of new reports I think the fix works and this can be closed? |
I avoided all updates until now, i will check next weekend, report or close whatever is appropriate. |
No power outage...., After update: (& waiting half an hour): Repairing of some devices has issues... (from 5 lamps 4 would re-connect after reset, 5th lamp no reconnect). |
Plugs/Switches are reported to be online, still giving errors:
This is equivalent for all plugs. |
I have exeprienced several (8 or so) power failures last week, some devices were lost. (about 1 each time). Different types, etc. I consider this solved. It can be closed. (I cannot do it). |
It still happens from time to time, I'm making periodical backups so I can restore it if something happens. |
Using 1.33.2 with an ezsp adapter (sonoff dongle-e) I had setup z2m for the first time and connected 22 devices. I then clicked the restart button on the z2m web ui, and when it came back every device was gone. The |
I think this issue is still present. See the following bug: See #19988 |
I had the same issue yesterday (1.35.3-1). Circuit breaker trip which also contained the socket my server is on. Lost al 32 devices after everything was back up running again. Had to manually add everything back to Z2M, had no good old backup sadly. |
What happened?
After waking up from a night's sleep, I noticed in HA that some devices were unavailable. After some more looking around, I found all my Z2M devices were unavailable.
Heading over to my Z2M docker container running on an Unraid host, I saw none of my devices were listed anywhere, and "Devices", "Dashboard" and "Map" were all blank, although my settings were still the same.
I do suspect that there was a power outage while I was sleeping, as all the lights had turned on and after booting my PC, there were some indications that it had lost power.
After power cycling a device I had nearby and setting Z2M to allow joining, the device appeared as previously configured without needing to change anything manually.
Now I have power cycled/reset all my devices, and they are all fully functional as they were before. What is weird is that I have, on several occasions, unintentionally, removed power from my running Unraid host without any issues at all.
This leads me to my questions:
I am lucky to only have about 30 devices on my network, and resetting them doesn't take that long, although it is still boring and tedious. I could only imagine how it would be for someone with 100s of devices on their network.
What did you expect to happen?
Not losing all my devices after a power outage.
How to reproduce it (minimal and precise)
No idea.
As mentioned my host has lost power several times earlier without issues.
Zigbee2MQTT version
1.23.0-dev commit: afe94a7
Adapter firmware version
0x26720700
Adapter
ConBee2
Debug log
07MAR22 11:20:18.txt
08MAR22 08:30:21.txt
The first log is dated 07MAR22 11:20:18 and I assume the last line is right before the power is lost. (I can't find any useful information in this)
The second log is dated 08MAR22 08:30:21 and I assume it would be from when the power came back. The shutdown at the end of this log is me trying to restart it.
The text was updated successfully, but these errors were encountered: