All devices gone after power outage #11759

Kodemikkel · 2022-03-08T11:31:54Z

What happened?

After waking up from a night's sleep, I noticed in HA that some devices were unavailable. After some more looking around, I found all my Z2M devices were unavailable.

Heading over to my Z2M docker container running on an Unraid host, I saw none of my devices were listed anywhere, and "Devices", "Dashboard" and "Map" were all blank, although my settings were still the same.
I do suspect that there was a power outage while I was sleeping, as all the lights had turned on and after booting my PC, there were some indications that it had lost power.
After power cycling a device I had nearby and setting Z2M to allow joining, the device appeared as previously configured without needing to change anything manually.

Now I have power cycled/reset all my devices, and they are all fully functional as they were before. What is weird is that I have, on several occasions, unintentionally, removed power from my running Unraid host without any issues at all.
This leads me to my questions:

Does anyone know why this happened, especially when the host has lost power on several occasions earlier without any issues?
Has anyone else experienced this issue before?
And if so, how did you fix it?

I am lucky to only have about 30 devices on my network, and resetting them doesn't take that long, although it is still boring and tedious. I could only imagine how it would be for someone with 100s of devices on their network.

What did you expect to happen?

Not losing all my devices after a power outage.

How to reproduce it (minimal and precise)

No idea.
As mentioned my host has lost power several times earlier without issues.

Zigbee2MQTT version

1.23.0-dev commit: afe94a7

Adapter firmware version

0x26720700

Adapter

ConBee2

Debug log

07MAR22 11:20:18.txt
08MAR22 08:30:21.txt

The first log is dated 07MAR22 11:20:18 and I assume the last line is right before the power is lost. (I can't find any useful information in this)
The second log is dated 08MAR22 08:30:21 and I assume it would be from when the power came back. The shutdown at the end of this log is me trying to restart it.

The text was updated successfully, but these errors were encountered:

MattWestb · 2022-03-08T19:58:34Z

If having one system in production with light then user real Zigbee light switches and binding them to the Zigbee Light groups so they is always working if the host system or internal internet is having problem.
Implanting light "HA way" you can always getting problem and all is not working and with Light groups its only one device that is falling and not 100% of the system.

The reason you need repower the device and having joining enable is that the coordinator is have its frame counter for the network key is out off sync or the system have restored one old backup of its after coming back after the power problem => all devices in the network is blocking then they thing its one replay attack (normal Zigbee security).

Kodemikkel · 2022-03-08T20:05:25Z

But how come if I manually cut the power to the host, everything works fine? I've never had that issue before when my host loses and restores power.

And also, HA does not really have anything to do with this, as the Z2M docker is running completely separately and it was the Z2M docker that had the issue.

Edit: Added some more information in the reply.

twsl · 2022-03-11T10:40:06Z

I had something similar happen to my docker-based instance 3 days ago after rebooting my server.
Zigbee2MQTT version 1.24.0-dev (commit #c49f546)
zigbee-herdsman (0.14.20)

eloo · 2022-03-14T12:17:12Z

Had the same issue today :(

Really a pitty that the system is not self-healing.

Zigbee2MQTT version
1.24.0-dev commit: [f7c6207](https://github.com/Koenkk/zigbee2mqtt/commit/f7c6207)
Coordinator type
ConBee2/RaspBee2
Coordinator revision
0x26720700
Coordinator IEEE Address
0x00212effff0656b9
Frontend version
0.6.77

github-actions · 2022-04-14T00:02:05Z

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days

adelaiglesia · 2022-04-14T01:54:25Z

Same issue today. Power loss on both Z2M and lights / devices and a lot of devices gone when Z2M reboots. I have manually cut the power before without consecuences.

github-actions · 2022-05-15T00:02:23Z

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days

eloo · 2022-05-16T15:46:59Z

is this really stale?
afaik there was no fix yet?

mihaiblaga89 · 2022-06-03T22:20:17Z

just had the same issue. Almost all devices gone after power outage, only 2 were present in z2m, both Hue motion sensors, the rest of 20 devices gone. I managed to get most of them back by resetting them but I have 4 Philips Hue outdoor lights that don't want to rejoin by themselves and I can't reset them. I'll need to remove them from the wall, get the serial number, add them to Philips Hue app and I think I'll keep them there, don't want to get the ladder out if a power outage happens again.

Using zzh stick. Also tried updating to latest coordinator firmware and keeping "Allow join" on all the time with the hope that some devices will rejoin by themselves but those 4 lights never did. Some IKEA buttons did rejoin when I pressed them but not all.

github-actions · 2022-07-04T00:40:14Z

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days

eloo · 2022-07-04T08:52:17Z

AFAIK not stale

Koenkk · 2022-07-04T15:10:09Z

What do you mean with "gone"?

Are the devices missing from the z2m frontend OR
Are non of the devices controllable?

Kodemikkel · 2022-07-04T15:26:34Z

What do you mean with "gone"?

Are the devices missing from the z2m frontend OR

Are non of the devices controllable?

Both

eloo · 2022-07-05T07:31:51Z

Yep for me it was also the same.. nearly all devices were not visible in Z2M frontend nor controllable in Homeassistent.
I had also need to repair them.

Maybe there is some unwritten state in Z2M which creates an inconsistency when the Z2M system is killed hardly?

Koenkk · 2022-07-05T14:51:00Z

Did something special happen around during crash? Z2M won't just empty its data/database.db file by itself

eloo · 2022-07-05T18:22:48Z

hmm i can not remember any special expect the power outage..

just checked what the database.db is.. and afaik this is just a json file?
maybe this file just gets corrupted from time to time?

as no proper database is used the "database.db" is lacking corruption prevention.
Maybe it would make sense to use something more robust here like a sqlite database?

further the last_seen is also stored in this file. so this file is going to have a lot write operations which could lead to a corruption while a power outage

adelaiglesia · 2022-07-05T18:31:35Z

What do you mean with "gone"?

Are the devices missing from the z2m frontend OR

Are non of the devices controllable?

Hi, sorry for the lack of clarity in my last response. The devices were present in z2m frontend but unreachable (all of them, 91 devices). None of the devices or groups were controllable. Coordinator were reflashed with same version but with no effect. Repairing all devices was necessary.

Just for your info, 80/91 devices are power line operated (not battery devices). Just in case that helps. I'm going to search if i have Database.db to share it in this conversation.

The workaround i have deployed is to connect Zigbee2mqtt machine to an UPS 🤣. At this time i think that z2m was writting just in the right moment and got corrupted. Only if i find the file we will know.

Thank you for your time

Koenkk · 2022-07-06T15:15:01Z

@eloo

as no proper database is used the "database.db" is lacking corruption prevention.

there is some corruption prevention, the db is first written to a temp path and then renamed (https://github.com/Koenkk/zigbee-herdsman/blob/f1c6a3887e9d7a763e9ec981543881716c75c5ff/src/controller/database.ts#L75). I agree that sqlite may be a better option but its also more complicated (and we are not sure yet this causes the issue).

further the last_seen is also stored in this file. so this file is going to have a lot write operations which could lead to a corruption while a power outage

not every last_seen state will rewrite the db, this is done occasionally

@adelaiglesia what did you see in the log when sending messages to the devices?

hitokiri8x · 2022-07-26T20:29:50Z

I don't know if it's the same but I describe my situation: stop ( maybe ungraceful ) of the container then all sensor are still paired but they receive no signal.
I have only aquara devices: windows, temperature and water.
Only the windows sensors when toggled works again; for the temperature/water sensors to work again I need to press the button ( not re-pair )

tripplet · 2022-08-17T12:58:18Z

I just had the same problem after a short power outage Z2M no longer showed any devices in the web interface, however the log looked normal.
The new database.db only contained 3 lines, 1 for the coordinator and 2 empty groups:

{"id":1,"type":"Coordinator","ieeeAddr":"0x...." ... }
{"id":2,"type":"Group","groupID":1,"members":[],"meta":{}}
{"id":3,"type":"Group","groupID":2,"members":[],"meta":{}}

All devices were gone.
Luckily I was able to restore a backup from Home Assistant which contained the database.db.
After restoring and restarting the addon all worked again no need for a lengthy repairing of all devices.

eloo · 2022-08-17T13:18:38Z

@tripplet okay.. that makes is more clear that the problem seems to be related to the database.db as restoring will fix it.

@Koenkk maybe as a quickfix the database.db can be duplicated every time? so maybe the old version will be just renamed with .bak or something like this?
so we can easily restore every time

Koenkk · 2022-08-17T16:37:33Z

I will check if I can come up with an easy recovery solution. Something like:

on save of db:

copy old db to something like database.db.bak as you suggested
write db to temp file with a closing mark at the end
copy db from temp file to database.db (if this only completes partially we get this issue)
if z2m starts next time it will check if it can find the closing mark such that it knows the db is complete, if not it will use the database.db.bak file if present

eloo · 2022-08-17T19:07:57Z

@Koenkk sounds like a good solution.
i also like the idea of the self healing check 👍

xit · 2022-09-17T05:35:04Z

Had the same issue the other night. Short power outage made my server reboot and only my Philips Hue motion sensors appeared, after they had detected motion.

Rolled back VM snapshot and everything was back to normal and I could finally turn off all the lights that turned on when the power came back. 😵‍💫

github-actions · 2022-10-18T00:04:50Z

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days

eloo · 2022-10-18T06:46:46Z

not stale AFAIK

mrwiwi · 2022-10-26T20:47:07Z

Just got the issue after power outage too, so stressing, luckcly i had a backup from a few days back.

mrwiwi · 2022-10-26T20:51:31Z

Note : my database.db.backup was only 4ko, when my backuped database.db was 43ko !

xit · 2022-11-03T18:54:47Z

Happened yet again. Sigh.

drhirn · 2022-11-18T07:55:52Z

Just had a power outage too. In my case the database was ok, but all z2m devices were offline and couldn't be controlled.

skinkie · 2022-12-10T19:38:52Z

Happened to me now twice too. Database was corrupted, and configuration.yaml isn't used to restore anything.

Koenkk · 2022-12-11T08:21:42Z

@skinkie can you provide me an example of how the data/database.db looked?

mrwiwi · 2022-12-11T08:27:10Z

@skinkie can you provide me an example of how the data/database.db looked?

For me ever time it looked brand new !

Koenkk · 2022-12-11T09:32:31Z

I've added the fsync call before the rename as suggested by @tripplet. Let's see if it still occurs after this.

Changes will be available in the dev branch in a few hours from now. (https://www.zigbee2mqtt.io/advanced/more/switch-to-dev-branch.html)

jjarven · 2022-12-18T07:29:44Z

I saw this behaviour with 1.28.4 yesterday.
Migrated from ZHA and had issues, zigbee2mqtt crashed many times during device pairing (the web front end stopped to acknowledging to pairing attempts and finally noticed the backend was down.

At one point, I had around 5 devices paired and backend crashed - when restarted, the devices were gone.
Thus the service automatic restart function is not working either - had to manually start.

maxime1992 · 2023-01-02T08:44:41Z

I had a very similar situation yesterday and while I had no power outage as far as I'm aware, I start to wonder if it's not somehow related : #15868

scottrhoyt · 2023-02-06T16:50:55Z

Hi, I just ran into a similar issue on 1.28.1 running in docker. I restarted the container and the WebUI was now depopulated of all devices and most other info (version numbers in about, map not working, .etc). Though looking at the log, it appears that devices are still paired and transmitting state and commands correctly. Here's what I tried to no avail:

Restart container
Rollback container data to known good state
Update Z2M to 1.30.0

noci2012 · 2023-06-27T23:20:02Z

Current version of Z2M: 1.31.2 commit: 21f51258
Conbee II, Firmware: 0x26580700

After a restart (requested through either the webinterface, or systemd restart for the Z2M process)
Most devices are off-line. - not forgotten... just off-line and the need to be repaired.
Devices that report are more likely to return than others.
Devices that were unavailable (mains device offline by being turned off from the mains) also have a better chance of returning.

44 devices, 31 using mains, 12 on battery, one never bothered to report either battery/mains (does run on battery).

Is it possible that a single status request that for some device gets lost in the traffic during startup causing disabling the device somehow?
Also observed (once noticed, no complete trackrecord) there is a failed poll in the log files BEFORE the stick has been registerd.

info  2023-06-27 10:29:17: Logging to console and directory: '/opt/zigbee2mqtt/data/log/2023-06-27.10-29-17' filename: log.txt
warn  2023-06-27 10:29:17: Failed to ping 'innr_plug1' (attempt 1/1, Read 0x18fc260000051121/1 genBasic(["zclVersion"], {"sendWhen":"immediate","timeout":10000,"disableResponse":false,"disableRecovery":true,"disableDefaultResponse":true,"direction":0,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null,"writeUndiv":false}) failed (no response received (17)))
info  2023-06-27 10:29:17: Starting Zigbee2MQTT version 1.31.2 (commit #21f51258)
info  2023-06-27 10:29:17: Starting zigbee-herdsman (0.14.117)
info  2023-06-27 10:29:17: zigbee-herdsman started (resumed)
info  2023-06-27 10:29:17: Coordinator firmware version: '{"meta":{"maintrel":0,"majorrel":38,"minorrel":88,"product":0,"revision":"0x26580700","transportrev":0},"type":"ConBee2/RaspBee2"}'

noci2012 · 2023-06-29T16:54:43Z

Additional: database.db entry for such a device that stubornly doesn't come back...

{
  "id": 22,
  "type": "Router",
  "ieeeAddr": "0x847127fffea9ccab",
  "nwkAddr": 10923,
  "manufId": 4644,
  "manufName": "ROBB smarrt",
  "powerSource": "Mains (single phase)",
  "modelId": "ROB_200-004-0",
  "epList": [
    1,
    242
  ],
  "endpoints": {
    "1": {
      "profId": 260,
      "epId": 1,
      "devId": 257,
      "inClusterList": [
        0,
        3,
        4,
        5,
        6,
        8,
        2821,
        4096
      ],
      "outClusterList": [
        25
      ],
      "clusters": {
        "genBasic": {
          "attributes": {
            "modelId": "ROB_200-004-0",
            "manufacturerName": "ROBB smarrt",
            "powerSource": 1,
            "zclVersion": 3,
            "appVersion": 0,
            "stackVersion": 0,
            "hwVersion": 1,
            "dateCode": "NULL",
            "swBuildId": "2.5.3_r51"
          }
        },
        "genOta": {
          "attributes": {
            "currentFileVersion": 51
          }
        },
        "genOnOff": {
          "attributes": {
            "onOff": 0
          }
        },
        "genLevelCtrl": {
          "attributes": {
            "currentLevel": 69,
            "onLevel": 255
          }
        }
      },
      "binds": [
        {
          "cluster": 6,
          "type": "endpoint",
          "deviceIeeeAddress": "0x00212effff06747e",
          "endpointID": 1
        },
        {
          "cluster": 8,
          "type": "endpoint",
          "deviceIeeeAddress": "0x00212effff06747e",
          "endpointID": 1
        }
      ],
      "configuredReportings": [
        {
          "cluster": 6,
          "attrId": 0,
          "minRepIntval": 0,
          "maxRepIntval": 3600,
          "repChange": 0
        }
      ],
      "meta": {}
    },
    "242": {
      "profId": 41440,
      "epId": 242,
      "devId": 102,
      "inClusterList": [
        33
      ],
      "outClusterList": [
        33
      ],
      "clusters": {},
      "binds": [],
      "configuredReportings": [],
      "meta": {}
    }
  },
  "appVersion": 0,
  "stackVersion": 0,
  "hwVersion": 1,
  "dateCode": "NULL",
  "swBuildId": "2.5.3_r51",
  "zclVersion": 3,
  "interviewCompleted": true,
  "meta": {
    "configured": 1461352984
  },
  "lastSeen": 1687851057212,
  "defaultSendRequestWhen": "immediate"
}

tripplet · 2023-09-28T19:40:13Z

Given the lack of new reports I think the fix works and this can be closed?

noci2012 · 2023-09-28T21:16:24Z

I avoided all updates until now, i will check next weekend, report or close whatever is appropriate.

noci2012 · 2023-10-01T21:04:41Z

No power outage....,
update through update.sh

After update: (& waiting half an hour):
All devices that send measurements (temperature, power usage, motion) are Online
All switchable devices (lamps, relais) are gone. (switches that give a power reading are present).

Repairing of some devices has issues... (from 5 lamps 4 would re-connect after reset, 5th lamp no reconnect).
One hour later no change.

noci2012 · 2023-10-02T12:07:30Z

Plugs/Switches are reported to be online, still giving errors:

Failed to read state of 'frients_plug1' after reconnect (Read 0x0015bc002f013410/2 genOnOff(["onOff"], {"sendWhen":"immediate","timeout":10000,"disableResponse":false,"disableRecovery":false,"disableDefaultResponse":tue,"direction":0,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null,"writeUndiv":false}) failed (no response received (48)))

This is equivalent for all plugs.

noci2012 · 2023-11-06T20:07:35Z

I have exeprienced several (8 or so) power failures last week, some devices were lost. (about 1 each time). Different types, etc.
The non-connecting lamp probably died during reset attempts... it will not reset anymore, it now just is an expensive dumb bulb. Not sure what genius thought turning off/on a device 6 times in a short time is a sane reset method.

I consider this solved. It can be closed. (I cannot do it).

joaquinvacas · 2023-11-06T20:24:07Z

I have exeprienced several (8 or so) power failures last week, some devices were lost. (about 1 each time). Different types, etc. The non-connecting lamp probably died during reset attempts... it will not reset anymore, it now just is an expensive dumb bulb. Not sure what genius thought turning off/on a device 6 times in a short time is a sane reset method.

I consider this solved. It can be closed. (I cannot do it).

It still happens from time to time, I'm making periodical backups so I can restore it if something happens.

mmerickel · 2023-11-23T03:12:37Z

Using 1.33.2 with an ezsp adapter (sonoff dongle-e) I had setup z2m for the first time and connected 22 devices. I then clicked the restart button on the z2m web ui, and when it came back every device was gone. The database.db.backup was there, and contained all of the devices. I put it back in place and z2m started showing all of the devices but the network was non-functional and the devices were gone from HA. I had to repair everything to re-establish the network. Did not see any errors in the logs.txt files from before/after the restart. This is hugely concerning as a first time user of z2m, trying it after never seeing an issue like this with ZHA over about 1.5 years.

gorstj · 2023-12-30T12:15:47Z

I think this issue is still present. See the following bug:

See #19988

Brachterbaek · 2024-02-28T16:13:29Z

I had the same issue yesterday (1.35.3-1). Circuit breaker trip which also contained the socket my server is on. Lost al 32 devices after everything was back up running again. Had to manually add everything back to Z2M, had no good old backup sadly.
The second problem that occurred is that InfluxDB and Grafana now don't included new data on device-id, the id's of my devices haven't changed of course but new data is only added when using device name and not identity-id anymore.

Kodemikkel added the problem Something isn't working label Mar 8, 2022

github-actions bot added the stale Stale issues label Apr 14, 2022

github-actions bot removed the stale Stale issues label Apr 15, 2022

github-actions bot added the stale Stale issues label May 15, 2022

github-actions bot removed the stale Stale issues label May 17, 2022

github-actions bot added the stale Stale issues label Jul 4, 2022

github-actions bot removed the stale Stale issues label Jul 5, 2022

github-actions bot added the stale Stale issues label Oct 18, 2022

Koenkk added dont-stale and removed stale Stale issues labels Oct 18, 2022

Koenkk added a commit to Koenkk/zigbee-herdsman that referenced this issue Dec 11, 2022

Attempt to fix empty database writes. Koenkk/zigbee2mqtt#11759

a8daebf

reneklootwijk mentioned this issue Jan 1, 2023

Z-Stack 1.2 Backup & Restore Koenkk/zigbee-herdsman#395

Merged

All devices gone after power outage #11759

All devices gone after power outage #11759

Comments

Kodemikkel commented Mar 8, 2022

What happened?

What did you expect to happen?

How to reproduce it (minimal and precise)

Zigbee2MQTT version

Adapter firmware version

Adapter

Debug log

MattWestb commented Mar 8, 2022

Kodemikkel commented Mar 8, 2022 • edited

twsl commented Mar 11, 2022 • edited

eloo commented Mar 14, 2022

github-actions bot commented Apr 14, 2022

adelaiglesia commented Apr 14, 2022

github-actions bot commented May 15, 2022

eloo commented May 16, 2022

mihaiblaga89 commented Jun 3, 2022

github-actions bot commented Jul 4, 2022

eloo commented Jul 4, 2022

Koenkk commented Jul 4, 2022

Kodemikkel commented Jul 4, 2022

eloo commented Jul 5, 2022

Koenkk commented Jul 5, 2022

eloo commented Jul 5, 2022

adelaiglesia commented Jul 5, 2022 • edited

Koenkk commented Jul 6, 2022 • edited

hitokiri8x commented Jul 26, 2022

tripplet commented Aug 17, 2022

eloo commented Aug 17, 2022

Koenkk commented Aug 17, 2022

eloo commented Aug 17, 2022

xit commented Sep 17, 2022

github-actions bot commented Oct 18, 2022

eloo commented Oct 18, 2022 • edited

mrwiwi commented Oct 26, 2022

mrwiwi commented Oct 26, 2022

xit commented Nov 3, 2022

drhirn commented Nov 18, 2022

skinkie commented Dec 10, 2022

Koenkk commented Dec 11, 2022

mrwiwi commented Dec 11, 2022

Koenkk commented Dec 11, 2022

jjarven commented Dec 18, 2022

maxime1992 commented Jan 2, 2023

scottrhoyt commented Feb 6, 2023

noci2012 commented Jun 27, 2023 • edited

noci2012 commented Jun 29, 2023

tripplet commented Sep 28, 2023

noci2012 commented Sep 28, 2023 • edited

noci2012 commented Oct 1, 2023 • edited

noci2012 commented Oct 2, 2023 • edited

noci2012 commented Nov 6, 2023 • edited

joaquinvacas commented Nov 6, 2023

mmerickel commented Nov 23, 2023 • edited

gorstj commented Dec 30, 2023

Brachterbaek commented Feb 28, 2024

Kodemikkel commented Mar 8, 2022 •

edited

twsl commented Mar 11, 2022 •

edited

adelaiglesia commented Jul 5, 2022 •

edited

Koenkk commented Jul 6, 2022 •

edited

eloo commented Oct 18, 2022 •

edited

noci2012 commented Jun 27, 2023 •

edited

noci2012 commented Sep 28, 2023 •

edited

noci2012 commented Oct 1, 2023 •

edited

noci2012 commented Oct 2, 2023 •

edited

noci2012 commented Nov 6, 2023 •

edited

mmerickel commented Nov 23, 2023 •

edited