Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deCONZ Linux HA with docker multiple gateways #3210

Closed
hre1 opened this issue Aug 30, 2020 · 15 comments
Closed

deCONZ Linux HA with docker multiple gateways #3210

hre1 opened this issue Aug 30, 2020 · 15 comments

Comments

@hre1
Copy link

hre1 commented Aug 30, 2020

Describe the question or issue you are having

I have a Linux HA Cluster consisted of two server with two conbee II usb sticks (one per server) and configured with a docker resource including the marthoc/deconz docker image. my aim is that in case of a failure of one server the other server take over the deCONZ Service. I configured the main server with all my zigbee devices and there are no problems with the deCONZ service. But if i migrate the deCONZ service to the backup server all connections are lost. Both conbee sticks have the same configuration/firmware and are recognized by the phoscon app on the two server nodes. I followed the instructions for "Network lost issues" on the backup server so that panid, channel and network id have the same values like on the main server. But it doesn't matter the zigbee devices are not recognized on the backup server. If i unmigrate (switch back to main server) the linux docker resource all things are fine. Are there any points that i missed out or is it impossible to reach a workable solution in that way?

Environment

  • Host system: Intel Server S1200v3RPO
  • Running method: (Ubuntu 16.04.7 LTS with Linux HA and Marthoc Docker container on a VMware Virtual Machine)
  • Firmware version: 26580700
  • deCONZ version: 2.05.80 / 14.8.2020
  • Device: ConBee II
  • Do you use an USB extension cable: yes

deCONZ Logs

[marthoc/deconz] Starting deCONZ...
[marthoc/deconz] Current deCONZ version: 2.05.80
[marthoc/deconz] Web UI port: 8000
[marthoc/deconz] Websockets port: 8043
[marthoc/deconz] VNC port: 5900
Killing Xtigervnc process ID 24... which was already dead
Cleaning stale pidfile '/root/.vnc/ha-node2.hre:0.pid'!
Cleaning stale x11 lock '/tmp/.X0-lock'!
Cleaning stale x11 lock '/tmp/.X11-unix/X0'!

New 'ha-node2.hre:0 (root)' desktop at :0 on machine ha-node2.hre

Starting applications specified in /etc/X11/Xvnc-session
Log file is /root/.vnc/ha-node2.hre:0.log

Use xtigervncviewer -SecurityTypes VncAuth,TLSVnc -passwd /root/.vnc/passwd ha-node2.hre:0 to connect to the VNC server.

libEGL warning: DRI2: failed to open swrast (search paths /usr/lib/x86_64-linux-gnu/dri:${ORIGIN}/dri:/usr/lib/dri)
libEGL warning: DRI2: failed to open swrast (search paths /usr/lib/x86_64-linux-gnu/dri:${ORIGIN}/dri:/usr/lib/dri)
libpng warning: iCCP: known incorrect sRGB profile
21:59:15:179 HTTP Server listen on address 0.0.0.0, port: 8000, root: /usr/share/deCONZ/webapp/
21:59:15:195 CTRL. 3.16.221:59:15:321 COM: --dev: /dev/ConBeeII (RaspBee)
21:59:15:321 ZCLDB init file /root/.local/share/dresden-elektronik/deCONZ/zcldb.txt
21:59:15:590 parent process /bin/sh
21:59:15:590 gw run mode: docker
21:59:15:590 GW sd-card image version file does not exist: /root/.local/share/dresden-elektronik/deCONZ/gw-version
21:59:15:591 DB sqlite version 3.16.2
21:59:15:599 DB PRAGMA page_count: 39
21:59:15:599 DB PRAGMA page_size: 4096
21:59:15:599 DB PRAGMA freelist_count: 1
21:59:15:599 DB file size 159744 bytes, free pages 1
21:59:15:599 DB PRAGMA user_version: 6
21:59:15:599 DB cleanup
21:59:15:600 DB create temporary views
21:59:15:615 started websocket server at port 8043
21:59:15:623 found node plugin: libde_rest_plugin.so - REST API Plugin
21:59:15:626 found node plugin: libde_signal_plugin.so - Signal Monitor Plugin
21:59:15:638 found node plugin: libstd_otau_plugin.so - STD OTAU Plugin
21:59:15:679 COM: --dev: /dev/ConBeeII (RaspBee)
21:59:15:809 New websocket 10.100.100.200:62468 (state: 3)
21:59:21:125 Announced to internet http://dresden-light.appspot.com/discover
21:59:25:343 COM: --dev: /dev/ConBeeII (RaspBee)
21:59:25:457 Device firmware version 0x26580700
21:59:25:462 unlocked max nodes: 200
21:59:25:562 Device protocol version: 0x010B
21:59:25:575 new node - ext: 0x00212effff0595ec, nwk: 0x0000
21:59:25:758 LightNode 4: Deckenlampe_Leuchte1 SZ added
21:59:25:777 LightNode 6: Extended color light 6 added
21:59:25:787 LightNode 7: Deckenlampe_Leuchte2 SZ added
21:59:25:799 LightNode 8: Deckenlampe_Leuchte3 SZ added
21:59:25:808 LightNode 9: Bettunterlicht SZ added
21:59:25:816 LightNode 10: Sideboard WZ added
21:59:25:830 LightNode 11: Wandregal oben WZ added
21:59:25:848 LightNode 12: Wandregal unten WZ added
21:59:25:858 LightNode 13: Deckenlampe Küche added
21:59:25:879 LightNode 15: Bettlampe rechts added
21:59:25:918 LightNode 16: Bettlampe links added
21:59:25:938 LightNode 17: Deckenlampe KZ added
21:59:25:955 SensorNode 2 set node 0x00158d0004877740
21:59:25:955 SensorNode 3 set node 0x00158d0004877740
21:59:25:980 LightNode 18: Schranklampe KZ added
21:59:26:177 Current channel 15
21:59:26:195 CTRL ANT_CTRL 0x03
21:59:26:219 Device protocol version: 0x010B
21:59:26:269 Current channel 15
21:59:26:286 CTRL ANT_CTRL 0x03
21:59:30:368 GW update firmware found: /usr/share/deCONZ/firmware/deCONZ_ConBeeII_0x26580700.bin.GCF
21:59:30:368 GW firmware version: 0x26580700
21:59:30:368 GW firmware version is up to date: 0x26580700
21:59:35:611 New websocket 10.10.10.12:46826 (state: 3)
21:59:35:806 LightNode 1: Configuration tool 1 added
21:59:42:405 Websocket disconnected 10.100.100.200:62468 (state: 0)
21:59:43:991 New websocket 10.100.100.200:62516 (state: 3)
21:59:58:057 0x00178801067ABA34 error APSDE-DATA.confirm: 0xD0 on task
22:00:08:963 0x00178801049ADCA7 error APSDE-DATA.confirm: 0xD0 on task
22:00:15:330 Current channel 15
22:00:15:338 Device TTL 6546 s flags: 0x7
22:00:19:857 0x00178801062F2006 error APSDE-DATA.confirm: 0xD0 on task
22:00:30:657 0x00178801062F3673 error APSDE-DATA.confirm: 0xD0 on task
22:00:41:553 0x001788010448D987 error APSDE-DATA.confirm: 0xD0 on task
22:00:52:460 0x0017880104045870 error APSDE-DATA.confirm: 0xD0 on task
22:01:03:256 0x0017880104045878 error APSDE-DATA.confirm: 0xD0 on task
22:01:14:063 0x00178801048C6124 error APSDE-DATA.confirm: 0xD0 on task
22:01:15:329 Current channel 15
22:01:15:337 Device TTL 6486 s flags: 0x7
22:01:24:970 0x00178801067AB7C9 error APSDE-DATA.confirm: 0xD0 on task
22:01:35:857 0x0017880108855DE9 error APSDE-DATA.confirm: 0xD0 on task
22:01:46:674 0x00178801088557B0 error APSDE-DATA.confirm: 0xD0 on task
22:01:57:622 0x0017880106ED7188 error APSDE-DATA.confirm: 0xD0 on task
22:02:08:556 0x001788010883D4AF error APSDE-DATA.confirm: 0xD0 on task
22:02:15:332 Current channel 15
22:02:15:340 Device TTL 6426 s flags: 0x7
22:02:20:252 0x00178801067ABA34 error APSDE-DATA.confirm: 0xD0 on task
22:02:31:056 0x00178801049ADCA7 error APSDE-DATA.confirm: 0xD0 on task
22:02:41:963 0x00178801062F2006 error APSDE-DATA.confirm: 0xD0 on task
22:02:52:862 0x00178801062F3673 error APSDE-DATA.confirm: 0xD0 on task
22:03:03:655 0x001788010448D987 error APSDE-DATA.confirm: 0xD0 on task
22:03:14:564 0x0017880104045870 error APSDE-DATA.confirm: 0xD0 on task
22:03:15:330 Current channel 15
22:03:15:338 Device TTL 6366 s flags: 0x7
22:03:25:490 0x0017880104045878 error APSDE-DATA.confirm: 0xD0 on task
22:03:30:340 GW firmware version: 0x26580700
22:03:30:341 GW firmware version is up to date: 0x26580700
22:03:36:356 0x00178801048C6124 error APSDE-DATA.confirm: 0xD0 on task
22:03:47:163 0x00178801067AB7C9 error APSDE-DATA.confirm: 0xD0 on task
22:03:58:059 0x0017880108855DE9 error APSDE-DATA.confirm: 0xD0 on task
22:04:08:956 0x00178801088557B0 error APSDE-DATA.confirm: 0xD0 on task
22:04:15:329 Current channel 15
22:04:15:337 Device TTL 6306 s flags: 0x7
22:04:19:763 0x0017880106ED7188 error APSDE-DATA.confirm: 0xD0 on task
22:04:30:559 0x001788010883D4AF error APSDE-DATA.confirm: 0xD0 on task
22:04:42:256 0x00178801067ABA34 error APSDE-DATA.confirm: 0xD0 on task
22:04:53:054 0x00178801049ADCA7 error APSDE-DATA.confirm: 0xD0 on task
22:05:03:960 0x00178801062F2006 error APSDE-DATA.confirm: 0xD0 on task
22:05:14:757 0x00178801062F3673 error APSDE-DATA.confirm: 0xD0 on task
22:05:15:331 Current channel 15
22:05:15:339 Device TTL 6246 s flags: 0x7
22:05:25:551 0x001788010448D987 error APSDE-DATA.confirm: 0xD0 on task
22:05:25:551 max transmit errors for node 0x001788010448D987, last seen by neighbors 359 s
22:05:28:910 0x9E53 seems to be a zombie recv errors 6
22:05:28:910 LightNode removed 0x001788010448d987
22:05:28:911 Node zombie state changed 0x001788010448d987
22:05:31:231 0x0017880104045870 error APSDE-DATA.confirm: 0xD0 on task
22:05:34:306 0x66EA seems to be a zombie recv errors 6
22:05:34:306 LightNode removed 0x00178801067aba34
22:05:34:306 Node zombie state changed 0x00178801067aba34
22:05:36:936 Unhandled node key 16777251

@Mimiix
Copy link
Collaborator

Mimiix commented Aug 31, 2020

Yes. Because the MACaddress in this case is different. That is where the devices are looking for. That also explains the 0xD0 On task errors. They are known, but they can't find their coordinator.

This probably would only work if the same MacID is used on both devices, and afaik you can't do that.

@Smanar
Copy link
Collaborator

Smanar commented Aug 31, 2020

You can set yourself a MAC adress but from my memory its on the backup too, so if you use the backup/restaure feature, you will have too the same MAC adress on the 2 gateway, no ?

@Mimiix
Copy link
Collaborator

Mimiix commented Aug 31, 2020

@Smanar I doubt that the end devices would take this.

@Smanar
Copy link
Collaborator

Smanar commented Aug 31, 2020

You are speaking about the MAC adress of the gateway ?

@Mimiix
Copy link
Collaborator

Mimiix commented Aug 31, 2020

Yes. The End devices/routers expect Gateway x to answer. Now they suddenly have gateway Y. To them as if it is Gateway X.

@Mimiix Mimiix changed the title deCONZ Linux HA with docker deCONZ Linux HA with docker multiple gateways Aug 31, 2020
@Smanar
Copy link
Collaborator

Smanar commented Aug 31, 2020

But this setting is on the backup file too. else when you change your gateway, the backup will never work.

@Mimiix
Copy link
Collaborator

Mimiix commented Aug 31, 2020

Hmm that might be true. Would it be possible that the range is to bad / interference?

@hre1 Are you using extension cables on both?

@hre1
Copy link
Author

hre1 commented Aug 31, 2020

Thank you for the fast replay. Yes i use extension cables on both conbee II sticks. the distance between both devices are approximately 1m. Do you mean this could be a problem?
The deCONZ application have a network settings configuration dialog to use a custom mac address. I changed the value to the IEEE Address (and the TC Address to) of the backup conbee II stick and restarted the docker container on the backup server. But the devices do not connect to the backup conbee II stick. Is there a way to force the zigbee devices to use the new IEEE Address of the backup coordinator?

@Smanar
Copy link
Collaborator

Smanar commented Aug 31, 2020

You are not using the 2 conbee in same time ?

@SwoopX
Copy link
Collaborator

SwoopX commented Aug 31, 2020

Yes, and with the same MAC address (at least for the time of testing).

It's like the Highlander: there can only be one. A device must have the exact same settings of the other to take over and that only works if one is powered down as the MAC address is taken over as well.

@hre1
Copy link
Author

hre1 commented Aug 31, 2020

@Smanar
Yes, this is what i did. i only switched the deCONZ application between the two linux ha nodes. But when i switched the docker container with the deCONZ application to the backup node, the main node remain powered on and still served the conbee usb port with power. So i do not know what exactly happend in this case...

This is the scenario:

  1. bought a conbee II and configured all my devices (IEEE -> 0x00212effff0595ec, PANID-> 1529, Channel -> 15)
  2. bought a second conbee II and plugged them into the backup server and switched the deCONZ docker service to the backup server (in the deCONZ application a second configurator box appears with Name -> 0x000, IEEE -> 0x00212effff05e828, Phoscon show now Network ID -> 67A8, Channel -> 25 with no connections) --> main server with his conbee is still powered on
  3. Followed the instructions for "Network lost issues" with deCONZ application running on the backup server, after that Phoscon show Network ID -> 1529, Channel -> 15
  4. Switched several times the docker container between the two nodes -> all zigbee devices working but the configurator box in the deCONZ application with IEEE -> 0x00212effff05e828 had never connections to the zigbee devices
  5. in the deCONZ application network settings i changed the values for the IEEE Address and TC Address to 0x00212effff05e828, because i thought this is the backup server conbee stick and restarted the docker container-> all zigbee devices working but the configurator box in the deCONZ application with IEEE -> 0x00212effff05e828 had never connections to the zigbee devices
  6. removed the main conbee usb stick un plugged them into the backup server -> all zigbee devices working but the configurator box in the deCONZ application with IEEE -> 0x00212effff05e828 had never connections to the zigbee devices
  7. plugged the second bought conbee back to the backup server and let the first bought conbee unplugged, restart the VM -> total confusion, the configurator box in the deCONZ application with IEEE -> 0x00212effff05e828 shows connections to the zigbee devices but only one per zigbee device and some devices working, other do not, the configurator box in the deCONZ application with IEEE -> 0x00212effff0595ec shows no connections
  8. restore the last backup via phoscon and followed the instructions for "Network lost issues" again, restarted the VM -> all zigbee devices working but the configurator box in the deCONZ application with IEEE -> 0x00212effff05e828 had never connections to the zigbee devices

BUT-----> the configurator box in the deCONZ application with IEEE -> 0x00212effff0595ec have all the connections to the zigbee devices

So it seems, that the second bought remaining plugged in conbee usb stick get the MAC Address 0x00212effff0595ec from the deCONZ application and do not have a burned in MAC Address of 0x00212effff05e828 like a network card for instance. In that case i do not know what happen with the second plugged in conbee stick in my main server. Does the MAC Address and the coordinator function survive the docker migrations till the usb stick is plugged out? If so then the same mac address and coordinator function coexist on both conbee sticks at the same time and the deCONZ application do not take account of that behaviour. I thought that in case of shutting down the deCONZ application the conbee stick will also deconfigured form the application?

Sorry for the long text, but i'm not a zigbee expert... ;-))))

@ebaauw
Copy link
Collaborator

ebaauw commented Sep 1, 2020

Short answer: what you want cannot be done. Long answer below, but does require more understanding of Zigbee and of deCONZ. See deCONZ for dummies as a start.

You can only have one coordinator (blue node with NWK address 0x0000) on a Zigbee network. Each device on a network needs to have a unique mac address and NWK address. All devices need the same PANID, extended PANID, network key, channel, and network update ID. These parameters are stored, in non-volatile memory, on the device.

The backup/restore function allows a new device to take over the identity of a broken device, by configuring the mac address of the old device onto the new device. After that, you can never again power up the old device in range of the new device. This function is not intended to setup redundant gateways.

It’s perfectly possible to have multiple gateways on the same Zigbee network, and run two instances of deCONZ, each using its own device. As said, only one device can be the coordinator; the other needs to be configured as router (yellow node, other NWK address), and join the network of the coordinator. Unfortunately, there’s something broken in this function, so it might be challenging to setup, see e.g. #2788, #21. Make sure to leave the network, write the changes, read them back, and join the network; the changes won’t stick while the device is in the network.

It will be challenging to use two REST API plugins on the same network. Each will try and configure devices, setting up bindings for attribute reporting. This will likely require more bindings than a device supports. Each instance has its own database, managing its own REST resources. You need to pair each device twice (without resetting it) for the resources to be created on each gateway. You probably need to patch the database and change the resource IDs, so both gateways expose the same device using the same resource.

@stale
Copy link

stale bot commented Sep 26, 2020

As there hasn't been any response in 21 days, this issue has been automatically marked as stale. At OP: Please either close this issue or keep it active It will be closed in 7 days if no further activity occurs.

1 similar comment
@stale
Copy link

stale bot commented Oct 17, 2020

As there hasn't been any response in 21 days, this issue has been automatically marked as stale. At OP: Please either close this issue or keep it active It will be closed in 7 days if no further activity occurs.

@stale stale bot added the stale label Oct 17, 2020
@stale
Copy link

stale bot commented Oct 25, 2020

As there hasn't been any response in 28 days, this issue will be closed. @ OP: If this issue is solved post what fixed it for you. If it isn't solved, request to get this opened again.

@stale stale bot closed this as completed Oct 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants