Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NCP_UHW_MG1B232_678_PA0-PA1-PB11_PA5-PA4.gbl results in all sorts of errors in my log and disfunction zigbee network #7

Open
gribouk opened this issue Mar 21, 2021 · 34 comments

Comments

@gribouk
Copy link

gribouk commented Mar 21, 2021

Hello!

It is a nice jab you've done, but it requires some more effort to fix the issues.

Currently, after updating to the latest firmware suggested ( NCP_UHW_MG1B232_678_PA0-PA1-PB11_PA5-PA4.gbl)
I am having all sorts of troubles running zigbee network, and I have a big one (~200 devices), though even in redundunt state (~50 devices - all bulbs) it fails instantly. I am getting the following errors in the log of my HA instance runnin only ZHA integration for the test purpose:

  1. NWK conflict is reported for 0x1f46 - for almost every device I have on the network
Logger: homeassistant.components.websocket_api.http.connection
Source: components/zha/light.py:234
Integration: Home Assistant WebSocket API (documentation, issues)
First occurred: 20:11:09 (7 occurrences)
Last logged: 20:12:15

[2771035904] duplicate 2 TSN
[2771035904] duplicate 42 TSN
[2771035904] duplicate 142 TSN
[2771035904] duplicate 150 TSN
[2771035904] duplicate 92 TSN
AssertionError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/components/websocket_api/commands.py", line 136, in handle_call_service
    await hass.services.async_call(
  File "/usr/src/homeassistant/homeassistant/core.py", line 1455, in async_call
    task.result()
  File "/usr/src/homeassistant/homeassistant/core.py", line 1490, in _execute_service
    await handler.job.target(service_call)
  File "/usr/src/homeassistant/homeassistant/helpers/entity_component.py", line 204, in handle_service
    await self.hass.helpers.service.entity_service_call(
  File "/usr/src/homeassistant/homeassistant/helpers/service.py", line 595, in entity_service_call
    future.result()  # pop exception if have
  File "/usr/src/homeassistant/homeassistant/helpers/entity.py", line 664, in async_request_call
    await coro
  File "/usr/src/homeassistant/homeassistant/helpers/service.py", line 632, in _handle_entity_call
    await result
  File "/usr/src/homeassistant/homeassistant/components/light/__init__.py", line 233, in async_handle_light_on_service
    await light.async_turn_on(**params)
  File "/usr/src/homeassistant/homeassistant/components/zha/light.py", line 546, in async_turn_on
    await super().async_turn_on(**kwargs)
  File "/usr/src/homeassistant/homeassistant/components/zha/light.py", line 234, in async_turn_on
    result = await self._on_off_channel.on()
  File "/usr/local/lib/python3.8/site-packages/zigpy/group.py", line 44, in request
    res = await self.application.mrequest(
  File "/usr/local/lib/python3.8/site-packages/bellows/zigbee/application.py", line 415, in mrequest
    with self._pending.new(message_tag) as req:
  File "/usr/local/lib/python3.8/site-packages/zigpy/util.py", line 262, in new
    raise ControllerException(f"duplicate {sequence} TSN") from AssertionError
zigpy.exceptions.ControllerException: duplicate 242 TSN
Logger: zigpy.device
Source: /usr/local/lib/python3.8/site-packages/zigpy/device.py:127
First occurred: 18:34:21 (10 occurrences)
Last logged: 19:25:31

[0xb057] Failed to discover active endpoints
[0x656e] Failed to discover active endpoints
[0x7fde] Failed to discover active endpoints
[0x96b3] Failed to discover active endpoints
[0xf1a0] Failed to discover active endpoints
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/zigpy/device.py", line 119, in _initialize
    status, _, endpoints = await self.zdo.Active_EP_req(
  File "/usr/local/lib/python3.8/site-packages/zigpy/util.py", line 110, in retry
    r = await func()
  File "/usr/local/lib/python3.8/site-packages/zigpy/device.py", line 214, in request
    raise zigpy.exceptions.DeliveryError(
zigpy.exceptions.DeliveryError: [0xad68:0:0x0005]: Message send failure
Logger: homeassistant.components.zha.core.gateway
Source: components/zha/core/gateway.py:157
Integration: Zigbee Home Automation (documentation, issues)
First occurred: 17:17:05 (55 occurrences)
Last logged: 18:28:02

Couldn't start EZSP = Silicon Labs EmberZNet protocol: Elelabs, HUSBZB-1, Telegesis coordinator
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/serial/urlhandler/protocol_socket.py", line 63, in open
    self._socket = socket.create_connection(self.from_url(self.portstr), timeout=POLL_TIMEOUT)
  File "/usr/local/lib/python3.8/socket.py", line 808, in create_connection
    raise err
  File "/usr/local/lib/python3.8/socket.py", line 796, in create_connection
    sock.connect(sa)
OSError: [Errno 113] Host is unreachable

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/components/zha/core/gateway.py", line 157, in async_initialize
    self.application_controller = await app_controller_cls.new(
  File "/usr/local/lib/python3.8/site-packages/zigpy/application.py", line 69, in new
    await app.startup(auto_form)
  File "/usr/local/lib/python3.8/site-packages/bellows/zigbee/application.py", line 108, in startup
    self._ezsp = await bellows.ezsp.EZSP.initialize(self.config)
  File "/usr/local/lib/python3.8/site-packages/bellows/ezsp/__init__.py", line 78, in initialize
    await ezsp.connect()
  File "/usr/local/lib/python3.8/site-packages/bellows/ezsp/__init__.py", line 88, in connect
    self._gw = await bellows.uart.connect(self._config, self)
  File "/usr/local/lib/python3.8/site-packages/bellows/uart.py", line 352, in connect
    protocol, connection_done = await thread.run_coroutine_threadsafe(
  File "/usr/local/lib/python3.8/site-packages/bellows/uart.py", line 330, in _connect
    transport, protocol = await serial_asyncio.create_serial_connection(
  File "/usr/local/lib/python3.8/site-packages/serial_asyncio/__init__.py", line 445, in create_serial_connection
    serial_instance = serial.serial_for_url(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/serial/__init__.py", line 90, in serial_for_url
    instance.open()
  File "/usr/local/lib/python3.8/site-packages/serial/urlhandler/protocol_socket.py", line 66, in open
    raise SerialException("Could not open port {}: {}".format(self.portstr, msg))
serial.serialutil.SerialException: Could not open port socket://192.168.1.149:8888: [Errno 113] Host is unreachable

Often the coordinator just gets stuck and even hardreboot does not save the situation - I have to delete the configuration and reinstall from scratch ZHA to bring it back to life ... for a litle while...

That what the state of the events is at present moment...

@MattWestb
Copy link

MattWestb commented Mar 21, 2021

Use
```

To start the code section and after ending with one 

```
And then writing the normal text so is it eater to reading and do one block finished before doing the next so not getting the code blocking going off sync :-))

GREAT !!!!! :-))

The formatting is one PITA in Github !!!

@gribouk
Copy link
Author

gribouk commented Mar 21, 2021

That's as I did, but I had "2. ''' " and the parcer decided my text is the code...

@MattWestb
Copy link

Its looks like you have doing one "hot swap" with on other coordinator and the system cant getting all working because is one mess in the network addresses.

The last is very likely that the coordinator have not time talking with ZHA then its having too much to frighting in its network.

But to coming back to point one i think its the easier killing the network and start one new one so the coordinator can adding device OK and not having all devices with wrong NWK of the device in the network.

One Zigbee 3 network is not possible doing one "hot swap" then network, trust center link and APS keys and counter must being in the same state or is the network not working at all or very bad than all things have the wrong encryption in the links between the devices.

As you have writing "test purses" its the best having one clean start and adding devices by devices so the mesh can building it out nicely.

I have running the original the 6.7.8.0 and the 6.5.1.0 and all looks being stable but i have not merger my production to tuya ZBGW then i running on PI with GIPO UART for the moment and in the test only around 10 devices but its the same firmware version i have in the production network.

@MattWestb
Copy link

I have doing firmware upgrading in my production network from 6.7.7.0 to 6.7.8.0 and it was working OK.

Some thya ZBGW user have reporting problem after upgrading upgrading so i can being that going from 6.5.0 to 6.7.8.0 is corrupting the NVRAM in the coordinator.

Silabs is recommending doing one flash erase before flashing one NCP images for getting all old thing deletes (but normal user cant flashing with SWD).

It shall being problem doing backup and restore on EZSP but i have not trying it and must burning one new IEEEE if changing chip (only possible one time without SWD flasher).

@MattWestb
Copy link

MattWestb commented Mar 21, 2021

I have finding the course for your problem.
NVM3 is implanted in EZSP 6.5.2 and is needing one tool for converting from NVM 1 and 2 that is very likely on in the 6.7.8.0 NCP firmware.

So you need destroying you network and doing one new for getting it working.

1.3 Secure EZSP Protocol Security Key Storage on the NCP

@gribouk
Copy link
Author

gribouk commented Mar 21, 2021

"
As you have writing "test purses" its the best having one clean start and adding devices by devices so the mesh can building it out nicely."

  • that is the whole point - I have 18 bulbs on a 8m ceileng which implies me to build a turret to reach those to add them on-by-one... And if the network fails doing it all over agin is not what I would like to do again and again... The only thing I can do is to controll them throught the switch - turn on-off 5 times and they appear in the integration simuntaniously.

@gribouk
Copy link
Author

gribouk commented Mar 21, 2021

"So you need destroying you network and doing one new for getting it working."

What is the proper way to "destroy" the network? Is there something I should know?

@MattWestb
Copy link

I think ZHA have one limited to do pairing at one time but i think its possible doing more in "one row"..
If i was you i should trying putting the circuit barker 6 times and trying if its possible getting all retested at one time and getting them paired in one row.

Or if you is having one possible touch link resetting then but its little long way (I have doing some GU10 spots im my kitchen by mistake with deCONZ and RaspBee I 6 meters away from the coordinator).

@MattWestb
Copy link

I think bellows in one command line in your docker and leaving the network and then forming one new shall doing that (i have not trying that).

@MattWestb
Copy link

One new CLI from bellows in my test system:

bash-5.0#  bellows --help
Usage: bellows [OPTIONS] COMMAND [ARGS]...

Options:
  -v, --verbosity LVL     Either CRITICAL, ERROR, WARNING, INFO or DEBUG
  -d, --device TEXT       [required]
  -b, --baudrate INTEGER
  --help                  Show this message and exit.

Commands:
  backup           Backup NCP config to stdio.
  bootloader       Start bootloader
  config           Get/set configuration on the NCP
  devices          Show device database
  dump             Capture frames on CHANNEL and write to FILE in tcpdump...
  form             Form a new ZigBee network
  info             Get NCP information
  join             Join an existing ZigBee network as an end device
  leave            Leave the ZigBee network
  permit           Allow devices to join this ZigBee network
  permit-with-key  Allow devices to join this ZigBee network using an...
  restore          Backup NCP config to stdio.
  scan             Scan for networks or radio interference
  zcl              Perform ZCL operations against a device
  zdo              Perform ZDO operations against a device
bash-5.0# 

The leave is no problem but i think you is needing the database for forming on network.

@gribouk
Copy link
Author

gribouk commented Mar 21, 2021

"
The leave is no problem but i think you is needing the database for forming on network.
"
I think I know what you mean - I need to destroy zigbee.db and zha.storage in .storage folder of HA config dir?

If so, than I am doing exectly that every time I rebuild the network. And yet I am having the problems as above...

Since you know the zigbee protocol at low level, can you tell me your opinion on the following solution I am trying to impliment:

As long as the amount of devices on the network is relatively small, lets say ~40-50, the network remains stable. So, I can do several zigbee networks - as of today I ran 3 of those ~ 60 devices each and consider further splitting. So my question is about electromagnetic inteference - how well do different zigbee networks live in a configned environment? Because after my 200 devs mesh collapsed last night, today I've split it into fractions. All 3 coordinators are in 2 neighboring rooms ~ 3-5 m away from each other. I have 2 meshes which work with the devices from same room even... So my concern is if an electromagnetic interference would allow me to build several meshes instead of one unstable one?

@MattWestb
Copy link

MattWestb commented Mar 21, 2021

First the zigbee.db is one database for ZHA saving information of the devices.
The Coordinator is having all link keys and counters for all devices saved on the chip in one table that is the problem in your case its not being OK (very likely corrupted then going from 6.5.0.0 to 6.7.8.0) and the coordinator cant controlling the network if its cant reading, writing and updating its key table. The ZHA Zigbee.db is easy its only deleting and ZHA is making one new (very) clean one.

I have 5 network running for the moment but 3 is not critical and my main production is away from my WiFi network but the other is sharing the channels.

Zigbee is pity smooths and is listen before sending but if you have one strong WiFi on the same frequency its being killed for 100%. And Zigbee network is made living together with other zigbee network only the "space in the air" can making it being bandwidth problems and delays.

If having 2 large but mot mega (as you is having) on the same channel and if both is not heavy using there bandwidths (not updating the firmware of many devices) i shall not being no large problems but i think it can being latency then doing firmware updates.

If you is running multiple network be sure PAN-ID, Extended PAN-ID and network keys is not the same in the networks or you is getting collisions and very bad network bark downs.

I think then you is going up to 200 and more devices with light you is getting problems then its much bradcasts that is "running around" and can more or less blocking the mesh.

If you is making 4 networks with 50 devices then one broadcast (one light switch is sending on on command) is replayed around to all 50 devices (not the sleeping end devices) and if you is having 200 devise in the network its being replayed 200 times for retching all around the network.
And if you is getting one device blocking the network you is only having problem in 1/4 of your lights and its lesser likely its happens.

If doing 4 or more networks keep them away from strong WiFi channels and if its possible not all Zigbee network on the same but its better then fighting with WiFi networks so perhaps 2 Zigbee nets on 20 and 25 and WiFi on lower channels.

@gribouk
Copy link
Author

gribouk commented Mar 21, 2021

"
If you is running multiple network be sure PAN-ID, Extended PAN-ID
"
Would you provide some details what exactly I have to do in ZHA integration setup (mayne you have an example)?
At the moment I pick the channel through the config:

zha:
  zigpy_config:
    network:
      channel: 25             # What channel the radio should try to use.
      channels: [24, 25, 26]  # Channel mask    

Does this thing count?

Brief summary:

  1. Multiple zigbee networks even sharing same channel are o.k. as long as they are not big and do not polute the air.
  2. WiFi set to channel 1 or shift to 5G
  3. Bluetooth?

@MattWestb
Copy link

I have funding the missing config parameters for ZHA but i have not testing if they is working or not:

zha:
  zigpy_config:
    network:
      key: [16,15,14,15,14,13,12,10,16,15,14,15,14,13,12,10]   ## 16 bytes of network key
      channel: 25
      channels: [11, 15, 20, 25]
      pan_id: 0x12AB
      extended_pan_id: "DF:DE:DD:DC:DF:DE:DD:DC"

The "channels" is the primary channels that is used for paring and network searching and shall being as i have doing or you can getting problem then devices is joining and rejoining your network.

The channel is the channel your network is formed on.

The config is only used then you is forming on new network so if you is changing them its not doing anything until doing one leave and forming on new network then all "working parameters" is saved in the coordinators NVM not in ZHA local files.

The best way see your "active" config is bellows -d socket://192.168.x.x:8888 info in HA container CLI.

I have not trying changing network parameters on one formed network only doing one channel changing without repairing all devices.

Try dong one leaving in the CLI and then starting HA with your new parameters in config and i think ZHA is setting up the new network with your parameters from your ZHA config (I have not trying that) and then verifying the new parameters is being used.
You can tricking ZHA not to locking your (network)comport (= you cant running bellows in CLI) but i think its easiest deleting the ZHA integration in HA and you can running bellows in the container CLI and then installing ZHA.

Bluetooth is using the same frequencies but is doing frequency jumping and is not staying one the same frequency and is low power so not so dangers and in the ground its using the same underlying radio format / protocol (IEEE 802.15.4) so is more or less like on Zigbee network but gave BT on top.

If you can move all WiFi to 5Ghz but i think not realistic if not trowing 90% if all WiFi devices away :-((

One not finished Wiki page with WiFI / Ziigbee channels: https://github.com/zigpy/zigpy/wiki/Zigbee---Changing-channel

Sorry its little late so my head is not so good writing :-(

@gribouk
Copy link
Author

gribouk commented Mar 21, 2021

Surprise - surprise...
There is something I am failing to understand. Maybe you can help me with that.
I run 3 zigbee networks under HA instances - 2 from the docker and 1 from hassos.
So, 192.168.1.149/148 - are coordinators for docker run HA, and 147 - hassos.
That setup for channel pick 25 was for coordinator 192.168.1.147 - and CLI command says it is on channel 11, but 192.168.1.149 has no zha setup in the config at all and should run a default channel (15), but it runs 25!
How is that possible?
I set channel choise to 25 long ago, before I formed my last network... Why does it say it is on 11-th?

bash-5.0# bellows -d socket://192.168.1.149:8888 info
[58:8e:81:ff:fe:c4:e1:0d]
[0x0000]
[<EmberNetworkStatus.JOINED_NETWORK: 2>]
[<EmberStatus.SUCCESS: 0>, <EmberNodeType.COORDINATOR: 1>, EmberNetworkParameters(extendedPanId=95:fe:4e:27:16:3a:ac:56, panId=0x7a6b, radioTxPower=13, radioChannel=25, joinMethod=<EmberJoinMethod.USE_MAC_ASSOCIATION: 0>, nwkManagerId=0x0000, nwkUpdateId=0, channels=<Channels.ALL_CHANNELS: 134215680>)]
[<EmberStatus.SUCCESS: 0>, EmberCurrentSecurityState(bitmask=<EmberCurrentSecurityBitmask.TRUST_CENTER_USES_HASHED_LINK_KEY|64|32|HAVE_TRUST_CENTER_LINK_KEY|GLOBAL_LINK_KEY: 244>, trustCenterLongAddress=58:8e:81:ff:fe:c4:e1:0d)]
Manufacturer: 
Board name: 
EmberZNet version: 6.7.8.0 build 373
bash-5.0# bellows -d socket://192.168.1.148:8888 info
[60:a4:23:ff:fe:09:0f:c1]
[0x0000]
[<EmberNetworkStatus.JOINED_NETWORK: 2>]
[<EmberStatus.SUCCESS: 0>, <EmberNodeType.COORDINATOR: 1>, EmberNetworkParameters(extendedPanId=cc:cc:cc:cc:e2:ab:f4:98, panId=0x3498, radioTxPower=20, radioChannel=11, joinMethod=<EmberJoinMethod.USE_MAC_ASSOCIATION: 0>, nwkManagerId=0x0000, nwkUpdateId=0, channels=<Channels.ALL_CHANNELS: 134215680>)]
[<EmberStatus.SUCCESS: 0>, EmberCurrentSecurityState(bitmask=<EmberCurrentSecurityBitmask.64|32|HAVE_TRUST_CENTER_LINK_KEY|8|GLOBAL_LINK_KEY: 124>, trustCenterLongAddress=60:a4:23:ff:fe:09:0f:c1)]
Manufacturer: 
Board name: 
EmberZNet version: 6.7.8.0 build 373
bash-5.0# bellows -d socket://192.168.1.147:8888 info
[60:a4:23:ff:fe:45:c1:44]
[0x0000]
[<EmberNetworkStatus.JOINED_NETWORK: 2>]
[<EmberStatus.SUCCESS: 0>, <EmberNodeType.COORDINATOR: 1>, EmberNetworkParameters(extendedPanId=cc:cc:cc:cc:d7:12:cf:a4, panId=0x0fa4, radioTxPower=20, radioChannel=11, joinMethod=<EmberJoinMethod.USE_MAC_ASSOCIATION: 0>, nwkManagerId=0x0000, nwkUpdateId=0, channels=<Channels.ALL_CHANNELS: 134215680>)]
[<EmberStatus.SUCCESS: 0>, EmberCurrentSecurityState(bitmask=<EmberCurrentSecurityBitmask.64|32|HAVE_TRUST_CENTER_LINK_KEY|8|GLOBAL_LINK_KEY: 124>, trustCenterLongAddress=60:a4:23:ff:fe:45:c1:44)]
Manufacturer: 
Board name: 
EmberZNet version: 6.7.8.0 build 373

@MattWestb
Copy link

Very interesting !!

The 149 is formed if ZHA /bellows then it have TRUST_CENTER_USES_HASHED_LINK_KEY That is one must for having many Zigbee 3 devices (without you cant storing so many link keys to devices) and was very likely the reason for your first problem.
tuya is storing one key and the frame counter for each device but the 6.7.8.0 do have different settings and ZHA dont knowing the maximum keys in the firmware so its must using the trick "hashing" them so you can use as many you like without the support in the firmware.

So the 147 and 148 you need leaving and reforming the network for getting the right setting of the working mode (verified with bellows -info as you have done).

Trying do one "bellows - leave" with "bellow CLI" and then starting HA/ZHA with the right config and i think its forming one new network with the ZHA config in HA.

You can see in the HA long wot the network parameters is so no need stopping and using CLI ("./config/"home-assistant.log in your HA config folder.

Then you have the "working mode" its possible changing the channel from HA with services. I must looking how i was doing it before.

@MattWestb
Copy link

MattWestb commented Mar 22, 2021

I have finding it how i did changing the channel on the fly and did not need repairing all devices (not all device dont liking it but most "normal" is doing it).
Its standard Zigbee network commands and shall working but Xiaomi and other dont being very standard and not working.

zigpy/zigpy-znp#31 (comment)

You need installing the utility https://github.com/Adminiuga/zha_custom but its only copy and putting in the config and restarting HA.

Edit: One issue with changing channel with EZSP zigpy/bellows#191

@MattWestb
Copy link

It shall being possible forming the network with config parameters from HA config:
zigpy/bellows#266 (comment)

And i have finding the parameters that can being stetted: zigpy/zigpy#154 (comment)

@MattWestb
Copy link

By the way @banksy-git Is it OK "spamming" your git with this problem or shall we moving to one other place ??

@MattWestb
Copy link

I have putting one request for implanting leave command in in ZHA_custom Adminiuga/zha_custom#4 for making it easier forcing reforming of the network.

@gribouk
Copy link
Author

gribouk commented Mar 22, 2021

"
You can see in the HA long wot the network parameters is so no need stopping and using CLI ("./config/"home-assistant.log in your HA config folder.
"

How would I do that? There is a bunch of unmarked text. I tried to search for "bellows/zigpy" but there was no channel mentioned nearby...

@MattWestb
Copy link

From my test system so you can searching for key words:

2021-03-21 16:45:22 INFO (MainThread) [bellows.zigbee.application] EZSP Radio manufacturer: 
2021-03-21 16:45:22 INFO (MainThread) [bellows.zigbee.application] EZSP Radio board name: 
2021-03-21 16:45:22 INFO (MainThread) [bellows.zigbee.application] EmberZNet version: 6.7.8.0 build 373
2021-03-21 16:45:22 DEBUG (MainThread) [bellows.ezsp.protocol] Send command networkInit: ()
2021-03-21 16:45:22 DEBUG (MainThread) [bellows.ezsp.protocol] Application frame 23 (networkInit) received: b'00'
2021-03-21 16:45:22 DEBUG (MainThread) [bellows.ezsp.protocol] Application frame 25 (stackStatusHandler) received: b'90'
2021-03-21 16:45:22 DEBUG (MainThread) [bellows.ezsp.protocol] Send command getNetworkParameters: ()
2021-03-21 16:45:22 DEBUG (MainThread) [bellows.ezsp.protocol] Application frame 40 (getNetworkParameters) received: b'000105b565551525f5e537d30d0f0000000000f8ff07'
2021-03-21 16:45:22 INFO (MainThread) [bellows.zigbee.application] Node type: EmberNodeType.COORDINATOR, Network parameters: EmberNetworkParameters(extendedPanId=e5:f5:25:15:55:65:b5:05, panId=0xd337, radioTxPower=13, radioChannel=15, joinMethod=<EmberJoinMethod.USE_MAC_ASSOCIATION: 0>, nwkManagerId=0x0000, nwkUpdateId=0, channels=<Channels.ALL_CHANNELS: 134215680>)

Very likely you need putting up the debug logging for see all things: https://github.com/zigpy/zigpy/wiki/Zigpy-Advanced-Configuration

@MattWestb
Copy link

I have getting one very interesting thinking of very large Zigbee network that can crashing them. zigpy/zigpy#635 (reply in thread)
I think its better not using all your networks on the same channel.
If you is having 4 rooms in one row and one network in etch then trying not using the same channels in 2 rooms that is direct neighbors if its possible.

Example: room 1 2 3 4
Channel: 1 2 3 1 if only having 3 channels.
The channel 1 is hopefully not in range in the 2 networks.
Also its better using secondary channels (primary is 11,15,20 and 25 all other is secondary ones) if you devices is working well with that (some strange devices dot liking using all channels = no standard) than using the same channel for all networks.

Example you have only the range 20 - 25 then WiFi is blocking the lower ones.
Then using 20, 21, 22, 23, 24 and 25 is better then putting 100+ devices on 20 and 100+ on 25 and having 50 devices on 5 different channels.

I hope you have getting your network(s) up and running also the high positioned ones !!

@gribouk
Copy link
Author

gribouk commented Mar 24, 2021

Thank you, my dear friend!

Your help is very valuable.

Since 3 out of 4 of my routers are WiFi interfaced (sonof zbridge), and I am pretty determined to exterminate the 2.4Ghz WiFi from my network, I have ordered some new equipment to replace those. Once it arrives I'll start rebuilding the whole thing from scratch. Now with WiFi present I see that it is usless to expect the device (I mean Sonoff zigbee bridge) having WiFi and Zigbe emitters spaced within 10mm from each other to function properly... And I have only one coordinator with ethernet interface (TYZB-01) at the moment, which, by the way has recorded no errors in 3 days of functioning (it has 55 devices on it's network).

@MattWestb
Copy link

Great to hearing that you not have throwing 200 Zigbee devices away :-))

The Sonoff have one much better EFR32MG21 but the WiFi is making it not very stable.....
And the quality looks being not so high then may users have hardware problems with there ZBBs (also tasmota devs have hardware bricked some).

Tuya ZBGW is having one slower but robuster EFR32MG1 but my short experience of it is very positive (if not corrupting the config so cant login to it like i have done on one).

I think 2 or 3 (or 4) tuya ZBGW with EZSP 6.7.8.0 is being one god solution if getting the configuration working OK and not getting bad interference from the surrounding.
Perhaps its possible running all of your devices on one but i think its better doing it with more coordinators for paying safe.

Have you looking on making one backup of one coordinator and restoring it on on new one ?
It shall working but its needs doing one onetime (if not having one SWD-flasher) burning new EUI64(MAC address) of the chip https://github.com/zigpy/zigpy/wiki/Coordinator-Backup-and-Migration but i think its not tested with WiFi / Ethernet coordinators but if you is having devices that is very tricky getting retested it can being worth one try !!

I can pinging the bellows dev if you need help with it and i think hi is more in your time zone ;-)

@diizzyy
Copy link

diizzyy commented May 19, 2021

Since we're somewhat on the topic, what's the procedure to upgrade the firmware?
Just run https://github.com/banksy-git/lidl-gateway-freedom/blob/master/scripts/upgrade_ncp.py as is or do you need to using bellows first in some way?

@MattWestb
Copy link

What i can see you must using bellows for putting the NCP in bootloadre mode.
But i have not testing it ;-))
Its 2 more methods for updating the firmware on the NCP #5 and its links to the commands that can being sent in CLI if you dont like the scrips.

@diizzyy
Copy link

diizzyy commented May 19, 2021

That's what's confusing me as it will only print if you ignore warnings?
@banksy-git Can you please clarify how it's supposed to work?

@challs
Copy link

challs commented May 22, 2021

I was able to use the upgrade script to sucessfully upgrade my Aldi gateway from 6.5.0.0 to 6.7.8.0.

According to the recommended firmware versions page of the zigpy project, recommended version is 6.7.8.0, which can be downloaded from this page

The first time I tried, the upgrade script hung:

lidl-gateway-freedom/scripts/upgrade_ncp.py --port 7777 192.168.0.100
[...]
Attempting upgrade...
Entering upload mode...
Traceback (most recent call last):
  File "/home/chris/dev/smart-home/lidl-gateway-freedom/scripts/upgrade_ncp.py", line 91, in <module>
    upgrade_ncp(args.ip, args.port, args.firmware)
  File "/home/chris/dev/smart-home/lidl-gateway-freedom/scripts/upgrade_ncp.py", line 26, in upgrade_ncp
    while s.recv(1) != b"\x00":
socket.timeout: timed out

I had to reboot the gateway to get it back into a mode that would respond to HA again.

I could see it was waiting for the xmodem transfer to start. So I tried restarting the serial gateway in software flow control mode:

# /tuya/serialgateway -p 7777 -f
serialgateway Release-1.2: port 7777, serial=/dev/ttyS1, baud=115200, flow=sw

This time it worked and the upgrade procedure was sucessful:

$ pip3 install bellows
$ bellows -d socket://192.168.0.100:7777 bootloader
Manufacturer: 
Board name: 
Current EmberZNet version: 6.5.0.0 build 188
bootloader version: 0x0108, nodePlat: 0x04, nodeMicro: 0x18, nodePhy: 0x0f
bootloader launched successfully

$ python3 upgrade_ncp.py --port 7777 192.168.0.100 NCP_UHW_MG1B232_678_PA0-PA1-PB11_PA5-PA4.gbl

WARNING: THIS TOOL COMES WITH NO WARRANTY. USE AT YOUR OWN RISK.

This will replace the firmware on your ZigBee Network Co-Processor

 * Ensure your firmware file comes from a trusted and tested source.

 * Failed upgrades may require special hardware tools to recover.

 * After this operation you should not use this gateway on the original cloud
   service.

To begin, first place device in bootloader mode using Bellows, e.g.:

    bellows -d socket://192.168.0.100:7777 bootloader

And then enter the word: UPGRADE below
>UPGRADE
Attempting upgrade...
Entering upload mode...
Waiting for XMODEM...
Uploading firmware...
Upload complete. Starting new firmware...
Done

In Homeassistant:

[bellows.zigbee.application] EmberZNet version: 6.7.8.0 build 373

@MattWestb
Copy link

I have making one "copy and past" updating of the upgrading script that is rebooting the TBGW and then uploading the GLB file (in the same way i have doing on ESPHome ZBB).
Its very ugly then im not one programmer and cant do advanced programming but its working and its possible changing the protocol from V8 all down to V4 if needed but editing the script.
#6 (comment)

(NCP) PS C:\msys32\Portainer\NCP> python.exe .\ncp_blm.py --port 6638 --no-confirm  192.168.2.121 .\NCP_690.gbl
Attempting rebooting in to the bootloader...
NCP reseted
EZSP Configuration Version X sent
DATA ACK response sent
Reboot in to bootloader command sent
Entering upload mode...
Waiting for XMODEM...
Uploading firmware...
Upload complete. Starting new firmware...
Done

Reboot in to bootloader and updating the NCP to EZSP 6.9.2.0 on one "IKEA Billy EZSP" thru one WeMos D1 Mini running ESPHome with serial server on alternate com pins.

@diizzyy
Copy link

diizzyy commented May 22, 2021

@challs
Thanks for posting the logs, helped out a lot and worked for me to! :-)

@wangeris
Copy link

@challs Thanks for your comment, really helped me out a lot.
Had to run everything on port 8888 instead of 7777 tho
and before running
/tuya/serialgateway -p 7777 -f
had to run command
killall serialgateway
It finally worked after 2 hours of head-bashing.

@sekt1953
Copy link

sekt1953 commented Oct 25, 2022 via email

@MathiasEveraerts
Copy link

m@server:/updatezigbee$ bellows -d socket://192.168.2.12:7777 bootloader
Manufacturer: None
Board name: None
Current EmberZNet version: 6.5.0.0 build 188
bootloader version: 0x0108, nodePlat: 0x04, nodeMicro: 0x18, nodePhy: 0x0f
bootloader launched successfully
m@server:
/updatezigbee$ python3 upgrade_ncp.py 192.168.2.12 --p 7777 NCP_UHW_MG1B232_678_PA0-PA1-PB11_PA5-PA4.gbl

WARNING: THIS TOOL COMES WITH NO WARRANTY. USE AT YOUR OWN RISK.

This will replace the firmware on your ZigBee Network Co-Processor

  • Ensure your firmware file comes from a trusted and tested source.

  • Failed upgrades may require special hardware tools to recover.

  • After this operation you should not use this gateway on the original cloud
    service.

To begin, first place device in bootloader mode using Bellows, e.g.:

bellows -d socket://192.168.2.12:7777 bootloader

And then enter the word: UPGRADE below

UPGRADE
Attempting upgrade...
Entering upload mode...
Waiting for XMODEM...
Uploading firmware...
send error: expected ACK; got b'\x18' for block 1
send error: expected ACK; got b'\x18' for block 1
send error: expected ACK; got b'\x18' for block 1
send error: expected ACK; got b'\r' for block 1
send error: expected ACK; got b'\n' for block 1
send error: expected ACK; got b'S' for block 1
send error: expected ACK; got b'e' for block 1
send error: expected ACK; got b'r' for block 1
send error: expected ACK; got b'i' for block 1
send error: expected ACK; got b'a' for block 1
send error: expected ACK; got b'l' for block 1
send error: expected ACK; got b' ' for block 1
send error: expected ACK; got b'u' for block 1
send error: expected ACK; got b'p' for block 1
send error: expected ACK; got b'l' for block 1
send error: expected ACK; got b'o' for block 1
send error: expected ACK; got b'a' for block 1
send error: NAK received 17 times, aborting.
Upload complete. Starting new firmware...
Done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants