-
-
Notifications
You must be signed in to change notification settings - Fork 918
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nodes not detected correctly in OZW1.6 #2036
Comments
Please upgrade (or ask the Zwave2MQTT guys to update to latest version, which is 1.6-992-g76e21d80. Then please do not post snippets from the OZW_Log.txt or any other log, but please zip the complete OZW_Log.txt and drag & drop it here, because the issue might be somewhere else in the log. I see you've already started discussing that on the the other project, that's good, I post this to put a reference: |
Forgot to mention, before creating and sending the OZW_Log.txt log, please delete the ozwcache* cache file, to force OZW to interview all devices (from scratch). But that means you'll have to wake up (manually, or wait until the wake up interval of the device passes) all battery nodes, they will remain unknown until you do that. That does not apply to the Eurotronic Spirit, because although it has batteries, it behaves (almost) like a mains powered device, it is a FLiRS (Frequently Listening Routing Slave). I did a quick search on "Popp Solar Powered Outdoor Siren" and that is probably a FLiRS too, the OZW_Log.txt will tell... You might guess that 2 x issue with a FLiRS device would mean a problem with OpenZWave, but I recently tested a Z-Uno FLiRS and did not spot any problems. |
Hi @petergebruers. Im not sure if that's a problem with FLiRS, as some of my mains powered devices show same behaviour, for example Fibaro Dimmers. I pasted Eurotronic TRVs as an example because those are odd - half of them show up and the other half does not. |
Probably not
That should not happen, but there was a change in the interview so that's why I'd like to see what happens with 1.6-992-g76e21d80
Thanks. Please be patient, I am slow at this and I am a volunteer so it depends on available time...
Please only do that for testing purposes, the whole idea of the "cache file" is that OZW progresses through the interview per node, and as soon as one node finishes (certain stages), that data gets written to that file. Next time OZW starts, it will skip a lot of request/answer commands. This reduces startup time.
If it is, it will say so in the log, it is a very important property of the device. It will also show in the ozwcache file , see properties listening and frequentListening, eg |
Okay, your log has plenty of these and that is not a good sign
I count hundreds of them That usually indicates a hardware issue, a driver issue or something using the same serial port You say HASS (so 1.4) is OK and ZWave2mqtt is not so I am guessing that is not on the same system? |
@petergebruers it is same system, im just swapping Docker containers, stopping one, deleting zwcfg, restarting the system (seems that Zwave2MQTT does not free up all the resources when stopping the container) and starting a new container. |
@Fishwaldo I need a second opion on this. The 1.4 log is also full of CAN... Not sure what to try next. |
I’m traveling right now so can’t see the logs but CAN = collisions. Could be a faulty node. I’d try powering off everything and progressively power on devices till the CAN’s start again. Either that or you have a node spamming the network with updates (power Meter?) |
Btw, OZW 1.4 uses zwcfg files and OZW 1.6 uses ozwcache files. |
Thanks @Fishwaldo Haven't had time myself to look into this in great detail but have gone through the first few nodes. That is the log starting with Node 6 seems unreachable. Node 11 and 12 are sleeping. Node 13 is mains and alive. Node 14 and 15 are sleeping Log at mains Node 16 gets interesting. Node 56 is "chatty" and sends data while OZW is doing NoOp(s)... For example:
There is something odd about this sequence of events imho...
Then the next node gets interrupted by "node 56" but this time the NOP test succeeds...
This time no confusion about Callback ID 11 Next device is Node 21:
So again Node 056 interferes, but the callback of is correct and has status flag "ZW_SEND_DATA failed" which seems correct... But that is not good news because it is a mains powered node so should not fail... I stopped examining nodes in detail after Node 21 and scrolled to Node 56
That is not looking good either. But that might be caused by something else happing when we see the firs CAN, that is a bit before that log snippet:
@Dinth Can you power down node 56 (hex 0x38), remove ozwcache (so please try 1.6 because we do not support 1.4) and restart and see if the interview gets any better? I think @Fishwaldo may be on the right track when he said: "... or you have a node spamming the network with updates". What about Node 6 and Node 21, they don't respond to a ping test, are powered and reachable? |
Again, thanks both for looking into this i really do appreciate! |
Right... Might be a "red herring" then - I expected those to be mains devices. Devices known to me to send quite a few updates, sometimes 5 per minute, are power reporting devices like a Fibaro Wall Plug with default settings and certain fluctuating nodes (that's just an example). BTW iirc you had quite a few nodes in "secure" node, may I suggest if you re-include them to only do that if the devices can open/close doors, gates or blinds? The S0 protocol has a lot of overhead (instead of doing TX it can be 3-4 and sometimes more) and slows down your network.
Okay, the CAN issue starts at node 50 so that leaves 27 -> 49 to investigate. I'll try to do that tomorrow evening. |
@petergebruers i took out batteries from nodes 52, 56, 57 as advised. This time Z2M only discovered "Product" of my first node (Controller). |
Please disregard the log pasted above. |
As long as you see a plenty of CAN, nothing is going to work reliably... I was beginning to doubt my self, so I started my 10-node network and saved the OZW_Log.txt... No CAN, and the few timeouts you'll find can be explained. No CAN or resend issues. That is OpenZwave Version 1.6-992-g76e21d80 on a Raspberry Pi 2 (two !) using a ZME USB1 dongle. I think it is time for you to try something radically different. For example; install Domoticz Beta (which is based on a recent OZW 1.6 version) on a different system eg a Windows PC |
Thanks for all help @petergebruers. Hopefully i will have some days off work over the christmas so i will try to fix the can problems, i will try different machine, software and if that wont help i will try excluding all the nodes i can access. |
I've had a similar experience. I previously used Home Assistant and it was able to identify node details (e.g., manufacturer), but using OZW 1.6 built from UPDATE: just built the |
@petergebruers i have tried windows Domoticz beta, but i cannot find OZW logs literally anywhere. To the point, that i have asked on Domoticz forums and nobody could help me. Im also trying to troubleshoot those CAN errors, but i couldnt really find an explanation online why those happen. |
CAN = collisions at the RF level. It means two nodes trying to send at the same time. (In this case it’s the stick and one other node). Yes - removing nodes to try to isolate CAN issues can help. (Start with any non Zwave plus devices) |
On the hardware tab, go to the OpenZWave hardware, click setup. Then you get a screen listing al your devices (nodes). Click on the controller and you'll find an option to enable debug logging. You'll find OZW_Log.txt and ozwcache* in the "Config" folder of your Domoticz installation
On paper it can cause issues, in real life not so much. No, I own 45 mains powered nodes and most of them are mounted in pairs, very close to each other (7 - 10 cm between them), and I have checked performance with Zniffer. It is not an issue. |
That is because up to now that part of the protocol was not published as a "public" specification. Silabs has said very recently, the will transfer all info to "The Alliance" and make Z-Wave a truly open spec. If you want to details about SOF/ACK/NAK/CAN you can still obtain the relevant document if you register online. At the moment, I cannot post a copy of that document here yet... INS12350-17 - Serial API Host Appl. Prg. Guide.pdf.pdf I have to warn you, most people think SOF/ACK/NAK/CAN are the response of a Z-Wave device but that is only indirectly the case! The confusion arrises because Those four packets regulate the flow between the controller and OpenZWave, not between a device and and OpenZWave! But many online resources say Z-Wave has a "ACK" packet on the radio level which is true, but is unrelated to the SOF/ACK/NAK/CAN on the serial level. The "ACK" on the device level is now as ZW_SEND_DATA "status" If you send data to a device... What users commonly understand as "ack" or "no ack" is |
Sorry, pressed "enter" too soon... If you send data to a device... What users commonly understand as "ack" or "no ack" is everything related with ZW_SEND_DATA For example, if you see this in OZW log:
The "delivered" means the dongle has accepted the packet for transmission (and will transmit it soon) The "ZW_SEND_DATA failed. No ACK received" means the dongle has reported and error Technically speaking, although we use the word "ACK" for that it is the result of ZW_SEND_DATA Callback if (_data[3] != TRANSMIT_COMPLETE_OK) And the most common failure is TRANSMIT_COMPLETE_NO_ACK This is the complete list: /* Transmit complete codes */
#define TRANSMIT_COMPLETE_OK 0x00
#define TRANSMIT_COMPLETE_NO_ACK 0x01 /* retransmission error */
#define TRANSMIT_COMPLETE_FAIL 0x02 /* transmit error */
#define TRANSMIT_ROUTING_NOT_IDLE 0x03 /* transmit error */
/* Assign route transmit complete but no routes was found */
#define TRANSMIT_COMPLETE_NOROUTE 0x04 /* no route found in assignroute */ It comes as no surprise the names are confusing. Anyway... If you see NAK/CAN as @Fishwaldo points out one of the possible causes is that OpenZWave and the controller (dongle) were sending data at the same time, and because the controller sends data to OpenZWave when it receives data from a device, that is a possible cause. But it is also possible for some reason the "serial protocol" lost sync. So far we have been able to identify thesed other causes of (lots of CAN/NAK):
On windows, you get an exclusive lock on the COM port, that's one of the reasons why I've mentioned running Domoticz on windows. It is not impossible to have a CAN or NAK on a healthy network, but the number should be very low. |
On a more practical level... If you don't own and use a Zniffer, you might never be able to fully understand what is going on your network. OZW_Log is good but it is no substitute, a single line can turn into many Z-Wave packets, nothing we can do about that. The controller hides the details. Your best bet may be: properly delete (exclude) all your nodes... Reset your controller... Start from scratch. This means: start by adding your Z-Wave Plus mains devices first and add them so you spread them around your controller, because those are likely to form a good backbone. Do a heal so they are very happy nodes. Then add your battery devices. Go slowly and be careful when changing parameters on your devices. To be honest... I don't fully understand what causes the behavior seen in your OZW_Log file. And although I have tried to replicate that, I don't get such behavior... |
Hi. I finally managed to get OZW log from Windows Domoticz. Kind regards |
I simulated a busy network and observed: 2019-12-25 08:48:05.800 Info, contrlr, Sending (Command) message (Callback ID=0x00, Expected Reply=0x15) - FUNC_ID_ZW_GET_VERSION: 0x01, 0x03, 0x00, 0x15, 0xe9 2019-12-25 08:48:05.800 Info, contrlr, Encrypted Flag is 0 2019-12-25 08:48:05.801 Detail, Unsolicited message received while waiting for ACK. 2019-12-25 08:48:05.802 Detail, Node012, Received: 0x01, 0x0f, 0x00, 0x04, 0x00, 0x0c, 0x07, 0x60, 0x0d, 0x00, 0x01, 0x25, 0x03, 0xff, 0xdd, 0x00, 0x97 2019-12-25 08:48:05.802 Detail, 2019-12-25 08:48:05.803 Detail, contrlr, CAN received...triggering resend 2019-12-25 08:48:05.803 Detail, 2019-12-25 08:48:05.803 Info, contrlr, Sending (Command) message (Attempt 2, Callback ID=0x00, Expected Reply=0x15) - FUNC_ID_ZW_GET_VERSION: 0x01, 0x03, 0x00, 0x15, 0xe9 2019-12-25 08:48:05.803 Info, contrlr, Encrypted Flag is 0 2019-12-25 08:48:05.803 Detail, Unsolicited message received while waiting for ACK. 2019-12-25 08:48:05.804 Detail, Node012, Received: 0x01, 0x0f, 0x00, 0x04, 0x00, 0x0c, 0x07, 0x60, 0x0d, 0x00, 0x01, 0x25, 0x03, 0x00, 0xdd, 0x00, 0x68 2019-12-25 08:48:05.804 Detail, 2019-12-25 08:48:05.805 Detail, contrlr, CAN received...triggering resend 2019-12-25 08:48:05.805 Detail, Before this commit, a CAN would lead to an immediate resend of the command, without handling the incoming data. After this commit, the "threadprocloop" will run once, allowing the handling of data, flushing the buffer and clearing the CAN condition. 2019-12-25 09:27:53.484 Info, contrlr, Sending (Command) message (Callback ID=0x00, Expected Reply=0x15) - FUNC_ID_ZW_GET_VERSION: 0x01, 0x03, 0x00, 0x15, 0xe9 2019-12-25 09:27:53.484 Info, contrlr, Encrypted Flag is 0 2019-12-25 09:27:53.485 Detail, Unsolicited message received while waiting for ACK. 2019-12-25 09:27:53.486 Detail, Node012, Received: 0x01, 0x0f, 0x00, 0x04, 0x00, 0x0c, 0x07, 0x60, 0x0d, 0x00, 0x01, 0x25, 0x03, 0xff, 0xd0, 0x00, 0x9a 2019-12-25 09:27:53.486 Detail, 2019-12-25 09:27:53.487 Detail, contrlr, CAN received...triggering resend 2019-12-25 09:27:53.487 Detail, Unsolicited message received while waiting for ACK. 2019-12-25 09:27:53.488 Detail, Node012, Received: 0x01, 0x0f, 0x00, 0x04, 0x00, 0x0c, 0x07, 0x60, 0x0d, 0x00, 0x01, 0x25, 0x03, 0x00, 0xda, 0x00, 0x6f 2019-12-25 09:27:53.488 Detail, 2019-12-25 09:27:54.492 Detail, 2019-12-25 09:27:54.492 Info, contrlr, Sending (Command) message (Attempt 2, Callback ID=0x00, Expected Reply=0x15) - FUNC_ID_ZW_GET_VERSION: 0x01, 0x03, 0x00, 0x15, 0xe9 2019-12-25 09:27:54.492 Info, contrlr, Encrypted Flag is 0 2019-12-25 09:27:54.492 Detail, contrlr, Notification: Notification - TimeOut 2019-12-25 09:27:54.495 Detail, contrlr, Received: 0x01, 0x10, 0x01, 0x15, 0x5a, 0x2d, 0x57, 0x61, 0x76, 0x65, 0x20, 0x34, 0x2e, 0x36, 0x31, 0x00, 0x01, 0x95 2019-12-25 09:27:54.495 Detail, So the key change is that before this commit "CAN received...triggering resend" immediately leads to outbound data "Sending (Command) message (Attempt 2". After this PR then "CAN received...triggering resend will handle the incoming data "Unsolicited message received while waiting for ACK." Should improve OpenZWave#2036 "Nodes not detected correctly in OZW1.6" by reducing the number of CAN loops.
@Dinth thank you for trying that and posting the log. Indeed, still too many CANs and I am still convinced it has got to do with your "busy" network (meaning eg you have lots of sensors or modules reporting power) but I also noticed OZW may stay too long in CAN handling... I've opened a PR for that, if you know how to how to use your own OZW build with Zwave2MQTT you can try that branch on my PR (see #2049). |
I just merged the commit. Please test and let us know. |
This issue has been automatically closed because there has been no response to our request for more information from the original author. With only the information that is currently in the issue, we don't have enough information to take action. Please reach out if you have or find the answers we need so that we can investigate further. |
Hey. Im trying to migrate to OZW1.6 (build:1.6-962-gcbe2b60f-dirty) using Zwave2MQTT 2.0.6-dev docker, unfortunately i found that it doesnt detect many of my devices correctly.
For example, while all my Eurotronic Spirit-Z thermostats are detected correctly in OZW1.4 (Zwave2MQTT 2.0.6, not 2.0.6-dev), in OZW1.6 only half of them gets detected and other half is detected as "Generic Thermostat V2"
Here is a log from running Refresh Node info on one of the thermostats which doesnt get discovered properly on OZW1.6:
Same goes with my Popp Outdoor solar powered siren. In OZW1.4 it is detected at "Popp Solar Powered Outdoor Siren", but in OZW1.6 its just detected as "Siren". Here's a log of running Refresh node info
The text was updated successfully, but these errors were encountered: