Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MQTT client timeout #140

Open
eriklindqvist opened this issue Jul 29, 2020 · 50 comments
Open

MQTT client timeout #140

eriklindqvist opened this issue Jul 29, 2020 · 50 comments

Comments

@eriklindqvist
Copy link

I am running docker images ozwdaemon:latest and eclipse-mosquitto:latest as of yesterday on a Raspberry Pi together with Home Assistant 0.113.1. I have about 90 devices and over 500 entities.

As long as I don't really do anything, it's running pretty stable. However, using OZWAdmin-0.1.74, if I push the z-wave network a bit too far, such as healing and/or refreshing too many nodes at the same time, basically clogging the network with messages, the ozwdaemon mqtt client doesn't seem to be able to keep up. All entities in Home Assistant becomes unavailable,
and I see the following in the mosquitto logs:

Client qt-openzwave-1 has exceeded timeout, disconnecting.

So I restart the ozwdaemon docker instance, and I see the folling in the Mosquitto logs:

  New connection from 172.18.0.2 on port 1883.
  Socket error on client <unknown>, disconnecting.
  New connection from 172.18.0.2 on port 1883.
  New client connected from 172.18.0.2 as qt-openzwave-1 (p2, c1, k60).

Then it works for just about under two minutes before it gets disconnected again:

  Client qt-openzwave-1 has exceeded timeout, disconnecting.

From what I understand (please, correct me if I'm wrong) that "k60"-part in the Mosquitto logs means
"keepalive = 60", i.e. the MQTT client tells the broker when connecting that it will stay in touch
with a ping message at least once every minute, and if that doesn't happen, the client will be disconnected.

I increased logging in mosquitto (by setting "log_type all" in mosquitto.conf) and also started ozwdaemon with
-e QT_LOGGING_RULES="*.debug=false;ozw.mqtt.publisher.debug=true"

and I can see in the mosquitto logs

  ...
  Received PUBLISH from qt-openzwave-1 (d0, q0, r1, m0, 'OpenZWave/1/node/1/instance/1/commandclass/32/value/562949970722835/', ... (478 bytes))
  Sending PUBLISH to auto-3E2A0E60-FB05-A814-00F1-5DE1DEFD51A0 (d0, q0, r0, m0, 'OpenZWave/1/node/1/instance/1/commandclass/32/value/562949970722835/', ... (478 bytes))
  Sending PUBLISH to qt-openzwave-1 (d0, q0, r0, m0, 'OpenZWave/1/node/1/instance/1/commandclass/32/value/562949970722835/', ... (478 bytes))
  Received PINGREQ from auto-3E2A0E60-FB05-A814-00F1-5DE1DEFD51A0
  Sending PINGRESP to auto-3E2A0E60-FB05-A814-00F1-5DE1DEFD51A0
  Client qt-openzwave-1 has exceeded timeout, disconnecting.

while ozwdaemon continues to print out hundreds of rows such as

  ...
  [ozw.mqtt.publisher] [debug]: Publishing Event valueAdded: 562952802893846
  ...

for several minutes until it realizes that the connection is gone:

  [ozw.mqtt.publisher] [debug]: Publishing Event valueRefreshed: 562950595969074
  [ozw.mqtt.publisher] [debug]: Publishing Event valueRefreshed: 72057594680475696
  [ozw.mqtt.publisher] [debug]: MQTT State Change "Disconnected" 
  [ozw.mqtt.publisher] [warning]: Exiting on Failure
  [ozw.mqtt.publisher] [warning]: MQTT Client Disconnnected
  [ozw.mqtt.publisher] [warning]: MQTT Client Error "Transport Invalid"

The only way I can get it to stay up is to remove/rename the ozwcache_0xf7b52c8f.xml file and restart, but it doesn't feel like a good solution.

Any ideas on what's going on?

@Fishwaldo
Copy link
Member

[ozw.mqtt.publisher] [debug]: Publishing Event valueAdded: 562952802893846

I'm gathering that the hundreds of these messages, the number is changing?

If so, yes, I have a idea. We are not yielding to allow the actual network processing to happen. If you can confirm the above, the fix should be simple.

@eriklindqvist
Copy link
Author

Yes, exactly. The numbers are changing for every row.
Also, it's not only valueAdded, it can be all sorts of different z-wave stuff, such as nodeGroupChanged, valueRefreshed etc.

@Olen
Copy link

Olen commented Aug 20, 2020

Same issue here. It also happens sometimes on startup of the qt-ozw-contianer.
Tried to adjust the keepalive_timeout in mosquitto.conf, but mosquitto did not like that, and would not start.

It seems like when the mqtt-disconnect happens, the ozwdaemon is stuck using 100% CPU. I thought it would exit (and restart the ccontainer?)

@sirfooey
Copy link

sirfooey commented Aug 25, 2020

I am also experiencing this issue, occurring when the ozw container is starting up. Resulting in a client timeout and disconnection.

In the previous zw1.4, ozwlog shows I have a very chatty network so assuming this is causing too many messages, and ozw is delayed in sending a keepalive to mqtt.

@sirfooey
Copy link

As with OP, the only way for the ozwd to remain connected to mqtt (not getting timeout) is to trash the ozwcache file.

@jlengq
Copy link

jlengq commented Oct 15, 2020

Any progress on this one? I have a large network as well and can't get past the initialization step without this timeout and eventually shutdown of ozwdaemon.

1602753420: Client qt-openzwave-1 has exceeded timeout, disconnecting.
@sirfooey I can't find any ozwcache file, where is it located?

@sirfooey
Copy link

@jlengq Check your OZW container volume location, it should be right there.

@jlengq
Copy link

jlengq commented Oct 16, 2020

Thanks, I found it! Does unfortunately not solve my problem ,

I have 100+ nodes and the interview process launched at startup seems to choke the MQTT network somehow, resulting in the timeout. Deleting the ozwcache only seems to start the whole process over again?

@sirfooey
Copy link

@jlengq yes, trashing the file starts the whole discovery process again (no loss of any node data); but sometimes that's what is needed to get it fully operational.
Some people have reported better stability with Build 150 (docker pull openzwave/ozwdaemon:allinone-build-150), might be worth trying that as well.

@abmantis
Copy link

I've now moved from qt-openzwave to zwave2mqtt. I noticed that during some operations while the controller is waiting for replies and they take a long time to come (and timeout, usually), z2m also shows as "disconnected", but after a while it reconnects.
This is probably what is making z2m more reliable: it reconnects to mqtt.

@kpine
Copy link
Contributor

kpine commented Oct 30, 2020

Has anyone tried increasing the MQTT client timeout in ozwd to see if that works around the problem? Maybe increasing the timeout would allow ozwd to finish whatever it's doing before disconnecting (assuming it's not in an infinite loop).

I think it should be as simple as adding a call to this->m_client->setKeepAlive(360); in the code below. That would set the timeout to 3 mins instead of the default 1 minute. Adjust as necessary or use preferably set via an environment variable.

mqttpublisher::mqttpublisher(QSettings *settings, QObject *parent) :
QObject(parent),
m_ready(false),
m_uncleanshutdown(false)
{
this->settings = settings;
this->m_client = new QMqttClient(this);
this->m_client->setHostname(settings->value("MQTTServer", "127.0.0.1").toString());
this->m_client->setPort(static_cast<quint16>(settings->value("MQTTPort", 1883).toInt()));
this->m_client->setClientId(QString("qt-openzwave-%1").arg(settings->value("Instance", 1).toInt()));

Not sure if there's any downside to increasing the time besides not reacting as quickly for real timeouts.

@sirfooey
Copy link

So far with limited testing, I can confirm with @kpine's suggestion, ozw is able to start up with a pre-existing ozwcache file, whereas in the past, I would 100% of the time get timeout disconnects until I deleted the ozwcache file.

Olen added a commit to Olen/qt-openzwave that referenced this issue Nov 10, 2020
This is a workaround for OpenZWave#140 to ensure the daemon is able to start up while we are waiting for a proper fix
@Olen
Copy link

Olen commented Nov 10, 2020

Just added a PR for that. I still think there are better fixes to be made, but at least it will make the daemon start up.

@brett19
Copy link

brett19 commented Nov 28, 2020

I too have encountered this issue. I have rebuild my entire HA setup around using Docker such that I can test if turning off logging and what not improves the situation (which it did, but it still fails to talk to MQTT sometimes).

I also took a look at how the ping/pong works and from what I can tell it's all built into qtmqtt using a timer, the only reason I can imagine that the timer wouldn't fire is if the event loop was blocked.

@m3ki
Copy link

m3ki commented Nov 29, 2020

I am having the same issue, 100+ nodes ozw daemon won't stay up.

@renlor16
Copy link

I have just added additional devices to my zwave network. I am now having the same issue. ozw daemon goes offline during startup.

@brett19
Copy link

brett19 commented Nov 30, 2020

So I spent an ungodly amount of time figuring out how to get a local debug build of ozwdaemon running on my MBP. I can't exactly reproduce the issue on my MBP (presumably because its too fast to trigger the issue), but what I do see is that the MQTT timer for pings is being invoked as expected (I also set up a 1s timer which triggers precisely on time). I also confirmed that blocking the event loop for a period of time definitely causes the MQTT timer not to be invoked. After digging into the code a bit, it looks like OZWNotification schedules it's events to be processed by the main thread using Qt::QueuedConnection, I have a suspicion that what's going on is that Open-Zwave is generating events so quickly that the Pi cannot keep up on the main thread. This causes the queue of queued signals to become saturated with queued events from OZW, which leads to the timer not being able to get in to fire. Something else that makes this seem likely is that I did some profiling and a huge amount of time is spent serializing events for MQTT, which is likely what is making each event take so long to be processed on the main thread. Assuming this is an accurate assessment, I see a couple of paths forward.

  1. Use Qt::BlockingQueuedConnection to put some back-pressure onto OZW such that it doesn't saturate us with events during startup. This does require that OZW itself has internal back-pressuring mechanisms that can ensure that the thread that's communicating with the dongle doesn't saturate any internal OZW queues (this could cause the dongle to timeout).
  2. Refactor the MQTT handling to automatically reconnect after a disconnect. This is certainly something that should occur anyways, but I think it's sort of hiding the underlying issue. Saturating the event loop with stuff can cause all sort's of havoc.
  3. Move processing off the main thread. This is definitely the best option out of all of them. Having the event-processing for QTOZW and the network-handling for MQTT on their own threads would enable them to flow events between each-other without interfering with ordering or maintenance work that they need to be performing.
  4. Accept something like Pull Request Increase keepalive #185 which adjusts the MQTT Timeout to no longer triggered while the main thread is being saturated during startup (although it should probably be something like 10m). There isn't a whole lot of downside to doing this since taking longer to discover a 'lost client' is mostly irrelevant. Similar to 2 above, this is sort of hiding the issue, though it's quick and easy.

What do you think @Fishwaldo ?

@Olen
Copy link

Olen commented Nov 30, 2020

I totally agree that #185 is just a workaround, and for me, yout option 3 seems like the best one. Unfortunately I don't have enough experience with C++ and QT to help.

Regarding option 2, there has been some discussion in another issue (could not find it here and now), and a problem is apparently that it is hard to keep track of the states (and what messages in either direction that might be lost) if you just do a MQTT-reconnect without restarting OZW at the same time.

I really hope fishwaldo is well, and will be back from his involuntary break soon...

@m3ki
Copy link

m3ki commented Nov 30, 2020

So I spent an ungodly amount of time figuring out how to get a local debug build of ozwdaemon running on my MBP. I can't exactly reproduce the issue on my MBP

This is where I am at right now :)

docker buildx build --platform linux/arm -f Docker/Dockerfile -t qt-ozw-allinone-timeout .
WARN[0000] invalid non-bool value for BUILDX_NO_DEFAULT_LOAD:
WARN[0000] No output specified for docker-container driver. Build result will only remain in the build cache. To push result image into registry use --push or to load image into docker use --load
[+] Building 854.4s (16/44)

quickly that the Pi cannot keep up on the main thread. This causes the queue of queued signals to become saturated with queued events from OZW, which leads to the timer not being able to get in to fire.

So the classic CS problem of thread starvation?

Man I am rusty with C++, but having non-working lights is making me want to pick it back up.

@psgcooldog
Copy link

Until there's an actual fix for this issue, is there any way to take CPU resources from the core-openzwave docker container to artificially slow it down so that it doesn't overwhelm mqtt? Perhaps Portainer has something useful. I'll check this idea out tomorrow, because otherwise I'm just dead in the water.

@m3ki
Copy link

m3ki commented Dec 2, 2020

what I did on my end as a workaround is to install mqtt locally on my pi and then bridged it to my main mqtt server. It seems to fix the issue at least for me. I also tried recompiling ozwd to extend the timeout but that didn't quite solve the issue completely.

try this docker compose

version: '3'
services:
  mqtt:
    image: eclipse-mosquitto
    container_name: "mqtt-bridge"
    volumes:
      - ./mqtt:/mosquitto
      - ./mqtt/data:/mosquitto/data
      - ./mqtt/log:/mosquitto/log
    ports:
     - "1883:1883"
     - "9001:9001"
    restart: always
  ozwd:
    image: openzwave/ozwdaemon:latest
    container_name: "ozwd"
    depends_on:
      - "mqtt"
    security_opt:
      - seccomp:unconfined
    devices:
      - "/dev/serial/by-id/usb-xxx"
    volumes:
      - ./ozw:/opt/ozw/config
    ports:
      - "1983:1983"
      - "5901:5901"
      - "7800:7800"
    environment:
      MQTT_SERVER: "pi.local.net"
      MQTT_USERNAME: "[redacted]"
      MQTT_PASSWORD: "[redacted]"
      USB_PATH: "/dev/serial/by-id/usb-xxx"
      OZW_INSTANCE: "1"
      OZW_NETWORK_KEY: "[redacted]"
    restart: always

Here is mqtt config

persistence true
persistence_location /mosquitto/data/

log_dest file /mosquitto/log/mosquitto.log

password_file /mosquitto/config/passwd
allow_anonymous false

# External MQTT Broker
connection zpie01
address hassio.local.net
topic OpenZWave/1/# both # <-- 1 is the id of one of many of my instances update as needed
remote_username [redacted]
remote_password [redacted]

@brett19
Copy link

brett19 commented Dec 2, 2020 via email

@psgcooldog
Copy link

It looks like adding " --cpu-shares 512 " might be a good start. Would any of you folks here know how I can get that into the command line that HA is using to start up the addon?

@m3ki
Copy link

m3ki commented Dec 2, 2020

Depending on the MQTT server you have, the performance of your device and the size of the network, you could improve things with those kinds of changes. With my network of 58 nodes, even with a local MQTT and Pi4, it couldn't complete quickly enough. I'll try to remember to push my docker image with Olen's workaround (which is perfectly reasonable to use in "production") tomorrow morning. Cheers, Brett

On Tue., Dec. 1, 2020, 8:13 p.m. m3ki, @.***> wrote: what I did on my end as a workaround is to install mqtt locally on my pi and then bridged it to my main mqtt server. It seems to fix the issue at least for me. I also tried recompiling ozwd to extend the timeout but that didn't quite solve the issue completely. try this docker compose version: '3'services: mqtt: image: eclipse-mosquitto container_name: "mqtt-bridge" volumes: - ./mqtt:/mosquitto - ./mqtt/data:/mosquitto/data - ./mqtt/log:/mosquitto/log ports: - "1883:1883" - "9001:9001" restart: always ozwd: image: openzwave/ozwdaemon:latest container_name: "ozwd" depends_on: - "mqtt" security_opt: - seccomp:unconfined devices: - "/dev/serial/by-id/usb-xxx" volumes: - ./ozw:/opt/ozw/config ports: - "1983:1983" - "5901:5901" - "7800:7800" environment: MQTT_SERVER: "pi.local.net" MQTT_USERNAME: "[redacted]" MQTT_PASSWORD: "[redacted]" USB_PATH: "/dev/serial/by-id/usb-xxx" OZW_INSTANCE: "1" OZW_NETWORK_KEY: "[redacted]" restart: always Here is mqtt config persistence truepersistence_location /mosquitto/data/ log_dest file /mosquitto/log/mosquitto.log password_file /mosquitto/config/passwdallow_anonymous false # External MQTT Brokerconnection zpie01address hassio.m3ki.nettopic OpenZWave/1/# both # <-- 1 is the id of one of many of my instances update as neededremote_username [redacted]remote_password [redacted] — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#140 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAML466463XNO4Q4OWZ3YMDSSW5FRANCNFSM4PLIQ25Q .

For me I have 4 pis with 50-100 nodes or so and nothing would work. Even on the test ozwd instance where there were no nodes (what gives !?) OZWD would just disconnect with or without the keepalive modification in the code, iff the mqtt server was external to hassio. eclipse/mosquitto on docker on a beefy VM.

I have a feeling now that there is something going on with a network connection between the pi and the mqtt server. Things worked better if pi would connect to internal mqtt addon on hass server.

If you need to make sure startup completes you can wipeout ozwd cache on your pi that might help, just don't reset your zwave stick and all nodes will come back.

For me my network has been rock solid as of this morning with a setup I mention in my previous comment.
PI (running ozwd and mqtt container) --->bridged to hassio mqtt.

@m3ki
Copy link

m3ki commented Dec 2, 2020

By network connection, i meant there is something going on with how ozwd handles traffic and/or congestion. Even with the keepalive increased network wouldn't be stable. At least in my case.

@psgcooldog
Copy link

I'm running a supervised install on Ubuntu 20.4 on an RPi4, with the Openzwave addon and the MQTT addon. I've been working to get everything back to normal after switching from the old all-in-one Zwave integration, and I thought I had finally got things right when this issue cropped up.

@m3ki
Copy link

m3ki commented Dec 2, 2020

I'm running a supervised install on Ubuntu 20.4 on an RPi4, with the Openzwave addon and the MQTT addon. I've been working to get everything back to normal after switching from the old all-in-one Zwave integration, and I thought I had finally got things right when this issue cropped up.

A friend did this build yesterday when we were trying to troubleshoot this issue you can try his docker
https://hub.docker.com/r/firstof9/qt-ozwdaemon it has an increased timeout.

Keep in mind it kinda worked for me but I would still experience intermittent disconnects every hour or so or when network got busy.

I am now back to the original docker though, with mqtt running on same host and bridging mqtt to mqtt with hassio on a separate VM. So far so good.

You can also wipeout ozwd cache file of your ozwdaemon and see if you can get your network back up.

@renzor16
Copy link

renzor16 commented Dec 2, 2020

Adding several more devices to my network caused this problem for me last weekend. I'm around 60 devices now with more to add. I'm running using VirtualBox on an Intel NUC so it doesn't appear easy to make any temporary changes to work around this issue. Last night I switched to Zwave2MQTT and have all my devices connected this morning. I'll have to spend some time renaming devices/entities but at least this will get me going again.

@psgcooldog
Copy link

I really need a fix for this. The system fails too frequently as I am trying to add some door sensors and deadbolts. Deleting the cache file will let it restart, but many nodes lose their names, and some disappear. And it takes forever, anyway.

I tried Zwave2MQTT last week, and I switched when I ran into some issue (can't remember what it was now, lol). I may switch back and try it again. I think one issue was that it was clear from what I read that it was an orphaned project, and that the OpenZwave add-on with the OpenZwave integration was the road forward.

Is there any way to get the developers' attention quickly, or should I assume that this state of affairs will persist for a while?

@brett19
Copy link

brett19 commented Dec 3, 2020

Hey @psgcooldog,
I am not a developer of qt-openzwave, but I am a C++ developer. I spent a bunch of time looking into this issue and I am reasonably confident that the fixed timeout should be sufficient for most networks. I have deployed that fix to my own network and have not seen any issues since. If your running into issues still, even after building and deploying the pull requests fix, it's likely there is another issue at play.
Cheers, Brett

@m3ki
Copy link

m3ki commented Dec 4, 2020

Hey @brett19
would this solution be better than simply setting an arbitrary timeout? this way anyone can adjust a timeout and tweak it as needed?

    QString mqtt_keep_alive = qgetenv("MQTT_KEEP_ALIVE");
    if (!mqtt_keep_alive.isEmpty()) {
        this->m_client->setKeepAlive(mqtt_keep_alive);
    }

I am having a hard time setting up a crosscompilation buildchain. How did you get it to work?

@brett19
Copy link

brett19 commented Dec 4, 2020

Sorry for the delay, forgot to push this the other day. Here are some armhf (32-bit ARM) images including fix-185.
https://hub.docker.com/r/brett19/ozwdaemon

@psgcooldog
Copy link

I decided to punt, and converted everything over to ZWave2MQTT. It was quite the time-consuming process, but this particular problem is no longer an issue for me.

@karl-gustav
Copy link

@psgcooldog I also jumped ship for z2m, but ozw is far superior when it comes to handling scene events from switches. It actually comes into HA as a scene event and not a regular state change. And z2m sends 4 state changes per button press, so you need to go deep into the event to figure out if it really is a new scene event.

tl;dr: would prefer to use ozw but had to switch to z2m because ozw can't handle more than ~20-25 devices before it breaks down...

@renlor16
Copy link

renlor16 commented Dec 5, 2020

@karl-gustav I was just trying to figure scenes out with z2m. I have Inovelli red dimmers that allow multi-tapping to create scenes. Had it working fine with ozw, but no luck with z2m. I'll probably give up for now and make the switch back to ozw when this bug is fixed. At least all my regular light automations are working again.

@brett19
Copy link

brett19 commented Dec 5, 2020

Hey Everyone, can you confirm that you still have issues with the image I posted above containing fix 185.

If you can upload logs, that would help track down your specific issue beyond what we've already discovered.

Cheers, Brett

@Olen
Copy link

Olen commented Dec 5, 2020

tl;dr: would prefer to use ozw but had to switch to z2m because ozw can't handle more than ~20-25 devices before it breaks down...

FWIW, I run ozw with 61 devices, and it has been running solid for 7 weeks. But I have only added a few new devices during that time, not removed any, and not done any network refreshes or other tricks.
Restarting the container, on the other hand, is usually causing trouble. But as soon as it starts up, it seems pretty stable.

@renlor16
Copy link

renlor16 commented Dec 5, 2020

@brett19 I'm not that familiar with Docker. Is it possible to use Portainer to replace my current ozwdaemon with the one you created?

@brett19
Copy link

brett19 commented Dec 6, 2020

@brett19 I'm not that familiar with Docker. Is it possible to use Portainer to replace my current ozwdaemon with the one you created?

You would need to shut down the container and spin up a new one with the same configuration but different image (as far as I know). I personally use docker-compose to make it easier to do that.

@renlor16
Copy link

renlor16 commented Dec 6, 2020

Maybe I’ll have some time over the holidays to figure that out. For now z2m is running well. The only thing I can’t figure out is scenes from my dimmers. Is @Fishwaldo the only person that can release a new version of the addon? Given that the HA roadmap seemed to be heading down the ozw path, it seems pretty risky if there is only one person that can release bug fixes/workarounds.

@m3ki
Copy link

m3ki commented Dec 9, 2020

A friend compiled "this" fix and added my fix to add MQTT_KEEP_ALIVE environment variable to change timeout as needed
docker here https://hub.docker.com/r/firstof9/qt-ozwdaemon

Before this fix my setup would still restart if I did "Refresh node"

MQTT_KEEP_ALIVE: "360" 

My config is here (keep in mind I am using a local mqtt on the pi that bridges to a main mqtt sever to make sure ozw doesn't restart if HASS instance is restarted

version: '3'
services:
  mqtt:
    image: eclipse-mosquitto
    container_name: "mqtt-bridge"
    volumes:
      - ./mqtt:/mosquitto
      - ./mqtt/data:/mosquitto/data
      - ./mqtt/log:/mosquitto/log
    ports:
     - "1883:1883"
     - "9001:9001"
    restart: always
  ozwd:
    #image: openzwave/ozwdaemon:allinone-latest
    image: firstof9/qt-ozwdaemon:latest
    container_name: "ozwd"
    depends_on:
      - "mqtt"
    security_opt:
      - seccomp:unconfined
    devices:
      - "/dev/serial/by-id/usb-0658_0200-if00"
    volumes:
      - ./ozw:/opt/ozw/config
    ports:
      - "1983:1983"
      - "5901:5901"
      - "7800:7800"
    environment:
      MQTT_SERVER: "localhost.mydomain.net"
      MQTT_USERNAME: "[redacted]"
      MQTT_PASSWORD: "[redacted]"
      MQTT_KEEP_ALIVE: "360"    <------ add keep alive like so
      USB_PATH: "/dev/serial/by-id/usb-0658_0200-if00"
      OZW_INSTANCE: "3"
      OZW_NETWORK_KEY: "[redacted]"
    restart: always

@genome-prime
Copy link

Hi, I have the same issues. Are there any solutions in sight? Or at least a temporary workaround for people like me, who run the official image on a raspberry pi?

I've tried switching to zwave2mqtt but I couldn't figure out how to add the devices and entities to HomeAssistant. Auto discover didn't work either. And I really don't want to have to do everything manually.

Is it correct that everyone who has a sufficiently large network using the OpenZWave Plugin is experiencing this issue?

Oh and also I'm quite new to all of this. So I'm JUST learing how to access certain files in the docker containers, getting OS SSH Access etc. This is how I've managed to delete the ozwcache file at least, so I don't have to install the whole thing from scratch. But everytime I "reset", start up the OpenZWave Plugin and let my network run for a bit it seems to randomly miss some entities.

Is there a way to restore the ozwcache from an old file without running into the timeout issue, maybe?

I'm desperate at this point.. so I'm thankful for any help :)

@renlor16
Copy link

@genome-prime I was unable to figure out how to get the workaround installed on my setup. I ended up switching to z2m and was able to get auto discover to work. It did require that I re-enter all the entity names, which was a bit time consuming. So far it has been very stable. I have about 80 zwave devices. I had trouble getting the central scene figured out with z2m. I'm now looking at Node-Red with an MQTT node to grab the central scene info from there.

@genome-prime
Copy link

genome-prime commented Dec 15, 2020

I'm giving up in frustration... I gave z2m another shot. This time my devices were auto detected and showed up in Home Assistant.

Unfortunately

  • some devices are missing random entities and even after "Refresh node info" not all entities are listed (this was occuring much less with OZW)
  • some devices (all of my thermostats) are always missing entities (like "mode") that were definitely showing up with OZW. The strange thing is I can see those entities in MQTT Explorer and in the config sections, but not anywhere else.
  • some devices are entirely the wrong type
  • some entities always report "unknown" like my Fibaro Motion Sensors (this was also definitely working with OZW)
  • ...

I know there are customizations but I couldn't figure out where the config files are being stored and how to apply them and honestly I don't wanna have to customize anything, since OZW was detecting everything fine on its own (except my Fibaro Button which I can live without for now)

I know this might not be the right or best place for it but I need to get this off my chest:
It would be much more fun if there were official, step-by-step, up-to-date Tutorials out there for setting up the different ways of a core feature (Z-Wave) of HomeAssistant.

I guess I should at least say some positive things too:

zwave2mqtt:

  • The no restart thing is nice
  • The UI is nice and not so buggy compared to OZW-Admin

OpenZWave:

  • It actually works! (for a limited time...)
  • Better compatibility out of the box

@blhoward2
Copy link

Hey Everyone, can you confirm that you still have issues with the image I posted above containing fix 185.

If you can upload logs, that would help track down your specific issue beyond what we've already discovered.

Cheers, Brett

Your docker image seemed to fix the issue for me!

@TheArcturian
Copy link

TheArcturian commented Dec 25, 2020

I have a new and fresh installation of Home Assistant on Raspberry Pi per 25th December 2020 and have the same problem:
"Client qt-openzwave-1 has exceeded timeout, disconnecting."

Should this bug even show up on a totally clean install?

@TheArcturian
Copy link

Seems like it is the Aeotec Z-Wave Gen5 stick that doesn't work on Raspberry Pi4. At least 3 (of 4) hardware versions of the stick:
https://community.home-assistant.io/t/sticky-aeotec-z-stick-gen5-raspberry-pi4/218405/25

@m3ki
Copy link

m3ki commented Dec 25, 2020

Seems like it is the Aeotec Z-Wave Gen5 stick that doesn't work on Raspberry Pi4. At least 3 (of 4) hardware versions of the stick:
https://community.home-assistant.io/t/sticky-aeotec-z-stick-gen5-raspberry-pi4/218405/25

Did you try this solution?

@TheArcturian
Copy link

TheArcturian commented Dec 26, 2020

No, but I will get a cheap unpowered USB 2.0 hub. That will do the trick. Also it makes the stick further away from the Raspberry which is supposed to reduce interference.

@sstratoti
Copy link

@m3ki - thank you. That docker image has seem to have done the trick for getting my network back online. It was super frustrating - I could see through the admin gui that everything was up and running, could see it posting into MQTT, but for some reason HA kept saying that everything was "unavailable"...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests