-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Help with Poller #62
Comments
Great set of questions. That API sample is really out of date. I'd take a pull request with your updated ##sample. You will need to run the message pump as the configuration is delivered asynchronously in messages. There is no "polling" interface to the equipment; fundamentally its a pub/sub model. The "cloud_message_pump_task" is mislabeled it should be called message_pump_task (initially the API only supported the cloud connection) So basically the way it works, is as the message pump gets messages it updates it's in-memory view of the system. The API bootstraps by requesting subscriptions to a set of topics. The initial messages from the S30 contain the full information for the topic (for example a list of zones). Subsequent messages provide updates (for example the temperature in the zone). The poller task is just periodically reading that in-memory representation. To set up two connections you create two instances of the API. If you do that, use a different APP_ID for each connection as the S30s do talk to each other; and this will cause them to get confused. s30api = s30api_async("none", "none", APP_ID, IP_ADDRESS) What is done in the Home Assistant Integration is the zone temperature sensors setup a callback and that callback gets called whenever a requested property changes. So that's an approach also. |
Thanks for the info. I will attempt to play with it a bit more with that knowledge. Note that I am not using HA. My plans are to write to a InfluxDB DB and then read it later with Grafana. And so whatever I create will most likely end up as a Docker container to feed the DB. |
Do you see anything wrong with this approach to defining 2 systems? It seems to work OK. async def multiple_tasks(s30api,s30api2): def main(): Console Output (Zone 1 is a separate system): |
That looks great. A case to handle will be when the connection fails - the message pump task will generate an exception at which point the connection will need to be reestablished. You can simulate this by rebooting your S30. Or there is a simulator in the repository. In vscode these are the debug config, which starts a simulator on localhost on a configurable port 8081,8082, etc. You can then point the API at localhost:8081 and there's a parameter in the constructor to use HTTP instead of the default HTTPS. I do most dev work against the simulators. There's also a rich set of diagnostic data that can be obtained - for example the heat pump inverter current - useful for energy monitoring. |
A case to handle will be when the connection fails - the message pump task will generate an exception at which point the connection will need to be reestablished. This seems to be the case at initialization but then recovers. Is the recovery not automatic? |
The API has no recovery / retry coded in it. That said, the S30 does keep state information regarding the APPID subscription and message queue so it is possible that some scenarios do recover if this state info has not been removed. For example, a temporary network glitch, would cause the pump to generate an exception, but when the network is restored it should pick up where it left off. I'm not sure of the rules of when the s30 removes the subscriber, I've assumed there must be a timeout, max queued messages and it gets cleared on restart - but I've never done detailed analysis. Early on there were indications that failure to disconnect would cause the message queue to grow indefinitely and the device to automatically reboot. |
Just to be sure we are talking about this exception in the sample?
From your comment it sounds like you expect some communications errors to occur but it should keep running and respond when the connection resumes? Are you saying that may not be the case? Would some sort of timer and then a reconnection (await s30api.serverConnect) request if exceeded be what you have in mind? The timer value would get reinitialized after every successful read? Could it be as simple as this? Any idea what happens to the buffered data when you close the connection? I suspect it is wiped out. So if I was to aggressive with this I could inadvertently loose data. Maybe a number of minutes to wait before a reconnection is called? I see that when you connect you force a close_session first during the process. Should I had a close_session for the keyboard abort too? In my normal use case this will run forever but could be shut down via Docker. So I guess I will need to check into a way to trap the container shutting down and terminate the connection then too. |
There's also a rich set of diagnostic data that can be obtained - for example the heat pump inverter current - useful for energy monitoring. Unfortunately this only appears to be available after/during a diagnostic test. I tried it for one I tested yesterday and got the same values. And the other S30 that has not had a diagnostic test run on it (well at least since a reboot), had None as its values. This would be exactly what I would like to see and plot though. Any chance there is one lurking somewhere that is real-time for voltage and current? |
@PeteRager Any thoughts on how I can tell if a system is currently using the strip heaters, but not set as Emergency Heat. I looked for Aux Heat or stages of heating. One issue is that the variable speed units are not fixed. But at some point they will go into aux heating. I thought Demand would have done it but this appears to be a value of the airflow. |
On error handling, the test program is overly optimistic and just catches errors and continues. That will not work reliably; at some point the code will need to go back through the initialization process; reconnect and resubscribe and then start the message pump again. When building the HASS integration, I put all the retry logic into the integration it took a while to get is right. That logic is in this file, it is convoluted, but has been very reliable https://github.com/PeteRager/lennoxs30/blob/master/custom_components/lennoxs30/__init__.py Regarding the last two questions of diagnostic data. Call this turn diags on
After that you'll get a stream of diagnostic data forever (or until the S30 reboots or you set it back to zero) Read this document as there are some potential stability issues that could arise: The electric heat data is part of the diagnostic data. There is a diagnostic element called #_of_electric_heat_sections_on on the indoor unit. This test shows how to access them:
This is what you are looking for:
Looks like diagnostic 13 on equipment 2 Yes demand is CFM of airflow. |
Wow. Great stuff. This will keep me busy for a bit. |
Quick question. Re reading your setup instructions for HA under Emergency Heat I see this statement: If the Lennox Auxiliary Heat is running, the aux attribute in the HA Climate entity will be set to True and the HA Climate Entity will show Heating This seems to imply that you can detect if the heat strips have been applied to normal heating (ie 2nd stage or aux heating). Is this true? This may be all I need for my purposes to monitor excessive heat strip use. Somewhere/somehow I can keep track of how long this is going on per operation/day, and make a judgement call if it is excessive or not. Its a high level use case to trap a broken system before I run up a huge electric bill (don't ask me why this is a concern). BTW why are the attributes defrost, outdoor_temperature and aux in the zone vs system? |
In a system where there is an heat pump and a gas furnace; when the heat pump is locked out due to ambient temperature being too low and the gas furnace is running: these indicators are used. Whether those work for the heat strips also, I do not know, either way I'll update the docs based on what you find. Outdoor temperature is on the system object. The other two are on the zone because that's the way the S30 models the data. . In general the zone and system properties map 1:1 with a JSON attribute in the zone or system messages. What could be helpful is to setup HASS in Docker and enable the integration. This will provide a GUI with all the data. Turn on message logging and it'll dump the messages to a file - then as the strips come on - we can see what is in the messages. It is possible there is a parameter the API is not processing. The API also has ability to log to a file. Either way getting the messages will be helpful to see what changes when. |
The heat strip experiment will have to wait a bit. Fighting 100F+ temps right now. I'll look into the HASS Docker thing if I get some time later in the week. The test python stuff spits out a lot of data and its easy to add to it on the fly. |
Had a few minutes free so I installed HA via Docker. It all seems to be OK but when I went to add Lennox S30 as an integration nothing shows when I search for len or lennox. Thoughts? |
The integration is community add-on and so needs to be installed. The simplest way it to download the latest release: Unzip it and put it the contents of the custom_components/lennoxs30 into /config/custom_components/lennoxs30 folder. Where the /config is the volume used by docker. Then restart HASS. Alternatively install HACS (Home Assistant Community Store) https://hacs.xyz/ and follow the instruction here on how to add the integration https://github.com/PeteRager/lennoxs30#hacs |
OK. Up and running. Both systems added. Now where do I find these log files? |
This section describes how to enable message logging (ignore the debug logging). The files will appear in the /config directory. |
Just to let you know I did enable logging and verified it was working. Seems about the same detail as I was getting in the test scripts (minus the prints from the zones) but I did not do a detailed comparison. NOTE: I did not enable the diagnostic feature as I am not ready for that test, so maybe more details appear then. For now I will return to working on my simple usage collection server (need to embed your recovery logic). Also interesting to note is that one of my systems occasionally has wifi issues. This came up during some of my tests today and in HASS I could see the state going from connected to disconnected to connecting, so it showed the value of your recovery routines. |
I have been studying the file and seem to understand much of it's logic. I will have to gently remove the HASS stuff as I will be running it under Docker, just feeding an InfluxDB DB for use in a Grafana dashboard. I must admit it's rather complex but I think I get the jest of it. But now my question is much more basic.
|
It's a good question. Have you looked at the InfluxDB connector in HASS? I think you can configure it to send any entity you want to Influx. This may be a no code solution. I have had a couple of requests to create an MQTT server, as this would allow integration with lots of systems. So that model would be something like this Jump into the code (ie start the processes)
|
Thanks for getting back so quick. Sorry for not being more complete in my questions. I am good for understanding how to inject my data into InfluxDB via Docker. And also when to trigger the shutdown event. My questions were more about what the calls would be like to the init example you built for HASS. Clearly I don't need all the HASS setup functioinality and will replace that with my own variable setting routines. But after that how do I kick off starting the processes to pull the S30? For example it looks like I would start by calling async_setup_entry. Would this then cascade all the other routines buried in the manager classs? And for stopping the processes call async_unload_entry? And I notice that return received is the result of the message routine. What do I need to do to subscribe to this so I can parse the message stream in my code? Sorry but I am a Python newbie. |
@PeteRager Just pinging you in case you did not see my last post. |
Yes, your statements regarding async_setup_entry are correct, calling that should kickoff the process: and likewise async_unload_entry should stop it. I have often thought that untangling the manager class from HASS and having it in the API would make the API more complete and would make the HASS integration simpler. There is no callback handler to get the JSON messages directly, so are options are to add one to the API or derive a class from the API and instantiate instead. This is python pseudo-code and likely won't compile but should get you going in the right direction.
And then you'd instantiate this class instead of s30api_async |
Outstanding. Give me a few days to stitch something together and I will report back my findings. |
I have often thought that untangling the manager class from HASS and having it in the API would make the API more complete and would make the HASS integration simpler. Looking over the complexity of stripping HASS and overwriting the message handler, I have decided to wait to see if you end up doing this. In the mean time I will see if I can hack the original sampler above with some very basic recovery logic. My brain hurts. :-( |
OK, I am back at this again. Have a few questions: Call super.init() with the parameters Confused about this. My plan was to create a new file based on init.py after stripping all the HASS stuff, and then just calling the async_setup_entry to start it. I could never find out how you got started in HASS with this. One issue I am having with my conversion is how to define the config object. Is that what CONFIG_SCHEMA is doing? Cannot find the source for that. I was also in the process of creating a stripped down ConfigType that just supported the class structure without all that HASS functions embedded. |
One of the challenges with that starting point is unwinding and understanding the HASS objects. The main element of the config entry is a dictionary of the configuration values. There are some examples in the test directory of where I setup a fake config entry to test those functions, that could provide a good starting point. Another alternative could be to go back to the command line program and get this executing so that data is flowing into influx. A simple way to handle errors is to have the program exit and then have the docker container auto-restart. So essentially a failure stops it and it restarts from scratch. Simple may be better. That HASS integration was my first attempt at python programming and while it works very reliably it's not very modular and unwrapping it may be a lot of work. Myself, I'd likely just put this into HASS configuration.yaml and be done with it. This will send all the data collected by the integration to influx.
|
Good tip on the Docker restart. I'll set it to automatic. I was heading in the direction of simple and using the base of the command examples and then got caught up in thinking the HASS one was more complete. Sometimes complexity looks like elegance when you really don't understand everything. :-) I'll reset and take another stab at it. I have all the parts so it should be fairly easy to crank something out now that you put me back on the rails. |
Quick question: If running as a Docker container is it necessary to shutdown the s30api_async object before the container terminates or will this happen when the container is gone? I don't want to leave connections hanging around unnecessarily. Of is it NBD, especially if I use the same app_id on a restart. |
I would try to logout as it will prevent messages from accumulating for that app_id. I don't know how long they accumulate for and the side effects of it running out of space. That said, if it's in trouble is usually reboots on its own. |
OK. Looks like the best way to detect a container shutdown is via signals (SIGTERM)? It looks like we have 10 seconds to gracefully quiesce the container before it's shut down? Any experience or best practices here? |
As an experiment I wanted to determine where and how I did the shutdown. It looked like I would do this in the message_pump_task routine so I just put in a counter for now to allow a few executions and then terminate the program:
This resulted in what I think is a smooth exit without an error:
Before I put in the shutdown I would get:
|
That looks great! |
OK. Here is my first hack at collecting zonal information once every minute and storing it in a InfluxDB database. I will use this with https://github.com/jasonacox/Powerwall-Dashboard so I will placing it in the same db. I am using the unique_id for each zone as the measurement so each zone has its own data set. This does not have a lot of error recovery built in so I will see how it goes after running it for awhile. I will make it a Docker container once it seems to be stable enough. Next up will be to create a Grafana dashboard to display the data, with some of it emulating the monthly summaries that come from Lennox. #62 (comment) |
Well my attempt to get it running in Docker has been limited. When I run it I get this error in the log:
So it cannot find module. Any thoughts on what that may be? No errors during the build. |
Small update: Still struggling here. For some reason Docker is not finding the packages that have been installed into the site-packages. I verified that the path was good and also pip says its valid. :-( EDIT: Making good progress. Past all the module prereqs and config file. Now trying to solve why it can't write to the log file that is there and has rw permissions for everyone. EDIT 2: Success. I had to redo the way logging is done in Docker as it would not write in the format I used first. Another sample after I run this for a few days and it seems to work OK. Just barely started on the Dashboard in Grafana. |
System has been working good and I have been making slow progress on the Grafana dashboard until this last night: #65 |
@PeteRager OK. I have hit a wall. For some reason one of my S30s just stops updating the message pump data. This can happen after few minutes or a few hours. It does not throw any errors so there is nothing to trap to restart it. The message queue just has the same values in it and my app will just keep pumping away. Thoughts on what this is and how to detect it? BTW my Grafana dashboard is complete. All I need now is a reliable data feed. |
When it's in that state no messages are being processed. Meaning the get is not returning any messages? The only time I've seen something like that is when I've had two applications running using the same app_id; but typically this would just cause each application to just get a subset of the messages. Are you using different app_ids for the two S30s? Are the two S30s running the same firmware version? |
I am using the sample api_poller_task. So I guess it just keeps looping looking in what was dumped in the queue? And if await s30api.subscribe(lsystem) does not update the values, I just process the same values over and over. Yes different app_ids for the two S30s. I swore before when there was an issue with subscribe I would get an error. That is what I programmed for. |
As a way to detect no messages. There is a metric object attached to the api.metrics In that object is a last received time and a message count. So this could be used to detect the connection not sending data. Typically there are multiple messages a minute, so 10 minutes of no messages could be the diagnostic to detect this. |
That is what I probably need. I will code that up. Since I am on a LAN I would think no messages in a minute or so would mean a problem? |
Maybe log that message count to grafana, see how often it typically changes, then go 3x or so around that. Is that the single zone s30 or the multizone one? Restarting the S30 may help, but I've only ever had to do that one time in two years. |
Good idea about putting it in grafana. Easier than hunting down through Docker Compose. Its the zoned one. The single zone one seems to be stable. I rebooted both of them last week once I "thought" I had my code stable. :-( |
Take a look at the firmware version. Mine are at 3.81.213 |
Is the WiFi signal strength the same on both of them? If the WiFi was weak and drops out occasionally I could see this situation arising. The devices so send the RSSI in messages periodically, it's not captured by the API but should be in the message logs. The other value to look at is the sysUpTime on the system object, that tells how long in seconds the system have been running. If it resets to zero, that means the S30 has restarted - but if that happened I'd expect communication errors to be reported. |
@PeteRager Here are the two metrics from both systems after they were established and reporting messages for three minutes (1 minute polling time). Note Downstairs has two zones vs upstairs having one.
Are these the metrics you described to check? The other time ones are: last_metric_time, last_message_time , last_reconnect_time, last_send_time. And seeing these what types of deltas would indicate a problem? My S30 FW is from July 2022. My WiFi signal has 5 bars but I have seen the device randomly disconnect before, but rare. |
Looks good. If you don't get any messages for 5 minutes that would mean there's an issue and the code could try to reconnect. |
I think I have an experiment running now. I tested it immediately and it did the disconnect and then restarted so I am hopeful. Now if I did not screw up my date math, I have a chance. I did it for a 3 minute lag. My python date math was tested as some give back UTC and others local, so I have to fix that. |
Small update but making progress. Once I have this resolved I am done. |
My use case is fairly simple. I want to query my two Heat Pumps periodically (every 5 seconds or so) and see what state they are in (heating/cooling/standby) and what the physical characteristics are of the units and the zones they are supporting. There are a total of 2 S30 thermostats and additional slave thermostat that gives a total of 3 zones. I do not want to control or set anything. I want to do all of this locally (not via the cloud).
It would appear that the poller would meet my needs at first glance but I am having some issues getting the results I expect and could use some guidance.
Log
2023-07-16 10:42:58,793 [MainThread ] [DEBUG] serverConnect - Entering
2023-07-16 10:42:58,793 [MainThread ] [DEBUG] Closing Session
2023-07-16 10:42:58,793 [MainThread ] [DEBUG] Creating Session
2023-07-16 10:42:58,794 [MainThread ] [DEBUG] authenticate - Enter
2023-07-16 10:42:58,794 [MainThread ] [DEBUG] login - Enter
2023-07-16 10:42:58,942 [MainThread ] [INFO ] Creating lennox_home homeId [local]
2023-07-16 10:42:58,942 [MainThread ] [INFO ] Updating lennox_home homeIdx [0 homeId [local] homeName [local]
2023-07-16 10:42:58,943 [MainThread ] [INFO ] Creating lennox_system sysId [LCC]
2023-07-16 10:42:58,943 [MainThread ] [INFO ] Update lennox_system idx [0] sysId [LCC]
2023-07-16 10:42:58,943 [MainThread ] [INFO ] login Success homes [1] systems [1]
2023-07-16 10:42:58,943 [MainThread ] [DEBUG] Negotiate - Enter
2023-07-16 10:42:58,944 [MainThread ] [DEBUG] serverConnect - Complete
lsystem LCC
lsystem LCC
lsystem LCC
lsystem LCC
The text was updated successfully, but these errors were encountered: