Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spike in temperature data across all devices at the same (periodic) time #722

Closed
jum0n opened this issue Feb 17, 2022 · 45 comments
Closed
Labels
bug Something isn't working

Comments

@jum0n
Copy link

jum0n commented Feb 17, 2022

I get periodic spikes in the temperature data coming from multiple sensors all at the same time. Not sure what is causing this, but it throws off my tracking (and triggers events) for that period of time. Please let me know if I can provide any other data. Images attached of 3 temperature data history graphs showing these spikes.

temp1
temp2
temp3

ble_monitor version: 7.7.1
HA version: core-2022.2.8
Home Assistant OS: 7.4

Also, just want to express my appreciation for this integration, its really awesome!

@Magalex2x14
Copy link
Collaborator

What kind of sensors are these?

@jum0n
Copy link
Author

jum0n commented Feb 17, 2022

What kind of sensors are these?

These are all Govee temp/humidty sensors (each is a different model, though).

@Magalex2x14
Copy link
Collaborator

Please, show attributes of any of this sensors entities.

@jum0n
Copy link
Author

jum0n commented Feb 17, 2022

Please, show attributes of any of this sensors entities.

Does this work for that?
attribute

@Magalex2x14
Copy link
Collaborator

If these peaks are one measurement period long, then for now, to filter them, you can enable the use of the median instead of the mean in the integration settings.

@jum0n
Copy link
Author

jum0n commented Feb 17, 2022

Yes, they are usually just one measurement. Just now, though, there was one where it was two measurements long for some reason. I'll enable that, thanks!

@Magalex2x14
Copy link
Collaborator

Magalex2x14 commented Feb 17, 2022

It is very embarrassing that these peaks appear at the same time. Have you tried linking it to some "external" event?

In general, you need to catch the contents of these erroneous packets in order to understand the nature of what is happening. To do this, turn on debugging as described here, wait for the peak to appear, and upload here a fragment of the HA log that contains the time the peak appeared (+- couple of minutes).

@Magalex2x14
Copy link
Collaborator

Magalex2x14 commented Feb 17, 2022

Oh! Also you need to revert back "use median" option, otherwise you will most likely not see these peaks anymore.

@jum0n
Copy link
Author

jum0n commented Feb 17, 2022

I was just looking for an event to correlate as well. I added the extra logging for this you referenced, then restarted Core HA, the spike appears (and median is still enabled). So it does seem like its linked to restarting the Core. For the future, I will watch for any spike that does not correlate with restarting HA as well. Is there already a known issue with data like this when restarting?

@Magalex2x14
Copy link
Collaborator

No, I don't remember that.
When restarting, during the first period, the data is updated immediately when the first packets arrive, so the median may not work, yes. Turn back the use of the mean.

Let's summon the Govee master ) @Thrilleratplay have you ever experienced something like this with the Govee sensors?

@jum0n
Copy link
Author

jum0n commented Feb 17, 2022

I might have a possible answer after looking through the debug logs. It seems after the reboot, the integration might be restoring the state of the sensors but it classifies the unit of measurement as Celsius instead of Fahrenheit (which is my local measurement) if I'm reading this correctly. The calculated difference seems to account for the spike measurement which then returns to Fahrenheit later.

2022-02-17 11:55:22 DEBUG (MainThread) [custom_components.ble_monitor.sensor] Sensor device with uuid e2c56db5-dffb-48d2-b060-d0f5a71096e0 has the following settings. Name: E2C56DB5DFFB48D2B060D0F5A71096E0. Temperature unit: ..C. Decimals: 1. Use Median: True. Restore state: Tru
2022-02-17 11:55:22 DEBUG (MainThread) [custom_components.ble_monitor.sensor] Sensor device with uuid e2c56db5-dffb-48d2-b060-d0f5a71096e0 has the following settings. Name: E2C56DB5DFFB48D2B060D0F5A71096E0. Temperature unit: ..C. Decimals: 1. Use Median: True. Restore state: Tru
2022-02-17 11:55:22 DEBUG (MainThread) [custom_components.ble_monitor.sensor] Sensor device with uuid e2c56db5-dffb-48d2-b060-d0f5a71096e0 has the following settings. Name: E2C56DB5DFFB48D2B060D0F5A71096E0. Temperature unit: ..C. Decimals: 1. Use Median: True. Restore state: Tru
2022-02-17 11:55:22 DEBUG (MainThread) [custom_components.ble_monitor.sensor] Sensor device with uuid e2c56db5-dffb-48d2-b060-d0f5a71096e0 has the following settings. Name: E2C56DB5DFFB48D2B060D0F5A71096E0. Temperature unit: ..C. Decimals: 1. Use Median: True. Restore state: Tru

@Magalex2x14
Copy link
Collaborator

Hmm... If my memory serves me, @Ernst79 struggled with something like this. I'll try to take a look too.

@jum0n
Copy link
Author

jum0n commented Feb 17, 2022

I turned off "Restore state after a restart" and then bounced HA (also went back to "mean"). The spike did not happen. I will leave it like this for a while and see if that was the cause.

@Magalex2x14
Copy link
Collaborator

From what I can see, the sensor gives data in degrees Celsius.
Do you have imperial units enabled in your HA settings?
In the integration settings, in the device settings submenu, do you have Celsius degrees selected (by default)?

@Magalex2x14
Copy link
Collaborator

That's why I'm asking - #601. It looks like we have a double conversion when restoring the entity.

@jum0n
Copy link
Author

jum0n commented Feb 17, 2022

Do you have imperial units enabled in your HA settings?

Yes, it is shown on the menu here from when I setup initially.
setting

In the integration settings, in the device settings submenu, do you have Celsius degrees selected (by default)?

Yes, each device is set to Celsius. If I change it to Fahrenheit, the readings are incorrect.

@Magalex2x14
Copy link
Collaborator

OK, so settings are correct. I will look further.

@Thrilleratplay
Copy link
Collaborator

@jum0n @Magalex2x14 This happens with some of the Govee sensors. One of mine did this and it seems to have been corrected by keeping it out of direct sunlight.

@jum0n
Copy link
Author

jum0n commented Feb 17, 2022

@jum0n @Magalex2x14 This happens with some of the Govee sensors. One of mine did this and it seems to have been corrected by keeping it out of direct sunlight.

Mine are well protected, even inside. But it happens on all of them at the same time, so it is most likely not the sensors themselves but some other event. I think its linked to the HA restart and the C to F conversion/restoration activity.

If I had to map it out, I think that perhaps the restoration value is being stored in its "converted state" which is Fahrenheit for me. Then, when the restoration happens at reboot, the data is fed into the sequence as if the sensor provided it, its then converted to from Celsius to Fahrenheit (even though its already Fahrenheit) and that is the first reading until the sensor itself reports in with the correct temp.

I've turned the "Restore state after a restart" off and that fixes it for me but not for everyone that might have an issue with it in the future.

@Magalex2x14
Copy link
Collaborator

If I had to map it out, I think that perhaps the restoration value is being stored in its "converted state" which is Fahrenheit for me. Then, when the restoration happens at reboot, the data is fed into the sequence as if the sensor provided it, its then converted to from Celsius to Fahrenheit (even though its already Fahrenheit) and that is the first reading until the sensor itself reports in with the correct temp.

Yes, I am also thinking in this direction. The first thought was to simply check if the global HA setting matches the unit of measure of the sensor entity, and if not, then do the inverse conversion. After a little thought, I begin to come to the conclusion that the matter may be in the order of calling the first update of the entity, that is, improper handling of the recovery mechanism. It is possible that the update does not need to be requested at all, since it entails a unit conversion.

In any case, I need to reproduce the problem on a test system and check above assumption. It will take some time.

@Magalex2x14
Copy link
Collaborator

Magalex2x14 commented Feb 18, 2022

So... It seems like I found a bug. The root of evil was buried in the recovery procedure (BaseSensor.async_added_to_hass()):

try:
    self._attr_native_unit_of_measurement = old_state.unit_of_measurement
except AttributeError:
    pass

This shouldn't be done, because when restoring an entity, old_state.unit_of_measurement contains a unit already converted by the HA core (in our case, F, while the sensor sends C data - look at unit_of_measurement in the first line of log below). And self._attr_native_unit_of_measurement is already filled with entity _init_ method:

2022-02-18 22:11:33 WARNING (MainThread) [custom_components.ble_monitor.sensor] Oldstate <state sensor.ble_temperature_582d34310acb=67.3; state_class=measurement, sensor type=LYWSDCGQ, mac address=58:2D:34:31:0A:CB, median=19.6, mean=19.6, last mean of=1, rssi=-84, firmware=Xiaomi (MiBeacon V2), last packet id=138, battery_level=100, uuid=58:2D:34:31:0A:CB, unit_of_measurement=°F, device_class=temperature, friendly_name=ble temperature 582D34310ACB @ 2022-02-18T22:10:57.094435+03:00>
2022-02-18 22:11:33 WARNING (MainThread) [custom_components.ble_monitor.sensor] Native unit °C
2022-02-18 22:11:33 WARNING (MainThread) [custom_components.ble_monitor.sensor] State will be restored to 67.3

This log fragment is a result of the corrected code (the mentioned above sensor.py code lines are commented out, these lines of code resulted in a re-conversion as Fahrenheit fell into native units).

@Ernst79 please, look at this, maybe these lines was added for something, what i'm missing. But I think it's some kind of "old lost tail"...

@Ernst79
Copy link
Collaborator

Ernst79 commented Feb 18, 2022

I'm not sure why this was exactly added. The only thing I remember is that you sometimes can have these messages about changed unit of measurements, which messes with the statistics stuff. I also remember that we restore a lot of other attributes, as they won't be restored automatically, but that's probably only for the "non-standard" attributes.

I would propose that @jum0n comments out these lines in sensor.py and gives it a try. If it works, we include it in a next release.

@Magalex2x14
Copy link
Collaborator

Searching for commits doesn't give any results.
I'm sure everything will be fine 🤞 But let all the responsibility be on the topicstarter, yes 😁

@Ernst79
Copy link
Collaborator

Ernst79 commented Feb 19, 2022

Please give it a try with 7.8.1-beta

@Ernst79 Ernst79 added fixed in beta This issue is fixed in the current beta release bug Something isn't working labels Feb 19, 2022
@Ernst79
Copy link
Collaborator

Ernst79 commented Feb 20, 2022

Fix is not yet added in 7.8.2. Please test it first with 7.8.1-beta

@jum0n
Copy link
Author

jum0n commented Feb 22, 2022

If I've already gone to 7.8.2 do I just download this beta and over-write the ble_monitor folder under custom_components? And, if so, do I need to comment anything out once I do?

@Magalex2x14
Copy link
Collaborator

No, you don't need to do anything extra except install 7.8.1-beta (manually overwrite ble_monitor folder or use HACS redownload feature).

@Ernst79
Copy link
Collaborator

Ernst79 commented Feb 22, 2022

I have created a new beta with the same fix (7.9.0-beta). Please let us know if it fixes your issue.

@jum0n
Copy link
Author

jum0n commented Feb 22, 2022

I enabled the "restore" option, installed 7.9.0-beta, restarted HA Core and the spike appeared. Does 7.9.0-beta already have to be "running" for the fix or should it handle it as it starts? I then restarted a second time and the issue did not happen. I will continue to monitor a few more restarts to see if it stays that way, but so far it seems fixed after 7.9.0-beta is already running.

@Magalex2x14
Copy link
Collaborator

Magalex2x14 commented Feb 22, 2022

Yes, the first restart (after the fix is installed) will still produce a spike, since the entity saved for recovery still has Fahrenheit as native units. On subsequent restarts, there will already be actual units (celsius in your case), so there will be no more double conversion spikes.

@jum0n
Copy link
Author

jum0n commented Feb 22, 2022

Ok, I may have reported in too quickly. After looking at the data again, all 3 restarts I have the spike. Restarts 2 and 3 have the beta running.
Capture1

@Ernst79 Ernst79 removed the fixed in beta This issue is fixed in the current beta release label Feb 22, 2022
@Magalex2x14
Copy link
Collaborator

Hmm. Can you try, please, to disable restore option, restart, and enable restore again?

Tomorrow I will try to reproduce this again and will make for you extended debugging probably.

@jum0n
Copy link
Author

jum0n commented Feb 22, 2022

Ok, I've completed this and the spikes still show up after enabling. You can see the blank gap in the chart with no spike after it where it was disabled, then another data gap and spikes when enabled again.

image

@Magalex2x14
Copy link
Collaborator

I must say, I'm at a loss, because it is quite obvious that the problem has not gone away. However, I am absolutely sure that I saw a positive result. Now I'm not sure anymore, but I think I just used an old version of HA. Shit happens and we have to move on.

OK. Here's what I was able to find:

Apparently, there were some changes in the second half of last year. A more intelligible mechanism for working in various measurement systems has appeared https://developers.home-assistant.io/docs/core/entity/sensor/
Changes with the advent of native units of measure and native value instead of state appeared on August 23, 2021.
In addition, a curious text appeared in the documentation on February 9 this year.

image

At the moment, it looks like the problem with double conversion is that we are restoring the state and returning it as a native value, which is not correct:

    @property
    def native_value(self):
        """Return the state of the sensor."""
        return self._state

So far I don't have a complete solution. We need to dig further...

@jum0n
Copy link
Author

jum0n commented Feb 24, 2022

I'm happy to help however I can. For me, I'm satisfied that I can just turn off the restore feature and it no longer affects me but I'd guess someone out there might have a use for that.

@Magalex2x14
Copy link
Collaborator

Magalex2x14 commented Feb 24, 2022

Another observation - the code that we removed earlier never runs for me :)
Because old_state does not contain the unit_of_measurement attribute... That is, the following code:

        old_state = await self.async_get_last_state()
        _LOGGER.warning("Old state %s", old_state)
        if not old_state:
            self.ready_for_update = True
            return

        try:
            _LOGGER.warning("Native unit will be set to %s", old_state.unit_of_measurement)
            self._attr_native_unit_of_measurement = old_state.unit_of_measurement
        except AttributeError:
            pass

        try:
            _LOGGER.warning("Native unit %s", self._attr_native_unit_of_measurement)
        except AttributeError:
            pass

        self._state = old_state.state

Outputs only this:

2022-02-24 03:39:54 WARNING (MainThread) [custom_components.ble_monitor.sensor] Old state <state sensor.ble_temperature_a4c138d16a8d=unknown; state_class=measurement, sensor_type=LYWSD03MMC, mac_address=A4:C1:38:D1:6A:8D, unit_of_measurement=°F, device_class=temperature, friendly_name=ble temperature A4C138D16A8D @ 2022-02-24T03:19:08.439030+03:00>
2022-02-24 03:39:54 WARNING (MainThread) [custom_components.ble_monitor.sensor] Native unit °C

@Magalex2x14
Copy link
Collaborator

Magalex2x14 commented Feb 24, 2022

So, with the current entity code, the problem is solved.
Working code in the Fix-temperature-unit-restore branch.
Works like this:

  • initialization of the temperature sensor entity sets the correct native unit of measurement from the device settings
  • when restoring, since we are restoring the state and not the native_value (to which we do not have access), the native unit of measure is set to the unit obtained from the old state, that is, the wrong one, converted
  • first entity update is called with "wrong" native unit of measurement, so there is no conversion and no spikes
  • at the stage of data collection, the native unit of measure is restored to the correct one from the device settings
  • subsequent entity update is called with correct native unit of measurement

@Ernst79 , look at this to be aware.
The code looks working, but it looks like we need to bring the entity code in line with the current documentation (extend RestoreSensor class instead of RestoreEntity and use self.async_get_last_sensor_data() instead of self.async_get_last_state(), but then we need to think about the problem of backward compatibility with older HA versions... Or not? How do you think?

@Ernst79
Copy link
Collaborator

Ernst79 commented Feb 24, 2022

Ill have a look tonight. About backwards compatibility, my impression is that we don’t have to keep backward compatibility for very old versions. If one decides to use an old version of HA, he can always decide to use an old version of BLE monitor.

You can set the minimum compatible HA version here, HACS won’t show an update (I think ) if your HA isn’t compatible.

https://github.com/custom-components/ble_monitor/blob/master/hacs.json

@Ernst79
Copy link
Collaborator

Ernst79 commented Feb 24, 2022

@Magalex2x14 I was thinking that it is better to not set temperature unit in the collect part, as it will run this on every advertisement. Instead, we can just check it in the init, if the old temperature unit is not the same as the device setting, we just convert the value of old_state.state to the unit of measurement of the device setting. In that way, we only have to run this part of the code once and we can remove it from the collect function.

Note: I didn't convert the "old compatibility code", I added a warning that it will be deprecated soon.

        try:
            self._attr_native_unit_of_measurement = old_state.unit_of_measurement
            _LOGGER.warning("Future updates of BLE monitor will lose compatibility with this version of Home Assistant. Please update Home Assistant.")
        except AttributeError:
            pass

        self.convert_temperature = False
        try:
            if old_state.attributes["unit_of_measurement"] in [TEMP_CELSIUS, TEMP_FAHRENHEIT]:
                self._attr_native_unit_of_measurement == self._device_settings["temperature unit"]
                # Check if the temperature unit in the registry is the same as the device setting
                if old_state.attributes["unit_of_measurement"] == self._device_settings["temperature unit"]:
                    self.convert_temperature = False
                else:
                    self.convert_temperature = True
            else:
                self._attr_native_unit_of_measurement = old_state.attributes["unit_of_measurement"]
        except AttributeError:
            pass
        except IndexError:
            pass
        
        # convert temperature is device setting is not the same as the temperature unit in the registry
        if self.convert_temperature:
            if old_state.attributes["unit_of_measurement"] == TEMP_CELSIUS:
                try:
                    self._state = float(old_state.state) * 9 / 5 + 32
                    _LOGGER.debug("old state is in Celcius, but device temperature unit is set to Fahrenheit. Convert °C to F")
                except ValueError:
                    self._state = old_state.state
            else:
                try:
                    self._state = (float(old_state.state) - 32) * 5 / 9
                    _LOGGER.debug("old state is in Fahrenheit, but device temperature unit is set to Celcius. Convert F to °C")
                except ValueError:
                    self._state = old_state.state
        else:
            self._state = old_state.state

@Magalex2x14
Copy link
Collaborator

Yes, I had about the same idea before. Maybe you're right, and it's not that bad, but for some reason I don't like it... On the other hand, in any case, in the near future, there will be a change in the entity classes used, and we are good at restoring and updating only native_unit_of_measurement and native_value, and not so much matters is what the current temporary solution will be.

@Ernst79
Copy link
Collaborator

Ernst79 commented Feb 25, 2022

I managed to make it a lot shorter, by using HA's homeassistant.util.temperature function convert.

It now works as follows.

  • if the old unit of measurement is a temperature unit, it sets the unit of measurement to the device setting and it converts the temperature value from the old unit of measurement to the device setting. It doesn't matter if it is the same or not, it will just convert it. (so if both are the same (e.g. Celsius), the convert function will return the same value).
  • if it isn't a temperature unit, it will work as before
        # Restore the old state and unit of measurement
        try:
            if old_state.attributes["unit_of_measurement"] in [TEMP_CELSIUS, TEMP_FAHRENHEIT]:
                # Convert old state temperature to a temperature in the device setting temperature unit
                self._attr_native_unit_of_measurement = self._device_settings["temperature unit"]
                self._state = convert_temp(
                    float(old_state.state),
                    old_state.attributes["unit_of_measurement"],
                    self._device_settings["temperature unit"]
                )
            else:
                self._attr_native_unit_of_measurement = old_state.attributes["unit_of_measurement"]
                self._state = old_state.state
        except (KeyError, ValueError):
            self._state = old_state.state

I'll create a beta to test.

@Ernst79
Copy link
Collaborator

Ernst79 commented Feb 25, 2022

Released in 7.9.1-beta 7.9.2-beta.

@jum0n Could you try it and let us know if this version works.

@Ernst79 Ernst79 added the fixed in beta This issue is fixed in the current beta release label Feb 25, 2022
@jum0n
Copy link
Author

jum0n commented Feb 26, 2022

I updated to 7.9.2 and enabled the "restore" function. I then restarted Core and, perhaps as expected, saw the spike in data. I let it gather a few more data points then restarted again and did NOT see the spike. After a small wait, I checked that the "restore" function was still on, it was, and rebooted one more time and still have NOT seen the spike. I'd call it fixed at this point but I will definitely follow up if I see anything happen over the next few days. Thanks @Ernst79 and @Magalex2x14 !!
Capture1

@Ernst79
Copy link
Collaborator

Ernst79 commented Feb 26, 2022

That first spike is because you had not restarted core yet. It had still the old version in memory

@Ernst79 Ernst79 removed the fixed in beta This issue is fixed in the current beta release label Feb 26, 2022
@Ernst79
Copy link
Collaborator

Ernst79 commented Feb 26, 2022

Released as final in 7.9.2

@Ernst79 Ernst79 closed this as completed Feb 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants