Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Qubino ZMNHYD1 .. and also others) crash on first time SetValue(bool) after include #1981

Closed
gizmocuz opened this issue Oct 18, 2019 · 32 comments

Comments

@gizmocuz
Copy link
Contributor

I have received some reports about OZW crashing the first time when a (boolean) value is set after a node got included.
This time i can replicate this with a Qubino ZMNHYD1)

It does not matter if it is included secure or not.

I also found where it goes wrong:

bool SwitchBinary::SetState(uint8 const _instance, bool const _state)

And it goes wrong here:

				if (GetVersion() >= 2)
				{
					Internal::VC::ValueByte* durationValue = static_cast<Internal::VC::ValueByte*>(GetValue(_instance, ValueID_Index_SwitchBinary::Duration));

At this point durationValue = NULL, and is used right below:

uint8 duration = durationValue->GetValue();

I have no idea why this happens, but I would propose a fix like this:

					uint8 duration = 0xff;

					Internal::VC::ValueByte* durationValue = static_cast<Internal::VC::ValueByte*>(GetValue(_instance, ValueID_Index_SwitchBinary::Duration));
					if (durationValue != nullptr)
					{
						uint8 duration = durationValue->GetValue();
						durationValue->Release();
						if (duration == 0xff)
						{
							Log::Write(LogLevel_Info, GetNodeId(), "  Duration: Default");
						}
						else if (duration >= 0x80)
						{
							Log::Write(LogLevel_Info, GetNodeId(), "  Duration: %d minutes", duration - 0x7f);
						}
						else
						{
							Log::Write(LogLevel_Info, GetNodeId(), "  Duration: %d seconds", duration);
						}
					}

This way we always use the default duration (0xff) when we can not retrieve the (durationValue)

Hope that's OK, i will create a PR for this. (It is also good to check against NULL pointer results)

@Fishwaldo
Copy link
Member

Known issue. This is caused by the interview order that I’ve started working on in Dev branch. No need for a PR.

@gizmocuz
Copy link
Contributor Author

@Fishwaldo , sorry, I just noticed your response.
Please feel free to delete/close/merge, but checking against a nullptr is always good right ?

@petergebruers
Copy link
Collaborator

@gizmocuz it affects more devices and CCs and the problem is more fundamental, that is why your patch is good but I am not inclined to merge it because it works around the problem, it does not fundamentally fix the cause.

The problem only happens right after interview (inclusion, refresh node, delete cache). As a workaround, you should be able to run an unmodified version of Domoticz/OZW, but do not use the device right after inclusion, refresh node or delete cache, instead reboot. BTW it is related to the version of the command class (and interview order).

@gizmocuz
Copy link
Contributor Author

Maybe I could make a PR to check all return pointers against nullptr ? (And of course make a Log mesage if this happens)

@petergebruers
Copy link
Collaborator

Granted, this bug is very annoying when you try to debug another issue because we"ll ask "delete your cache or refresh node info" then the user will report "oh no, it segv's" but for another reason...

@petergebruers
Copy link
Collaborator

BTW the issue isn't new indeed and we did a previous attempt at fixing it, but due to a failed merge that introduced a bug. Then the merge was reverted!

I tried to warn people that their "ozwcache" would get deleted, which is still true if they come from an "old" domoticz version.

That warning on your side is this issue:

"Warning! Domoticz 4.10988 and higher incorporate an important OpenZWave update and re-interviews devices"

domoticz/domoticz#3530

@gizmocuz
Copy link
Contributor Author

Do you want me to make a general PR that checks against all return pointer and add a Log message ?

@Fishwaldo
Copy link
Member

image

Fishwaldo added a commit that referenced this issue Feb 1, 2020
@Fishwaldo
Copy link
Member

Fishwaldo commented Feb 1, 2020

  • Version CommandClass should only be on Root Endpoint
  • Basic CC is creating ValueID's too Early.
  • QueryStage_ManufacturerSpecific2 can be dropped
  • Audit Wakeup CC to check if we are storing values collected during init
  • A bunch of CC's are calling CreateVars internally.
  • Audit StaticRequest etc calls
  • Type_EssentialNodeQueriesComplete - Check its being called correctly and always :)

@Fishwaldo
Copy link
Member

Just pending some proper testing with Secure Devices...

Fishwaldo added a commit that referenced this issue Feb 7, 2020
Fishwaldo added a commit that referenced this issue Feb 7, 2020
* Start work on Redoing Interview Order - Issue #1981 - Not Testing with Security or Sleeping Devices

* Flag Some CC's that don't support MultiChannel

* Fix Loading From Cache. We need NoOp and ManufacturerSpecific CC's loaded when Node is Created for ReadXML Calls from Cache

* Fix up CreateVars on Basic CC (Issue #1981)

* Fix up Wakeup CC so it stores the Interval retrieved during initialization (Issue #1981)

* Tidy up Wakeup CC a bit

* Update Logging in some CC's to ensure we write something when processing a message from a device

* Fix up the CreateVars call in ThermostateMode CC

* Some Tidy Ups and Fixes for #2085

* Testing Secured Devices... #1981
@Fishwaldo
Copy link
Member

Fishwaldo commented Feb 7, 2020

I've just pushed the fix to the master branch. I did as much testing as possible, but despite this, as its such a large and involved change, I expect roadbumps. Please feedback on this Issue if you encounter anything.
@robertsLando @gizmocuz @petergebruers @marcelveldt @cgarwood @balloob

@robertsLando
Copy link
Member

I could release a new version of z2m to docker using this? In any case if something isn't working I can revert to previous zwave version

@Fishwaldo
Copy link
Member

@robertsLando Fingers crossed its ok. Its fixing a crash, but it had to rework the interview process for devices, so adding new devices etc might be a bit bumpy if there are bugs :)

@robertsLando
Copy link
Member

mmmmm, will ask users to test it for a week so and then decide :)

@robertsLando
Copy link
Member

@Fishwaldo I have about 4 users that are testing. Will keep you updated

@petergebruers
Copy link
Collaborator

HASS with native integration, upgraded to 1.6-1028-g5904a725 (aka the interview order fix)... Note that this is an unsupported setup but it used to work for me.

Production system with 13 devices, mostly old stuff, quick test seems OK except thermostat. Clean start without ozwcache*

No "CC V2" Dimmer on that network, so not really testing the purpose of this fix.

"collateral damage" to Old qubino thermostat, OZW accepts setpoint, but has issues with mode setting.

2020-02-07 16:30:42.322 Warning, Invalid Index Set on ValueList: vector::_M_range_check: __n (which is 0) >= this->size() (which is 0)
2020-02-07 16:30:42.322 Warning, ValueList returned a NULL value for GetValueListSelection: Mode

I might investigate this tomorrow.

@petergebruers
Copy link
Collaborator

I cannot work on this now but I was able to do a quick test with ozwcp and the "Mode" pull down menu is empty (explains the 0 in the logs...)

<CommandClass id="64" name="COMMAND_CLASS_THERMOSTAT_MODE">
	<Compatibility />
	<State>
		<CCVersion>2</CCVersion>
		<InNif>true</InNif>
		<StaticRequests>3</StaticRequests>
	</State>
	<Instance index="1" />
	<Value type="list" genre="user" instance="1" index="0" label="Mode" units="" read_only="false" write_only="false" verify_changes="false" poll_intensity="0" min="0" max="0" vindex="0" size="1">
		<Help>Set the Thermostat Mode</Help>
	</Value>
	<SupportedModes>
		<Mode index="0" label="Off" />
		<Mode index="3" label="Auto" />
	</SupportedModes>
</CommandClass>

Mmm... @Fishwaldo min="0" max="0" that is a bit suspicious?
The Off/Auto is probably correct, this is a ZMNHIA2 Flush on/off thermostat

Sorry, have to go now...

@Fishwaldo
Copy link
Member

ThermostatFanMode was one CC that had some ugly hacks. I had to change it but can’t test as I don’t own a thermostat.

Looks like we get the nodes just we don’t assign them correctly (or in time?) to the Value.

@Fishwaldo
Copy link
Member

Just pushed a small change to master to increase logging around Secure Inclusion. @robertsLando if OpenZWave/Zwave2Mqtt#245 turns out to be on the very latest version, can you redo your image with the latest so I can get some additional debugging?

@petergebruers
Copy link
Collaborator

Just pushed a small change to master to increase logging around Secure Inclusion.

As I've noted in OpenZWave/Zwave2Mqtt#245 the oldest model of the Fibaro Wall Plug very likely does not support security.

@petergebruers
Copy link
Collaborator

According to an old Zniffer log, the first generation FGWP does not list CC security... So we need to know the exact model...

@Fishwaldo
Copy link
Member

@petergebruers glad your the Fibaro expert. I was almost going to stay up and debug this. :) haha

@petergebruers
Copy link
Collaborator

@Fishwaldo

ThermostatFanMode was one CC that had some ugly hacks. I had to change it but can’t test as I don’t own a thermostat.

Looks like we get the nodes just we don’t assign them correctly (or in time?) to the Value.

It is COMMAND_CLASS_THERMOSTAT_MODE.

I'll try to get some more information tomorrow.

It is my only Z-Wave thermostat and if it does not work then it will get replaced by something completely different. But right now I seem to be able to set the temperature, that is good enough.

@Fishwaldo
Copy link
Member

It is COMMAND_CLASS_THERMOSTAT_MODE.

Thats what I meant.... sorry, it was like 3am when I wrote that :)

I think I see whats happening... CreateVars is being called at the same time as QueryStage_Static is started. We don't get the response to ThermostatModeCmd_SupportedReport before the Var is created... Could affect other CC's as well.

Can you try this patch for me?
CreateVars.txt

@Fishwaldo
Copy link
Member

Actually - Forget that patch. It will break other things...

thinking...

@Fishwaldo
Copy link
Member

ThermostatMode.txt

This - A brief audit of other CC, this should be the right way.

@petergebruers
Copy link
Collaborator

This - A brief audit of other CC, this should be the right way.

Thanks. Yes, that patch restores "mode" selection of my thermostat in ozwcp. HASS looks OK too. (After a refresh node of course).

<CommandClass id="64" name="COMMAND_CLASS_THERMOSTAT_MODE">
	<Compatibility />
	<State>
		<CCVersion>2</CCVersion>
		<InNif>true</InNif>
		<StaticRequests>3</StaticRequests>
	</State>
	<Instance index="1" />
	<Value type="list" genre="user" instance="1" index="0" label="Mode" units="" read_only="false" write_only="false" verify_changes="false" poll_intensity="0" min="0" max="0" vindex="1" size="1">
		<Help>Set the Thermostat Mode</Help>
		<Item label="Off" value="0" />
		<Item label="Auto" value="3" />
	</Value>
	<SupportedModes>
		<Mode index="0" label="Off" />
		<Mode index="3" label="Auto" />
	</SupportedModes>
</CommandClass>

That's better. That max="0" is ok? Never looked into that...

@GravityRZ
Copy link

could it be that the change in interview changed things regarding the fgwpf device

see this issue
#2152

i think it is related since the fgwpf-102 ZW5 device is not rebuild as it should in the cache thus causing problems with groups

@Fishwaldo
Copy link
Member

@GravityRZ not related.

@gameofyou
Copy link

gameofyou commented Mar 23, 2020

I have had this problem on of my Fibaro Zwave switches as well, and came up with a similar fix to above.

 diff --git a/cpp/src/command_classes/SwitchMultilevel.cpp b/cpp/src/command_classes/SwitchMultilevel.cpp
 index b02ef5c7..04a8926e 100644
 --- a/cpp/src/command_classes/SwitchMultilevel.cpp
 +++ b/cpp/src/command_classes/SwitchMultilevel.cpp
 @@ -398,8 +398,12 @@ namespace OpenZWave
                                 if (GetVersion() >= 2)
                                 {
                                         Internal::VC::ValueByte* durationValue = static_cast<Internal::VC::ValueByte*>(GetValue(_instance, ValueID_Index_SwitchMultiLevel::Duration));
 -                                       uint8 duration = durationValue->GetValue();
 -                                       durationValue->Release();
 +                                       uint8 duration = 0xff;
 +                                       if (durationValue)
 +                                       {
 +                                               duration = durationValue->GetValue();
 +                                               durationValue->Release();
 +                                       }
                                         if (duration == 0xff)
                                         {
                                                 Log::Write(LogLevel_Info, GetNodeId(), "  Duration: Default");

Sadly gizmocuz's code contains a bug, he redefines the variable inside the braces, so a value other than default is never used. I know this problem is due to the nodes not being fully initialized however this bug has been around for months and has been breaking my domoticz for some time now. Could a temp fix be push to git?

@Fishwaldo
Copy link
Member

This was fixed by the Interview Reordering we implemented already.

domoticz should pull the latest version of OZW

@gizmocuz
Copy link
Contributor Author

@gameofyou , which was already implemented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants