LIFX : fix #2191, #2193 and other improvements #2201
Conversation
@kgoderis thanks for the quick response! I'll put the swarm to work with your updated code and report back on how well it all works with it. With using 4 retries I already know that only very occasionally a bulb goes offline. #2191 does does not occur frequently. I'll try to reproduce it by adding/removing the bulbs a lot of times. I think when your code works with my setup it should work with everyone's. I'm still running everything on my fast development PC instead of a less capable Raspberry Pi or so. So this setup is a concurrency nightmare for your code. ;-) |
@kgoderis I still ran into deadlock, I'll add some review comments on where I see room for improvement. This is the thread dump I now got of the deadlock
|
@@ -25,8 +27,8 @@ | |||
|
|||
public final static long PACKET_INTERVAL = 50; | |||
|
|||
private static ConcurrentHashMap<String, ReentrantLock> locks = new ConcurrentHashMap<String, ReentrantLock>(); | |||
private static ConcurrentHashMap<String, Long> timestamps = new ConcurrentHashMap<String, Long>(); | |||
private static Map<String, ReentrantLock> locks = Collections.synchronizedMap(new HashMap<String, ReentrantLock>()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This data structure still does not solve the issue that when as the Map
grows it starts returning the locks in a different order when values()
is called. All threads should really lock in the same order to get rid of the deadlock.
You can clearly see it happen when you make a simple test printing String
s as map values. It kicks even in earlier than I thought. Especially when there are a lot of bulbs, the order starts changing a lot each time a new lock is added:
Map<String, String> keyValueMap = Collections.synchronizedMap(new HashMap<String, String>());
@Test
public void valuesTest()
{
String values = "";
for (int i = 0; i < 40; i++)
{
keyValueMap.put("key" + i, "value" + i);
values = StringUtils.join(keyValueMap.values(), ",");
System.out.println(values);
}
}
The output of system is then:
value0
value1,value0
value1,value2,value0
value1,value2,value0,value3
value1,value2,value0,value3,value4
value1,value2,value0,value5,value3,value4
value1,value2,value0,value5,value6,value3,value4
value1,value2,value0,value5,value6,value3,value4,value7
value1,value2,value0,value5,value6,value3,value4,value7,value8
value1,value2,value0,value5,value6,value3,value4,value9,value7,value8
value1,value2,value0,value5,value6,value3,value4,value9,value7,value8,value10
value1,value2,value0,value5,value6,value3,value4,value9,value7,value8,value11,value10
value1,value2,value0,value5,value6,value3,value4,value9,value7,value8,value11,value10,value12
value1,value2,value0,value5,value6,value3,value4,value9,value7,value8,value11,value10,value13,value12
value1,value2,value0,value5,value6,value3,value4,value9,value7,value8,value11,value10,value14,value13,value12
value1,value2,value0,value5,value6,value3,value4,value9,value7,value8,value11,value10,value15,value14,value13,value12
value1,value2,value0,value5,value6,value3,value4,value9,value7,value8,value11,value10,value15,value14,value13,value12,value16
value1,value2,value0,value5,value6,value3,value4,value9,value7,value8,value11,value10,value15,value14,value13,value12,value17,value16
value1,value2,value0,value5,value6,value3,value4,value9,value7,value8,value11,value10,value15,value14,value13,value12,value18,value17,value16
value1,value2,value0,value5,value6,value3,value4,value9,value7,value8,value11,value10,value15,value14,value13,value12,value19,value18,value17,value16
value1,value2,value0,value5,value6,value3,value4,value9,value7,value8,value11,value10,value20,value15,value14,value13,value12,value19,value18,value17,value16
value1,value2,value0,value5,value6,value3,value4,value9,value7,value8,value11,value10,value21,value20,value15,value14,value13,value12,value19,value18,value17,value16
value1,value2,value0,value5,value6,value3,value4,value9,value7,value8,value11,value10,value22,value21,value20,value15,value14,value13,value12,value19,value18,value17,value16
value1,value2,value0,value5,value6,value3,value4,value9,value7,value8,value11,value10,value22,value21,value20,value15,value14,value13,value12,value19,value18,value17,value16,value23
value1,value2,value0,value5,value6,value3,value4,value9,value7,value8,value11,value10,value22,value21,value20,value15,value14,value13,value12,value19,value18,value17,value16,value24,value23
value1,value2,value0,value5,value6,value3,value4,value9,value7,value8,value11,value10,value22,value21,value20,value15,value14,value13,value12,value19,value18,value17,value16,value25,value24,value23
value1,value2,value0,value5,value6,value3,value4,value9,value7,value8,value11,value10,value22,value21,value20,value15,value14,value13,value12,value19,value18,value17,value16,value26,value25,value24,value23
value1,value2,value0,value5,value6,value3,value4,value9,value7,value8,value11,value10,value22,value21,value20,value15,value14,value13,value12,value19,value18,value17,value16,value26,value25,value24,value23,value27
value1,value2,value0,value5,value6,value3,value4,value9,value7,value8,value11,value10,value22,value21,value20,value15,value14,value13,value12,value19,value18,value17,value16,value26,value25,value24,value23,value28,value27
value1,value2,value0,value5,value6,value3,value4,value9,value7,value8,value11,value10,value22,value21,value20,value15,value14,value13,value12,value19,value18,value17,value16,value26,value25,value24,value23,value29,value28,value27
value1,value2,value0,value5,value6,value3,value4,value9,value7,value8,value11,value10,value22,value21,value20,value15,value14,value13,value12,value19,value18,value17,value16,value30,value26,value25,value24,value23,value29,value28,value27
value1,value2,value0,value5,value6,value3,value4,value9,value7,value8,value11,value10,value22,value21,value20,value15,value14,value13,value12,value19,value18,value17,value16,value31,value30,value26,value25,value24,value23,value29,value28,value27
value1,value2,value0,value5,value6,value3,value4,value9,value7,value8,value11,value10,value22,value21,value20,value15,value14,value13,value12,value19,value18,value17,value16,value32,value31,value30,value26,value25,value24,value23,value29,value28,value27
value1,value2,value0,value5,value6,value3,value4,value9,value7,value8,value11,value10,value22,value21,value20,value15,value14,value13,value12,value19,value18,value17,value16,value33,value32,value31,value30,value26,value25,value24,value23,value29,value28,value27
value1,value2,value0,value5,value6,value3,value4,value9,value7,value8,value34,value11,value10,value22,value21,value20,value15,value14,value13,value12,value19,value18,value17,value16,value33,value32,value31,value30,value26,value25,value24,value23,value29,value28,value27
value1,value2,value0,value5,value6,value3,value4,value9,value7,value8,value35,value34,value11,value10,value22,value21,value20,value15,value14,value13,value12,value19,value18,value17,value16,value33,value32,value31,value30,value26,value25,value24,value23,value29,value28,value27
value1,value2,value0,value5,value6,value3,value4,value9,value7,value8,value36,value35,value34,value11,value10,value22,value21,value20,value15,value14,value13,value12,value19,value18,value17,value16,value33,value32,value31,value30,value26,value25,value24,value23,value29,value28,value27
value1,value2,value0,value5,value6,value3,value4,value9,value7,value8,value37,value36,value35,value34,value11,value10,value22,value21,value20,value15,value14,value13,value12,value19,value18,value17,value16,value33,value32,value31,value30,value26,value25,value24,value23,value29,value28,value27
value1,value2,value0,value5,value6,value3,value4,value9,value7,value8,value37,value36,value35,value34,value38,value11,value10,value22,value21,value20,value15,value14,value13,value12,value19,value18,value17,value16,value33,value32,value31,value30,value26,value25,value24,value23,value29,value28,value27
value1,value2,value0,value5,value6,value3,value4,value9,value7,value8,value37,value36,value35,value34,value39,value38,value11,value10,value22,value21,value20,value15,value14,value13,value12,value19,value18,value17,value16,value33,value32,value31,value30,value26,value25,value24,value23,value29,value28,value27
The Javadoc of synchronizedMap(..)
suggest it might be a good idea to synchronize on the Map
itself when iterating over it. I might give that a try tomorrow and see what happens.
* It is imperative that the user manually synchronize on the returned
* map when iterating over any of its collection views:
* <pre>
* Map m = Collections.synchronizedMap(new HashMap());
* ...
* Set s = m.keySet(); // Needn't be in synchronized block
* ...
* synchronized (m) { // Synchronizing on m, not s!
* Iterator i = s.iterator(); // Must be in synchronized block
* while (i.hasNext())
* foo(i.next());
* }
* </pre>
* Failure to follow this advice may result in non-deterministic behavior.
private static ConcurrentHashMap<String, ReentrantLock> locks = new ConcurrentHashMap<String, ReentrantLock>(); | ||
private static ConcurrentHashMap<String, Long> timestamps = new ConcurrentHashMap<String, Long>(); | ||
private static Map<String, ReentrantLock> locks = Collections.synchronizedMap(new HashMap<String, ReentrantLock>()); | ||
private static Map<String, Long> timestamps = Collections.synchronizedMap(new HashMap<String, Long>()); | ||
|
||
public static void lock(String key) { | ||
if (!locks.containsKey(key)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it is thread safe to first inspect the map and then add the key in the next statement. Most of the time it will work though. ;-) But in theory two threads may inspect the map. Conclude that the key is not there, and then one thread would overwrite the others value.
OK, but I now see it also gets the key again from the map in the next statement so it should be less of an issue.
mmh... my understanding was that synchronizedMap() does lock on both read and write operations, in contrast to the ConcurrentHashMap. I fail to see as to why there is a deadlock in the first place. Does the order of the .values() really matter? So, the deadlock occurs when a tread calls .lock(somestring), and then another thread does a .lock() on all locks (e.g. broadcast). Imagine that order:
what are we missing here? putting a synchronised() the map itself is a solution, but I am wondering what performance impact that could have on a large number of bulbs |
@kgoderis I refactored the Also I've introduced an additional The bulbs do stay online much better again with the 4 retries and there are only occasionally messages about bulbs that are missing in action. 👍 |
I too found some documentation on ConcurrentHashMap vs syncrhonizedMap performance, it should a factor 4 difference. Ok - just do a PR on my repo so that I can integrate it all in a clean PR on the main repo |
I think the root cause of the deadlock is that the
|
I also think your |
Yeah, I know, but that is on a single serial port, and normally there is no more than 1 thread accessing the throttler. |
@wborn Any further observations ? |
The deadlock no longer occurred after restarting OH dozens of times with with my refactored version of the There is still a
It is easy to reproduce. Just disconnect power from a bulb. Wait for it to go offline. Then restore power to the bulb. When it goes online the exception occurs. Also I wonder why in I still see some |
No particular reason but you are right ;-) |
Wouter, I was unable to reproduce this. Apart from that, given that a lock is acquired in the run(), I wondering how that CME is possible in the first place. Did you investigate it further? |
I have less of these since we fixed the error you found. Now, the sequence number is between 1 and 255, and when a lot of messages are sent to a bulb, the sequence number runs over 255 and starts again at 1. If there are previously sent messages still stuck in the sent queue (because there was no answer to it due to whatever problem), then the warning is produced. To overcome this annoying message/situation, we could opt for a mechanism whereby we use the timestamp (a stamp is created when the class instance is instantiated) of the sent packets in the sent queue, and retire these packets after a given amount of time so that the queue is cleared over time. For example, do not keep sent packets older than 60 seconds? |
I have not yet investigated this issue but I can give it a try. I did have a look at:
After tweaking the logging and filtering requests/responses per bulb, my conclusion is that these are not false positives but indeed packets that never got a response.
I think we should address the issue of packets that get lost in another (new) issue. For developers working on the binding it is useful information to know that there were indeed packets that got lost. If we were to do something about it in this issue, I think changing the log level to debug would be the most simple short-term solution for now. End users will most of the time just be nagged with the warning. Also it really depends how bad it really is that a packet got lost. Most of the packets that are sent to bulbs are just continuously querying its current state (GetRequest/GetServiceRequest/GetLightPowerRequest). When there is no response there will be another request, several seconds later, that will most likely get a response. If not, then the bulb will most likely be offline and the end user will see the bulb go from When a packet gets lost that is send via Even with So I think we are done with issues #2191 and #2193 when the |
OK, I also had again some troubles to always reproduce it. So I reprogrammed the binding to automatically throw a bulb each time offline. This increased the chance the
This is actually a method called from the iteration over the Now I just need to come up with a solution. |
Karel I put in a new PR with your repo that should fix the CME. Also I made some minor logging improvement and used the If you want, you can do something about the nagging warnings in case of lost packets. I'm OK with putting it on debug or not doing anything about it now. I would rather see effort go into making a structural solution. Something that could look like what I outlined in one of my comments above. |
… cheaper getHex instead of more expensive getAsLabel when sending packets LIFX : Fix eclipse-archived#2191 LIFX : Fix eclipse-archived#2193 Also-by: Wouter Born <eclipse@maindrain.net> Signed-off-by: Karel Goderis <karel.goderis@me.com>
@kaikreuzer This is LGTM and contains necessary improvements for the LIFX binding. Thanks to @wborn for providing insights and fixes. |
Thank you both! |
This should fix some concurrency issues.
@wborn If would be helpful if you could test this in your LIFX "swarm". I have put the number of retries for polling to 4. I prefer not to put that as a configurable parameter as it would clutter the simple and clean configuration interface of the bulbs, but instead I would propose to put it to the value that works for you, since you have the biggest collection of LIFX bulbs I know of ;-)
Signed-off-by: Karel Goderis karel.goderis@me.com