Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with DHD 52/SX #42

Closed
radio42 opened this issue Apr 5, 2017 · 14 comments
Closed

Issue with DHD 52/SX #42

radio42 opened this issue Apr 5, 2017 · 14 comments
Assignees
Milestone

Comments

@radio42
Copy link

radio42 commented Apr 5, 2017

Hi,

I have a customer using a DHD 52/SX.
When using the old legacy Ember+ Viewer 1.6.2 from Lawo (which used https://github.com/Lawo/ember-plus), he can see all nodes and parameters just fine!

Now we are trying to connect to the DHD with THIS library, but this always generates errors. The client actually can connect fine, but the retrieval of the consumer always fails at a certain point... see below.

So it seems, that there is a fundamental different behavior between THIS lib and https://github.com/Lawo/ember-plus.
This is especially annoying, as I now have build a whole set of products around THIS lib,
DHD and the customer now insist, that all is fine on their side, as they have proven via the legacy Ember+ Viewer 1.6.2 from Lawo, that all is fine!
So where is the effective issue within THIS lib?
Why can all nodes be retrieved fine with the Ember Viewer 1.6.2, but not with this lib?

  1. We tried the standard approach to sync the database at once:
    _consumer = await Consumer<MyRoot>.CreateAsync(_client, 20000);
    This however resulted in a TimeoutException:
    "The provider failed to send the children for the element with the path /Device/Channels/Channel 58"
    I have attached a full Glow Payload Message Log, so hopefully you can spot anything:
    DHD 52SX payload.txt

  2. Then we tried the dynamic mode (using ChildrenRetrievalPolicy.DirectOnly):
    So we retrieved the nodes 1 by 1 and traversed down the tree.
    In this mode, we could get all nodes, up until the path: /Device/Channels/Channel 58
    As soon as we tried to retrieve that node, again we got a TimeoutException!
    From then on all was broken and we could not get any further nodes, e.g. /Device/Channels/Channel 59 or /Device/Channels/Channel 60 etc.

Any idea what is causing this issue?
Please note, that this issue is highly critical for me.

Thanks in advance,
Bernd

@andreashuber-lawo
Copy link
Contributor

Hi Bernd

Could you please use GlowAnalyzerProxy to produce the log and attach the file unaltered? GlowAnalyzerProxy has the advantage that it also logs decoded payload besides everything else.

@radio42
Copy link
Author

radio42 commented Apr 5, 2017

I assume you mean the provided tool which comes with this lib?!
Can you quickly explain what options to set?
I assume, I need to:

  • specify as the 'Listening Port' any free local port, e.g. 8999
  • specify as the 'Provider Host and Port' the DHD address
  • specify a log folder
  • then connect from my app to the local GlowAnalyzerProxy instead of the DHD...

@andreashuber-lawo
Copy link
Contributor

  • specify as the 'Listening Port' any free local port, e.g. 8999
  • specify as the 'Provider Host and Port' the DHD address
  • specify a log folder
  • then connect from my app to the local GlowAnalyzerProxy instead of the DHD...

Yes.

@radio42
Copy link
Author

radio42 commented Apr 5, 2017

Thanks, I have a remote session at 19:00 where we can perform further tests...using the Proxy.

@radio42
Copy link
Author

radio42 commented Apr 5, 2017

Here comes the results of the GlowAnalyzerProxy ... see attached zip.
We performed 3 tests (as such LOG1..3 are in the zip)

LOG1: Using your lib in the classic 'get all' mode, as explained above.
Same result, we got an exception:
"The provider failed to send the children for the element with the path /Device/Channels/Channel 58"

LOG2: Using your lib with the ChildrenRetrievalPolicy.DirectOnly:
Same result all Channels are visible, but starting with Channel 58, there are no Children reported anymore!
I also made a screenshot, its attached in the zip.

LOG3: Using the legacy EmberViewer 1.6.2:
All fine, everything is visible. A screenshot, is also attached in the zip.

I also inspected with the EmberViewer 1.6.2 the different nodes, to see, if there is anything different, but no all channel nodes look 100% the same, all attributes and properties are identical.
So I assume it is NOT a DHD issue.
The different behavior must be within your lib and the legacy lib - which is rather odd, as it means, so far it is not reliable and usable - at least not with DHD devices.
So help is urgently needed ;-)

Many Thanks in advance!
Bernd

LOG.zip

P.S:: I already used your latest version from this morning incl. the fix for #40.

@andreashuber-lawo
Copy link
Contributor

In the file LOG1_20170405_170024_YourLibGetAll.xml if you search for "Channel 57", you'll find that it has the path 0.3.56 and was announced in provider message number 6. If you search for elements with the same path further down, you'll see that a GetDirectory request was sent in consumer message number 4 and its children were subsequently announced in provider message number 35. So far so good.

Now, if you do the same for "Channel 58", you'll find that both the initial provider announcement and the consumer GetDirectory request were made in the same respective messages yet the provider never announces the children. In fact, all GetDirectory requests after 0.3.56 sent in consumer message number 4 remain unanswered.

There's almost no material difference between LOG1_20170405_170024_YourLibGetAll.xml and LOG2_20170405_170248_YourLibDynamic.xml. It appears you've set ChildrenRetrievalPolicy on multiple nodes without awaiting SendAsync() in between, as described in #39.

Given the above evidence, it seems that the provider runs into some kind of overflow, after which it "forgets" to answer still pending GetDirectory requests. I would further speculate that if you retrieved children as described in #39 (and implemented in Bug40Test) you'd get all of them.

BTW, doesn't your viewer also implement a mode where it only gets from the provider those nodes that the user expands?

@radio42
Copy link
Author

radio42 commented Apr 5, 2017

I am not fully sure I understand you are saying, except, that it looks like a DHD issue.
Note, that in your lib I resp. the user cannot control in what message what request is made or not made - this is under the control of your lib I am afraid.
I can only tell, that DHD claims; we are complient, as we tested all with the EmberViewer 1.6.2 and all is fine!
I now looks you are telling me, that DHD is forgetting to send outstanding replys ?!

Sounds like a typical deadlock to me. So who is resolving the deadlock?

However, here is the source code for both scenarios:

  1. Get All:
    _consumer = await Consumer<MyRoot>.CreateAsync(_client, 10000);
    Nothing more to add.

Why this is failing is still unclear to me?
Is it because ember messages are unfortunately constructed in 'unlucky' message numbers?
Or is it because DHD didn't implement all fully correctly?
Let us assume it is a DHD fault and DHD expects single messages?
What can I do?
Is there any option to deal with 'imperfect' Ember+ implementations?

Note, that I am writing a generic software which should be able to 'talk' with any Ember+ device, not only DHD! So I need a generic solution - and I had the impression, that the goal of Ember+ was to be generic?!

  1. Get Dynamically:
    Initially I call:
    _consumer = await Consumer<MyRoot>.CreateAsync(_client, 10000, ChildrenRetrievalPolicy.DirectOnly);
    and then right after:
foreach (var child in _root.Children)
{
    if (child is INode && ((INode)child).Children.Count == 0)
         ((INode)child).ChildrenRetrievalPolicy = ChildrenRetrievalPolicy.DirectOnly;
}
await _consumer.SendAsync();

Then for each retrieved child node (when it is expanded resp. displayed - the 'element' below) I do call again::

if (element is INode)
{
    foreach (var child in ((INode)element).Children)
    {
        if (child is INode && ((INode)child).Children.Count == 0)
              ((INode)child).ChildrenRetrievalPolicy = ChildrenRetrievalPolicy.DirectOnly;
    }
    await _consumer.SendAsync();
}

So the I always retrieve all childs of a nodes at once...this is needed in order to display all sub-nodes of the tree.
But as you said: also this is also not working.
Being honest, I still have no idea why this is ?!
But I believe it is again because some messages are 'unluckily' constructed or send resp. becasue DHD forgets something it should not forget.

And yes, as written I was using the very latest version incl. the #40 fix.

Error in short:
The DHD implementation is not fully compliant, as it forgets to send certain replies!
Is this is clear bug on the DHD side?

Conclusion:
This library currently cannot handle various device implementations (e.g. the ones from DHD), which are out there and deployed to customers.
In the end I don't care who's fault it is - I need a working solution.
Lawo defined Ember+. If certain vendors are not compliant, I guess you either need to talk to those vendors yourself, establish a clear certification process and make sure they are compliant.
Or you have to live with an imperfect world, but then your libs must do the same.

I would be glad, if you can offer me a clear and generic solution, which ALWAYS works with any Ember+ device, so that I can build a reliable interface to it.
As I don't want to implement a specific solution for each vendor - since then Ember+ is useless.

I hope you understand my point and frustration at this time.
But lets cross the fingers, we find a good working solution.

If you need a contact at DHD, please let me now.

Many Thanks in advance,
Bernd

@andreashuber-lawo
Copy link
Contributor

Note, that in your lib I resp. the user cannot control in what message what request is made or not made - this is under the control of your lib I am afraid.

Incorrect, if you use INode.ChildrenRetrievalPolicy in combination with Consumer.SendAync, you can control to a very fine degree what is sent in the consumer messages.

So the I always retrieve all childs of a nodes at once...

Why don't you try to call SendAsync within the loop, as described in #39? I bet that it'll work then.

Being honest, I still have no idea why this is ?!

Have you had a closer look at the log files? Did you see that the provider fails to answer many GetDirectory requests? Do you dispute that the log files are accurate?

I would be glad, if you can offer me a clear and generic solution, which ALWAYS works with any Ember+ device, so that I can build a reliable interface to it.

Have you had a look at the other Ember+ libraries? They only provide serialization and deserialization of Ember+ messages but leave networking and protocol implementation to client code. If client code fails to answer requests or goofs up otherwise, what do you propose should I do?

In this specific situation however, as I've already told you, please follow the process outlined in #39.

@radio42
Copy link
Author

radio42 commented Apr 5, 2017

I am sorry, if I sounded frustrated, but...
...it seems, that the 1st approach (get all at once) will ALWAYS fail, because of an error on the DHD side.
How should we deal with this?
I can only go back to DHD and tell them: sorry guys your Ember+ implementation is buggy, this is proven here and if you want, please start the discussion with Lawo.
In the end, they might be happy, since we discovered this bug!

And yes, this leaves me with the only option to always call for each and every single element:
await _consumer.SendAsync();
This doesn't sound right to me, as it:
a) might span really a lot of threads
b) is the absolute slowest solution
c) utilizes the network with a maximum number of messages
But is probably the only way to currently go.

So let us turn all this into something good, as we have learned a lot:

  • Lawo has implemented a great Ember+ standard with a lot of options
  • unfortunately many vendors have not implemented this correctly or only partially
  • this now leads to issues in the field, if someone expects a standard, which was never one
  • the legacy Ember+ Viewer 1.6.2 cannot be used to validate full Ember+ compliance
  • there is no real quality assurance or Ember+ certification process in place for vendors

In essence there is great room for improvements and I suggest, that Lawo is getting in contact with DHD to overcome the current shortcomings in their implementation.

I can even believe in a simple test tool which we can provide to make a fully automated test, if a device is really 100% Ember+ compliant or a greater certification process.
See, if vendors are allowed to always only implement a bit of the standard, you will end up in a huge mess of vendors all implementing pieces of Ember+, but in the end not being able to 'talk' to each other. And if this happens, this is against all your marketing effort.

Many Greets,
Bernd

@andreashuber-lawo
Copy link
Contributor

And yes, this leaves me with the only option to always call for each and every single element:
await _consumer.SendAsync();

No, that would only be a first step. If that makes the DHD provider work correctly, then you could return to requesting children of multiple nodes, e.g. 20 at a time and see whether that still works. E.g. in your viewer you could make this a setting.

a) might span really a lot of threads

The Ember+ Sharp Library doesn't create any threads whatsoever. Everything is done on the thread that calls Consumer.CreateAsync.

@radio42
Copy link
Author

radio42 commented Apr 6, 2017

Thanks!
This is exactly what I already did: I changed the 'Dynamic Load' parameter to an integer, which now specifies how many elements might be combined (0=no dynamic load, get all at once; 1=get each single element separately; 2=combine up to 2 elements; 3=combine up to 3 elements; etc.)
A new version is available here:
www.proppfrexx.radio42.com/download.html#EmberViewer

I also informed DHD about their bug and pointed them to this thread.

@andreashuber-lawo
Copy link
Contributor

Thanks!

@radio42
Copy link
Author

radio42 commented Apr 6, 2017

I am afraid, we must re-open this issue. After I had the chance to test it again against the DHD...

I changed the code to retrieve all nodes with the following code:

_consumer = await Consumer<MyRoot>.CreateAsync(_client, 10000, ChildrenRetrievalPolicy.DirectOnly);
await RetrieveAllChildrenAsync(_consumer);
...
public static async Task RetrieveAllChildrenAsync(Consumer<MyRoot> consumer)
{
    INode root = consumer.Root;
    List<INode> nodes = new List<INode> { root };

    while (nodes.Count != 0)
    {
        INode node = nodes[0];
        nodes.RemoveAt(0);

        node.ChildrenRetrievalPolicy = ChildrenRetrievalPolicy.DirectOnly;
        await consumer.SendAsync();

        foreach (IElement child in node.Children)
        {
            if (child is INode)
            {
                nodes.Add((INode)child);
            }
        }
    }
}

Please find attached the new GlowAnalyzerProxy log files:
NewLOG.zip

This time it looks, like the 'Channel' nodes children are never received.
Meaning all channel nodes cannot be expanded. From the logs I cannot even see, that those are requested from the Provider - I have no idea why?!

Testing the same code against the 'TinyEmberPlus.exe', e.g. using the Saphire.EmBER; or the 'TinyEmberPlusRouter.exe' works just fine!

So could it somehow be, that once 'await consumer.SendAsync();' returns, the 'node.Children' are NOT populated?
If yes, why?
Else I cannot explain and would have no clue how to get the entire DHD ember tree with all nodes and all children.
Any idea?

Many Thanks in advance,
Bernd

@DHDaudio-support
Copy link

Hi,

DHD was able to reproduce the reported Ember+ protocol issue with the ProppFrexx Ember+Viewer.
The issue only occurs when a (multi-packet) message with a size near to the allowed maximum (1024 byte Ember+ payload) is recieved.
The last GetDirectory requests within such packets got lost due to a wrong RxBufferSize and remained unanswered.

Thanks for the provided GlowProxy logs that helped to reproduce und find the issue here.
Next firmware release will contain a fix for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants