-
Notifications
You must be signed in to change notification settings - Fork 204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RDM full discovery sometimes misses connected fixtures or will sometimes hang. #1396
Comments
Is that the same four fixtures in both runs? I'm guessing so based on And In which case, the question is why this happened:
Which means that either there's something up with one or more of your fixtures, or they're somehow generating a collision which appears as a complete and valid response to the Enttec. To progress this, it's probably a case of adding some more debugging to BranchComplete and/or capturing the raw RDM data on the line using an analyser/logic sniffer etc, e.g. see https://www.openlighting.org/rdm-tools/rdm-analyzers/ . |
So in data terms:
Which I guess isn't a huge stretch looking at the data. But with checksums as follows:
A clean collision for them seems less likely to me. |
Correct those are the same four fixtures in both runs. I can try and get a bigger log with more fixtures before the end of the week if need be. |
There are two questions here really:
Although fundamentally, the way the standard is designed, 1 shouldn't be possible, otherwise all bets are off in terms of discovery. The full discovery run log for success and failure would help, both should start with something like "DUB 0000:00000000 - ffff:ffffffff" or a line or two above that. |
It's very weird. Say we have 11 connected fixtures. I'll have to run the discovery multiple times before all 11 fixtures are discovered. It usually goes like this... Discovery 1: 4 Fixtures |
Does running a full, then incrementals improve that? When it fails to find everything, does it log "didn't respond to MUTE, marking as bad" against some non-existent UIDs? |
I haven't tried running a full then incremental. Hopefully I can give it a shot tomorrow. Nope I've never seen that message. It just outputs a list of the found uids. |
Prior to that, in the olad -l 4 logging, it should have that message (not in ola_rdm_discover sorry), e.g.: |
Ahh gotcha. I won't have access to the manufacturing plant until Monday so I'm tied until next week. |
Was able to go into the plant yesterday. Ran incremental discoveries and didn't run into any issues. However, this was only with 4 fixtures. Once we get more fixtures I'll really try and recreate it. |
Hi @ltd9938 did you make any progress with this? |
Hi peter, My new born responders (LED fixtures), all passed the OLA Responder tests sucessfully (Passed 367, Not Run 59, Total 426) I tried 12 of LED fixtures, (same model ID, 12 sequential DID from 1 to 12) to patch using OLA admin. I have to press" run full discovery" button several times to succeed discovering all devices. Is there a an upper limit discovering for OLA ? |
Hi @majcit , There shouldn't be a limit and I've successfully discovered large numbers of responders with it such as the LabPack (lots of responders in a box https://www.goddarddesign.com/rdm-lab-pack.html and there's a bigger version too). Firstly I assume this is repeatable, i.e. it happens each time you do it? I'd suggest using ola_rdm_discover as it's probaly more repeatable, and you can explicitly try full and incremental discoveries: To have any chance of finding the source of this bug and potentially fixing or working around it, we'll need a lot more information please, including olad -l 4 logs of any test runs. If you can capturing the raw RDM data on the line using an analyser/logic sniffer etc would be amazing, e.g. see https://www.openlighting.org/rdm-tools/rdm-analyzers/ . I'm going to throw out a load of things to consider and hopefully you can do some testing your end to come up with the minimum needed to reproduce the fault, which might give me a chance of seeing it too and hence greatly increase the chance of fixing it. For starters, what controller/interface are you using? Do you have access to another of the same or ideally a different type? What about an RDM compatible lighting desk? @ltd9938 if you can answer any of the above for your issue too would also help. |
Hi @peternewman, Sorry for not updating my situation sooner. My problem has been solved. My software was triggering multiple discoveries at a time which was causing the complications. All has been fixed. I'm going to close this issue, please reopen if you see the need to. |
Hmm, thanks for coming back @ltd9938 , that sounds like a potential bug in OLA still, as I'm not sure we should allow the second discovery until the first has completed, given it would cause issues like you've seen. Do you have a basic sample of your code you could share with the bug so I can try and reproduce and fix it? Which API were you using C++, Python, JSON/HTTP? @majcit are you sure you aren't having the same issue? Can you try with the CLI client to make sure it's only being run once at a time, ideally after waiting some time for the initial discovery that's run on startup to complete. |
Ok Peter By both ENTTEC hardwares + OLA 10.3 (on Raspbian) I can execute Enttec sniffer and report sniffer messages soon, |
Thanks @majcit . But if you don't press any buttons and wait for say 2-3 mins, then hit full discovery once, it still generally fails to find all 12? The Enttec software works very differently to us, as there is some oddity when discovering https://www.enttec.com/products/controls/led/din-led4px/ with OLA that works with their system which I haven't had a chance to get to the bottom of yet and find out where the issue lies. Enttec sniffer logs would be excellent, either from their software or using ours: Likewise as mentioned, a few tests to see if the number of devices is special, or if it's intermittent with just one or two devices. In terms of gathering olad debug logs, see here: |
I did new trials by adding fixtures one by one and observed new facts as below : when N>=4 and DIDs are not sequential, OLA always discovers successfully, even if pressed immediately
I tried several times, there is no difference if wait 1s or 1min or 5min. |
Thanks @majcit , I'll reopen this, as that certainly sounds like a bug. @ltd9938 are you sure you also aren't seeing the same issue, it sounds VERY similar! @majcit I assume this fails with both the Enttec Pro and the Pro Mk 2? I think we can now just concentrate on four sequential fixtures for the moment, some RDM sniffer and/or olad -l 4 logs will be the next step now I think. |
yes It occurs both for PRO and PRO Mk2
I am your man! attachments are sniffer log for 4 fixtures with DID=1,2,3 and 4 3 fixtures, discovered all 3 successfully : 4 fixtures, discovered only 1 : 4 fixtures, discovered only 2: 4 fixtures, discovered all 4 successfully : are .txt files ok? or do you need .bin file? |
I wrote a quality assurance station for our manufacturing team using Flask. I had a "Refresh Fixtures" button on the homepage that when clicked would trigger After going through my code I realized I stupidly had the discovery start twice. After removing the second discovery initiation we haven't had an issue since (except when our fixtures aren't daisy chained correctly, which may have also played a part months ago) |
Strange, that log only shows two Good Checksum lines! For 02ac:00000002 and 02ac:00000001. Are they actually your UIDs, they don't match what I assume was supposed to be the device part of the IDs listed earlier. However it does match the DEVICE_LABEL get response, and indeed your comment just above, so I assume that's correct. Actually looking at the log it's already muted 02ac:00000003, so I assume it found that outside of the text log you sent. Other strange things here:
02ac:00000003
So I think this is the worst case, only finding half: The 7's are all when DUBing 000000000000-7FFFFFFFFFFF
The sniffing can't be great, as it also shows this line:
Text is fine, and indeed easier! We've got an EUID to UID converter here: I'd again suggest OLA's RDM sniffer code (which should work with your Enttec), and may produce nicer output: Other than that, it's probably a case of using the Enttec software or your XMT-350 to do a successful discovery and sniffing that to see if it also has the valid discovery of non-existent UIDs. |
Okay thanks for confirming @ltd9938 , although it's odd how much it mirrors your issue, I was trying to find where I made this comment, then realised it was regarding you: |
4 fixtures full UIDs are : 02ac:00000001, 02ac:00000002, 02ac:00000003 and 02ac:00000004 strange UIDs 6 and 7, there is no such UIDs , controller DISC_MUTE them, but they don't appear at final list, maybe they are consequence of collisions I am not sure if it is normal or not but It also happens with XMT-350, here is the sniffer log for 4 fixtures UIDs 1, 2, 3 and 4 :
I tried to execute
|
So the XMT-350 log also finds 6, mute's it once and then continues dubbing. So I think this is down to how different things respond to a collision that generates a good checksum. I suspect OLA is being a bit too defensive and assuming the device just doesn't respond to mute properly, whereas it seems we should branch/DUB a bit more first before writing it off as a bad device. I think some olad -l 4 logs are the next step just to confirm my guesswork from the RDM captures.
I think you want /dev/ttyUSB0, I believe the other format is Mac style. |
I succeeded capturing 4 fixtures, discovered 2 of them: 1,2 4 fixtures, discovered 2 of them: 1, 3 4 fixtures, discovered 1 of them: 3 4 fixtures, discovered all of them: 1,2,3,4 |
Hi @peternewman , I just observed new fact, The issue does not happen, for other new sequential UIDs, I executed OLA discovery several times , for 4 devices , never skipped any device and found all 4 devices successfully, then I repeated for 12 devices , again never skipped any device and found all 12 devices successfully, here is the sniffed .txt for new UIDs 12 devices : I did the test for old issued UIDs ( |
I'd still be curious if this works if you've got five minutes to test it. I can then update the docs to clarify too. Thanks @majcit , sorry I'd not got back to you. So I've found the source of the issue, it's down to how we behave when we get a response from a phantom UID.
We were seeing 6, failing to mute it, and giving up on that whole branch. Our test code was also broken, so although our tests passed, they didn't actually test this particular issue, fixing the test code made the tests fail, so I've then been able to fix and test the actual discovery code. If you add the changes in DiscoveryAgent*.cpp from here and recompile, your issue should be fixed: Although there's a slightly more optimised fix I'm working on too... |
Sure, I will do it as soon as I get to workplace.
Glad to hear the issue is traced, thanks.
I would kindly request 2 more things, if possible please consider for next releases |
Thanks.
I think it's the reverse, the pull up wins. Certainly the OR generates the DUB reply your packet capture includes.
Do you mean in the olad log, or the output of ola_rdm_discover?
There is already auto-patch on the web UI (see the wand ) button on the UI. The code for this is here, but currently it only sorts by footprint: I'm not sure patching by UID is relevant, aside from maybe a large architectural install, the chances of getting devices with UIDs in any logical order is fairly slim and even in that scenario, fitting them to the building in the correct order will still be quite a hassle; identifying using RDM and addressing appropriately may be just as quick. For both requests, you're probably better off starting new issues with a bit more detail anyway. |
ok, I am trying to save sniffer log file, but unsuccessfull :/ both ENTTEC-PRO and ENTTEC-PRO-Mk2 are connected to Raspberry Pi I added only PRO as port for discovery by OLA admin, 4 fixtures: (02ac:2ee109a1-02ac:2ee109a2-02ac:2ee109a13-02ac:2ee109a4) connected to ProMk2:DMX1(Female) when I run discovery, OLA discovers fixtures + PRO Mk2 as supposed. I commanded to start logging : I pressed "run full RDM discovery" form :9090 I commanded to see logged file : but nothing is logged !, file is empty did I do right ? |
I get the feeling there is probably a command to put the newer devices into sniffer mode, which perhaps we're not sending. The old ones used to require a different firmware. If you've got a bit of time, I'd be curious to try with the setup reversed, so Mk2 as controller, Pro as sniffer and perhaps more likely to do things, stop olad and use the Enttec software on another machine to run the discovery, so the Pi is just doing sniffing (again if you could try both models taking it in turns to sniff). Back on the original bug front, this is the more optimised version of my fix if you'd like to give it a go: |
sure, after stopping I have got 2 Raspberry Pi, so I stopped after running full discovery on Pi # 2 a few times, discovery successfully found 4 fixtures + PRO Mk2
|
and following is the reverse : sniffer on Pi # 1 with PRO
|
I can't find anything in the Enttec API docs either, so we probably need some USB packet sniffing (e.g. Wireshark) to find out what's going on using a machine with the Enttec software, but I'm guessing there is a way to switch the Enttec into RDM mode. I'm not clear if the USB licence dongle is actually required to unlock the Enttec, or only to unlock the software. I've also just rediscovered this, so it won't work on the Mk2 without some more work (I suspect you need an actual Enttec RDM Pro for our code to work: I think even if you add --display-dmx to your command, you won't see DMX traffic as it's not in sniffer mode, which our code expects. |
Our device is closed source. I am designer and programmer of device, I would be delighted my design to be examined by OLA engineers ! : ) |
I was talking in the IRC channel last night with @peternewman and I was finally able to grab some logs before I left the office for the day.
The RDM full discovery OFTEN misses a handful of connected fixtures, and we will have to run the discovery multiple times until all fixtures appear. Once in a while a discovery will hang and will render our quality assurance software frozen.
Here is a
olad -l 4
log where it misses 2 out of the 4 connected fixtures.https://gist.github.com/ltd9938/b502c5395b9fab2231a97a2087b02400
Here is a
olad -l 4
log where it picks up all of the connected devices.https://gist.github.com/ltd9938/6e7391310fba51db07161b4c7919459c
I was unable to grab a log of the discovery hanging, but as soon as I do I will update this post.
We are running an ENTTEC RDM USB PRO with firmware 2.4 (RDM Enabled) on the latest version of OLA on ubuntu 12.04.
The text was updated successfully, but these errors were encountered: