Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RDM full discovery sometimes misses connected fixtures or will sometimes hang. #1396

Open
ghost opened this issue Mar 28, 2018 · 36 comments
Open

Comments

@ghost
Copy link

ghost commented Mar 28, 2018

I was talking in the IRC channel last night with @peternewman and I was finally able to grab some logs before I left the office for the day.

The RDM full discovery OFTEN misses a handful of connected fixtures, and we will have to run the discovery multiple times until all fixtures appear. Once in a while a discovery will hang and will render our quality assurance software frozen.

Here is a olad -l 4 log where it misses 2 out of the 4 connected fixtures.
https://gist.github.com/ltd9938/b502c5395b9fab2231a97a2087b02400

Here is a olad -l 4 log where it picks up all of the connected devices.
https://gist.github.com/ltd9938/6e7391310fba51db07161b4c7919459c

I was unable to grab a log of the discovery hanging, but as soon as I do I will update this post.

We are running an ENTTEC RDM USB PRO with firmware 2.4 (RDM Enabled) on the latest version of OLA on ubuntu 12.04.

@peternewman
Copy link
Member

peternewman commented Mar 28, 2018

Is that the same four fixtures in both runs?

I'm guessing so based on
plugins/usbpro/EnttecUsbProWidget.cpp:586: Enttec Pro discovery complete: 4151:0200013a,4151:0200014e,4151:020002a3,4151:020002a5

And
plugins/usbpro/EnttecUsbProWidget.cpp:586: Enttec Pro discovery complete: 4151:0200013a,4151:0200014e

In which case, the question is why this happened:

plugins/usbpro/EnttecUsbProWidget.cpp:330: Sending DUB packet: 4151:02000200 - 4151:020003ff
plugins/usbpro/EnttecUsbProWidget.cpp:854: TX: 11, length 38
common/io/EPoller.cpp:306: ss process time was 0.000001
plugins/usbpro/EnttecUsbProWidget.cpp:865: RX: 5, length 25
plugins/usbpro/EnttecUsbProWidget.cpp:865: RX: 12, length 0
common/rdm/DiscoveryAgent.cpp:217: BranchComplete, got 24
common/rdm/DiscoveryAgent.cpp:321: Muting 4151:020002a7

Which means that either there's something up with one or more of your fixtures, or they're somehow generating a collision which appears as a complete and valid response to the Enttec.

To progress this, it's probably a case of adding some more debugging to BranchComplete and/or capturing the raw RDM data on the line using an analyser/logic sniffer etc, e.g. see https://www.openlighting.org/rdm-tools/rdm-analyzers/ .

@peternewman
Copy link
Member

So in data terms:

4151:020002a5 = eb 55 fb 55 aa 57 aa 55 aa 57 af f5 af 57 bf 75
4151:020002a3 = eb 55 fb 55 aa 57 aa 55 aa 57 ab f7 af 57 bb 77
4151:020002a7 = eb 55 fb 55 aa 57 aa 55 aa 57 af f7 af 57 bf 77

Which I guess isn't a huge stretch looking at the data.

But with checksums as follows:

AB 5D EF 7F
AB 5D EB 7F
AB 5D FB 77

A clean collision for them seems less likely to me.

@ghost
Copy link
Author

ghost commented Mar 28, 2018

Correct those are the same four fixtures in both runs. I can try and get a bigger log with more fixtures before the end of the week if need be.

@peternewman
Copy link
Member

There are two questions here really:

  1. From your side, how come you're getting a perfect collision which generates a valid DUB response
  2. From the OLA side, when we find a bad UID, should we actually behave a bit like a collision and keep branching down either side of the bad UID, which I think may work around your issue.

Although fundamentally, the way the standard is designed, 1 shouldn't be possible, otherwise all bets are off in terms of discovery.

The full discovery run log for success and failure would help, both should start with something like "DUB 0000:00000000 - ffff:ffffffff" or a line or two above that.

@ghost
Copy link
Author

ghost commented Mar 29, 2018

It's very weird. Say we have 11 connected fixtures. I'll have to run the discovery multiple times before all 11 fixtures are discovered. It usually goes like this...

Discovery 1: 4 Fixtures
Discovery 2: 7 Fixtures
Discovery 3: 4 Fixtures
Discovery 4: 9 Fixtures
Discovery 5: 8 Fixtures
Discovery 6: 11 Fixtures

@peternewman
Copy link
Member

Does running a full, then incrementals improve that?

When it fails to find everything, does it log "didn't respond to MUTE, marking as bad" against some non-existent UIDs?

@ghost
Copy link
Author

ghost commented Mar 29, 2018

I haven't tried running a full then incremental. Hopefully I can give it a shot tomorrow.

Nope I've never seen that message. It just outputs a list of the found uids.

@peternewman
Copy link
Member

Prior to that, in the olad -l 4 logging, it should have that message (not in ola_rdm_discover sorry), e.g.:
https://gist.github.com/ltd9938/b502c5395b9fab2231a97a2087b02400#file-fail-olad-l-4-L211

@ghost
Copy link
Author

ghost commented Mar 29, 2018

Ahh gotcha. I won't have access to the manufacturing plant until Monday so I'm tied until next week.

@ghost
Copy link
Author

ghost commented Apr 4, 2018

Was able to go into the plant yesterday. Ran incremental discoveries and didn't run into any issues. However, this was only with 4 fixtures. Once we get more fixtures I'll really try and recreate it.

@peternewman
Copy link
Member

Hi @ltd9938 did you make any progress with this?

@majcit
Copy link

majcit commented Dec 1, 2018

Hi peter,
Apologize I break in here. I thought not to start a new thread because I have very similar problem.

My new born responders (LED fixtures), all passed the OLA Responder tests sucessfully (Passed 367, Not Run 59, Total 426)

I tried 12 of LED fixtures, (same model ID, 12 sequential DID from 1 to 12) to patch using OLA admin. I have to press" run full discovery" button several times to succeed discovering all devices.
attempt 1 : 6 devices
attempt 2 : 3 devices
attempt 3 : 4 devices
attempt 4 : 4 devices
...

Is there a an upper limit discovering for OLA ?

@peternewman
Copy link
Member

Hi @majcit ,

There shouldn't be a limit and I've successfully discovered large numbers of responders with it such as the LabPack (lots of responders in a box https://www.goddarddesign.com/rdm-lab-pack.html and there's a bigger version too).

Firstly I assume this is repeatable, i.e. it happens each time you do it? I'd suggest using ola_rdm_discover as it's probaly more repeatable, and you can explicitly try full and incremental discoveries:
https://docs.openlighting.org/ola/man/man1/ola_rdm_discover.1.html

To have any chance of finding the source of this bug and potentially fixing or working around it, we'll need a lot more information please, including olad -l 4 logs of any test runs. If you can capturing the raw RDM data on the line using an analyser/logic sniffer etc would be amazing, e.g. see https://www.openlighting.org/rdm-tools/rdm-analyzers/ .

I'm going to throw out a load of things to consider and hopefully you can do some testing your end to come up with the minimum needed to reproduce the fault, which might give me a chance of seeing it too and hence greatly increase the chance of fixing it.

For starters, what controller/interface are you using? Do you have access to another of the same or ideally a different type? What about an RDM compatible lighting desk?
Does it happen with fewer fixtures? E.g. binary chop to six and repeat, what's the minimum number required?
If it happens with fewer than twelve, or you have access to more than twelve or can change their UIDs, does it happen if they aren't sequentially numbered. As discussed above, the closer the numbers, the greater the chance of a collision generating a valid packet (although still unlikely).
Does incremental discovery behave better or worse?
Does the fixture pass the responder tests fine if it's tested with all the other fixtures on the line too?
I assume these are all just connected daisy-chained? Have you got an RDM compatible splitter, does it still happen through that?
Is it always that sequence of discovery successes or does it vary?
I assume your responder is closed source? Would you be able to loan me some, or at least the control board guts/minimum bit to do RDM responding, (in London), if you don't have the RDM analyser/logic sniffer kit?

@ltd9938 if you can answer any of the above for your issue too would also help.

@ghost
Copy link
Author

ghost commented Dec 3, 2018

Hi @peternewman,

Sorry for not updating my situation sooner. My problem has been solved. My software was triggering multiple discoveries at a time which was causing the complications.

All has been fixed. I'm going to close this issue, please reopen if you see the need to.

@ghost ghost closed this as completed Dec 3, 2018
@peternewman
Copy link
Member

Hmm, thanks for coming back @ltd9938 , that sounds like a potential bug in OLA still, as I'm not sure we should allow the second discovery until the first has completed, given it would cause issues like you've seen.

Do you have a basic sample of your code you could share with the bug so I can try and reproduce and fix it? Which API were you using C++, Python, JSON/HTTP?

@majcit are you sure you aren't having the same issue? Can you try with the CLI client to make sure it's only being run once at a time, ideally after waiting some time for the initial discovery that's run on startup to complete.

@majcit
Copy link

majcit commented Dec 3, 2018

Ok Peter
My setup (12 fixture) is successfully discovered by :
XMT-350,
ENTTEC DMX USB Pro + Enttec software
ENTTEC DMX USB Mk2 + Enttec software
Several times I press "full Discovery" every time discovers 12 devices

By both ENTTEC hardwares + OLA 10.3 (on Raspbian)
İt may randomly any number between 1~12
Rarely at first try discovers all 12,
I didn't consider any special numbers or special order of DIDs

I can execute Enttec sniffer and report sniffer messages soon,
For olad log, I need to try more, never used before,

@peternewman
Copy link
Member

Thanks @majcit .

But if you don't press any buttons and wait for say 2-3 mins, then hit full discovery once, it still generally fails to find all 12?

The Enttec software works very differently to us, as there is some oddity when discovering https://www.enttec.com/products/controls/led/din-led4px/ with OLA that works with their system which I haven't had a chance to get to the bottom of yet and find out where the issue lies.

Enttec sniffer logs would be excellent, either from their software or using ours:
http://docs.openlighting.org/ola/man/man1/rdmpro_sniffer.1.html

Likewise as mentioned, a few tests to see if the number of devices is special, or if it's intermittent with just one or two devices.

In terms of gathering olad debug logs, see here:
https://www.openlighting.org/ola/get-help/ola-faq/#How_do_I_get_olad_-l_4_logs

@majcit
Copy link

majcit commented Dec 4, 2018

I did new trials by adding fixtures one by one and observed new facts as below :
N: number of fixtures
there is no problem when N<4, OLA always discovers successfully
when N>=4 and DIDs are sequential, at every pressing of discovery , OLA finds randomly different numbers 1 ~ N
for example 6 sequential fixteures : (all sequential)
2ee109a1-2ee109a2-2ee109a3-2ee109a4-2ee109a5-2ee109a6

when N>=4 and DIDs are not sequential, OLA always discovers successfully, even if pressed immediately
for example 15 non-sequential fixteures : (5 group, each group only 3 sequential)
2ee109a1-2ee109a2-2ee109a3
2ee10aa4-2ee10aa5-2ee10aa6
2ee10ba7-2ee10ba8-2ee10ba9
2ee10caa-2ee10cab-2ee10cac
2e94be67-2e94be68-2e94be69

But if you don't press any buttons and wait for say 2-3 mins, then hit full discovery once, it still generally fails to find all 12?

I tried several times, there is no difference if wait 1s or 1min or 5min.

@peternewman
Copy link
Member

Thanks @majcit , I'll reopen this, as that certainly sounds like a bug. @ltd9938 are you sure you also aren't seeing the same issue, it sounds VERY similar!

@majcit I assume this fails with both the Enttec Pro and the Pro Mk 2?

I think we can now just concentrate on four sequential fixtures for the moment, some RDM sniffer and/or olad -l 4 logs will be the next step now I think.

@peternewman peternewman reopened this Dec 4, 2018
@majcit
Copy link

majcit commented Dec 4, 2018

@majcit I assume this fails with both the Enttec Pro and the Pro Mk 2?

yes It occurs both for PRO and PRO Mk2

I think we can now just concentrate on four sequential fixtures for the moment, some RDM sniffer and/or olad -l 4 logs will be the next step now I think.

I am your man!
I need to sort this out because maybe it is hidden bug of my devices, so that ENTTEC PRO and ENTTEC PRO Mk2 and XMT-350 somehow tolerate the error , but OLA not

attachments are sniffer log for 4 fixtures with DID=1,2,3 and 4
OLA discovery by PRO, sniffed by PRO Mk2,

3 fixtures, discovered all 3 successfully :
FIX=3 DISC 3 (DID=1,2,3).txt

4 fixtures, discovered only 1 :
FIX=4 DISC 1 (DID=3).txt

4 fixtures, discovered only 2:
FIX=4 DISC 2 (DID=1,2).txt

4 fixtures, discovered all 4 successfully :
FIX=4 DISC 4 (DID=1,2,3,4).txt

are .txt files ok? or do you need .bin file?

@ghost
Copy link
Author

ghost commented Dec 4, 2018

@peternewman

I wrote a quality assurance station for our manufacturing team using Flask. I had a "Refresh Fixtures" button on the homepage that when clicked would trigger ola_rdm_discover -f -u 1.

After going through my code I realized I stupidly had the discovery start twice. After removing the second discovery initiation we haven't had an issue since (except when our fixtures aren't daisy chained correctly, which may have also played a part months ago)

@peternewman
Copy link
Member

peternewman commented Dec 4, 2018

@majcit I assume this fails with both the Enttec Pro and the Pro Mk 2?

yes It occurs both for PRO and PRO Mk2

I think we can now just concentrate on four sequential fixtures for the moment, some RDM sniffer and/or olad -l 4 logs will be the next step now I think.

I am your man!
I need to sort this out because maybe it is hidden bug of my devices, so that ENTTEC PRO and ENTTEC PRO Mk2 and XMT-350 somehow tolerate the error , but OLA not

attachments are sniffer log for 4 fixtures with DID=1,2,3 and 4
OLA discovery by PRO, sniffed by PRO Mk2,

3 fixtures, discovered all 3 successfully :
FIX=3 DISC 3 (DID=1,2,3).txt

Strange, that log only shows two Good Checksum lines! For 02ac:00000002 and 02ac:00000001. Are they actually your UIDs, they don't match what I assume was supposed to be the device part of the IDs listed earlier. However it does match the DEVICE_LABEL get response, and indeed your comment just above, so I assume that's correct.

Actually looking at the log it's already muted 02ac:00000003, so I assume it found that outside of the text log you sent.

Other strange things here:
We DUB 0000000000007FFFFFFFFFFF twice and don't get collisions either time, although that might just be a timing thing.

4 fixtures, discovered only 1 :
FIX=4 DISC 1 (DID=3).txt

02ac:00000003

4 fixtures, discovered only 2:
FIX=4 DISC 2 (DID=1,2).txt

So I think this is the worst case, only finding half:
Finds 02ac:00000002
Finds 02ac:00000001
Finds 02ac:00000006!
Finds 02ac:00000007!
Finds 02ac:00000007!
Finds 02ac:00000007!

The 7's are all when DUBing 000000000000-7FFFFFFFFFFF

4 fixtures, discovered all 4 successfully :
FIX=4 DISC 4 (DID=1,2,3,4).txt

The sniffing can't be great, as it also shows this line:
39362263,RDM Discovery Response, , , , , , , Good Checksum , 8 ,FC FF FF FF FF FF FF BA
This still found 02ac:00000000 (twice) and 02ac:00000006!

are .txt files ok? or do you need .bin file?

Text is fine, and indeed easier!

We've got an EUID to UID converter here:
http://rdm.openlighting.org/tools/uid-converter

I'd again suggest OLA's RDM sniffer code (which should work with your Enttec), and may produce nicer output:
https://www.openlighting.org/rdm-tools/rdm-analyzers/enttec-sniffer/

Other than that, it's probably a case of using the Enttec software or your XMT-350 to do a successful discovery and sniffing that to see if it also has the valid discovery of non-existent UIDs.

@peternewman
Copy link
Member

@peternewman

I wrote a quality assurance station for our manufacturing team using Flask. I had a "Refresh Fixtures" button on the homepage that when clicked would trigger ola_rdm_discover -f -u 1.

After going through my code I realized I stupidly had the discovery start twice. After removing the second discovery initiation we haven't had an issue since (except when our fixtures aren't daisy chained correctly, which may have also played a part months ago)

Okay thanks for confirming @ltd9938 , although it's odd how much it mirrors your issue, I was trying to find where I made this comment, then realised it was regarding you:
#1396 (comment)

@majcit
Copy link

majcit commented Dec 5, 2018

Strange, that log only shows two Good Checksum lines! For 02ac:00000002 and 02ac:00000001. Are they actually your UIDs, they don't match what I assume was supposed to be the device part of the IDs listed earlier. However it does match the DEVICE_LABEL get response, and indeed your comment just above, so I assume that's correct.

Actually looking at the log it's already muted 02ac:00000003, so I assume it found that outside of the text log you sent.

Other strange things here:
We DUB 0000000000007FFFFFFFFFFF twice and don't get collisions either time, although that might just be a timing thing.

4 fixtures full UIDs are : 02ac:00000001, 02ac:00000002, 02ac:00000003 and 02ac:00000004
I briefly say 1,2,3 and 4

strange UIDs 6 and 7, there is no such UIDs , controller DISC_MUTE them, but they don't appear at final list, maybe they are consequence of collisions I am not sure if it is normal or not but It also happens with XMT-350, here is the sniffer log for 4 fixtures UIDs 1, 2, 3 and 4 :
XMT-350 FIX=4 DISC 4 (DID=1,2,3,4).txt

I'd again suggest OLA's RDM sniffer code (which should work with your Enttec), and may produce nicer output:
https://www.openlighting.org/rdm-tools/rdm-analyzers/enttec-sniffer/

I tried to execute rdmpro_sniffer [ options ] <usb-device-path> but I can't determine <usb-device-path>
the example says rdmpro_sniffer -r /dev/tty.usbserial-00001014
in my Raspberry Pi there is no such file or similar :

root@raspberrypi:/dev# ls
autofs           loop7               ram6     tty20  tty46      urandom
block            loop-control        ram7     tty21  tty47      vchiq
btrfs-control    mapper              ram8     tty22  tty48      vcio
bus              mem                 ram9     tty23  tty49      vc-mem
cachefiles       memory_bandwidth    random   tty24  tty5       vcs
char             mmcblk0             raw      tty25  tty50      vcs1
console          mmcblk0p1           rfkill   tty26  tty51      vcs2
cpu_dma_latency  mmcblk0p2           serial   tty27  tty52      vcs3
cuse             mqueue              serial0  tty28  tty53      vcs4
disk             net                 shm      tty29  tty54      vcs5
fb0              network_latency     snd      tty3   tty55      vcs6
fd               network_throughput  stderr   tty30  tty56      vcsa
full             null                stdin    tty31  tty57      vcsa1
fuse             ppp                 stdout   tty32  tty58      vcsa2
gpiochip0        ptmx                tty      tty33  tty59      vcsa3
gpiomem          pts                 tty0     tty34  tty6       vcsa4
hwrng            ram0                tty1     tty35  tty60      vcsa5
initctl          ram1                tty10    tty36  tty61      vcsa6
input            ram10               tty11    tty37  tty62      vcsm
kmsg             ram11               tty12    tty38  tty63      vhci
log              ram12               tty13    tty39  tty7       watchdog
loop0            ram13               tty14    tty4   tty8       watchdog0
loop1            ram14               tty15    tty40  tty9       zero
loop2            ram15               tty16    tty41  ttyAMA0
loop3            ram2                tty17    tty42  ttyprintk
loop4            ram3                tty18    tty43  ttyUSB0
loop5            ram4                tty19    tty44  uhid
loop6            ram5                tty2     tty45  uinput

@peternewman peternewman added the bug label Dec 5, 2018
@peternewman peternewman added this to the 0.11.0 milestone Dec 5, 2018
@peternewman
Copy link
Member

Strange, that log only shows two Good Checksum lines! For 02ac:00000002 and 02ac:00000001. Are they actually your UIDs, they don't match what I assume was supposed to be the device part of the IDs listed earlier. However it does match the DEVICE_LABEL get response, and indeed your comment just above, so I assume that's correct.
Actually looking at the log it's already muted 02ac:00000003, so I assume it found that outside of the text log you sent.
Other strange things here:
We DUB 0000000000007FFFFFFFFFFF twice and don't get collisions either time, although that might just be a timing thing.

4 fixtures full UIDs are : 02ac:00000001, 02ac:00000002, 02ac:00000003 and 02ac:00000004
I briefly say 1,2,3 and 4

strange UIDs 6 and 7, there is no such UIDs , controller DISC_MUTE them, but they don't appear at final list, maybe they are consequence of collisions I am not sure if it is normal or not but It also happens with XMT-350, here is the sniffer log for 4 fixtures UIDs 1, 2, 3 and 4 :
XMT-350 FIX=4 DISC 4 (DID=1,2,3,4).txt

So the XMT-350 log also finds 6, mute's it once and then continues dubbing. So I think this is down to how different things respond to a collision that generates a good checksum. I suspect OLA is being a bit too defensive and assuming the device just doesn't respond to mute properly, whereas it seems we should branch/DUB a bit more first before writing it off as a bad device. I think some olad -l 4 logs are the next step just to confirm my guesswork from the RDM captures.

I'd again suggest OLA's RDM sniffer code (which should work with your Enttec), and may produce nicer output:
https://www.openlighting.org/rdm-tools/rdm-analyzers/enttec-sniffer/

I tried to execute rdmpro_sniffer [ options ] <usb-device-path> but I can't determine <usb-device-path>
the example says rdmpro_sniffer -r /dev/tty.usbserial-00001014
in my Raspberry Pi there is no such file or similar :

I think you want /dev/ttyUSB0, I believe the other format is Mac style.

@majcit
Copy link

majcit commented Dec 5, 2018

I think some olad -l 4 logs are the next step just to confirm my guesswork from the RDM captures.

I succeeded capturing olad -l 4 log stream :

4 fixtures, discovered 2 of them: 1,2
olad log, FIX=4 DISC 2 (DID=1,2).txt

4 fixtures, discovered 2 of them: 1, 3
olad log, FIX=4 DISC 2 (DID=1,3).txt

4 fixtures, discovered 1 of them: 3
olad log, FIX=4 DISC 1 (DID=3).txt

4 fixtures, discovered all of them: 1,2,3,4
olad log, FIX=4 DISC 4 (DID=1,2,3,4).txt

@majcit
Copy link

majcit commented Dec 7, 2018

I did new trials by adding fixtures one by one and observed new facts as below :
N: number of fixtures
there is no problem when N<4, OLA always discovers successfully
when N>=4 and DIDs are sequential, at every pressing of discovery , OLA finds randomly different numbers 1 ~ N
for example 6 sequential fixteures : (all sequential)
2ee109a1-2ee109a2-2ee109a3-2ee109a4-2ee109a5-2ee109a6

when N>=4 and DIDs are not sequential, OLA always discovers successfully, even if pressed immediately
for example 15 non-sequential fixteures : (5 group, each group only 3 sequential)
2ee109a1-2ee109a2-2ee109a3
2ee10aa4-2ee10aa5-2ee10aa6
2ee10ba7-2ee10ba8-2ee10ba9
2ee10caa-2ee10cab-2ee10cac
2e94be67-2e94be68-2e94be69

Hi @peternewman , I just observed new fact,

The issue does not happen, for other new sequential UIDs,
I just did new tests with new UIDs - all sequential too, - the issue does not occurs for this UIDs :
02ac2ea58b8d-02ac2ea58b8e-02ac2ea58b8f-02ac2ea58b90-02ac2ea58b91-02ac2ea58b92-02ac2ea58b93-02ac2ea58b94-02ac2ea58b95-02ac2ea58b96--02ac2ea58b97-02ac2ea58b98

I executed OLA discovery several times , for 4 devices , never skipped any device and found all 4 devices successfully, then I repeated for 12 devices , again never skipped any device and found all 12 devices successfully,

here is the sniffed .txt for new UIDs 12 devices :
FIX=12, DISC=12 (all successful).txt

I did the test for old issued UIDs (2ee109a1-2ee109a2-2ee109a3-2ee109a4) again on exactly the same hardware and same firmware, the issue is still exists for old UIDs as before,

@peternewman
Copy link
Member

I tried to execute rdmpro_sniffer [ options ] <usb-device-path> but I can't determine <usb-device-path>
the example says rdmpro_sniffer -r /dev/tty.usbserial-00001014
in my Raspberry Pi there is no such file or similar :

I think you want /dev/ttyUSB0, I believe the other format is Mac style.

I'd still be curious if this works if you've got five minutes to test it. I can then update the docs to clarify too.

Thanks @majcit , sorry I'd not got back to you. So I've found the source of the issue, it's down to how we behave when we get a response from a phantom UID.

02ac:00000002 = aa 57 ae fd aa 55 aa 55 aa 55 aa 57 ae 57 aa ff
02ac:00000004 = aa 57 ae fd aa 55 aa 55 aa 55 ae 55 ae 57 ae fd
bitwise or gives:
aa 57 ae fd aa 55 aa 55 aa 55 ae 57 ae 57 ae ff
Which if you decode the EUID is:
02ac:00000006

We were seeing 6, failing to mute it, and giving up on that whole branch. Our test code was also broken, so although our tests passed, they didn't actually test this particular issue, fixing the test code made the tests fail, so I've then been able to fix and test the actual discovery code.

If you add the changes in DiscoveryAgent*.cpp from here and recompile, your issue should be fixed:
#1520

Although there's a slightly more optimised fix I'm working on too...

@majcit
Copy link

majcit commented Dec 7, 2018

I'd still be curious if this works if you've got five minutes to test it. I can then update the docs to clarify too.

Sure, I will do it as soon as I get to workplace.

So I've found the source of the issue, it's down to how we behave when we get a response from a phantom UID.
02ac:00000002 = aa 57 ae fd aa 55 aa 55 aa 55 aa 57 ae 57 aa ff
02ac:00000004 = aa 57 ae fd aa 55 aa 55 aa 55 ae 55 ae 57 ae fd
bitwise or gives:
aa 57 ae fd aa 55 aa 55 aa 55 ae 57 ae 57 ae ff
Which if you decode the EUID is:
02ac:00000006

Glad to hear the issue is traced, thanks.
Actually I also simulated collision manually by bitwise AND'ing 2, 4, resulted 2 with coincidently true checksum, since I assumed 0 is dominent in sinking 1 to 0. I did't think OR!

Although there's a slightly more optimised fix I'm working on too...

I would kindly request 2 more things, if possible please consider for next releases
1.
After RUN discovery a brief message of how many devices were found would be very useful to verify discovery is successful.
2.
Automatic patching option for ascening/descending sorted UIDs may accelerate patching for sequentially mounted devices.

@peternewman
Copy link
Member

I'd still be curious if this works if you've got five minutes to test it. I can then update the docs to clarify too.

Sure, I will do it as soon as I get to workplace.

Thanks.

Glad to hear the issue is traced, thanks.
Actually I also simulated collision manually by bitwise AND'ing 2, 4, resulted 2 with coincidently true checksum, since I assumed 0 is dominent in sinking 1 to 0. I did't think OR!

I think it's the reverse, the pull up wins. Certainly the OR generates the DUB reply your packet capture includes.

Although there's a slightly more optimised fix I'm working on too...

I would kindly request 2 more things, if possible please consider for next releases
1.
After RUN discovery a brief message of how many devices were found would be very useful to verify discovery is successful.

Do you mean in the olad log, or the output of ola_rdm_discover?

Automatic patching option for ascening/descending sorted UIDs may accelerate patching for sequentially mounted devices.

There is already auto-patch on the web UI (see the wand Wand ) button on the UI. The code for this is here, but currently it only sorts by footprint:
https://github.com/OpenLightingProject/ola/blob/master/javascript/ola/full/rdm_patcher.js

I'm not sure patching by UID is relevant, aside from maybe a large architectural install, the chances of getting devices with UIDs in any logical order is fairly slim and even in that scenario, fitting them to the building in the correct order will still be quite a hassle; identifying using RDM and addressing appropriately may be just as quick.

For both requests, you're probably better off starting new issues with a bit more detail anyway.

@majcit
Copy link

majcit commented Dec 10, 2018

I'd still be curious if this works if you've got five minutes to test it. I can then update the docs to clarify too.

Sure, I will do it as soon as I get to workplace.

ok, I am trying to save sniffer log file, but unsuccessfull :/
I did as below :

both ENTTEC-PRO and ENTTEC-PRO-Mk2 are connected to Raspberry Pi
OLA Admin :9090 can detect both PRO and PRO-Mk2

I added only PRO as port for discovery by OLA admin,
I did not add PRO Mk2

4 fixtures: (02ac:2ee109a1-02ac:2ee109a2-02ac:2ee109a13-02ac:2ee109a4) connected to ProMk2:DMX1(Female)
ProMk2:DMX1(Male) connected to Pro:DMXOUT

when I run discovery, OLA discovers fixtures + PRO Mk2 as supposed.

I commanded to start logging :
root@raspberrypi:~# rdmpro_sniffer -w /tmp/log /dev/ttyUSB0

I pressed "run full RDM discovery" form :9090

I commanded to see logged file :
root@raspberrypi:~# rdmpro_sniffer -p /tmp/log

but nothing is logged !, file is empty

did I do right ?

@peternewman
Copy link
Member

but nothing is logged !, file is empty

did I do right ?

I get the feeling there is probably a command to put the newer devices into sniffer mode, which perhaps we're not sending. The old ones used to require a different firmware.

If you've got a bit of time, I'd be curious to try with the setup reversed, so Mk2 as controller, Pro as sniffer and perhaps more likely to do things, stop olad and use the Enttec software on another machine to run the discovery, so the Pi is just doing sniffing (again if you could try both models taking it in turns to sniff).

Back on the original bug front, this is the more optimised version of my fix if you'd like to give it a go:
#1522

@majcit
Copy link

majcit commented Dec 10, 2018

If you've got a bit of time, I'd be curious to try with the setup reversed, so Mk2 as controller, Pro as sniffer and perhaps more likely to do things, stop olad and use the Enttec software on another machine to run the discovery, so the Pi is just doing sniffing (again if you could try both models taking it in turns to sniff).

sure, after stopping olad seems it is working :

I have got 2 Raspberry Pi, so I stopped olad and run rdmpro_sniffer on Pi # 1 with Mk2
then I run full discovery on Pi # 2 with PRO
4 fixtures are connected to Mk2:DMX1(female)
Mk2:DMX1(male) is connected to PRO:DMX OUT(female)

after running full discovery on Pi # 2 a few times, discovery successfully found 4 fixtures + PRO Mk2
and following is the log file content :

root@raspberrypi:~# rdmpro_sniffer -w /tmp/log /dev/ttyUSB0
tools/rdmpro/rdm-sniffer.cpp:208: Not a SNIFFER_PACKET, was 5
tools/rdmpro/rdm-sniffer.cpp:208: Not a SNIFFER_PACKET, was 5
^Z
[1]+  Durdu                   rdmpro_sniffer -w /tmp/log /dev/ttyUSB0
root@raspberrypi:~# rdmpro_sniffer -p /tmp/log
tools/rdmpro/rdm-sniffer.cpp:208: Not a SNIFFER_PACKET, was 5
tools/rdmpro/rdm-sniffer.cpp:208: Not a SNIFFER_PACKET, was 5
root@raspberrypi:~#

@majcit
Copy link

majcit commented Dec 10, 2018

and following is the reverse :

sniffer on Pi # 1 with PRO
discovery on Pi # 2 with Mk2

root@raspberrypi:~# rdmpro_sniffer -w /tmp/log /dev/ttyUSB0
tools/rdmpro/rdm-sniffer.cpp:208: Not a SNIFFER_PACKET, was 5
tools/rdmpro/rdm-sniffer.cpp:208: Not a SNIFFER_PACKET, was 5
tools/rdmpro/rdm-sniffer.cpp:208: Not a SNIFFER_PACKET, was 5
tools/rdmpro/rdm-sniffer.cpp:208: Not a SNIFFER_PACKET, was 5
tools/rdmpro/rdm-sniffer.cpp:208: Not a SNIFFER_PACKET, was 5
tools/rdmpro/rdm-sniffer.cpp:208: Not a SNIFFER_PACKET, was 5
tools/rdmpro/rdm-sniffer.cpp:208: Not a SNIFFER_PACKET, was 5
tools/rdmpro/rdm-sniffer.cpp:208: Not a SNIFFER_PACKET, was 5
tools/rdmpro/rdm-sniffer.cpp:208: Not a SNIFFER_PACKET, was 5
tools/rdmpro/rdm-sniffer.cpp:208: Not a SNIFFER_PACKET, was 5
tools/rdmpro/rdm-sniffer.cpp:208: Not a SNIFFER_PACKET, was 5
tools/rdmpro/rdm-sniffer.cpp:208: Not a SNIFFER_PACKET, was 5
tools/rdmpro/rdm-sniffer.cpp:208: Not a SNIFFER_PACKET, was 5
tools/rdmpro/rdm-sniffer.cpp:208: Not a SNIFFER_PACKET, was 5
tools/rdmpro/rdm-sniffer.cpp:208: Not a SNIFFER_PACKET, was 5
tools/rdmpro/rdm-sniffer.cpp:208: Not a SNIFFER_PACKET, was 5
tools/rdmpro/rdm-sniffer.cpp:208: Not a SNIFFER_PACKET, was 5
tools/rdmpro/rdm-sniffer.cpp:208: Not a SNIFFER_PACKET, was 5
tools/rdmpro/rdm-sniffer.cpp:208: Not a SNIFFER_PACKET, was 5
tools/rdmpro/rdm-sniffer.cpp:208: Not a SNIFFER_PACKET, was 5
^Z
[1]+  Durdu                   rdmpro_sniffer -w /tmp/log /dev/ttyUSB0
root@raspberrypi:~# rdmpro_sniffer -p /tmp/log
tools/rdmpro/rdm-sniffer.cpp:208: Not a SNIFFER_PACKET, was 5
tools/rdmpro/rdm-sniffer.cpp:208: Not a SNIFFER_PACKET, was 5
tools/rdmpro/rdm-sniffer.cpp:208: Not a SNIFFER_PACKET, was 5
tools/rdmpro/rdm-sniffer.cpp:208: Not a SNIFFER_PACKET, was 5
tools/rdmpro/rdm-sniffer.cpp:208: Not a SNIFFER_PACKET, was 5
tools/rdmpro/rdm-sniffer.cpp:208: Not a SNIFFER_PACKET, was 5
tools/rdmpro/rdm-sniffer.cpp:208: Not a SNIFFER_PACKET, was 5
tools/rdmpro/rdm-sniffer.cpp:208: Not a SNIFFER_PACKET, was 5
tools/rdmpro/rdm-sniffer.cpp:208: Not a SNIFFER_PACKET, was 5
tools/rdmpro/rdm-sniffer.cpp:208: Not a SNIFFER_PACKET, was 5
tools/rdmpro/rdm-sniffer.cpp:208: Not a SNIFFER_PACKET, was 5
tools/rdmpro/rdm-sniffer.cpp:208: Not a SNIFFER_PACKET, was 5
tools/rdmpro/rdm-sniffer.cpp:208: Not a SNIFFER_PACKET, was 5
tools/rdmpro/rdm-sniffer.cpp:208: Not a SNIFFER_PACKET, was 5
tools/rdmpro/rdm-sniffer.cpp:208: Not a SNIFFER_PACKET, was 5
tools/rdmpro/rdm-sniffer.cpp:208: Not a SNIFFER_PACKET, was 5
tools/rdmpro/rdm-sniffer.cpp:208: Not a SNIFFER_PACKET, was 5
tools/rdmpro/rdm-sniffer.cpp:208: Not a SNIFFER_PACKET, was 5
tools/rdmpro/rdm-sniffer.cpp:208: Not a SNIFFER_PACKET, was 5
tools/rdmpro/rdm-sniffer.cpp:208: Not a SNIFFER_PACKET, was 5
root@raspberrypi:~#

@peternewman
Copy link
Member

If you've got a bit of time, I'd be curious to try with the setup reversed, so Mk2 as controller, Pro as sniffer and perhaps more likely to do things, stop olad and use the Enttec software on another machine to run the discovery, so the Pi is just doing sniffing (again if you could try both models taking it in turns to sniff).

sure, after stopping olad seems it is working :

I have got 2 Raspberry Pi, so I stopped olad and run rdmpro_sniffer on Pi # 1 with Mk2
then I run full discovery on Pi # 2 with PRO
4 fixtures are connected to Mk2:DMX1(female)
Mk2:DMX1(male) is connected to PRO:DMX OUT(female)

after running full discovery on Pi # 2 a few times, discovery successfully found 4 fixtures + PRO Mk2
and following is the log file content :

root@raspberrypi:~# rdmpro_sniffer -w /tmp/log /dev/ttyUSB0
tools/rdmpro/rdm-sniffer.cpp:208: Not a SNIFFER_PACKET, was 5
tools/rdmpro/rdm-sniffer.cpp:208: Not a SNIFFER_PACKET, was 5
^Z
[1]+  Durdu                   rdmpro_sniffer -w /tmp/log /dev/ttyUSB0
root@raspberrypi:~# rdmpro_sniffer -p /tmp/log
tools/rdmpro/rdm-sniffer.cpp:208: Not a SNIFFER_PACKET, was 5
tools/rdmpro/rdm-sniffer.cpp:208: Not a SNIFFER_PACKET, was 5
root@raspberrypi:~#

I can't find anything in the Enttec API docs either, so we probably need some USB packet sniffing (e.g. Wireshark) to find out what's going on using a machine with the Enttec software, but I'm guessing there is a way to switch the Enttec into RDM mode. I'm not clear if the USB licence dongle is actually required to unlock the Enttec, or only to unlock the software.

I've also just rediscovered this, so it won't work on the Mk2 without some more work (I suspect you need an actual Enttec RDM Pro for our code to work:
https://github.com/OpenLightingProject/ola/blob/master/tools/rdmpro/README.md

I think even if you add --display-dmx to your command, you won't see DMX traffic as it's not in sniffer mode, which our code expects.

@majcit
Copy link

majcit commented Dec 12, 2018

I assume your responder is closed source? Would you be able to loan me some, or at least the control board guts/minimum bit to do RDM responding, (in London), if you don't have the RDM analyser/logic sniffer kit?

Our device is closed source. I am designer and programmer of device, I would be delighted my design to be examined by OLA engineers ! : )
but I have limited authority in sending devices outside and I need to obtain privillige from executive.
I am ready to apply any tests you desire remotely as soon as possible. Please don't hesitate .
regards.

@peternewman peternewman modified the milestones: 0.11.0, 0.future Aug 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants