You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We're noticing a ~5-10s read time for get_new_corr in the hera_snap_redis_monitor.
I've attached the relevant slack conversation too.
Slack Convo
Matthew Kolopanis 8 days ago
dave and I narrowed down the snap monitor delay a bit. it's taking at least 10s for the corr.get_new_corr(steam, steam) call to complete for each ADC lane for each snap.
🐢
1
🐌
1
🦥
1
🕸️
1
🐳
1
Lindsay Berkhout 8 days ago
That function has been very inconsistent for me in my ASU setup. Sometimes it takes no time at all and sometimes it takes quite long to get data. I've never figured out what conditions cause what.
Matthew Kolopanis 8 days ago
fun. It seems to wait twice for the register named acc_cnt to increment before reading the dout register
Matthew Kolopanis 8 days ago
I was timing this acc_cnt incrementing to ~5s per time (so twice is 10s obviously)
Matthew Kolopanis 8 days ago
just saw one that took 16s
Bryna Hazelton 8 days ago
It must not always have been this slow
Matthew Kolopanis 8 days ago
Dave and I came to the same conclusion. I remember this script used to move through the entire array pretty quickly.
Bryna Hazelton 8 days ago
do we know what increments the acc_cnt?
Jonathon Kocz 8 days ago
It should be based on a specified integration time for the onboard xcorr block. I'll take a look and see if I can work out what it's set too and if/why it's slow.
Matthew Kolopanis 8 days ago
I can tell you the default acc_len for the Corr block objects is 3815 if that helps off the top of your head.
Jonathon Kocz 7 days ago
that's an interesting number... do we know why/how it was set originally? - Looking at the code, it interprets acc_len in clock cycles (not spectra) and it takes 1024 clock cycles to get 1 spectra, so 3815 is a weird number to have... (I'm wondering if I'm missing something here...) (edited)
Jonathon Kocz 7 days ago
It's further multiplied by 8 (there are 8 inputs to cycle through) in the code before being used. So a value of 3815 becomes 30520.
(With on spectra every 1024 clocks x 8 inputs, it takes 8192 clock cycles to set through a full set of spectra from each input.) (edited)
Jonathon Kocz 7 days ago
So to return one element of a get_new_corr with an acc_len of 3815, it's asking for an average of ~3.7 spectra, which should take ~ 122us to complete.
Jonathon Kocz 7 days ago
So I'm pretty sure I'm missing something - e.g. why 3815, but that's the initial calculation.
Dave MacMahon 7 days ago
I'm pretty sure the code writes 1024acc_len to the identically named acc_len register, so using the default of acc_len=3815 would write 10243815 == 3906560 to the acc_len register. I think this means that it takes just over 1/8 of a second.
Matthew Kolopanis 7 days ago
but either way 122microsecond, 125miliseconds is a lot faster than 5s
Danny Jacobs 5 days ago
Bumping this because its slowed our telemetry update period from 1m to 30m. Its really important for observers to be able to see when things change. @Jonathon what would be a reasonable acc_len?
Dave MacMahon 5 days ago
One thing that could help is to parallelize this over multiple snaps, but I think that was not done because that makes (made?) communications with the snaps less reliable. One thing that could alleviate that problem would be to manually/administratively pre-populate the ARP table on hera-snap-head. That should minimize if not eliminate the broadcast traffic to the snaps that (I think) is the cause of instability during parallel access to the snaps.
Danny Jacobs 4 days ago
But is there a reason it takes 5s to read out a few ms long integration?
Dave MacMahon 4 days ago
That is it's own issue that is independent of serialized-vs-parallelized reading of the snaps. The actual readout happens very quickly. It's the waiting for acc_cnt to increment that is taking longer than expected.
Jonathon Kocz 2 days ago
It should certainly be safe to parallelize this now (the comms problems we had should be fixed) so that would solve one aspect. I'm still not sure why it's taking so long for acc_cnt to increment. - Where is the monitoring code that does the polling? I was wondering if it's switching up the inputs too quickly and it delays the acc_cnt? (Though if it's just capturing one at a time then incrementing, I'm not sure why that would be the case).
Matthew Kolopanis 2 days ago
hera_corr_f/control_software/scripts/hera_snap_redis_monitor.py
also regarding the comms issues should be fixed. we were still seeing tftp errors when dave and I were looking at this last week. Does that imply the snaps are getting initialized/programed with an old fpg file?
Jonathon Kocz 2 days ago
Mostly - there should now only be tftp errors if a snap is dead / needs a power cycle.
Jonathon Kocz 2 days ago
The only other time there migth be an error is during the programming.
Jonathon Kocz 2 days ago
(as the "golden image" is still the old code).
Matthew Kolopanis 2 days ago
okay, makes some sense :thinking_face:
Jonathon Kocz 2 days ago
Certainly from what I've seen from the ADC comms etc which are all in parallel now, there are only errors when a snap has got into a bad state and needs resetting.
Jonathon Kocz 2 days ago
Confirmed also as Dave said the code writes 1024*acc_len, which is then further *8 on the FPGA, so it's 31,252,480 clock cycles per integration, so ~125ms per spectra.
I'll check the monitor, but if we're reading the full matrix of 6 inputs, that's 21 combinations, which would be ~2.63s, so x2 would give 5.25, so I could see how it would be a ~ 5 second wait (for a full set / snap). If we're just reading the 3 autocorrs, obviously that's not the reason.
Matthew Kolopanis 2 days ago
but it is taking 5 seconds per input
Matthew Kolopanis 2 days ago
maybe i'm missing something here
Jonathon Kocz 2 days ago
probably not :)
Jonathon Kocz 2 days ago
if it's 5 seconds / input, then something is very broken, I'm just not sure what yet.
The text was updated successfully, but these errors were encountered:
We're noticing a ~5-10s read time for
get_new_corr
in thehera_snap_redis_monitor
.I've attached the relevant slack conversation too.
Slack Convo
Matthew Kolopanis 8 days ago
dave and I narrowed down the snap monitor delay a bit. it's taking at least 10s for the corr.get_new_corr(steam, steam) call to complete for each ADC lane for each snap.
🐢
1
🐌
1
🦥
1
🕸️
1
🐳
1
Lindsay Berkhout 8 days ago
That function has been very inconsistent for me in my ASU setup. Sometimes it takes no time at all and sometimes it takes quite long to get data. I've never figured out what conditions cause what.
Matthew Kolopanis 8 days ago
fun. It seems to wait twice for the register named acc_cnt to increment before reading the dout register
Matthew Kolopanis 8 days ago
I was timing this acc_cnt incrementing to ~5s per time (so twice is 10s obviously)
Matthew Kolopanis 8 days ago
just saw one that took 16s
Bryna Hazelton 8 days ago
It must not always have been this slow
Matthew Kolopanis 8 days ago
Dave and I came to the same conclusion. I remember this script used to move through the entire array pretty quickly.
Bryna Hazelton 8 days ago
do we know what increments the acc_cnt?
Matthew Kolopanis 8 days ago
@Jonathon?
Jonathon Kocz 8 days ago
It should be based on a specified integration time for the onboard xcorr block. I'll take a look and see if I can work out what it's set too and if/why it's slow.
Matthew Kolopanis 8 days ago
I can tell you the default acc_len for the Corr block objects is 3815 if that helps off the top of your head.
Jonathon Kocz 7 days ago
that's an interesting number... do we know why/how it was set originally? - Looking at the code, it interprets acc_len in clock cycles (not spectra) and it takes 1024 clock cycles to get 1 spectra, so 3815 is a weird number to have... (I'm wondering if I'm missing something here...) (edited)
Jonathon Kocz 7 days ago
It's further multiplied by 8 (there are 8 inputs to cycle through) in the code before being used. So a value of 3815 becomes 30520.
(With on spectra every 1024 clocks x 8 inputs, it takes 8192 clock cycles to set through a full set of spectra from each input.) (edited)
Jonathon Kocz 7 days ago
So to return one element of a get_new_corr with an acc_len of 3815, it's asking for an average of ~3.7 spectra, which should take ~ 122us to complete.
Jonathon Kocz 7 days ago
So I'm pretty sure I'm missing something - e.g. why 3815, but that's the initial calculation.
Dave MacMahon 7 days ago
I'm pretty sure the code writes 1024acc_len to the identically named acc_len register, so using the default of acc_len=3815 would write 10243815 == 3906560 to the acc_len register. I think this means that it takes just over 1/8 of a second.
Matthew Kolopanis 7 days ago
but either way 122microsecond, 125miliseconds is a lot faster than 5s
Danny Jacobs 5 days ago
Bumping this because its slowed our telemetry update period from 1m to 30m. Its really important for observers to be able to see when things change. @Jonathon what would be a reasonable acc_len?
Dave MacMahon 5 days ago
One thing that could help is to parallelize this over multiple snaps, but I think that was not done because that makes (made?) communications with the snaps less reliable. One thing that could alleviate that problem would be to manually/administratively pre-populate the ARP table on hera-snap-head. That should minimize if not eliminate the broadcast traffic to the snaps that (I think) is the cause of instability during parallel access to the snaps.
Danny Jacobs 4 days ago
But is there a reason it takes 5s to read out a few ms long integration?
Dave MacMahon 4 days ago
That is it's own issue that is independent of serialized-vs-parallelized reading of the snaps. The actual readout happens very quickly. It's the waiting for acc_cnt to increment that is taking longer than expected.
Jonathon Kocz 2 days ago
It should certainly be safe to parallelize this now (the comms problems we had should be fixed) so that would solve one aspect. I'm still not sure why it's taking so long for acc_cnt to increment. - Where is the monitoring code that does the polling? I was wondering if it's switching up the inputs too quickly and it delays the acc_cnt? (Though if it's just capturing one at a time then incrementing, I'm not sure why that would be the case).
Matthew Kolopanis 2 days ago
hera_corr_f/control_software/scripts/hera_snap_redis_monitor.py
also regarding the comms issues should be fixed. we were still seeing tftp errors when dave and I were looking at this last week. Does that imply the snaps are getting initialized/programed with an old fpg file?
Jonathon Kocz 2 days ago
Mostly - there should now only be tftp errors if a snap is dead / needs a power cycle.
Jonathon Kocz 2 days ago
The only other time there migth be an error is during the programming.
Jonathon Kocz 2 days ago
(as the "golden image" is still the old code).
Matthew Kolopanis 2 days ago
okay, makes some sense :thinking_face:
Jonathon Kocz 2 days ago
Certainly from what I've seen from the ADC comms etc which are all in parallel now, there are only errors when a snap has got into a bad state and needs resetting.
Jonathon Kocz 2 days ago
Confirmed also as Dave said the code writes 1024*acc_len, which is then further *8 on the FPGA, so it's 31,252,480 clock cycles per integration, so ~125ms per spectra.
I'll check the monitor, but if we're reading the full matrix of 6 inputs, that's 21 combinations, which would be ~2.63s, so x2 would give 5.25, so I could see how it would be a ~ 5 second wait (for a full set / snap). If we're just reading the 3 autocorrs, obviously that's not the reason.
Matthew Kolopanis 2 days ago
but it is taking 5 seconds per input
Matthew Kolopanis 2 days ago
maybe i'm missing something here
Jonathon Kocz 2 days ago
probably not :)
Jonathon Kocz 2 days ago
if it's 5 seconds / input, then something is very broken, I'm just not sure what yet.
The text was updated successfully, but these errors were encountered: