Improve `adjustment.interpolate_to_reference` #310

r-pascua · 2024-04-30T03:15:22Z

We came across a silent failure mode while running the validation pipeline, which should at least be fixed to break loudly. The gist of it is that the validation simulations were made on an LST grid that starts at 0 hours LST and ends after one day, but the real data often starts somewhere before the phase wrap at $2\pi$ and continues on after the phase wrap. The adjustment.interpolate_to_reference function currently assumes that if there's a phase wrap, then the time array is contiguous, but this assumption is broken by the validation simulations (i.e., the data at an LST close to, but less than, $2\pi$ has a later time stamp than data at an LST close to, but greater than, zero). The right way to deal with this isn't immediately obvious to me, unless we can assume that the data is always sorted in baseline-time order or if we have a method of reliably obtaining the ordering used for the time axis. As far as I know, the first is probably true for most of our use cases (but not in general), and I have vague recollections of trying to get the blt_order for other data to no avail (i.e., it was set to None).

The text was updated successfully, but these errors were encountered:

steven-murray · 2024-04-30T08:16:11Z

Hmmm, I'm trying to digest what you're saying the error is. In the interpolate to reference function, you just want to go from LST1 to LST2, right? Is the problem that the LSTs aren't ordered in the data? You seem to be saying the problem is to do with the times themselves, not the LSTs... but why is this a problem?

r-pascua · 2024-04-30T16:27:11Z

This is a problem because I had been using the time axis to sort the LST axis as a way to avoid phase wraps causing the sorting to go sideways. I do something like this:

time, inds = np.unique(uvdata.time_array, return_index=True)
lsts = uvdata.lst_array[inds]

Here's a MWE of a situation where np.unique will fail when used on either the time_array or the lst_array.

lst_array = [6.24, 0.01]
time_array = [0.99, 0.01]

One option is to unwrap the LST array before sorting. I think this should generally be a viable option, though it will fail in edge cases where the observation run covers more than a day. (It will also fail in obscure cases where the time array isn't sorted properly, but I think it should fail there since the user isn't setting things up properly.) Here's the proposed change:

lst_array = uvdata.lst_array.copy()
lst_array[lst_array<lst_array[0]] += 2*np.pi
lsts = np.unique(lst_array)

There's one other edge case I can think of that could potentially cause a hiccup here, and it's relevant to the validation simulations. So, suppose I have a set of data files where the first LST is at 0 hours and the last LST is at 24 hours (minus epsilon), and the first time is smaller than the last time. Then suppose that I want to read in files that start at 22 hours LST and end at 6 hours LST. If I read in the data via uvdata = UVData.from_file(file_list), then the above case will fail since pyuvdata will sort the data so that the earliest time comes first. The above solution will only work in this case if I force pyuvdata to skip its checks and concatenate along the blt axis assuming the files were provided in the right order. It's a weird corner to be in, but this is precisely the edge case that broke things and led to this issue.

Aside from this, I'm having trouble thinking up other edge cases, so maybe this is the obvious solution that I missed when I wrote this code years ago. Curious to hear your thoughts.

steven-murray · 2024-05-09T09:37:50Z

@r-pascua is the assumption that each baseline has the same unique times associated with it? In that case, you could do something like

inds = uvd.antpair2inds(uvd.get_antpairs()[0])
unique_times = uvd.time_array[inds]

However, I'm still not sure I quite understand the problem here, so maybe this isn't quite what you're looking for. If what you need is to get all the times/lsts associated with each antpair (regardless of whether they're sorted), the above is what you want, I think.

In saying that, I think if you're ordering the files as you want, then you should turn off the re-ordering in the pyuvdata reader. It's then both faster and gives you what you want.

I hope I'm understanding the problem :-P

r-pascua added the type: bug label Apr 30, 2024

r-pascua self-assigned this Apr 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve `adjustment.interpolate_to_reference` #310

Improve `adjustment.interpolate_to_reference` #310

r-pascua commented Apr 30, 2024

steven-murray commented Apr 30, 2024

r-pascua commented Apr 30, 2024 •

edited

Loading

steven-murray commented May 9, 2024

Improve adjustment.interpolate_to_reference #310

Improve adjustment.interpolate_to_reference #310

Comments

r-pascua commented Apr 30, 2024

steven-murray commented Apr 30, 2024

r-pascua commented Apr 30, 2024 • edited Loading

steven-murray commented May 9, 2024

Improve `adjustment.interpolate_to_reference` #310

Improve `adjustment.interpolate_to_reference` #310

r-pascua commented Apr 30, 2024 •

edited

Loading