Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve adjustment.interpolate_to_reference #310

Open
r-pascua opened this issue Apr 30, 2024 · 3 comments
Open

Improve adjustment.interpolate_to_reference #310

r-pascua opened this issue Apr 30, 2024 · 3 comments
Assignees

Comments

@r-pascua
Copy link
Contributor

We came across a silent failure mode while running the validation pipeline, which should at least be fixed to break loudly. The gist of it is that the validation simulations were made on an LST grid that starts at 0 hours LST and ends after one day, but the real data often starts somewhere before the phase wrap at $2\pi$ and continues on after the phase wrap. The adjustment.interpolate_to_reference function currently assumes that if there's a phase wrap, then the time array is contiguous, but this assumption is broken by the validation simulations (i.e., the data at an LST close to, but less than, $2\pi$ has a later time stamp than data at an LST close to, but greater than, zero). The right way to deal with this isn't immediately obvious to me, unless we can assume that the data is always sorted in baseline-time order or if we have a method of reliably obtaining the ordering used for the time axis. As far as I know, the first is probably true for most of our use cases (but not in general), and I have vague recollections of trying to get the blt_order for other data to no avail (i.e., it was set to None).

@r-pascua r-pascua self-assigned this Apr 30, 2024
@steven-murray
Copy link
Contributor

Hmmm, I'm trying to digest what you're saying the error is. In the interpolate to reference function, you just want to go from LST1 to LST2, right? Is the problem that the LSTs aren't ordered in the data? You seem to be saying the problem is to do with the times themselves, not the LSTs... but why is this a problem?

@r-pascua
Copy link
Contributor Author

r-pascua commented Apr 30, 2024

This is a problem because I had been using the time axis to sort the LST axis as a way to avoid phase wraps causing the sorting to go sideways. I do something like this:

time, inds = np.unique(uvdata.time_array, return_index=True)
lsts = uvdata.lst_array[inds]

Here's a MWE of a situation where np.unique will fail when used on either the time_array or the lst_array.

lst_array = [6.24, 0.01]
time_array = [0.99, 0.01]

One option is to unwrap the LST array before sorting. I think this should generally be a viable option, though it will fail in edge cases where the observation run covers more than a day. (It will also fail in obscure cases where the time array isn't sorted properly, but I think it should fail there since the user isn't setting things up properly.) Here's the proposed change:

lst_array = uvdata.lst_array.copy()
lst_array[lst_array<lst_array[0]] += 2*np.pi
lsts = np.unique(lst_array)

There's one other edge case I can think of that could potentially cause a hiccup here, and it's relevant to the validation simulations. So, suppose I have a set of data files where the first LST is at 0 hours and the last LST is at 24 hours (minus epsilon), and the first time is smaller than the last time. Then suppose that I want to read in files that start at 22 hours LST and end at 6 hours LST. If I read in the data via uvdata = UVData.from_file(file_list), then the above case will fail since pyuvdata will sort the data so that the earliest time comes first. The above solution will only work in this case if I force pyuvdata to skip its checks and concatenate along the blt axis assuming the files were provided in the right order. It's a weird corner to be in, but this is precisely the edge case that broke things and led to this issue.

Aside from this, I'm having trouble thinking up other edge cases, so maybe this is the obvious solution that I missed when I wrote this code years ago. Curious to hear your thoughts.

@steven-murray
Copy link
Contributor

@r-pascua is the assumption that each baseline has the same unique times associated with it? In that case, you could do something like

inds = uvd.antpair2inds(uvd.get_antpairs()[0])
unique_times = uvd.time_array[inds]

However, I'm still not sure I quite understand the problem here, so maybe this isn't quite what you're looking for. If what you need is to get all the times/lsts associated with each antpair (regardless of whether they're sorted), the above is what you want, I think.

In saying that, I think if you're ordering the files as you want, then you should turn off the re-ordering in the pyuvdata reader. It's then both faster and gives you what you want.

I hope I'm understanding the problem :-P

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants