Frequency concatenation yields wrong data #1102

wfarah · 2021-12-09T21:50:35Z

Using pyuvdata==2.2.4

I’m trying to concatenate uvh5 files in frequency, and I’m simply using uv = uv1 + uv2 + uv3 + ... then writing out .ms file and parsing it to casa. This worked perfectly well all the time I used it, except for now where I started seeing some weirdness. The phase of the autocorrelation for some of the subbands appeared to be non-zero, which is not expected.
What I’m doing is the following:

uv1.read("subband1.uvh5")
uv2.read("subband2.uvh5")
...
uv = uv1 + uv2 + ...
uv.write_ms()
uv.write_hdf5()

I then read the ant_1_names and ant_2_names from all the uv files, and I could see that the first integrations in the concatenated uvfiles are not the autocorrelations, unlike the subbands:

subband1.uvh5
[ 3  5  7 ... 33 33 35] #ant_1_names
[ 3  5  7 ... 35 38 38] #ant_2_names

subband2.uvh5
[ 3  5  7 ... 33 33 35]
[ 3  5  7 ... 35 38 38]

...

concatenated.uvh5
[ 3  3  3 ... 33 33 35] #something is wrong here
[ 3  5  7 ... 35 38 38]

These are some cross/auto correlations respectively, viewed in casa. I also extracted the visibilities straight from the uvh5 files, and I see the same thing

The text was updated successfully, but these errors were encountered:

wfarah · 2021-12-09T21:51:35Z

Data are available on google drive:

https://drive.google.com/drive/folders/1BzNo8fa2yeS8kbO7MoNSPY8pY1FS_j8U

kartographer · 2021-12-10T02:20:24Z

So after a bit of poking around with the data, I was able to replicate the issue, and I think get to the heart of the issue at play. The long and short of this particular case is that there was a mix of concats across both the time-baseline and frequency axes, and after concatenating across the baseline access, the ordering of the baselines was not necessarily the same as it was prior to adding the two UVData objects together.

This actually leads to a more significant issue: I would have expected that UVData.__add__ would have checked the baseline ordering prior to combining the data together, but it does not appear to do so -- it seems only to check that the baseline and time values overlap, and where those overlaps occur, various metadata agree, but not that ant_1_array, ant_2_array, or time_array are in the same order. Looking at the code further, I think there's a similar issue with freq_array when there is frequency channel overlap, and polarization_array when there is polarization overlap. This little snippet of code maybe demonstrates the problem better than I can write it:

from pyuvdata import UVData
from pyuvdata.data import DATA_PATH
import numpy as np

filepath = str(DATA_PATH + "/day2_TDEM0003_10s_norx_1src_1spw.uvfits")
uv = UVData.from_file(filepath)

uv1 = uv.select(polarizations=["ll"], inplace=False)
uv1.reorder_blts("time","ant1")
uv2 = uv.select(polarizations=["rr"], inplace=False)
uv2.reorder_blts("time","ant2")

assert np.any(uv1.ant_1_array != uv2.ant_1_array)
assert np.any(uv1.ant_2_array != uv2.ant_2_array)

uv3 = uv1 + uv2

# Note that if uv1 and uv2 are differently ordered, one of these
# asserts should fail if everything is working as expected.
assert np.all(uv3.data_array[:,:,:,0] == uv2.data_array[:,:,:,0]).
assert np.all(uv3.data_array[:,:,:,1] == uv1.data_array[:,:,:,0])
assert np.any(uv1.ant_1_array != uv2.ant_1_array)
assert np.any(uv1.ant_2_array != uv2.ant_2_array)

I think adding a check for this would be relatively simple, that could throw an error when the ordering for these items is different. One alternative would be to make __add__ a little bit smarter, and sort out the ordering on concatenation, although that may require a bit more work and I don't know how rapidly this needs to get fixed.

david-macmahon · 2021-12-10T03:21:25Z

Does the baseline ordering vary between input datasets or are the time_arrays from each input dataset not identical? AFAIK, these input datasets should have all the same metadata except for the freq_arrays (which should even be contiguous in frequency). But it's not my data so I can't say for sure whether that's all true.

kartographer · 2021-12-10T03:43:55Z

@david-macmahon -- the ordering is in fact not different between them, but some files contain different subsets of times or baselines (my guess being the former), which causes __add__ to effectively reorder them after concatenation. That resultant UVData object does have a different ordering, which is why the subsequent adds effectively corrupt the data set.

david-macmahon · 2021-12-10T08:33:05Z

@kartographer, your guess is exactly right. The first clue was that the file sizes are significantly different. Further examination of the two node1 UVH5 files show that the *_0.uvh5 file contains 59 integrations and the *_1.uvh5 file contains 54 integrations based on the Header/Ntimes value from each file. The final time_array entries differ by 5 integration times (i.e. 59-54), so that nicely corroborates the mismatched time dimension (and therefore the mismatched Nblts dimension).

Not sure if this is originally caused by a "dropped/missing data" problem in the data capture code or mis-matched scan lengths between files. If the former, then the software correlator code should be more robust to dropped/missing data (and the data capture code improved, of course!). In any case it would be good for pyuvdata to either "do the right thing" (whatever that is in this case?) or refuse to combine these mismatched datasets. Either would be preferable to outputting invalid datasets.

bhazelton · 2021-12-10T20:43:49Z

thanks for this report @wfarah, we definitely need to fix this. @kartographer I think we should add the check you're suggesting and we should also check to see if this problem is present in the add methods of the other objects, which were based on this code.

kartographer · 2021-12-10T20:45:09Z

@bhazelton -- sounds good, I'll have a PR for someone to review in just a few minutes...

kartographer · 2021-12-10T21:23:25Z

@bhazelton -- taking a brief look through UVFlag, UVBeam, and UVCal, I think the answer to your question is "yes", although the code that's in UVFlag.__add__ looks less mature and doesn't seem to do nearly as rigorous a job checking metadata as that for the other objects.

bhazelton · 2021-12-10T22:14:40Z

Ok, I'll make issues for those objects so they can be tracked separately.

kartographer self-assigned this Dec 10, 2021

bhazelton added bug UVData labels Dec 10, 2021

kartographer mentioned this issue Dec 10, 2021

Fixing an issue with UVData objects being incorrectly added #1105

Merged

6 tasks

This was referenced Dec 10, 2021

Fix __add__ method in UVCal in the case of overlapping and differently sorted data #1107

Closed

Fix __add__ method in UVBeam in the case of overlapping and differently sorted data #1108

Open

bhazelton added this to the High Priority milestone Dec 10, 2021

bhazelton mentioned this issue Dec 10, 2021

Improve checking in UVFlag __add__ method #1109

Closed

bhazelton closed this as completed in #1105 Dec 10, 2021

kartographer mentioned this issue Dec 12, 2021

Adding a check to verify autos are real-only #1110

Merged

8 tasks

bhazelton mentioned this issue Mar 16, 2023

Make UVFlag add smarter #653

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Frequency concatenation yields wrong data #1102

Frequency concatenation yields wrong data #1102

wfarah commented Dec 9, 2021 •

edited

wfarah commented Dec 9, 2021 •

edited

kartographer commented Dec 10, 2021

david-macmahon commented Dec 10, 2021

kartographer commented Dec 10, 2021

david-macmahon commented Dec 10, 2021

bhazelton commented Dec 10, 2021

kartographer commented Dec 10, 2021

kartographer commented Dec 10, 2021

bhazelton commented Dec 10, 2021

Frequency concatenation yields wrong data #1102

Frequency concatenation yields wrong data #1102

Comments

wfarah commented Dec 9, 2021 • edited

wfarah commented Dec 9, 2021 • edited

kartographer commented Dec 10, 2021

david-macmahon commented Dec 10, 2021

kartographer commented Dec 10, 2021

david-macmahon commented Dec 10, 2021

bhazelton commented Dec 10, 2021

kartographer commented Dec 10, 2021

kartographer commented Dec 10, 2021

bhazelton commented Dec 10, 2021

wfarah commented Dec 9, 2021 •

edited

wfarah commented Dec 9, 2021 •

edited