Fix reconstitute interleaving features #886

aewallwi · 2023-04-09T07:40:14Z

Modify vis_clean.time_chunk_from_baseline_chunks code so that all interleaves will fall into the same final labeled time bin in makeflow which are set by the LSTs every stride_length. Without this fix, the time averages for different "interleaves" can fall into different makeflow file indices when we undo the cornerturn after performing time averaging.

codecov · 2023-04-10T05:44:35Z

Codecov Report

Patch coverage: 95.83% and project coverage change: +0.04 🎉

Comparison is base (e712061) 97.14% compared to head (b6ecdfb) 97.18%.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #886      +/-   ##
==========================================
+ Coverage   97.14%   97.18%   +0.04%     
==========================================
  Files          21       21              
  Lines        9354     9392      +38     
==========================================
+ Hits         9087     9128      +41     
+ Misses        267      264       -3

Impacted Files	Coverage Δ
hera_cal/vis_clean.py	`97.62% <95.83%> (-0.16%)`	⬇️

... and 5 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

jsdillon

Mostly looks good. Just a few questions and clarifications.

jsdillon · 2023-05-01T17:35:55Z

hera_cal/vis_clean.py

@@ -2023,7 +2022,9 @@ def time_chunk_from_baseline_chunks(time_chunk_template, baseline_chunk_files, o
        output will trim the extra frequencies in the time_chunk and write out trimmed freqs. The same is true
        for polarizations.
    baseline_chunk_files : list of strings
-        list of paths to baseline-chunk files to select time-chunk file from.
+        list of paths to baseline-chunk files to select time-chunk file from. If the files have "_interleave" in their title then


Is this any concern that this will be triggered accidentally? Do we want to have some kind of boolean trigger (default false) for this behavior or do you think that's too rare of an edge case to be worried about?

jsdillon · 2023-05-01T17:37:17Z

hera_cal/vis_clean.py

+           "interleave" in fname and not interleave_mode:
+            raise ValueError("must not have a subset of files with 'interleave' in name.")
+    if interleave_mode:
+        interleave_indices = np.unique([int(re.findall("interleave_[0-9]{1,10}", fname)[0][11:]) for fname in baseline_chunk_files])


This feels rather hard-coded

At minimum, the precise expectation for the file name format should be documented

Agreed. I actually think a much more flexible and future-proof way of doing this would be to have a new argument to the function, like interleave_regex, which if provided, will do the above (it could be the above by default, but probably better to define it with a capture group instead).

jsdillon · 2023-05-01T17:39:44Z

hera_cal/vis_clean.py

+           "interleave" in fname and not interleave_mode:
+            raise ValueError("must not have a subset of files with 'interleave' in name.")
+    if interleave_mode:
+        interleave_indices = np.unique([int(re.findall("interleave_[0-9]{1,10}", fname)[0][11:]) for fname in baseline_chunk_files])


This feels like a function that should probably be broken out for future use.

steven-murray

Thanks @aewallwi -- I have a few comments that I think will make this a little more future-robust, but otherwise it seems good.

steven-murray · 2023-04-27T18:43:02Z

hera_cal/tests/test_vis_clean.py

+        tmp_path = tmpdir.strpath
+        cdir = tmp_path + "/cache_temp"
+        os.mkdir(cdir)


In new code, I'd heartily suggest using the pathlib module for path operations. You can use the pytest fixture tmp_path_factory, then do cdir = tmp_path_factory.mkdir('test_time_chunk_from_baseline_chunks').

steven-murray · 2023-05-02T23:15:54Z

hera_cal/tests/test_vis_clean.py

        # reconstitute the filtered data
        for filenum, file in enumerate(datafiles):
            # reconstitute
+            # AEW -- 5-10-2023 -- I AM HERE!


Good to know!

steven-murray · 2023-05-02T23:16:55Z

hera_cal/vis_clean.py

@@ -794,8 +794,7 @@ def vis_clean(self, keys=None, x=None, data=None, flags=None, wgts=None,
            # get filter properties
            mfrate = max_frate[k] if max_frate is not None else None
            filter_centers, filter_half_widths = gen_filter_properties(ax=ax, horizon=horizon,
-                                                                       standoff=standoff, min_dly=min_dly,
-                                                                       bl_len=self.bllens[k[:2]], max_frate=mfrate)
+            standoff=standoff, min_dly=min_dly,                                                                    bl_len=self.bllens[k[:2]], max_frate=mfrate)


I can't tell what's happening on this line, but it seems weird

steven-murray · 2023-05-02T23:20:59Z

hera_cal/vis_clean.py

-        list of paths to baseline-chunk files to select time-chunk file from.
+        list of paths to baseline-chunk files to select time-chunk file from. If the files have "_interleave" in their title then
+        the method will automatically identify the number of unique interleaves, chunk the file list up into interleaved sets and
+        retrieve integrations based on the time in the first file. 


For me, just having read this docstring, I'm still not clear on exactly what's happening here. Can you maybe provide a more clear example in the docstring?

steven-murray · 2023-05-02T23:21:30Z

hera_cal/vis_clean.py

@@ -2040,11 +2041,38 @@ def time_chunk_from_baseline_chunks(time_chunk_template, baseline_chunk_files, o
    -------
        Nothing
    """
+    # check whether "interleave" is in baseline chunk filenames. If so, make sure that they all have "interleave",
+    # split them into sets, and make sure that the sets all have the same number of files.
+    interleave_mode = "interleave" in baseline_chunk_files[0]


this is not consistent with the docstring, which states that if "_interleave" is in the filename, it'll do the interleave mode

steven-murray · 2023-05-02T23:22:35Z

hera_cal/vis_clean.py

+    for fname in baseline_chunk_files:
+        if "interleave" not in fname and interleave_mode or \
+           "interleave" in fname and not interleave_mode:
+            raise ValueError("must not have a subset of files with 'interleave' in name.")


I think a nicer error here would be "Cannot have some baseline_chunk_files with "_interleave" in their name, while other's don't"

Furthermore, add this info to the docstring (i.e. say that all or none of the files must have "_interleave" in the name)

steven-murray · 2023-05-02T23:25:10Z

hera_cal/vis_clean.py

+           "interleave" in fname and not interleave_mode:
+            raise ValueError("must not have a subset of files with 'interleave' in name.")
+    if interleave_mode:
+        interleave_indices = np.unique([int(re.findall("interleave_[0-9]{1,10}", fname)[0][11:]) for fname in baseline_chunk_files])


Agreed. I actually think a much more flexible and future-proof way of doing this would be to have a new argument to the function, like interleave_regex, which if provided, will do the above (it could be the above by default, but probably better to define it with a capture group instead).

aewallwi and others added 4 commits April 8, 2023 15:33

started on interleave mode for reconstitution.

3170aed

added interleave support.

26deca1

Merge branch 'main' into fix_reconstitute_interleaving_features

3a3f1b1

fix existing unittests.

2090e55

aewallwi and others added 3 commits April 23, 2023 21:39

unittests running again.

ea99ef9

Merge branch 'main' into fix_reconstitute_interleaving_features

2fd7546

fix bug where wrong times are read from interleave.

b6ecdfb

aewallwi requested a review from steven-murray April 24, 2023 06:07

aewallwi marked this pull request as ready for review May 1, 2023 04:58

aewallwi requested review from jsdillon and removed request for steven-murray May 1, 2023 04:58

jsdillon requested changes May 2, 2023

View reviewed changes

steven-murray reviewed May 2, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix reconstitute interleaving features #886

Fix reconstitute interleaving features #886

aewallwi commented Apr 9, 2023

codecov bot commented Apr 10, 2023 •

edited

Loading

jsdillon left a comment

jsdillon May 1, 2023

jsdillon May 1, 2023

jsdillon May 1, 2023

steven-murray May 2, 2023

jsdillon May 1, 2023

steven-murray left a comment

steven-murray Apr 27, 2023

steven-murray May 2, 2023

steven-murray May 2, 2023

steven-murray May 2, 2023

steven-murray May 2, 2023

steven-murray May 2, 2023

steven-murray May 2, 2023

steven-murray May 2, 2023

Fix reconstitute interleaving features #886

Are you sure you want to change the base?

Fix reconstitute interleaving features #886

Conversation

aewallwi commented Apr 9, 2023

codecov bot commented Apr 10, 2023 • edited Loading

Codecov Report

jsdillon left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

steven-murray left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Apr 10, 2023 •

edited

Loading