Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix reconstitute interleaving features #886

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

aewallwi
Copy link
Collaborator

@aewallwi aewallwi commented Apr 9, 2023

Modify vis_clean.time_chunk_from_baseline_chunks code so that all interleaves will fall into the same final labeled time bin in makeflow which are set by the LSTs every stride_length. Without this fix, the time averages for different "interleaves" can fall into different makeflow file indices when we undo the cornerturn after performing time averaging.

@codecov
Copy link

codecov bot commented Apr 10, 2023

Codecov Report

Patch coverage: 95.83% and project coverage change: +0.04 🎉

Comparison is base (e712061) 97.14% compared to head (b6ecdfb) 97.18%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #886      +/-   ##
==========================================
+ Coverage   97.14%   97.18%   +0.04%     
==========================================
  Files          21       21              
  Lines        9354     9392      +38     
==========================================
+ Hits         9087     9128      +41     
+ Misses        267      264       -3     
Impacted Files Coverage Δ
hera_cal/vis_clean.py 97.62% <95.83%> (-0.16%) ⬇️

... and 5 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@aewallwi aewallwi marked this pull request as ready for review May 1, 2023 04:58
@aewallwi aewallwi requested review from jsdillon and removed request for steven-murray May 1, 2023 04:58
Copy link
Member

@jsdillon jsdillon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looks good. Just a few questions and clarifications.

@@ -2023,7 +2022,9 @@ def time_chunk_from_baseline_chunks(time_chunk_template, baseline_chunk_files, o
output will trim the extra frequencies in the time_chunk and write out trimmed freqs. The same is true
for polarizations.
baseline_chunk_files : list of strings
list of paths to baseline-chunk files to select time-chunk file from.
list of paths to baseline-chunk files to select time-chunk file from. If the files have "_interleave" in their title then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this any concern that this will be triggered accidentally? Do we want to have some kind of boolean trigger (default false) for this behavior or do you think that's too rare of an edge case to be worried about?

"interleave" in fname and not interleave_mode:
raise ValueError("must not have a subset of files with 'interleave' in name.")
if interleave_mode:
interleave_indices = np.unique([int(re.findall("interleave_[0-9]{1,10}", fname)[0][11:]) for fname in baseline_chunk_files])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels rather hard-coded

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At minimum, the precise expectation for the file name format should be documented

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I actually think a much more flexible and future-proof way of doing this would be to have a new argument to the function, like interleave_regex, which if provided, will do the above (it could be the above by default, but probably better to define it with a capture group instead).

"interleave" in fname and not interleave_mode:
raise ValueError("must not have a subset of files with 'interleave' in name.")
if interleave_mode:
interleave_indices = np.unique([int(re.findall("interleave_[0-9]{1,10}", fname)[0][11:]) for fname in baseline_chunk_files])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like a function that should probably be broken out for future use.

Copy link
Contributor

@steven-murray steven-murray left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @aewallwi -- I have a few comments that I think will make this a little more future-robust, but otherwise it seems good.

Comment on lines +1143 to +1145
tmp_path = tmpdir.strpath
cdir = tmp_path + "/cache_temp"
os.mkdir(cdir)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In new code, I'd heartily suggest using the pathlib module for path operations. You can use the pytest fixture tmp_path_factory, then do cdir = tmp_path_factory.mkdir('test_time_chunk_from_baseline_chunks').

# reconstitute the filtered data
for filenum, file in enumerate(datafiles):
# reconstitute
# AEW -- 5-10-2023 -- I AM HERE!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to know!

@@ -794,8 +794,7 @@ def vis_clean(self, keys=None, x=None, data=None, flags=None, wgts=None,
# get filter properties
mfrate = max_frate[k] if max_frate is not None else None
filter_centers, filter_half_widths = gen_filter_properties(ax=ax, horizon=horizon,
standoff=standoff, min_dly=min_dly,
bl_len=self.bllens[k[:2]], max_frate=mfrate)
standoff=standoff, min_dly=min_dly, bl_len=self.bllens[k[:2]], max_frate=mfrate)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't tell what's happening on this line, but it seems weird

list of paths to baseline-chunk files to select time-chunk file from.
list of paths to baseline-chunk files to select time-chunk file from. If the files have "_interleave" in their title then
the method will automatically identify the number of unique interleaves, chunk the file list up into interleaved sets and
retrieve integrations based on the time in the first file.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me, just having read this docstring, I'm still not clear on exactly what's happening here. Can you maybe provide a more clear example in the docstring?

@@ -2040,11 +2041,38 @@ def time_chunk_from_baseline_chunks(time_chunk_template, baseline_chunk_files, o
-------
Nothing
"""
# check whether "interleave" is in baseline chunk filenames. If so, make sure that they all have "interleave",
# split them into sets, and make sure that the sets all have the same number of files.
interleave_mode = "interleave" in baseline_chunk_files[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not consistent with the docstring, which states that if "_interleave" is in the filename, it'll do the interleave mode

for fname in baseline_chunk_files:
if "interleave" not in fname and interleave_mode or \
"interleave" in fname and not interleave_mode:
raise ValueError("must not have a subset of files with 'interleave' in name.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a nicer error here would be "Cannot have some baseline_chunk_files with "_interleave" in their name, while other's don't"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Furthermore, add this info to the docstring (i.e. say that all or none of the files must have "_interleave" in the name)

"interleave" in fname and not interleave_mode:
raise ValueError("must not have a subset of files with 'interleave' in name.")
if interleave_mode:
interleave_indices = np.unique([int(re.findall("interleave_[0-9]{1,10}", fname)[0][11:]) for fname in baseline_chunk_files])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I actually think a much more flexible and future-proof way of doing this would be to have a new argument to the function, like interleave_regex, which if provided, will do the above (it could be the above by default, but probably better to define it with a capture group instead).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants