Add an option to be able to ignore N-sample while LST-binning #932

Kai-FengChen · 2024-01-31T22:47:12Z

Add an option weight_by_nsamples (default True). When turning off, will weight only by flagging patterns but still propagate the nsamples.

(Minor concern: Is it a bad practice to set something with default True? When modifying the arg_parser it feels a bit weird to have an argument that has action="store_true" but also default to be True... But in my defence, weight_by_nsamples seems more straightforward than not_weight_by_nsamples)

codecov · 2024-01-31T22:59:12Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.18%. Comparing base (cc0a13d) to head (b1288c5).

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #932      +/-   ##
==========================================
- Coverage   97.18%   97.18%   -0.01%     
==========================================
  Files          30       30              
  Lines       10733    10727       -6     
==========================================
- Hits        10431    10425       -6     
  Misses        302      302

Flag	Coverage Δ
unittests	`97.18% <ø> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

steven-murray

Only a few comments on the code here, and it should do what you're saying it should do. However, I'm not quite sure of the motivation here. Is the assumption that in-painted data coming in will have Nsamples=0 but be un-flagged? So this would give you a way to distinguish between "true" and "inpainted" data?

steven-murray · 2024-02-01T11:04:33Z

hera_cal/tests/test_lstbin.py

+        # test weighted_by_nsamples, nsamples are propagated but data is not weighted by nsamples if set to False
+        output1 = lstbin.lst_bin(self.data_list, self.lst_list, dlst=dlst,
+                                 flags_list=self.flgs_list, nsamples_list=self.nsmp_list,
+                                 weight_by_nsamples=True)
+
+        nsmps1 = copy.deepcopy(self.nsmps1)
+        nsmps1[(24, 25, 'ee')][:, 32] = 0
+        nsmps2 = copy.deepcopy(self.nsmps2)
+        nsmps2[(24, 25, 'ee')][:, 32] = 0
+        nsmps3 = copy.deepcopy(self.nsmps3)
+        nsmps3[(24, 25, 'ee')][:, 32] = 0
+        nsmps_list = [nsmps1, nsmps2, nsmps3]
+        output = lstbin.lst_bin(self.data_list, self.lst_list, dlst=dlst,
+                                flags_list=self.flgs_list, nsamples_list=nsmps_list,
+                                weight_by_nsamples=True)
+        # Check Nsamples are all 0
+        assert np.allclose(output[-1][(24, 25, 'ee')].real[:, 32], 0)
+        # Check data got weighted sum to 0
+        assert np.allclose(output[1][(24, 25, 'ee')].real[100, 32], 0)
+        output = lstbin.lst_bin(self.data_list, self.lst_list, dlst=dlst,
+                                flags_list=self.flgs_list, nsamples_list=nsmps_list,
+                                weight_by_nsamples=False)
+        # Check Nsamples are all 0
+        assert np.allclose(output[-1][(24, 25, 'ee')].real[:, 32], 0)
+        # Check data is the same as before
+        assert np.allclose(output[1][(24, 25, 'ee')].real, output1[1][(24, 25, 'ee')].real)
+


Can we please extract this into its own test?

steven-murray · 2024-02-01T11:04:56Z

hera_cal/tests/test_lstbin.py

+        # test weight_by_nsamples
+        lstbin.lst_bin_files(self.data_files, ntimes_per_file=250, outdir="./", overwrite=True,
+                             verbose=False, rephase=True, weight_by_nsamples=False, file_ext=file_ext)
+        output_lst_file = "./zen.ee.LST.0.20124.uvh5"
+        output_std_file = "./zen.ee.STD.0.20124.uvh5"
+        assert os.path.exists(output_lst_file)
+        assert os.path.exists(output_std_file)
+        os.remove(output_lst_file)
+        os.remove(output_std_file)
+


Please also extract this into its own test

Actually so for here I did not do anything I just tested there is no error running this new option so it kind of falls into the "# basic execution" catalogue, similar to all the tests above like testing the rephase option. Does this still need to be its separate test?

steven-murray · 2024-02-01T11:09:12Z

hera_cal/lstbin.py

@@ -542,6 +549,7 @@ def lst_bin_arg_parser():
    a.add_argument("--outdir", default=None, type=str, help="directory for writing output")
    a.add_argument("--overwrite", default=False, action='store_true', help="overwrite output files")
    a.add_argument("--lst_start", type=float, default=None, help="starting LST for binner as it sweeps across 2pi LST. Default is first LST of first file.")
+    a.add_argument("--weight_by_nsamples", default=True, action='store_true', help="Weight by nsamples during LST binning. If set to False, weight by flags only. Default True.")


I don't think this will work. As far as I know, there's no way to set the flag to False on the command line. So I think you'll need to use something like --weight-by-flags-only

Gotcha, thanks! I will switch all the options to weight_by_flags_only.

Kai-FengChen · 2024-02-02T00:07:26Z

Yes so the idea is because we now keep track of N-samples, channels that are originally flagged and inpainted will still have zero nsample. If we used the original lstbin routine, the inpainted channel will be weighted by 0 during lstbining. By changing to this only_weighted_by_flag option we can average inpainted data with real data during lstbining.

Kai-FengChen · 2024-05-27T18:54:23Z

After a quick discussion with @jsdillon, I removed the tester test_lstbin.py for the old lst binner and resolved some conflicts to have this PR ready to be merged.

As a reminder, this PR is for the H4C re-run so that we are able to properly propagate nsample (i.e., treat inpainted channels as having nsample 0) but still use inpainted data during lst binning (so weight data by flagging pattern instead of nsample).

jsdillon

looks OK to me

Kai-FengChen added 3 commits January 31, 2024 13:50

Add a switch in lst_bin to allow for weight_by_nsamples or not

2d62541

Add tests related to weight_by_nsamples

68f65da

Add weight_by_nsamples in argparser

2f59937

Kai-FengChen requested review from steven-murray and jsdillon January 31, 2024 22:47

steven-murray reviewed Feb 1, 2024

View reviewed changes

Kai-FengChen added 2 commits February 1, 2024 17:38

Change from weight_by_nsamples to weight_only_by_flags

3f2f1f6

Modify tests after changing weight_by_nsamples to weight_only_by_flags

592911b

Kai-FengChen mentioned this pull request Feb 2, 2024

H4C Nsamples Changes HERA-Team/hera_pipelines#24

Merged

Kai-FengChen requested a review from steven-murray February 5, 2024 14:56

steven-murray approved these changes Feb 5, 2024

View reviewed changes

Resolve merge conflict for the old lst binner.

b1288c5

jsdillon approved these changes May 27, 2024

View reviewed changes

jsdillon merged commit f79a4b9 into main May 27, 2024
9 of 11 checks passed

jsdillon deleted the lstbin_ignore_nsample branch May 27, 2024 19:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add an option to be able to ignore N-sample while LST-binning #932

Add an option to be able to ignore N-sample while LST-binning #932

Kai-FengChen commented Jan 31, 2024

codecov bot commented Jan 31, 2024 •

edited

Loading

steven-murray left a comment

steven-murray Feb 1, 2024

steven-murray Feb 1, 2024

Kai-FengChen Feb 2, 2024

steven-murray Feb 1, 2024

Kai-FengChen Feb 2, 2024

Kai-FengChen commented Feb 2, 2024

Kai-FengChen commented May 27, 2024

jsdillon left a comment

Add an option to be able to ignore N-sample while LST-binning #932

Add an option to be able to ignore N-sample while LST-binning #932

Conversation

Kai-FengChen commented Jan 31, 2024

codecov bot commented Jan 31, 2024 • edited Loading

Codecov Report

steven-murray left a comment

Choose a reason for hiding this comment

steven-murray Feb 1, 2024

Choose a reason for hiding this comment

steven-murray Feb 1, 2024

Choose a reason for hiding this comment

Kai-FengChen Feb 2, 2024

Choose a reason for hiding this comment

steven-murray Feb 1, 2024

Choose a reason for hiding this comment

Kai-FengChen Feb 2, 2024

Choose a reason for hiding this comment

Kai-FengChen commented Feb 2, 2024

Kai-FengChen commented May 27, 2024

jsdillon left a comment

Choose a reason for hiding this comment

codecov bot commented Jan 31, 2024 •

edited

Loading