Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add flagging functions for power spectra #157

Closed
wants to merge 9 commits into from
Closed

Conversation

shaunakmodak
Copy link
Collaborator

Functions for generating greedy flagging masks, consolidating LST binned NSamples & Flag data into numpy arrays, and plotting visualizations of the flagging.

@ghost ghost assigned shaunakmodak Jul 13, 2018
@ghost ghost added the in progress label Jul 13, 2018
Copy link
Collaborator

@philbull philbull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review of in-progress PR.

greedy : bool
greedy flagging is used if true (default is False)

axis : int
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be better off as a string, and maybe with a name change, e.g. first='row'.

integer array with number of samples available for each frequency channel at a given LST angle

flags : numpy.ndarray
binary array with 1 representing flagged, 0 representing unflagged
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the same as the output of UVData.get_flags(), or does that need to be modified in some way before passing it to this function?


mask_generator(nsamples, flags, n_threshold, greedy=False, axis, greedy_threshold, retain_flags=True):
"""
Generates a greedy flags mask from input flags and nsamples arrays
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's worth briefly explaining how the algorithm works here.

n_threshold : int
minimum number of samples needed for a point to remain unflagged

greedy : bool
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if it's False?

@@ -0,0 +1,83 @@
import numpy as np

mask_generator(nsamples, flags, n_threshold, greedy=False, axis, greedy_threshold, retain_flags=True):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a more descriptive function name, e.g. construct_factorizable_mask?

if greedy=True, the threshold used to flag rows or columns if axis=1 or 0, respectively

retain_flags : bool
LST-Bin Flags are left flagged even if thresholds are not met (default is True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The data going into this are not necessarily LST binned.


# comparing the number of samples to the threshold

for i in range(shape[0]):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this nested for loop perhaps be replaced by a couple of strategic calls to np.where? We can discuss this.

@nkern
Copy link
Member

nkern commented Jul 15, 2018

Have either of you seen the development in PSpecData.broadcast_dset_flags? Seems like there is some possible overlap in what these two are trying to accomplish. Perhaps some of the code in broadcast_dset_flags should be put in flags.py and broadcast_dset_flags should then call routines in flags.py...

@ghost ghost assigned plaplant Sep 25, 2018
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.3%) to 96.205% when pulling eedee0f on flag_module into b129d47 on master.

@ghost ghost assigned philbull Mar 6, 2019
@philbull
Copy link
Collaborator

I've moved most of the code from this PR into hera_stats, where it will be very useful for testing different ways of flagging the data. The relevant PR is here: HERA-Team/hera_stats#22

This code could have lived in hera_pspec too, but I think it's worth trying to limit the number of peripheral features in pspec and keep it more or less focused on its core function (it's complicated enough). hera_stats is supposed to be a bigger tent, covering many different aspects of the analysis, so I think it's appropriate to move this there.

@philbull philbull closed this Jul 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants