pspecdata.I now accounts for flags in data by taking time avg #94

nkern · 2018-05-07T00:34:31Z

previously, pspecdata.I() was not accounting for data flags. this PR accounts for data flags by taking the time average of the data weights and multiplying that into the diagonal of identity.

addresses #72 and partially addresses #55

coveralls · 2018-05-07T00:42:20Z

Coverage increased (+0.04%) to 92.405% when pulling 9f209ed on oqe_weight_data_by_flags into 19db6ce on master.

philbull

Just checking: @acliu: Should the weights really be included here? They are used in the empirical covariance calculation because the estimator for C_emp uses the (inverse variance-)weighted data by definition. The definition of I is just I though, regardless of what the data look like.
In general: Should our covariance methods return w^T C w, or just C?

acliu · 2018-05-08T02:44:33Z

Ok, I just checked some of the math, to see if having the non-trivial weights messes up anything with our normalization. It turns out that it does, but it can be accommodated within our current architecture. I can do this on a separate branch, or we can do it on this one if you're not in a hurry to get this done. I'll include the revised mathematical expressions in the notes. (It makes sense that there's a difference---consider a situation where the first half of the band is all flagged. Then you've effectively halved the bandwidth, and the normalization needs to know about that. For more complicated flagging/weights it's not as simple as just changing the bandwidth, but the expression can still be written down).

@nkern @philbull Are the weights here purely binary weights to indicate flagged vs. not flagged? Or are we intending to use this for more general weights as well? My preference is that C=I really is a pure identity matrix. We can put in a warning if the user decides to use C=I if there's flagged data. Essentially, my thinking is that using non-uniform weights is an example of C \neq I. We should be explicit about this. After all, if there's a dataset that happens to have no channels completely flagged after LST binning (as was the case in Zaki's paper, for instance), I can imagine situations where we want to compare a pure C=I treatment with one where we pretend certain channels are flagged to see what happens.

If the weights are intended to only be binary, I could be receptive to having them incorporated into C=I. But then maybe "weights" isn't the best name. Essentially, the philosophical argument is whether flagged channels should be considered normal channels that just happen to have infinite noise variance, or if flags are special and should be treated as such.

nkern · 2018-05-08T05:20:04Z

@acliu to answer your first question: I think this is a must-have feature but not totally time critical, so we should wait on this PR until we figure out how exactly we want to incorporate data flags. If you want to push your edits to this branch you should feel free to do so.

To answer your next question: I think weights should be binary, so that we can think of them as flags. I agree we should think of covariance matrices as either C or I, but those can be different than a weighting matrix, R, which I think should be the place where we take into account the data-flags. However, in the code we treat R as just simply C^-1 (or I) and it was therefore the only place I could put it. One solution is to make the R calls access the covariance matrix C (or I) and then multiply that with the data flags on-the-fly.

pspecdata.I now accounts for flags in data by taking time avg

9f209ed

nkern requested a review from philbull May 7, 2018 00:42

nkern self-assigned this May 7, 2018

nkern added the bug label May 7, 2018

philbull reviewed May 7, 2018

View reviewed changes

nkern closed this Jun 18, 2018

nkern deleted the oqe_weight_data_by_flags branch July 2, 2018 01:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pspecdata.I now accounts for flags in data by taking time avg #94

pspecdata.I now accounts for flags in data by taking time avg #94

nkern commented May 7, 2018 •

edited

Loading

coveralls commented May 7, 2018

philbull left a comment

acliu commented May 8, 2018

nkern commented May 8, 2018 •

edited

Loading

pspecdata.I now accounts for flags in data by taking time avg #94

pspecdata.I now accounts for flags in data by taking time avg #94

Conversation

nkern commented May 7, 2018 • edited Loading

coveralls commented May 7, 2018

philbull left a comment

Choose a reason for hiding this comment

acliu commented May 8, 2018

nkern commented May 8, 2018 • edited Loading

nkern commented May 7, 2018 •

edited

Loading

nkern commented May 8, 2018 •

edited

Loading