Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pspecdata.I now accounts for flags in data by taking time avg #94

Closed
wants to merge 1 commit into from

Conversation

nkern
Copy link
Member

@nkern nkern commented May 7, 2018

previously, pspecdata.I() was not accounting for data flags. this PR accounts for data flags by taking the time average of the data weights and multiplying that into the diagonal of identity.

addresses #72 and partially addresses #55

@coveralls
Copy link

Coverage Status

Coverage increased (+0.04%) to 92.405% when pulling 9f209ed on oqe_weight_data_by_flags into 19db6ce on master.

@nkern nkern requested a review from philbull May 7, 2018 00:42
@nkern nkern self-assigned this May 7, 2018
@nkern nkern added the bug label May 7, 2018
Copy link
Collaborator

@philbull philbull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just checking: @acliu: Should the weights really be included here? They are used in the empirical covariance calculation because the estimator for C_emp uses the (inverse variance-)weighted data by definition. The definition of I is just I though, regardless of what the data look like.
In general: Should our covariance methods return w^T C w, or just C?

@acliu
Copy link
Contributor

acliu commented May 8, 2018

Ok, I just checked some of the math, to see if having the non-trivial weights messes up anything with our normalization. It turns out that it does, but it can be accommodated within our current architecture. I can do this on a separate branch, or we can do it on this one if you're not in a hurry to get this done. I'll include the revised mathematical expressions in the notes. (It makes sense that there's a difference---consider a situation where the first half of the band is all flagged. Then you've effectively halved the bandwidth, and the normalization needs to know about that. For more complicated flagging/weights it's not as simple as just changing the bandwidth, but the expression can still be written down).

@nkern @philbull Are the weights here purely binary weights to indicate flagged vs. not flagged? Or are we intending to use this for more general weights as well? My preference is that C=I really is a pure identity matrix. We can put in a warning if the user decides to use C=I if there's flagged data. Essentially, my thinking is that using non-uniform weights is an example of C \neq I. We should be explicit about this. After all, if there's a dataset that happens to have no channels completely flagged after LST binning (as was the case in Zaki's paper, for instance), I can imagine situations where we want to compare a pure C=I treatment with one where we pretend certain channels are flagged to see what happens.

If the weights are intended to only be binary, I could be receptive to having them incorporated into C=I. But then maybe "weights" isn't the best name. Essentially, the philosophical argument is whether flagged channels should be considered normal channels that just happen to have infinite noise variance, or if flags are special and should be treated as such.

@nkern
Copy link
Member Author

nkern commented May 8, 2018

@acliu to answer your first question: I think this is a must-have feature but not totally time critical, so we should wait on this PR until we figure out how exactly we want to incorporate data flags. If you want to push your edits to this branch you should feel free to do so.

To answer your next question: I think weights should be binary, so that we can think of them as flags. I agree we should think of covariance matrices as either C or I, but those can be different than a weighting matrix, R, which I think should be the place where we take into account the data-flags. However, in the code we treat R as just simply C^-1 (or I) and it was therefore the only place I could put it. One solution is to make the R calls access the covariance matrix C (or I) and then multiply that with the data flags on-the-fly.

@nkern nkern closed this Jun 18, 2018
@nkern nkern deleted the oqe_weight_data_by_flags branch July 2, 2018 01:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants