Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lazy covariance #179

Merged
merged 38 commits into from
Aug 6, 2019
Merged

Lazy covariance #179

merged 38 commits into from
Aug 6, 2019

Conversation

aewallwi
Copy link
Collaborator

@aewallwi aewallwi commented Nov 9, 2018

Added some new weighting modes.

@ghost ghost assigned aewallwi Nov 9, 2018
@ghost ghost added the in progress label Nov 9, 2018
@coveralls
Copy link

coveralls commented Nov 9, 2018

Coverage Status

Coverage increased (+0.1%) to 96.817% when pulling 6a10cdb on lazy_covariance into 58d860b on master.

hera_pspec/pspecdata.py Outdated Show resolved Hide resolved
hera_pspec/utils.py Outdated Show resolved Hide resolved
hera_pspec/utils.py Outdated Show resolved Hide resolved
hera_pspec/pspecdata.py Outdated Show resolved Hide resolved
hera_pspec/pspecdata.py Outdated Show resolved Hide resolved
Copy link
Member

@nkern nkern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also update the doc-string of PSpecData.R() to note that the K matrix can be either I, iC or sinc_downweight based on the value of self.data_weighting?

hera_pspec/pspecdata.py Outdated Show resolved Hide resolved
hera_pspec/utils.py Outdated Show resolved Hide resolved
@nkern
Copy link
Member

nkern commented May 14, 2019

It seems like what you've done in sinc_downweight is to apply a fourier filter, which is done via a discrete convolution whose functional form is dictated by a tophat in the fourier domain, or a sinc function in real space.. I'm just curious as to whether this is something we want to replace the K matrix in our R weighting, or have as a separate step in our QE chain. In other words, this matrix isn't really a covariance matrix, its a toeplitz matrix that is used for a discrete convolution, so not sure if its fair to replace the slot in QE that would normally be used for covariance matrices with this matrix... perhaps we can discuss this in person or on the next pspec telecon.

@nkern
Copy link
Member

nkern commented May 14, 2019

@anguta Here is a neat example for you to test out.

import numpy as np
from hera_pspec import utils
from scipy import stats

# generate a bunch of white noise vectors
x = np.array([stats.norm.rvs(0, 1, 32) for i in range(100)]).T

# generate a sinc downweight matrix
weights = np.ones(32)
cmat1 = utils.sinc_downweight_mat_inv(32, 100e3, weights, filter_centers = 0., filter_widths = 1500e-9, filter_factors = 1e-9)

# apply to data
xfilt = cmat1.dot(x)

# get covariances of data, filtered data, and then dot C with sinc matrix
C = np.cov(x)
CF = np.cov(xfilt)
CF2 = cmat1.dot(C).dot(cmat1.T)

You'll find that CF and CF2 are nearly identical, which I think suggests that this is not necessarily a replacement for covariance weighting..

@aewallwi
Copy link
Collaborator Author

It seems like what you've done in sinc_downweight is to apply a fourier filter, which is done via a discrete convolution whose functional form is dictated by a tophat in the fourier domain, or a sinc function in real space.. I'm just curious as to whether this is something we want to replace the K matrix in our R weighting, or have as a separate step in our QE chain. In other words, this matrix isn't really a covariance matrix, its a toeplitz matrix that is used for a discrete convolution, so not sure if its fair to replace the slot in QE that would normally be used for covariance matrices with this matrix... perhaps we can discuss this in person or on the next pspec telecon.

I think that the main reason to include this matrix in our power-spectrum stage is that its action is most easily incorporated into the calculation of power-spectrum covariances and error bars if its added as the K matrix. That said, we could propogate it into C either empirically or through direct calculation (though for the calculation, we'd need to develop some extra machinery outside of pspec).

We should discuss this on the next telecon.

@aewallwi
Copy link
Collaborator Author

Can you also update the doc-string of PSpecData.R() to note that the K matrix can be either I, iC or sinc_downweight based on the value of self.data_weighting?

I've added that sinc_downweight can be used as an option

@aewallwi
Copy link
Collaborator Author

aewallwi commented May 21, 2019

@anguta Here is a neat example for you to test out.

import numpy as np
from hera_pspec import utils
from scipy import stats

# generate a bunch of white noise vectors
x = np.array([stats.norm.rvs(0, 1, 32) for i in range(100)]).T

# generate a sinc downweight matrix
weights = np.ones(32)
cmat1 = utils.sinc_downweight_mat_inv(32, 100e3, weights, filter_centers = 0., filter_widths = 1500e-9, filter_factors = 1e-9)

# apply to data
xfilt = cmat1.dot(x)

# get covariances of data, filtered data, and then dot C with sinc matrix
C = np.cov(x)
CF = np.cov(xfilt)
CF2 = cmat1.dot(C).dot(cmat1.T)

You'll find that CF and CF2 are nearly identical, which I think suggests that this is not necessarily a replacement for covariance weighting..

Wouldn't you want to filter this by the inverse covariance?

I do think that this weighting is very similar to inv covariance weighting.

One simple way to think of inverse covariance filtering is that you transform into the eigenbasis of the signal vector (which in an ideal infinite bandwidth, spectrally flat foregrounds case would be delay space), and then divide the foreground modes (in the wedge) by their variance, and then transform back. What you've done is suppressed the sinusoids in the wedge by a factor of 1/Var which is essentially what the inverse sinc weighting does (although the supression factor is set by filter_factor which could be very different from the actual ratio between foreground and thermal noise variance).

@aewallwi aewallwi requested a review from philbull May 29, 2019 02:19
hera_pspec/pspecdata.py Outdated Show resolved Hide resolved
@nkern
Copy link
Member

nkern commented May 29, 2019

@anguta could you also link your lazy covariance PDF or overleaf url for posterity here? thanks

Copy link
Collaborator

@philbull philbull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Just a few questions about things I'm confused about, and a query about how r_params is specified.

hera_pspec/pspecdata.py Outdated Show resolved Hide resolved
hera_pspec/pspecdata.py Outdated Show resolved Hide resolved
hera_pspec/pspecdata.py Outdated Show resolved Hide resolved
hera_pspec/pspecdata.py Outdated Show resolved Hide resolved
hera_pspec/pspecdata.py Outdated Show resolved Hide resolved
hera_pspec/pspecdata.py Outdated Show resolved Hide resolved
hera_pspec/pspecdata.py Outdated Show resolved Hide resolved
hera_pspec/pspecdata.py Outdated Show resolved Hide resolved
hera_pspec/pspecdata.py Outdated Show resolved Hide resolved
@aewallwi
Copy link
Collaborator Author

aewallwi commented Jun 11, 2019

@anguta could you also link your lazy covariance PDF or overleaf url for posterity here? thanks

I'm working on a tutorial notebook that also describes some of the underlying math and why the filter is effective. I'm going to be adding it to uvtools with a new pull request there (which lets you use lazy covariance weighting as a general clean method for data inspection).

@nkern
Copy link
Member

nkern commented Jun 11, 2019

@anguta ok sounds good. do we want to wait on this PR so that you just call uvtools capabilities instead of duplicate code here? I think that's probably the best approach.

Also we still need to figure out how to propagate some amount of r_params metadata to the file history without overloading it, perhaps making it optional.

@aewallwi
Copy link
Collaborator Author

aewallwi commented Jun 12, 2019

@anguta ok sounds good. do we want to wait on this PR so that you just call uvtools capabilities instead of duplicate code here? I think that's probably the best approach.

Also we still need to figure out how to propagate some amount of r_params metadata to the file history without overloading it, perhaps making it optional.

Yeah, lets wait until the uvtools code is merged in so I can reference it from hera_pspec.

for the history issue, maybe we can use one baseline per redundant group for now as you suggested.

cheers,

-Aaron

@aewallwi
Copy link
Collaborator Author

I've added linear cleaning to uvtools and removed duplicate code here.

Copy link
Collaborator

@philbull philbull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just a couple of things to clean up.

hera_pspec/pspecdata.py Outdated Show resolved Hide resolved
hera_pspec/pspecdata.py Outdated Show resolved Hide resolved
hera_pspec/pspecdata.py Show resolved Hide resolved
hera_pspec/pspecdata.py Outdated Show resolved Hide resolved
hera_pspec/tests/test_pspecdata.py Outdated Show resolved Hide resolved
@philbull
Copy link
Collaborator

@anguta Also: the new notebook examples/PS_estimation_example_inverse_weights.ipynb seems to be a duplicate of the existing examples/PS_estimation_example.ipynb, but with a couple of small changes. I don't think it should duplicate the whole thing, as this will make it hard for users to see what the differences are! So, perhaps it should either focus on showing a specific example of how to use the new weighting scheme, or fully replace the old PS estimation example. (I have a preference for the former; it would be nice to show a simple comparison, e.g. using Nick's example code!)

Copy link
Collaborator

@philbull philbull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from a few minor typos in the example notebook, this is ready to go.

@aewallwi
Copy link
Collaborator Author

aewallwi commented Jul 8, 2019

I've also added a new r_params parameter for uvpspec objects that stores r_parameters as a string (in a compressed format that avoids storing many copies of the same dictionary) along with a decompression method to retrieve the original dictionary.

Copy link
Collaborator

@philbull philbull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but a few loose ends to tie up:

  • a couple of docstrings could be a little clearer/more detailed
  • it's not clear how r_params is treated when UVPSpec objects are added together
  • I think the get_r_params test should be a bit more real-worldy

hera_pspec/uvpspec.py Outdated Show resolved Hide resolved
hera_pspec/uvpspec.py Show resolved Hide resolved
hera_pspec/tests/test_uvpspec.py Outdated Show resolved Hide resolved
hera_pspec/tests/test_uvpspec.py Show resolved Hide resolved
@aewallwi
Copy link
Collaborator Author

@anguta Here is a neat example for you to test out.

import numpy as np
from hera_pspec import utils
from scipy import stats

# generate a bunch of white noise vectors
x = np.array([stats.norm.rvs(0, 1, 32) for i in range(100)]).T

# generate a sinc downweight matrix
weights = np.ones(32)
cmat1 = utils.sinc_downweight_mat_inv(32, 100e3, weights, filter_centers = 0., filter_widths = 1500e-9, filter_factors = 1e-9)

# apply to data
xfilt = cmat1.dot(x)

# get covariances of data, filtered data, and then dot C with sinc matrix
C = np.cov(x)
CF = np.cov(xfilt)
CF2 = cmat1.dot(C).dot(cmat1.T)

You'll find that CF and CF2 are nearly identical, which I think suggests that this is not necessarily a replacement for covariance weighting..

I'm a little bit confused by this example. The filtering matrix is supposed to be the inverse of sinc_downweight_inv. At any rate, I'd naively expect CF2 and CF to be the same since the first is the covariance of the data after applying a matrix multiplication (basically a change of basis with a matrix) and the second is applying the transformation matrix to the covariance of the original basis (equivalent to calculating the covariance in the new basis).

Copy link
Member

@nkern nkern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, thanks Aaron!

@aewallwi
Copy link
Collaborator Author

Looks good, but a few loose ends to tie up:

* a couple of docstrings could be a little clearer/more detailed

* it's not clear how `r_params` is treated when UVPSpec objects are added together

* I think the `get_r_params` test should be a bit more real-worldy

I've addressed all of the issues mentioned here.

cheers,

-Aaron

@aewallwi aewallwi closed this Jul 24, 2019
@aewallwi aewallwi reopened this Jul 24, 2019
Copy link
Collaborator

@philbull philbull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Just minor typos.

hera_pspec/uvpspec.py Outdated Show resolved Hide resolved
hera_pspec/uvpspec_utils.py Outdated Show resolved Hide resolved
@philbull philbull merged commit 7708f92 into master Aug 6, 2019
@philbull philbull deleted the lazy_covariance branch August 6, 2019 07:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants