-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decorrelation #137
Decorrelation #137
Conversation
vectors apply to data weighting matrix to produce symmetric R
modified: tests/test_pspecdata.py
@acliu This branch is failing to build. It looks like the cause is a time-out, but there are also some 'divide by zero' warnings showing up in the log that we never had before. |
OK, I tracked down what was causing the slowdown. |
Thanks @philbull for looking into this! This is very interesting. I guess it's giving us a quick glimpse into what it'll be like for a fully time-dependent set of matrices. I think the divide by zero errors are ok. I will check this, but if I recall from running the tests on my computer, those come from making sure that singular matrices are properly dealt with. It doesn't actually cause any of the tests to fail. On my computer I also got a failed test for |
Yes, I think it would be useful to do a bit of optimisation work to pave the way for time-dependent matrices. We haven't really tried to make things faster yet, but I feel like there could be some relatively simple things we could do to help. Once we hit a limit, though, we might need to offload some of the calculations to C code... It'd be good to get rid of the div/0 errors if we can, as it will worry users (and potentially mask other problems). I think the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few minor comments about default options and function names, plus the usual few comments about docstrings.
subsequent indices specify the baseline index, in _key2inds format. | ||
|
||
model : string, optional | ||
Type of covariance model to calculate, if not cached. options=['empirical'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to set empirical
as a default? We already know that it's a problematic in various ways, so perhaps isn't very safe as a default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have anything better currently, so we'll leave this for now.
hera_pspec/pspecdata.py
Outdated
Returns | ||
------- | ||
cross_covar : array-like | ||
Cross covariance model for the specified key. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this have dimensions Nfreq x Nfreq
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep! Added to the docstring.
if model == 'empirical': | ||
covar = utils.cov(self.x(key1), self.w(key1), | ||
self.x(key2), self.w(key2), | ||
conj_1=conj_1, conj_2=conj_2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs an else
statement to raise an error if model != empirical
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, this is already done a few lines above: assert model in ['empirical'], "didn't recognize model {}".format(model)
@@ -894,30 +937,120 @@ def get_H(self, key1, key2, sampling=False): | |||
|
|||
return H / 2. | |||
|
|||
def get_V_gaussian(self, key1, key2): | |||
def get_unnormed_E(self,key1,key2): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like we have too many matrix variable names floating around now. Would it be OK to call this get_unnormed_RQR
instead? It's a bit easier to keep track of.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to keep this name, because in principle E
is more general than RQR
, so in the future we might not have the two equal.
hera_pspec/pspecdata.py
Outdated
R1 = self.R(key1) | ||
R2 = self.R(key2) | ||
for dly_idx in range(self.spw_Ndlys): | ||
E_matrices[dly_idx] = np.einsum('ij,jk,kl', R1, self.get_Q_alt(dly_idx), R2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This einsum
might be slower than just doing 2 dot products. IIRC, R has shape (Nfreq, Nfreq)
? So we shouldn't have any issues fitting this into memory. (It's only if we have a dimension of size Nblpairts
that we consume lots of memory.)
(N.B. When I was looking into einsum
, it turns out that operations including 2 matrices have lots of optimisations, but operations with 3+ matrices, like this one, are probably brute-forced, and so are slower.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, converted to two applications of np.dot
hera_pspec/pspecdata.py
Outdated
Parameters | ||
---------- | ||
key1, key2 : tuples or lists of tuples | ||
Tuples containing indices of dataset and baselines for the two | ||
input datavectors. If a list of tuples is provided, the baselines | ||
in the list will be combined with inverse noise weights. | ||
|
||
model : str | ||
How the covariances of the input data should be estimated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Default: 'empirical'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added!
hera_pspec/pspecdata.py
Outdated
@@ -955,6 +1088,11 @@ def get_MW(self, G, H, mode='I'): | |||
Definition to use for M. Must be one of the options listed above. | |||
Default: 'I'. | |||
|
|||
band_covar : array_like, optional | |||
Covariance matrix of the unnormalized bandpowers (i.e., q). Used only | |||
if requesting the V^-1/2 normalization. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you note which function to use to compute the covariance matrix for this please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
raise ValueError("Covariance not supplied for V^-1/2 normalization") | ||
# First find the eigenvectors and eigenvalues of the unnormalizd covariance | ||
# Then use it to compute V^-1/2 | ||
eigvals, eigvects = np.linalg.eigh(band_covar) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do the eigenvectors/vals change each time this function is called, or should we cache them somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The philosophy here was the same as for why V
is provided as an argument to this function---the idea is to give flexibility to the user to use different matrices, so there isn't an obvious right way to compute them. Caching it seems counter to this.
@@ -1233,7 +1393,7 @@ def delays(self): | |||
return utils.get_delays(self.freqs[self.spw_range[0]:self.spw_range[1]], | |||
n_dlys=self.spw_Ndlys) * 1e9 # convert to ns | |||
|
|||
def scalar(self, pol, little_h=True, num_steps=2000, beam=None): | |||
def scalar(self, pol, little_h=True, num_steps=2000, beam=None, taper_override='no_override'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
taper_override = None
would be a more idiomatic default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed, we'll keep this in service of being more explicit.
@@ -1710,13 +1896,22 @@ def pspec(self, bls1, bls2, dsets, pols, n_dlys=None, input_data_weight='identit | |||
|
|||
# Normalize power spectrum estimate | |||
if verbose: print(" Normalizing power spectrum...") | |||
Mv, Wv = self.get_MW(Gv, Hv, mode=norm) | |||
if norm == 'V^-1/2': | |||
V_mat = self.get_unnormed_V(key1, key2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This always uses the default model
argument, which is model='empirical'
. So, the user can't change that option. Is this intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, I think it's probably a good idea to keep it like this. This way it's symmetric w.r.t. the iC
weighting of the data, which does not allow the user to set a non-empirical estimate. If we want to extend our covariance estimation capabilities in the future, we can address both of these examples of hard coding then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, ready to merge!
This PR restores decorrelation capability. This includes the
M = H^{-1}
option to give delta function windows, as well as theM \sim V^{-1/2}
option that decorrelates the error covariances.Note that while this PR allows power spectra to be computed using these options for the
M
matrix, I did not include extensions touvpspec
to store the bandpower covariances. This is because @anguta has a PR open that contains some of this capability, and I don't want to reinvent the wheel. So once that other PR is done, we can recycle some of that stuff.