Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decorrelation #137

Merged
merged 19 commits into from
Jul 2, 2018
Merged

Decorrelation #137

merged 19 commits into from
Jul 2, 2018

Conversation

acliu
Copy link
Contributor

@acliu acliu commented Jun 23, 2018

This PR restores decorrelation capability. This includes the M = H^{-1} option to give delta function windows, as well as the M \sim V^{-1/2} option that decorrelates the error covariances.

Note that while this PR allows power spectra to be computed using these options for the M matrix, I did not include extensions to uvpspec to store the bandpower covariances. This is because @anguta has a PR open that contains some of this capability, and I don't want to reinvent the wheel. So once that other PR is done, we can recycle some of that stuff.

@ghost ghost assigned acliu Jun 23, 2018
@ghost ghost added the in progress label Jun 23, 2018
@acliu acliu requested review from philbull and nkern June 23, 2018 03:28
@philbull
Copy link
Collaborator

@acliu This branch is failing to build. It looks like the cause is a time-out, but there are also some 'divide by zero' warnings showing up in the log that we never had before.

@ghost ghost assigned philbull Jun 27, 2018
@philbull
Copy link
Collaborator

OK, I tracked down what was causing the slowdown. scalar_delay_adjustment() was calling get_G and get_H every time it was called, instead of using the copies already computed earlier in pspec(). I've changed it to take the precomputed copies by default now, which speeds things up a lot.

@acliu
Copy link
Contributor Author

acliu commented Jun 28, 2018

Thanks @philbull for looking into this! This is very interesting. I guess it's giving us a quick glimpse into what it'll be like for a fully time-dependent set of matrices.

I think the divide by zero errors are ok. I will check this, but if I recall from running the tests on my computer, those come from making sure that singular matrices are properly dealt with. It doesn't actually cause any of the tests to fail.

On my computer I also got a failed test for pspec_run, and I didn't know if that was my fault or not. Has pspec_run been doing fine on all other branches? I didn't touch it at all, but maybe there was some indirect change due to the other new methods.

@philbull
Copy link
Collaborator

Yes, I think it would be useful to do a bit of optimisation work to pave the way for time-dependent matrices. We haven't really tried to make things faster yet, but I feel like there could be some relatively simple things we could do to help. Once we hit a limit, though, we might need to offload some of the calculations to C code...

It'd be good to get rid of the div/0 errors if we can, as it will worry users (and potentially mask other problems).

I think the pspec_run issues are caused by problems elsewhere. Lots of tests are broken right now due to changes in pyuvdata.

@coveralls
Copy link

coveralls commented Jul 1, 2018

Coverage Status

Coverage increased (+0.03%) to 96.997% when pulling c5d7ff3 on decorrelation into 472be38 on master.

Copy link
Collaborator

@philbull philbull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor comments about default options and function names, plus the usual few comments about docstrings.

subsequent indices specify the baseline index, in _key2inds format.

model : string, optional
Type of covariance model to calculate, if not cached. options=['empirical']
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to set empirical as a default? We already know that it's a problematic in various ways, so perhaps isn't very safe as a default?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have anything better currently, so we'll leave this for now.

Returns
-------
cross_covar : array-like
Cross covariance model for the specified key.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this have dimensions Nfreq x Nfreq?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep! Added to the docstring.

if model == 'empirical':
covar = utils.cov(self.x(key1), self.w(key1),
self.x(key2), self.w(key2),
conj_1=conj_1, conj_2=conj_2)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs an else statement to raise an error if model != empirical

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this is already done a few lines above: assert model in ['empirical'], "didn't recognize model {}".format(model)

@@ -894,30 +937,120 @@ def get_H(self, key1, key2, sampling=False):

return H / 2.

def get_V_gaussian(self, key1, key2):
def get_unnormed_E(self,key1,key2):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like we have too many matrix variable names floating around now. Would it be OK to call this get_unnormed_RQR instead? It's a bit easier to keep track of.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to keep this name, because in principle E is more general than RQR, so in the future we might not have the two equal.

R1 = self.R(key1)
R2 = self.R(key2)
for dly_idx in range(self.spw_Ndlys):
E_matrices[dly_idx] = np.einsum('ij,jk,kl', R1, self.get_Q_alt(dly_idx), R2)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This einsum might be slower than just doing 2 dot products. IIRC, R has shape (Nfreq, Nfreq)? So we shouldn't have any issues fitting this into memory. (It's only if we have a dimension of size Nblpairts that we consume lots of memory.)

(N.B. When I was looking into einsum, it turns out that operations including 2 matrices have lots of optimisations, but operations with 3+ matrices, like this one, are probably brute-forced, and so are slower.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, converted to two applications of np.dot

Parameters
----------
key1, key2 : tuples or lists of tuples
Tuples containing indices of dataset and baselines for the two
input datavectors. If a list of tuples is provided, the baselines
in the list will be combined with inverse noise weights.

model : str
How the covariances of the input data should be estimated.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default: 'empirical'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added!

@@ -955,6 +1088,11 @@ def get_MW(self, G, H, mode='I'):
Definition to use for M. Must be one of the options listed above.
Default: 'I'.

band_covar : array_like, optional
Covariance matrix of the unnormalized bandpowers (i.e., q). Used only
if requesting the V^-1/2 normalization.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you note which function to use to compute the covariance matrix for this please?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

raise ValueError("Covariance not supplied for V^-1/2 normalization")
# First find the eigenvectors and eigenvalues of the unnormalizd covariance
# Then use it to compute V^-1/2
eigvals, eigvects = np.linalg.eigh(band_covar)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do the eigenvectors/vals change each time this function is called, or should we cache them somewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The philosophy here was the same as for why V is provided as an argument to this function---the idea is to give flexibility to the user to use different matrices, so there isn't an obvious right way to compute them. Caching it seems counter to this.

@@ -1233,7 +1393,7 @@ def delays(self):
return utils.get_delays(self.freqs[self.spw_range[0]:self.spw_range[1]],
n_dlys=self.spw_Ndlys) * 1e9 # convert to ns

def scalar(self, pol, little_h=True, num_steps=2000, beam=None):
def scalar(self, pol, little_h=True, num_steps=2000, beam=None, taper_override='no_override'):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

taper_override = None would be a more idiomatic default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, we'll keep this in service of being more explicit.

@@ -1710,13 +1896,22 @@ def pspec(self, bls1, bls2, dsets, pols, n_dlys=None, input_data_weight='identit

# Normalize power spectrum estimate
if verbose: print(" Normalizing power spectrum...")
Mv, Wv = self.get_MW(Gv, Hv, mode=norm)
if norm == 'V^-1/2':
V_mat = self.get_unnormed_V(key1, key2)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This always uses the default model argument, which is model='empirical'. So, the user can't change that option. Is this intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, I think it's probably a good idea to keep it like this. This way it's symmetric w.r.t. the iC weighting of the data, which does not allow the user to set a non-empirical estimate. If we want to extend our covariance estimation capabilities in the future, we can address both of these examples of hard coding then.

@philbull philbull added this to the Ready for IDR 2.1 analysis milestone Jul 1, 2018
Copy link
Collaborator

@philbull philbull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, ready to merge!

@acliu acliu merged commit f5a8a7d into master Jul 2, 2018
@ghost ghost removed the in progress label Jul 2, 2018
@acliu acliu deleted the decorrelation branch July 2, 2018 23:01
@nkern nkern mentioned this pull request Jul 5, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants