Implement cross-correlation redshift/rv-fiding #544

ibusko · 2019-10-22T15:15:32Z

This code implements part of the workflow necessary for actually finding the redshift/RV: the cross-correlation computation. The correlation peak finding is supposed to happen elsewhere, maybe in a notebook where the user will have his/hers favorite workflow laid out. Also, any preparation of the spectra required by the actual science (e.g. removal of emission lines, masking of regions, and the like) is assumed to take place before the template_correlate function is called.

ibusko · 2019-10-23T16:03:26Z

@eteq @nmearl @camipacifici can you review this? I am not being able to tag reviewers via the normal access point in the github page. Perhaps because of permissions or credentials or something.

nmearl

Aside from a very minor docstring change, this looks good to me.

specutils/analysis/correlation.py

nmearl

Additionally, this is going to need documentation in the docs.

docs/analysis.rst

nmearl · 2019-11-13T15:17:44Z

docs/analysis.rst

+An example of how to get the cross correlation follows. Note that the observed spectrum must have a rest wavelength
+value set in.
+
+.. code-block:: python


Probably a question for @eteq or @camipacifici: would be want to demonstrate loading in a simple ASCII file with e.g. wavelength/flux values to use as the template? (That is, for someone perusing the docs, would that be more of a helpful example?)

Hmm, I think the more useful one is something that actually uses real templates that are loaded as Spectrum1D objects. That is, a fits file with an actual spectrum.

E.g. take a galaxy template from synphot and convert it to ascii and load in into a Spectrum1D object to use as the template.

There are some templates in the Synphot library (Bruzual-Charlot galaxy templates for example) https://pysynphot.readthedocs.io/en/latest/ $PYSYN_CDBS/grid/bc95/. I'd suggest bc95_g_10E9.fits. I don't think there is a URL available for this single file, but @pllim might know.

In developing a test case which uses a real template spectrum (wave and flux from bc95_g_10E9.fits converted to ASCII format), I realized that we need also an observed spectrum to correlate against this template. Any idea on how to get one of those without resorting to too many manipulations? The goal is to have this in the docs, so I believe it has to be straightforward to get the observed spectrum, as it is to get the template.

@ibusko - For an observed spectrum, I think the easiest thing is to use the SDSS spectrum that's already an example from the specutils docs - see https://specutils.readthedocs.io/en/stable/ (or the index.rst here) for the exact URL. That way we don't have to keep track of multiple examples.

Although @camipacifici pointed out to me out-of-band that she has some actual observed spectra and matched templates for them that is part of the notebook she's working on. Maybe we should use those as the template/observed combo?

Unfortunately I'm not sure it's going to be easy to access these publicly until she gets a chance to put them in a reasonable data location. So it might be best to follow the above just to get this PR in, but with a plan to move to @camipacifici's data once we can get it (and use it also in the chi2-template-fitting section?

The data is served by RedCat, independent of Synphotses code base. Pick your poison at https://ssb.stsci.edu/cdbs/grid/bc95/templates/

I put in place a unit test function in a branch in my fork for people to review and continue the discussion. The goal is to have something that can be put in the docs as an example on how to use the correlation function with real data. I found that it is perhaps trickier and lengthier than it should be, since it involves other aspects of the process besides the correlation itself. The code is in function test_correlation_real_data in ibusko:redshift-measurement-sandbox.

Temporarily I have both the observation and template spectra files added to the tests directory, because my dev environment (in PyCharm) can't run the test (it skips it) if I use the remote file loading code. It runs from the command line tough, so it's not a real problem. More important, remote loading works for the observed SDSS spectrum, but not for the synphot template. This file has to be manually changed into ECSV format to be usable. I think there is a way to use the file as is, but I didn't invest time pursuing it since that's not the goal of this exercise. These are topics to be solved later on. For now, we have to decide how to translate the code in the test function into a doc-usable code snippet.

Note that the test function is not doing all possible steps that are eventually needed to pre-process spectra before they can be cross-correlated, such as continuum subtraction or rectification. Should they be included in the example?

FWIW, I implemented API for synphot to read from and write to Spectrum1D. Maybe it'll make things easier for you. See spacetelescope/synphot_refactor#243

javerbukh

Great documentation in the code! I had a problem running pytest on my machine on your code, however. This could be my conda environment or the config file, I'm not entirely sure. Once you get that fixed everything else looks good!

eteq

Alright, I reviewed this in light of my experimentation for the stellar cross-correlation use case, which you can look at here:
https://github.com/eteq/specutils/blob/redshift-stars/docs/Stellar%20cross-correlation.ipynb

the short version is: it doesn't work 😢. But fortunately I think not for core-algorithm reasons, but rather for lack-of-options reasons. Specifically, the following issues:

There's currently no way to skip the resampling step. That's a deal-breaker because in my stellar x-corr example, I already have done the resampling in a way that's specific to that particular use case and need to be able to skip the internal step.
I could be mis-understanding, but I think the current resampling step is subtly wrong in that it's resampling onto two different spectral_axis grids
There's no way to say "I only want a sub-set of the lags" as far as I can tell. This is a problem because it means the correlation step is very slow when I try to do it at 10x resolution, which is roughly what's needed for the stellar x-corr case. So I would want a way to be able to say something like "only do the correlation for lags that are between -1000 and 1000 km/s", so that it's fast enough to work for this use case.

The first two I think are pretty easily fixed (and I've left comments inline about that). The third is a bit more, but perhaps we should have it be left as a follow-on PR, since this PR still has relevant use cases that do not require that.

EDIT/UPDATE: I realized I was missing one part of the workflow (continuum normalization), and once I did that it is working 👏, although not optimially: all the concerns above still apply because it's still clear that I need at least some level of sub-pixel resolution in the lags. But it is working in that it's getting the right ballpark answer!

eteq · 2020-03-12T16:36:20Z

specutils/analysis/correlation.py

+    # that both spectrum and template share a common spectral axis.
+    resampler = FluxConservingResampler()
+    observed_log_spectrum = _log_resampler(observed_spectrum, resampler)
+    template_log_spectrum = _log_resampler(template_spectrum, resampler)


This is confusing to me. It looks like the _log_resampler is resampling onto log-spaced grids but not the same log-spaced grids. So there's no guarantee they will be on the same grid. So I would have thought instead you would create a log grid based on whichever is the higher resolution or possibly just the observed one (also see my other comment about allowing a way to short-circuit this), and then use that same grid on the template?

This is fixed in branch https://github.com/ibusko/specutils/tree/log_wavelength_2. You're right that the code above is wrong. The reason it did work in the galaxy workflow is that we originally had the resampling to a common wavelength scale done in the notebook. The new code (in the branch pointed above) resamples both spectrum and template into a common log-wavelength array.

eteq · 2020-03-12T16:53:52Z

specutils/analysis/correlation.py

+    return num/denom
+
+
+def template_correlate(observed_spectrum, template_spectrum):


To follow in the same model as the resamplers, this should probably be a class. Right now it would be a trivial class - something like

class TemplateCorrelate: def __init__(self, ... any new arguments added from my comments...): self.newarg = newarg def __call__(self, observed_spectrum, templat_spectrum): ... all the same as this current function, using self.newarg where needed...

that seems more consistent with the resamplers and the consensus from #266

I just watched this video last night Stop Using Classes. This suggestion sets off alarm bells.

That's worth considering, @hcferguson, but we had that discussion in #266 . To be fair I think we said we'd allow simple functions that we might wrap in classes later, so that doesn't have to block this PR, I suppose though.

eteq · 2020-03-12T16:58:38Z

specutils/analysis/correlation.py

+
+    # Resample both spectrum and template to log wavelength. This is assuming
+    # that both spectrum and template share a common spectral axis.
+    resampler = FluxConservingResampler()


I was surprised to see that this always uses a resampler. In my use case in https://github.com/eteq/specutils/blob/redshift-stars/docs/Stellar%20cross-correlation.ipynb, I already explicitly re-sample to put the two spectra on a shared wavelength grid, and the results depend critically on how that is done. So I think there should be a keyword argument (or class attribute, as discussed in my other comment) that makes it possible to skip the resampling phase.

Either way, it is critical that the docstring states that resampling takes place, since I wouldn't have known the data are being transformed without looking at the code (which many users won't do).

It also may not be essential to conserve flux for this application, since it is just trying to measure the shift. A simpler resampler might be faster.

Indeed, easy to add a keyword. As for the simpler resampler, I ended up using the flux conserving one at some point in the development, because it gave better results. But that could have been due to something else as well. Once I got the function+galaxy notebook working, I forgot to revisit the resampler type issue. I will try with the linear type and see how it goes.

@hcferguson indeed, replacing the flux conserving by the linear resampler speeded up the calculation significantly. And didn't result in any visible side effects (as we should expect anyways).

Note this did change the results quite noticeably for the stellar case. Not in a way that significantly affected the final answer, but in a way I strongly suspect matters a lot more for fainter spectra than my test. So that just emphasizes the importance of allowing the user to choose "let me do my own resampling, I know what I'm doing" if they want to.

eteq · 2020-03-12T17:02:39Z

specutils/analysis/correlation.py

+    wave_l = observed_log_spectrum.spectral_axis.value
+    delta_log_wave = np.log10(wave_l[1]) - np.log10(wave_l[0])
+    deltas = (np.array(range(len(corr))) - len(corr)/2 + 0.5) * delta_log_wave
+    lags = (np.power(10., deltas) - 1. ) * const.c.to('km/s')


Maybe the lags should be in "redshift" units instead of km/s?

Or alternatively, perhaps a keyword argument of lag_units that defaults to km/s, but can alternatively be a velocity unit (in which case you'd just replace this line with ...to(lag_units) or the dimensionless_unscaled unit, which is then interpreted as redshift? I.e. this line but without multiplying by c?

This sounds easy to implement as well. The reason I have the lags done in that way is just because the IRAF xcorr task does it in that way. I used xcorr's code to guide me into implementing the steps in the notebook and function.

ibusko · 2020-03-12T19:55:44Z

@eteq as for having just a subset of the lags, it may require a little more thought so I won't be adding it to the next iteration. Lets have the other issues fixed first.

Something that may help in that regard is that the new code I am moving into this PR allows the user to optionally define either, or both, the blue and red wavelength extremes to use in the resampling. These limits are used to define the common wavelength scale that both spectrum and template will be resampled into. They can be either present in the spectrum/template, or be off their limits, in which case the resampled data gets zero-padded. As for the sampling step to use to build the low-wavelength scale, the code right now is using the smaller wavelength spacing it can find in the observed spectrum. We could add that sampling step as a keyword parameter as well. That may allow the kind of oversampling you require in your high resolution work.

ibusko · 2020-03-13T13:40:46Z

I put in place all suggested modifications except two: lags selected by the user (subject of another ticket TBD), and didn't implement the function as a class (following @hcferguson 's alert). Although we can address that in a TBD ticket as well.

I tested this with the galaxy redshift workflow and the unit tests only. @eteq 's workflow still need to be tested with this code.

…trum1D.

…completely outside the range of the observed spectrum.

eteq · 2020-03-18T19:19:08Z

I pushed up a few commits making updates to the changes made such my last review - I don't think we want to write-in all the keyword arguments specifically for the resampling, so I re-worked it into a separate (public) function that does the log-resampling. But following the principle of "sensible defaults", I worked it so that the resampler still works as it did before by default but the user can control how they want the resampler to work or just do it themselves.

I also re-ran this using my notebook in https://github.com/eteq/specutils/blob/redshift-stars/docs/Stellar%20cross-correlation.ipynb and it all seems to be working as expected now, so that's great!

I ran into a couple of other items that I don't have time to fix but will make follow-on issues for those. None are blockers so assuming the tests pass I think all is good to merge.

eteq · 2020-03-18T19:24:32Z

the failures in the last commit are all unrelated due to remote data outages not this PR, so I think it can be considered passing. I'll give this a bit for @nmearl or @rosteen to review my last set of commits, but if they are ok this can be merged.

rosteen · 2020-03-18T23:00:37Z

I looked over the recent commits by @eteq and they look good to me.

nmearl reviewed Oct 28, 2019

View reviewed changes

specutils/analysis/correlation.py Outdated Show resolved Hide resolved

nmearl reviewed Oct 28, 2019

View reviewed changes

nmearl changed the title ~~JDAT-57: Implement cross-correlation redshift/rv-fiding~~ Implement cross-correlation redshift/rv-fiding Nov 11, 2019

nmearl reviewed Nov 13, 2019

View reviewed changes

ibusko requested review from eteq and nmearl February 26, 2020 15:44

javerbukh approved these changes Mar 6, 2020

View reviewed changes

eteq requested changes Mar 12, 2020

View reviewed changes

busko@stsci.edu and others added 19 commits March 18, 2020 11:14

Adding a basic x-correlation function.

d793c23

Computes basic correlation between two spectra. Uses numpy.correlate.

bafc51b

Basic correlation is in place.

e10707f

Check position of correlation peak

30682b6

Lags expressed in km/s. Return object is a tuple and no longer a Spec…

88e0c27

…trum1D.

Remove blank lines at end of file.

70a3298

Add new line at end of file.

e8f4712

Fix doc string.

a389d06

Add doc entry.

eeb2fb7

Minor change in doc.

b528e4e

Fixes in documentation.

b67d4d2

Fix lag computation.

9ad767b

Make this test more realistic and sensitive.

3872887

Add autocorrelation test.

3297b8c

Add test for zero padding

a0f82c6

Stretch a bit the zero-padding test so as to make the reference line …

28369c5

…completely outside the range of the observed spectrum.

Add test with non-correlated features in observed and template spectra.

f34a3ac

Doc standards compliance

43aef0b

Attempt to fix code block in documentation

71d0831

busko@stsci.edu and others added 5 commits March 18, 2020 11:14

Final fix (tests pass and notebooks work)

e2bf798

Fix build errors

90afc3c

Modify as per code review.

34562de

Fix correlation import in doctest

103a2c7

Fix api references in doc building

0748390

ibusko requested a review from eteq March 18, 2020 17:32

eteq added 3 commits March 18, 2020 14:17

comment/docs cleanup

cabc982

change apodization to window keyword

4ea8092

refactor to have template_logwl_resample

c033053

eteq approved these changes Mar 18, 2020

View reviewed changes

This was referenced Mar 18, 2020

Have cross-correlation resampling function use a more robust copying mechanism #611

Open

More thoroughly test cross-correlation #612

Open

eteq merged commit 624a0c1 into astropy:master Mar 18, 2020

eteq mentioned this pull request Apr 2, 2020

Cross-correlation peak-finding functions #638

Open

nmearl mentioned this pull request May 6, 2020

Spectral cross-correlation (as a RV/redshift measurement tool) #257

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement cross-correlation redshift/rv-fiding #544

Implement cross-correlation redshift/rv-fiding #544

ibusko commented Oct 22, 2019

ibusko commented Oct 23, 2019

nmearl left a comment

nmearl left a comment

nmearl Nov 13, 2019

eteq Nov 21, 2019

nmearl Nov 21, 2019

hcferguson Nov 21, 2019

ibusko Nov 21, 2019

eteq Nov 21, 2019

eteq Nov 21, 2019

pllim Nov 22, 2019

ibusko Nov 23, 2019 •

edited

Loading

pllim Feb 13, 2020

javerbukh left a comment •

edited

Loading

eteq left a comment •

edited

Loading

eteq Mar 12, 2020

ibusko Mar 12, 2020

eteq Mar 12, 2020

hcferguson Mar 12, 2020

eteq Mar 18, 2020

eteq Mar 12, 2020

hcferguson Mar 12, 2020

ibusko Mar 12, 2020

ibusko Mar 13, 2020

eteq Mar 18, 2020

eteq Mar 12, 2020

ibusko Mar 12, 2020

ibusko commented Mar 12, 2020

ibusko commented Mar 13, 2020 •

edited

Loading

eteq commented Mar 18, 2020 •

edited

Loading

eteq commented Mar 18, 2020 •

edited

Loading

rosteen commented Mar 18, 2020

		return num/denom


		def template_correlate(observed_spectrum, template_spectrum):

Implement cross-correlation redshift/rv-fiding #544

Implement cross-correlation redshift/rv-fiding #544

Conversation

ibusko commented Oct 22, 2019

ibusko commented Oct 23, 2019

nmearl left a comment

Choose a reason for hiding this comment

nmearl left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ibusko Nov 23, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

javerbukh left a comment • edited Loading

Choose a reason for hiding this comment

eteq left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ibusko commented Mar 12, 2020

ibusko commented Mar 13, 2020 • edited Loading

eteq commented Mar 18, 2020 • edited Loading

eteq commented Mar 18, 2020 • edited Loading

rosteen commented Mar 18, 2020

ibusko Nov 23, 2019 •

edited

Loading

javerbukh left a comment •

edited

Loading

eteq left a comment •

edited

Loading

ibusko commented Mar 13, 2020 •

edited

Loading

eteq commented Mar 18, 2020 •

edited

Loading

eteq commented Mar 18, 2020 •

edited

Loading