New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add the LombScargleMultiband Periodogram Class #14016
Conversation
@mwcraig you requested a ping for reviewing this once it was in. Thanks in advance! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Welcome to Astropy 👋 and congratulations on your first pull request! 🎉
A project member will respond to you as soon as possible; in the meantime, please have a look over the Checklist for Contributed Code and make sure you've addressed as many of the questions there as possible.
If you feel that this pull request has not been responded to in a timely manner, please send a message directly to the development mailing list. If the issue is urgent or sensitive in nature (e.g., a security vulnerability) please send an e-mail directly to the private e-mail feedback@astropy.org.
@pllim @mwcraig @aaryapatil I just wanted to check in on this. Is this still in the plans to be reviewed for v5.3? Let me know if there's anything I can do to assist. Thanks in advance! 🙏 |
The CI fails at multiple places. While I have pinged timeseries maintainers to review, you should rebase and fix the CI. Thanks! |
I will try and review this shortly! |
@dougbrn - one quick question I have before reviewing this - you mentioned that there is no compatibility with timeseries objects, but can one not represent multiband photometry with several different columns in the time series? |
Hi @astrofrog , thanks for taking the time to look at this. You are right in that a user could use a column for each band. At present, lombscargle multiband expects a single flux array, with an accompanying band array that labels each measurement with a specific band label. So going from the band-per-column representation would just look like stacking the columns and adding a label column. However, my hesitance to support the timeseries object came from my understanding that the timeseries object has no standard way of representing multiband information. So while one user could provide column-per-band, another could provide a single flux column, with a band label column. With no standard, it wasn't clear how to best setup lombscargle multiband to handle the timeseries class. I'm open to ideas on how to get these two talking, if you have any! |
fwiw, we have experimented with a multiband class and it works reasonably well out of the box, but got some pushback, or better, some non-response because it is not how some people were constructing ztf light curves (I can't recall any responses to my concerns for the labelling approach). Needless to say, I think a multicolumn class is a much better fit for a broad usage astropy is aiming for, so ideally LS should be able to handle it. |
@bsipocz If it's fair to assume from the timeseries side that multiband will only be implemented as multi-column, then I can look into getting a data loader for that format into multiband_lombscargle, I'll probably need to overload the existing from_timeseries function that is written in BasePeriodogram. Do you envision that timeseries will eventually be built out to more explicitly support this? (And if so does it make sense to wait on this at all?) |
@dougbrn - It's already pretty easy to subclass TimeSeries to support multiband columns. It kind of works out of the box, unlike doing something non-standard, like adding a label in a separate column. My concerns I listed to you in the slack thread still stand. |
@bsipocz Okay, this sounds good. I'll work on a commit that allows a multicolumn timeseries object as an input. |
Yes, I expect it to be reasonably straightforward, basically, it could take a list of column names rather than one column name. |
@bsipocz @astrofrog Latest commit implements a from_timeseries function, with the necessary assumption of band-per-column formatting. Added some accompanying unit tests and updated the documentation. May need to push another commit or two if the CI tests fail (I've been having a finicky experience with astropy's testing of .rst files, and my local testing appears insufficient to predict the result of the CI testing...), but otherwise this should be ready for some feedback. |
88ee3a5
to
f3c24ac
Compare
@astrofrog @bsipocz , this is now passing all CI tests, just in case either of you were holding off until then! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly formatting/punctuation nits because I'm not familiar with the math at all.
astropy/timeseries/periodograms/lombscargle_multiband/implementations/main.py
Outdated
Show resolved
Hide resolved
astropy/timeseries/periodograms/lombscargle_multiband/implementations/main.py
Outdated
Show resolved
Hide resolved
astropy/timeseries/periodograms/lombscargle_multiband/implementations/main.py
Outdated
Show resolved
Hide resolved
astropy/timeseries/periodograms/lombscargle_multiband/implementations/mle.py
Outdated
Show resolved
Hide resolved
NORMALIZATIONS = ["standard", "psd", "log", "model"] | ||
|
||
|
||
@pytest.fixture |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if you put the pytest fixtures in a conftest.py
file in this directory, you don't need the empty __init__
file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We usually always leave the __init__.py
files in this package otherwise it messes with package collection by setuptools.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I apologise for taking so long to review this - this is an impressive amount of work!
I have made a number of comments below, and in addition I have two main requests:
-
Would it be possible to add a test (or expand one of the existing ones) to check that
LombScargleMultiband
with a single band returns results consistent withLombScargle
? -
As this includes code copied from gatspy, we should include a copy of the gatspy license in the
licenses
folder. This was not a problem originally when theLombScargle
class was added because it was added by @jakevdp who is the copyright holder, but we should add it now.
Once you've had a chance to reply/implement my comments, please ping me and I'll make sure we move this forward so that it can be included in the 5.3 release.
astropy/timeseries/periodograms/lombscargle_multiband/__init__.py
Outdated
Show resolved
Hide resolved
from .implementations import available_methods, lombscarglemultiband | ||
from .implementations.mle import construct_regularization, design_matrix, periodic_fit | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Include an __all__
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to allow for it to just be from .implementations.mle import *
, correct? The difficulty I see with that is that we get into trouble with Flake8, (https://www.flake8rules.com/rules/F405.html), where it the imports are undefined. If we don't care about violating Flake8 that's fine, but there does seem to be an argument to be made for keeping these imports explicit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessarily to enable using *
but more to also restrict what can be imported from modules. In general we define __all__
in all files - this serves as a way to be explicit about what is 'public' API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding __all__
would prevent users from importing say 'has_units' from this module, because they might do it and claim it was 'public API' because it worked and there were no underscores at the start of the name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks for the clarification! Added it in to the files that were missing it.
import numpy as np | ||
|
||
from astropy import units | ||
from astropy import units as u |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the first units import needed? Can you just use u.
?
) | ||
|
||
self.normalization = normalization | ||
self.Nterms_base = nterms_base |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would avoid capitalizing the attribute names here - better to use the same name as was passed in.
minimum_frequency = minimum_frequency.to(1 / unit) | ||
maximum_frequency = maximum_frequency.to(1 / unit) | ||
|
||
Nf = 1 + int(np.round((maximum_frequency - minimum_frequency) / df)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would avoid using uppercase letters in variables such as this - maybe n_freq would be more appropriate and readable.
interface to handle multiband data (where multiple bands/filters are present). | ||
|
||
The code here is adapted from the `astroml`_ package ([3]_, [4]_) and the | ||
`gatspy`_ package ([5]_, [6]_), but conforms closely to the design paradigms |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here of course it's fine to leave a reference to astroml and gatspy as in the original LombScargle page.
docs/timeseries/lombscarglemb.rst
Outdated
... time_delta=deltas*u.minute) | ||
|
||
>>> # g band fluxes | ||
>>> ts1['g_flux'] = [np.nan] * 180 * u.mJy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does LombScargleMultiband also recognise proper masked values instead of NaN values? We should make sure it does and perhaps the documentation should actually also show this by default. Either way, text should be added to mention that this example table has missing data and how it is indicated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LombScargleMultiband treats NaN values the same way LombScargle does, which is to say that both classes don't handle them at all when passing arrays with NaN values directly through. But both check for them in the y and dy columns when ingest from a timeseries object. As for proper masked values, do you mean like from a numpy.masked.array? Under the hood it's just using np.isnan calls, which should work well with numpy's masked arrays, and from then on we treat them like numpy arrays, which I believe masked arrays naturally accommodate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As long as this is consistent with LombScargle this is fine for now - but yes I mean tables with missing values as shown here:
https://docs.astropy.org/en/stable/table/masking.html
This is the 'preferred' way of indicating missing data as opposed to using NaN values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you have time, could you check if masked tables work as well as NaN values? If so, I wonder if a masked table would look nicer in the docs? (but ignore for now if short on time).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just updated the from_timeseries function to support proper masked values, and updated the docs to show a masked table instead of just a table with NaN values. Also updated the test suite to use proper masking. So just to be clear, if a user creates a timeseries object and loads it into LombScargleMultiband, it recognizes proper masking and handles it accordingly. The LombScargleMultiband class itself still doesn't handle NaN values if they are included in array inputs, which is how LombScargle behaves. I would think a change to this behavior should go in a separate PR that also modifies LombScargle to keep the two consistent.
docs/timeseries/lombscarglemb.rst
Outdated
follows: | ||
|
||
>>> ls = LombScargleMultiband.from_timeseries(ts1, signal_column_name=['g_flux', 'r_flux', 'i_flux'], | ||
... uncertainty=['g_err', 'r_err', 'i_err'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This example makes me think this be 'uncertainty_column_name' for consistency? Or maybe it should be signal_column and uncertainty_column to make it a little more concise.
``signal_column_name``. ``uncertainty`` specifies the columns containing the | ||
associated errors per band, while ``band_labels`` provides the labels to use | ||
for each photometric band. From here, | ||
:class:`~astropy.timeseries.LombScargleMultiband` can be worked with as normal. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe say 'as for the LombScargle class' and link to that documentation page.
for each photometric band. From here, | ||
:class:`~astropy.timeseries.LombScargleMultiband` can be worked with as normal. | ||
|
||
>>> frequency,power = ls.autopower() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this an example of how it can be used as normal? If so, prefix with 'for example:'
@astrofrog , thank you for the review! It's much appreciated. I've gone through an implemented nearly all of what you've requested. I've added a test that checks for equivalent output in the single-band case for lombscarglemultiband and lombscargle, and added the gatspy license. Also, I settled on "flexible" as the replacement name for the "gatspy" method. As the method is better able to accommodate different models and input datasets thanks to its use of regularization. If you think we should come up with a different name, just let me know. I replied to a few suggestions above as well, we can talk those through and if that all looks good then I think this is in a good spot with regards to your review. |
The commits need to be squashed. While you are at it, maybe also rebase. Though I am not sure if this PR is too late for v5.3. It has |
We should get this in for the feature freeze, will respond to comments this evening |
eee8265
to
8c63a8e
Compare
@pllim @astrofrog squashed and rebased. There's an allowed failure in the CI that looks unrelated to the contents of this PR. We'd really like to get this into v5.3, and I'm willing to turn around any further needed changes quickly once @astrofrog has had a chance to look over the last few things. |
Yes, remote data failure (timeout) is unrelated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking good! Just a few comments, mostly small, below. The only 'big' comment is about checking whether masked tables work and if so using that in the docs instead of NaN but this is not a deal-breaker and we can postpone to a follow-up if needed.
I'll be able to review this again on Thursday and can merge then if the other comments are implemented.
established in :class:`~astropy.timeseries.LombScargle`. | ||
|
||
Two implementations of the Multiband Lomb-Scargle Periodogram are available | ||
within :class:`~astropy.timeseries.LombScargleMultiband`, ``gatspy`` and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here and below, should be flexible
instead of gatspy
?
docs/timeseries/lombscarglemb.rst
Outdated
.. EXAMPLE START: Loading from a :class:`~astropy.timeseries.TimeSeries` object | ||
|
||
Consider the following generator code for a | ||
:class:`~astropy.timeseries.TimeSeries` object. Where timeseries data is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Join sentences?
docs/timeseries/lombscarglemb.rst
Outdated
>>> dy_i = 0.01 * (0.5 + rng.random(60)) # uncertainties | ||
>>> y_i += dy_i * rng.standard_normal(60) | ||
>>> ts1['i_flux'].value[120:] = y_i | ||
>>> ts1['i_err'].value[120:] = dy_i |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe worth printing out the table in the last cell to show what it looks like?
docs/timeseries/lombscarglemb.rst
Outdated
... time_delta=deltas*u.minute) | ||
|
||
>>> # g band fluxes | ||
>>> ts1['g_flux'] = [np.nan] * 180 * u.mJy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you have time, could you check if masked tables work as well as NaN values? If so, I wonder if a masked table would look nicer in the docs? (but ignore for now if short on time).
docs/whatsnew/5.3.rst
Outdated
established in :class:`~astropy.timeseries.LombScargle`. | ||
|
||
Two implementations of the Multiband Lomb-Scargle Periodogram are available | ||
within :class:`~astropy.timeseries.LombScargleMultiband`, ``gatspy`` and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gatspy -> flexible (also below)
from .implementations import available_methods, lombscarglemultiband | ||
from .implementations.mle import construct_regularization, design_matrix, periodic_fit | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding __all__
would prevent users from importing say 'has_units' from this module, because they might do it and claim it was 'public API' because it worked and there were no underscores at the start of the name.
@pllim - in general I wonder if we should recommend that when squashing we still keep the original commit(s) separate from the post-review ones? (rather than squashing down to one commit). Otherwise it's hard for me to check what has changed after my review, and this can be an issue in a PR this big. |
Errr... I meant to squash out the merge commits. I didn't mean to squash everything into a single commit. Sorry for the misunderstanding! If you really want the old stuff back, maybe checkout the old hashes mentioned above? For example: Also, if we ever enable "squash and merge," then we don't have to worry about such things no more. Please state your thoughts on #14473 |
Sorry about the full squash, didn't realize that this wasn't what was being requested! I'll make sure to leave these subsequent commits without squashing for ease of review 👍 |
It was my bad. My ask was too vague. |
@dougbrn, |
@eerovaher thanks for letting me know, have it set up now and will use it going forward. @astrofrog, I have addressed the last few issues you flagged. Namely, adding all's to the files that were missing them, fixed the doc issues you found, and added support for proper masked values when loading in from a timeseries object. I also responded in-thread where appropriate. I rebased again, as I saw @pllim 's message on slack yesterday indicating that there was need to do so. And finally, I squashed the commits from this review into a single commit, which should keep it easy to take a look at these specific changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for implementing all my comments! In particular I think that in the docs it'll now be much easier for users to understand the table example (I certainly found it a lot easier to visualize myself). This is good to go, let's get it in! 😄
Thanks again for all the work, and apologies for the delays!
only keep first paragraph as the rest already in What's New. Co-authored-by: Simon Conseil <contact@saimon.org>
Description
This pull request is to address the need for Lomb-Scargle Periodograms that are compatible with multiband data.
This PR includes a full implementation, with test suite and documentation, of a Lomb-Scargle Periodogram class that is capable of working with multiband data. This class inherits from the existing LombScargle class, but extends it's functionality to understand bands, which are passed as an additional required parameter. Like the original port of Lomb-Scargle, this class ports the core implementation that exists in Gatspy: https://github.com/astroML/gatspy, two implementations are ready in this PR:
LombScargleMultiband adheres to the conventions and API of LombScargle where possible, and the user experience is very similar. The core functions available at this time are: power, autofrequency, autopower, design_matrix, model_parameters, offset, and model.
The notable omissions from the class are:
Fixes #9982
Checklist for package maintainer(s)
This checklist is meant to remind the package maintainer(s) who will review this pull request of some common things to look for. This list is not exhaustive.
Extra CI
label. Codestyle issues can be fixed by the bot.no-changelog-entry-needed
label. If this is a manual backport, use theskip-changelog-checks
label unless special changelog handling is necessary.astropy-bot
check might be missing; do not let the green checkmark fool you.backport-X.Y.x
label(s) before merge.