Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the LombScargleMultiband Periodogram Class #14016

Merged
merged 2 commits into from Apr 19, 2023

Conversation

dougbrn
Copy link
Contributor

@dougbrn dougbrn commented Nov 16, 2022

Description

This pull request is to address the need for Lomb-Scargle Periodograms that are compatible with multiband data.

This PR includes a full implementation, with test suite and documentation, of a Lomb-Scargle Periodogram class that is capable of working with multiband data. This class inherits from the existing LombScargle class, but extends it's functionality to understand bands, which are passed as an additional required parameter. Like the original port of Lomb-Scargle, this class ports the core implementation that exists in Gatspy: https://github.com/astroML/gatspy, two implementations are ready in this PR:

  • Gatspy: The gatspy multiband floating mean periodogram, using matrix formalism, as outlined in VanderPlas et al. 2015 (https://arxiv.org/pdf/1502.01344.pdf)
  • Fast: A conceptual port of Gatspy's fast algorithm, but leverages LombScargle directly to do band-by-band fits.

LombScargleMultiband adheres to the conventions and API of LombScargle where possible, and the user experience is very similar. The core functions available at this time are: power, autofrequency, autopower, design_matrix, model_parameters, offset, and model.

The notable omissions from the class are:

  • False alarm probabilities: LombScargle's function is not valid for models with complexity beyond a single sinusoid, common for LombScargleMultiband
  • Timeseries object compatibility: Astropy Timeseries does not consider multiple bands, so we did not incorporate direct support for it, although the columns may be easily input as they are just array inputs for t, y, bands, and dy respectively.

Fixes #9982

Checklist for package maintainer(s)

This checklist is meant to remind the package maintainer(s) who will review this pull request of some common things to look for. This list is not exhaustive.

  • Do the proposed changes actually accomplish desired goals?
  • Do the proposed changes follow the Astropy coding guidelines?
  • Are tests added/updated as required? If so, do they follow the Astropy testing guidelines?
  • Are docs added/updated as required? If so, do they follow the Astropy documentation guidelines?
  • Is rebase and/or squash necessary? If so, please provide the author with appropriate instructions. Also see "When to rebase and squash commits".
  • Did the CI pass? If no, are the failures related? If you need to run daily and weekly cron jobs as part of the PR, please apply the Extra CI label. Codestyle issues can be fixed by the bot.
  • Is a change log needed? If yes, did the change log check pass? If no, add the no-changelog-entry-needed label. If this is a manual backport, use the skip-changelog-checks label unless special changelog handling is necessary.
  • Is this a big PR that makes a "What's new?" entry worthwhile and if so, is (1) a "what's new" entry included in this PR and (2) the "whatsnew-needed" label applied?
  • Is a milestone set? Milestone must be set but astropy-bot check might be missing; do not let the green checkmark fool you.
  • At the time of adding the milestone, if the milestone set requires a backport to release branch(es), apply the appropriate backport-X.Y.x label(s) before merge.

@dougbrn
Copy link
Contributor Author

dougbrn commented Nov 16, 2022

@mwcraig you requested a ping for reviewing this once it was in. Thanks in advance!

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Welcome to Astropy 👋 and congratulations on your first pull request! 🎉

A project member will respond to you as soon as possible; in the meantime, please have a look over the Checklist for Contributed Code and make sure you've addressed as many of the questions there as possible.

If you feel that this pull request has not been responded to in a timely manner, please send a message directly to the development mailing list. If the issue is urgent or sensitive in nature (e.g., a security vulnerability) please send an e-mail directly to the private e-mail feedback@astropy.org.

@dougbrn
Copy link
Contributor Author

dougbrn commented Jan 18, 2023

@pllim @mwcraig @aaryapatil I just wanted to check in on this. Is this still in the plans to be reviewed for v5.3? Let me know if there's anything I can do to assist. Thanks in advance! 🙏

@pllim
Copy link
Member

pllim commented Jan 23, 2023

The CI fails at multiple places. While I have pinged timeseries maintainers to review, you should rebase and fix the CI. Thanks!

@astrofrog
Copy link
Member

I will try and review this shortly!

@astrofrog
Copy link
Member

astrofrog commented Feb 4, 2023

@dougbrn - one quick question I have before reviewing this - you mentioned that there is no compatibility with timeseries objects, but can one not represent multiband photometry with several different columns in the time series?

@dougbrn
Copy link
Contributor Author

dougbrn commented Feb 4, 2023

Hi @astrofrog , thanks for taking the time to look at this. You are right in that a user could use a column for each band. At present, lombscargle multiband expects a single flux array, with an accompanying band array that labels each measurement with a specific band label. So going from the band-per-column representation would just look like stacking the columns and adding a label column.

However, my hesitance to support the timeseries object came from my understanding that the timeseries object has no standard way of representing multiband information. So while one user could provide column-per-band, another could provide a single flux column, with a band label column. With no standard, it wasn't clear how to best setup lombscargle multiband to handle the timeseries class. I'm open to ideas on how to get these two talking, if you have any!

@bsipocz
Copy link
Member

bsipocz commented Feb 6, 2023

fwiw, we have experimented with a multiband class and it works reasonably well out of the box, but got some pushback, or better, some non-response because it is not how some people were constructing ztf light curves (I can't recall any responses to my concerns for the labelling approach).

Needless to say, I think a multicolumn class is a much better fit for a broad usage astropy is aiming for, so ideally LS should be able to handle it.

@dougbrn
Copy link
Contributor Author

dougbrn commented Feb 6, 2023

@bsipocz If it's fair to assume from the timeseries side that multiband will only be implemented as multi-column, then I can look into getting a data loader for that format into multiband_lombscargle, I'll probably need to overload the existing from_timeseries function that is written in BasePeriodogram. Do you envision that timeseries will eventually be built out to more explicitly support this? (And if so does it make sense to wait on this at all?)

@bsipocz
Copy link
Member

bsipocz commented Feb 6, 2023

@dougbrn - It's already pretty easy to subclass TimeSeries to support multiband columns. It kind of works out of the box, unlike doing something non-standard, like adding a label in a separate column. My concerns I listed to you in the slack thread still stand.

@dougbrn
Copy link
Contributor Author

dougbrn commented Feb 6, 2023

@bsipocz Okay, this sounds good. I'll work on a commit that allows a multicolumn timeseries object as an input.

@bsipocz
Copy link
Member

bsipocz commented Feb 6, 2023

Yes, I expect it to be reasonably straightforward, basically, it could take a list of column names rather than one column name.

@dougbrn
Copy link
Contributor Author

dougbrn commented Feb 6, 2023

@bsipocz @astrofrog Latest commit implements a from_timeseries function, with the necessary assumption of band-per-column formatting. Added some accompanying unit tests and updated the documentation. May need to push another commit or two if the CI tests fail (I've been having a finicky experience with astropy's testing of .rst files, and my local testing appears insufficient to predict the result of the CI testing...), but otherwise this should be ready for some feedback.

@dougbrn dougbrn force-pushed the multiband_ls branch 2 times, most recently from 88ee3a5 to f3c24ac Compare February 21, 2023 21:48
@dougbrn
Copy link
Contributor Author

dougbrn commented Feb 22, 2023

@astrofrog @bsipocz , this is now passing all CI tests, just in case either of you were holding off until then!

Copy link

@delucchi-cmu delucchi-cmu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly formatting/punctuation nits because I'm not familiar with the math at all.

docs/changes/timeseries/14016.feature.rst Outdated Show resolved Hide resolved
docs/changes/timeseries/14016.feature.rst Outdated Show resolved Hide resolved
docs/timeseries/lombscarglemb.rst Outdated Show resolved Hide resolved
NORMALIZATIONS = ["standard", "psd", "log", "model"]


@pytest.fixture

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if you put the pytest fixtures in a conftest.py file in this directory, you don't need the empty __init__ file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We usually always leave the __init__.py files in this package otherwise it messes with package collection by setuptools.

Copy link
Member

@astrofrog astrofrog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I apologise for taking so long to review this - this is an impressive amount of work!

I have made a number of comments below, and in addition I have two main requests:

  • Would it be possible to add a test (or expand one of the existing ones) to check that LombScargleMultiband with a single band returns results consistent with LombScargle?

  • As this includes code copied from gatspy, we should include a copy of the gatspy license in the licenses folder. This was not a problem originally when the LombScargle class was added because it was added by @jakevdp who is the copyright holder, but we should add it now.

Once you've had a chance to reply/implement my comments, please ping me and I'll make sure we move this forward so that it can be included in the 5.3 release.

from .implementations import available_methods, lombscarglemultiband
from .implementations.mle import construct_regularization, design_matrix, periodic_fit


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Include an __all__

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to allow for it to just be from .implementations.mle import * , correct? The difficulty I see with that is that we get into trouble with Flake8, (https://www.flake8rules.com/rules/F405.html), where it the imports are undefined. If we don't care about violating Flake8 that's fine, but there does seem to be an argument to be made for keeping these imports explicit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily to enable using * but more to also restrict what can be imported from modules. In general we define __all__ in all files - this serves as a way to be explicit about what is 'public' API.

Copy link
Member

@astrofrog astrofrog Apr 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding __all__ would prevent users from importing say 'has_units' from this module, because they might do it and claim it was 'public API' because it worked and there were no underscores at the start of the name.

Copy link
Contributor Author

@dougbrn dougbrn Apr 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks for the clarification! Added it in to the files that were missing it.

import numpy as np

from astropy import units
from astropy import units as u
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the first units import needed? Can you just use u.?

)

self.normalization = normalization
self.Nterms_base = nterms_base
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would avoid capitalizing the attribute names here - better to use the same name as was passed in.

minimum_frequency = minimum_frequency.to(1 / unit)
maximum_frequency = maximum_frequency.to(1 / unit)

Nf = 1 + int(np.round((maximum_frequency - minimum_frequency) / df))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would avoid using uppercase letters in variables such as this - maybe n_freq would be more appropriate and readable.

interface to handle multiband data (where multiple bands/filters are present).

The code here is adapted from the `astroml`_ package ([3]_, [4]_) and the
`gatspy`_ package ([5]_, [6]_), but conforms closely to the design paradigms
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here of course it's fine to leave a reference to astroml and gatspy as in the original LombScargle page.

... time_delta=deltas*u.minute)

>>> # g band fluxes
>>> ts1['g_flux'] = [np.nan] * 180 * u.mJy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does LombScargleMultiband also recognise proper masked values instead of NaN values? We should make sure it does and perhaps the documentation should actually also show this by default. Either way, text should be added to mention that this example table has missing data and how it is indicated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LombScargleMultiband treats NaN values the same way LombScargle does, which is to say that both classes don't handle them at all when passing arrays with NaN values directly through. But both check for them in the y and dy columns when ingest from a timeseries object. As for proper masked values, do you mean like from a numpy.masked.array? Under the hood it's just using np.isnan calls, which should work well with numpy's masked arrays, and from then on we treat them like numpy arrays, which I believe masked arrays naturally accommodate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as this is consistent with LombScargle this is fine for now - but yes I mean tables with missing values as shown here:

https://docs.astropy.org/en/stable/table/masking.html

This is the 'preferred' way of indicating missing data as opposed to using NaN values.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have time, could you check if masked tables work as well as NaN values? If so, I wonder if a masked table would look nicer in the docs? (but ignore for now if short on time).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just updated the from_timeseries function to support proper masked values, and updated the docs to show a masked table instead of just a table with NaN values. Also updated the test suite to use proper masking. So just to be clear, if a user creates a timeseries object and loads it into LombScargleMultiband, it recognizes proper masking and handles it accordingly. The LombScargleMultiband class itself still doesn't handle NaN values if they are included in array inputs, which is how LombScargle behaves. I would think a change to this behavior should go in a separate PR that also modifies LombScargle to keep the two consistent.

follows:

>>> ls = LombScargleMultiband.from_timeseries(ts1, signal_column_name=['g_flux', 'r_flux', 'i_flux'],
... uncertainty=['g_err', 'r_err', 'i_err'],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example makes me think this be 'uncertainty_column_name' for consistency? Or maybe it should be signal_column and uncertainty_column to make it a little more concise.

``signal_column_name``. ``uncertainty`` specifies the columns containing the
associated errors per band, while ``band_labels`` provides the labels to use
for each photometric band. From here,
:class:`~astropy.timeseries.LombScargleMultiband` can be worked with as normal.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe say 'as for the LombScargle class' and link to that documentation page.

for each photometric band. From here,
:class:`~astropy.timeseries.LombScargleMultiband` can be worked with as normal.

>>> frequency,power = ls.autopower()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this an example of how it can be used as normal? If so, prefix with 'for example:'

@dougbrn
Copy link
Contributor Author

dougbrn commented Mar 30, 2023

@astrofrog , thank you for the review! It's much appreciated. I've gone through an implemented nearly all of what you've requested. I've added a test that checks for equivalent output in the single-band case for lombscarglemultiband and lombscargle, and added the gatspy license. Also, I settled on "flexible" as the replacement name for the "gatspy" method. As the method is better able to accommodate different models and input datasets thanks to its use of regularization. If you think we should come up with a different name, just let me know. I replied to a few suggestions above as well, we can talk those through and if that all looks good then I think this is in a good spot with regards to your review.

@pllim
Copy link
Member

pllim commented Apr 18, 2023

The commits need to be squashed. While you are at it, maybe also rebase.

Though I am not sure if this PR is too late for v5.3. It has +2,265 −0 and it is only 3 days to feature freeze and @astrofrog requested changes but has yet to respond to the changes.

@astrofrog
Copy link
Member

We should get this in for the feature freeze, will respond to comments this evening

@dougbrn dougbrn force-pushed the multiband_ls branch 2 times, most recently from eee8265 to 8c63a8e Compare April 18, 2023 18:01
@dougbrn
Copy link
Contributor Author

dougbrn commented Apr 18, 2023

@pllim @astrofrog squashed and rebased. There's an allowed failure in the CI that looks unrelated to the contents of this PR. We'd really like to get this into v5.3, and I'm willing to turn around any further needed changes quickly once @astrofrog has had a chance to look over the last few things.

@pllim
Copy link
Member

pllim commented Apr 18, 2023

Yes, remote data failure (timeout) is unrelated.

Copy link
Member

@astrofrog astrofrog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good! Just a few comments, mostly small, below. The only 'big' comment is about checking whether masked tables work and if so using that in the docs instead of NaN but this is not a deal-breaker and we can postpone to a follow-up if needed.

I'll be able to review this again on Thursday and can merge then if the other comments are implemented.

established in :class:`~astropy.timeseries.LombScargle`.

Two implementations of the Multiband Lomb-Scargle Periodogram are available
within :class:`~astropy.timeseries.LombScargleMultiband`, ``gatspy`` and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and below, should be flexible instead of gatspy?

.. EXAMPLE START: Loading from a :class:`~astropy.timeseries.TimeSeries` object

Consider the following generator code for a
:class:`~astropy.timeseries.TimeSeries` object. Where timeseries data is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Join sentences?

>>> dy_i = 0.01 * (0.5 + rng.random(60)) # uncertainties
>>> y_i += dy_i * rng.standard_normal(60)
>>> ts1['i_flux'].value[120:] = y_i
>>> ts1['i_err'].value[120:] = dy_i
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe worth printing out the table in the last cell to show what it looks like?

... time_delta=deltas*u.minute)

>>> # g band fluxes
>>> ts1['g_flux'] = [np.nan] * 180 * u.mJy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have time, could you check if masked tables work as well as NaN values? If so, I wonder if a masked table would look nicer in the docs? (but ignore for now if short on time).

established in :class:`~astropy.timeseries.LombScargle`.

Two implementations of the Multiband Lomb-Scargle Periodogram are available
within :class:`~astropy.timeseries.LombScargleMultiband`, ``gatspy`` and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gatspy -> flexible (also below)

from .implementations import available_methods, lombscarglemultiband
from .implementations.mle import construct_regularization, design_matrix, periodic_fit


Copy link
Member

@astrofrog astrofrog Apr 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding __all__ would prevent users from importing say 'has_units' from this module, because they might do it and claim it was 'public API' because it worked and there were no underscores at the start of the name.

@astrofrog
Copy link
Member

@pllim - in general I wonder if we should recommend that when squashing we still keep the original commit(s) separate from the post-review ones? (rather than squashing down to one commit). Otherwise it's hard for me to check what has changed after my review, and this can be an issue in a PR this big.

@pllim
Copy link
Member

pllim commented Apr 18, 2023

in general I wonder if we should recommend that when squashing we still keep the original commit(s) separate from the post-review ones?

Errr... I meant to squash out the merge commits. I didn't mean to squash everything into a single commit. Sorry for the misunderstanding!

If you really want the old stuff back, maybe checkout the old hashes mentioned above? For example:

Screenshot 2023-04-18 190517

Also, if we ever enable "squash and merge," then we don't have to worry about such things no more. Please state your thoughts on #14473

@dougbrn
Copy link
Contributor Author

dougbrn commented Apr 18, 2023

Sorry about the full squash, didn't realize that this wasn't what was being requested! I'll make sure to leave these subsequent commits without squashing for ease of review 👍

@pllim
Copy link
Member

pllim commented Apr 19, 2023

Sorry about the full squash

It was my bad. My ask was too vague.

@eerovaher
Copy link
Member

@dougbrn, astropy uses the pre-commit framework which can automatically fix trailing whitespaces and more. See https://docs.astropy.org/en/stable/development/workflow/development_workflow.html#pre-commit for instructions on how to use it.

@dougbrn
Copy link
Contributor Author

dougbrn commented Apr 19, 2023

@eerovaher thanks for letting me know, have it set up now and will use it going forward.

@astrofrog, I have addressed the last few issues you flagged. Namely, adding all's to the files that were missing them, fixed the doc issues you found, and added support for proper masked values when loading in from a timeseries object. I also responded in-thread where appropriate. I rebased again, as I saw @pllim 's message on slack yesterday indicating that there was need to do so. And finally, I squashed the commits from this review into a single commit, which should keep it easy to take a look at these specific changes.

Copy link
Member

@astrofrog astrofrog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for implementing all my comments! In particular I think that in the docs it'll now be much easier for users to understand the table example (I certainly found it a lot easier to visualize myself). This is good to go, let's get it in! 😄

Thanks again for all the work, and apologies for the delays!

@astrofrog astrofrog merged commit a699ebb into astropy:main Apr 19, 2023
24 of 25 checks passed
pllim added a commit to pllim/astropy that referenced this pull request Apr 29, 2023
pllim added a commit to pllim/astropy that referenced this pull request Apr 29, 2023
only keep first paragraph as the rest already in What's New.

Co-authored-by: Simon Conseil <contact@saimon.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Move multiband LombScargle from gatspy
6 participants