Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added new astropy.timeseries sub-package #8540

Merged
merged 97 commits into from Apr 19, 2019

Conversation

Projects
None yet
8 participants
@astrofrog
Copy link
Member

commented Apr 1, 2019

Background

This pull request adds a new astropy.timeseries sub-package to the core package. The aim of this sub-package is to provide core classes for representing time series, using QTable as a foundation. The general design of this sub-package has been discussed in APE9 which is not yet accepted, but the hope is to try and bring this to a conclusion before the 3.2 release.

This new sub-package has been developed over the last year in a standalone package called astropy-timeseries - and we announced it was available for testing on the astropy-dev list back in December.

Overall design

Those of you who have used Table/QTable with mix-in columns will know that it's already possible to represent time series with a Table and a mix-in Time column. This PR just provides a sub-class of QTable that does this automatically so that this becomes more user-friendly - but fundamentally TimeSeries still has all the flexibility of Table/QTable.

There are two main classes here, TimeSeries and BinnedTimeSeries - there is a detailed discussion on this in APE9, but the short version is that in most cases people just have some quantity vs time and TimeSeries is sufficient for this, but there are cases where being explicit about the binning (e.g. with binned X-ray time series).

Note that we are subclassing from QTable instead of Table since QTable is the way we would have designed the table class from the start if we could, so since this is a new class we have the opportunity to introduce the 'sensible' behavior from the start in relation to columns with units being quantities.

Documentation

You can find a preview of the documentation at http://astropy-timeseries.readthedocs.org

Path forward

I think it would be nice to aim to merge this before the 3.2 feature freeze. I am opening this well in advance of the feature freeze to allow people to have time to play with it. I recommend trying out the standalone astropy-timeseries package. If you would like to make improvements, you are welcome to make pull requests to https://github.com/aperiosoftware/astropy-timeseries as I have a script to easily convert that package into the branch for this pull request. None of the API is set in stone, so we can still discuss/work on it!

In any case, feel free to start reviewing this now!

Future work

Here are some things that I think we should do, though preferably in a separate pull request

  • Move the periodogram functionality in the core package to astropy.timeseries while keeping backward-compatible imports for now in astropy.stats, and ensure that this functionality can be used directly with the TimeSeries class. This would be good to have done in 3.2 for when we announce the new timeseries subpackage

  • Develop a mix-in class for NDData that can be used to construct Time Series cubes - this came up as a separate way to represent time series in APE 9, and this is equally valid, so I think it would be good to support this. However, it would be good to first have the latest WCSLIB in astropy (#7929) so that we can represent time axes in WCS.

Fixes #2205

bsipocz and others added some commits Jun 6, 2018

Addressed review comments by @pllim
* Added entry in stability page of docs
* Add documentation about missing values in the pandas conversion
* Added Kepler/TESS reader to the API docs
* Use real files to demonstrate I/O
* Clarify example of .loc
* Added a note about duplicates and sorting with vstack
* Fix example on analysis page
* Clean up formatting and docstrings
* Improve example in docs to include units
Address review comments by @eteq
* Add a new time_start argument to TimeSeries for the case where a scalar start time is provided
* Improve the arguments to simple_downsample
* Added information about time scales
* Renamed simple_downsample to aggregate_downsample

@astrofrog astrofrog force-pushed the astrofrog:timeseries branch from 442975e to c7e6200 Apr 17, 2019

@astrofrog

This comment has been minimized.

Copy link
Member Author

commented Apr 17, 2019

Thanks everyone for your reviews! 🙏 😅 😅 😅

I've done my best to try and address everything, and where I couldn't address it now I've opened new issues and linked to them. I've left a few comments as unresolved above so you can see the answer, but if you are happy with it please mark them as resolved.


@pllim - I've accepted all your suggestions and implemented most of your comments (most of the non-suggestion changes are in 29da87b if you want to check)

Regarding the inclusion of the Kepler/TESS readers - I think that in cases where the format is well defined and documented and versioned, it's ok to have readers/writers in the core package? (after all, for io.ascii we have e.g. the IPAC reader which is reasonably specific but well documented). What I wouldn't want to see in the core package are formats that aren't actually documented. Also since Kepler/TESS time series are quite common at the moment, having these readers might entice more users to start using this subpackage. Does that seem reasonable?

Regarding the astropy-benchmarks - yes we should add some there, but that can wait until after feature freeze. It's also too soon indeed for performance tips I think.


@eteq - I've accepted all your suggestions and implemented most of your comments (most of the non-suggestion changes are in c7e6200 if you want to check).

Regarding the periodograms and moving them over to astropy.timeseries, I'll try and do this tomorrow in a separate PR.


@taldcroft - thanks for your detailed comments! Here are some thoughts below:

The current implementation tries to ensure that a TimeSeries has the necessary columns, but this approach leads to somewhat intricate code, e.g. in add_columns. Sometimes intricate code can mean the behavior for users is likewise intricate. There is also opportunity for holes / problems, and right now you can get a malformed TimeSeries by deleting or renaming the time column. So what about lazy checking, naming a property like has_required_columns and a class level attribute required_columns. Then any methods that actually needs the columns can just check that.

I see your point, but I don't really like that idea of being able to have a TimeSeries object without a time column for quite a while until eventually e.g. .fold() or another time series-specific method is used. The issue is that since this is mostly a container class with not much functionality, many callers will actually be third-party functions which will be expecting the time column to be there (and we can't decorate them). At the same time, you are right that currently one can e.g. rename or remove the time column without any warning/error, so this is clearly not ideal. My plan is to try and investigate a little more how we could get around this, but I might run out of time for the 3.2 feature freeze. However, arguably the current behavior of being able to rename or remove the time column is a bug, so I believe this is something I could address in a bug fix release in the worst case.

Related to the above, I'm not so keen on the auto-downgrade to QTable. I think it is cleaner to let the user do what they want, which might include temporary operations where the table does not have the required columns.

I've gone back and forth on this and I do think dropping down to QTable if the user does e.g. ts['flux', 'error'] is the right behavior, since if they really wanted to keep it a time series, they could easily keep the time column. I don't think a common use case is going to be to extract two non-time columns and add a new/different time column. I propose leaving this as-is for now and re-assessing over the next six months once people start using this.

When I saw the docs I immediately thought about a class method like Time.series() that puts all that syntax into its own (independently) useful framework.

I can also see the benefit of this, but now for users creating a time series would be a two-step operation instead of a one-step operation. The class method Time.series is elegant but in my view actually makes things a little less obvious for users. The current API has been discussed for a while in the APE, so I'd rather keep it the way it is currently - it might not be as elegant, but I think it'll be more intuitive for users to just deal with the TimeSeries initializer rather than doing things in two steps.

Likewise, the vector column approach for binned time series is an elegant use of the Time/Table capabilities, but I worry that it's going to also end up being quite confusing for users. It's also not clear how indexing would work in this case.

Having said that, I'm totally on board with being able to use things like np.linspace etc on Time as I can see that being very useful.

Another thought is the fundamental name TimeSeries, which makes me think of a pandas Series, not a Table. But I'm not sure if the obvious TimeTable is more or less confusing for most users.

Even though TimeTable would be more logical, I think most astronomers would think of TimeSeries as meaning the time + all the associated data, as a table, so I think we are ok.


At this point, I'll track down the last failures in CI, but otherwise @pllim and @eteq feel free to take another look and leave more comments if necessary or approve if you are happy with it. I'll be working on moving the periodograms in a separate PR.

@astrofrog astrofrog force-pushed the astrofrog:timeseries branch 3 times, most recently from e0f7244 to 7e5d19f Apr 17, 2019

@astrofrog astrofrog force-pushed the astrofrog:timeseries branch from 7e5d19f to 23da5f3 Apr 18, 2019

@eteq

This comment has been minimized.

Copy link
Member

commented Apr 18, 2019

A few quick follow-ons from above:

this approach leads to somewhat intricate code

I sympathize with @taldcroft's suggestion here. But I'm not sure I'm a fan of doing it "lazy", because it feels as likely to end up inconsistent as the approach @astrofrog is using for un-decorated methods. Perhaps a safer thing to do is instead centralize all the checking in one method and call it on demand as @taldcroft suggests in the lazy way as well as at the end of the TimeSeries initializer? That way it seems like we have our cake and eat it too. Relatedly, I think we would not want the rename example as posed by @taldfroft to work, but rather:

t = Table.read('junk.ecsv')
t.rename_column('col2', 'time')
ts = TimeSeries(t)

Because then the TimeSeries object is always a time series.

Relatedly I also favor TimeSeries over TimeTable purely because that's the terminology used in astro for such things. I think it's more likely to be findable on that basis. And I think the QTable behavior is right for the same reason: it's most consistent with what the user might think that means.

I really like @taldcroft's idea of the Time.series() classmethod as an alternate way to make a TimeSeries, though. I could definitely see myself using that. That is, it would just yield a new TimeSeries with that one column populated (and maybe it would also get kwargs for additional columns although that feels optional. Perhaps that can be a follow-on PR though?

@pllim

This comment has been minimized.

Copy link
Member

commented Apr 18, 2019

Re: TimeSeries vs TimeTable -- I also prefer TimeSeries, because TimeTable makes me think of something I use to schedule my events for the day.

@pllim
Copy link
Member

left a comment

A question about its stability status and some nitpicks below.

API doc for io seems incomplete -- scroll down https://28570-2081289-gh.circle-artifacts.com/0/home/circleci/project/docs/_build/html/timeseries/index.html to see what I mean.

In https://28570-2081289-gh.circle-artifacts.com/0/home/circleci/project/docs/_build/html/timeseries/times.html#times-relative-to-other-times , I am curious if I can format the output of the delta time, say, to '.2f'. Is that possible?

Overall, this is fine by me. As long as API is not expected to change dramatically, we can push remaining stuff to bug fix or next release.

Show resolved Hide resolved astropy/timeseries/binned.py Outdated
Show resolved Hide resolved astropy/timeseries/binned.py Outdated
Show resolved Hide resolved astropy/timeseries/binned.py Outdated
Show resolved Hide resolved astropy/timeseries/binned.py Outdated
Show resolved Hide resolved astropy/timeseries/binned.py Outdated
Show resolved Hide resolved astropy/timeseries/sampled.py Outdated
Show resolved Hide resolved astropy/timeseries/sampled.py Outdated
Show resolved Hide resolved astropy/timeseries/sampled.py Outdated
Show resolved Hide resolved astropy/timeseries/sampled.py Outdated
astropy.timeseries
</td>
<td align='center'>
<span class="mature"></span>

This comment has been minimized.

Copy link
@pllim

pllim Apr 18, 2019

Member

Is "mature" an accurate categorization here? It conflicts with "in heavy development" just below.

This comment has been minimized.

Copy link
@astrofrog

astrofrog Apr 18, 2019

Author Member

woops, copy pasta 🍝

astrofrog and others added some commits Apr 18, 2019

Fixes to docstrings
Co-Authored-By: astrofrog <thomas.robitaille@gmail.com>
@astrofrog

This comment has been minimized.

Copy link
Member Author

commented Apr 18, 2019

@eteq:

I sympathize with @taldcroft's suggestion here. But I'm not sure I'm a fan of doing it "lazy", because it feels as likely to end up inconsistent as the approach @astrofrog is using for un-decorated methods. Perhaps a safer thing to do is instead centralize all the checking in one method and call it on demand as @taldcroft suggests in the lazy way as well as at the end of the TimeSeries initializer? That way it seems like we have our cake and eat it too. Relatedly, I think we would not want the rename example as posed by @taldfroft to work, but rather:

Ok so I've now cleaned up the checking significantly in e7c1807 - I've added a mechanism to decorate any method that adds/removes columns, and I check for consistency after the method is called. There are a few corner cases where we need to relax or disable the checks - for example when using vstack, an empty time series is temporarily created, so we need to account for that. What I've written up now deals with all these corner cases and I've made sure they are well tested so I think things are a lot more robust now.

I really like @taldcroft's idea of the Time.series() classmethod as an alternate way to make a TimeSeries, though. I could definitely see myself using that. That is, it would just yield a new TimeSeries with that one column populated (and maybe it would also get kwargs for additional columns although that feels optional. Perhaps that can be a follow-on PR though?

I definitely won't have time for this PR


@pllim:

API doc for io seems incomplete -- scroll down https://28570-2081289-gh.circle-artifacts.com/0/home/circleci/project/docs/_build/html/timeseries/index.html to see what I mean.

Should be fixed now

In https://28570-2081289-gh.circle-artifacts.com/0/home/circleci/project/docs/_build/html/timeseries/times.html#times-relative-to-other-times , I am curious if I can format the output of the delta time, say, to '.2f'. Is that possible?

No I don't think so. Maybe open an issue for astropy.time if you think this would be important, but here 'format' refers not so much to the precision but the type of string or float representation.

@pllim

pllim approved these changes Apr 18, 2019

Copy link
Member

left a comment

LGTM though I reviewed in the capacity of a "non-domain expert". Anything else like those issues opened as a result of others' reviews could be followed up in a future release. This is a very valuable addition. Thanks, @astrofrog !

@eteq

eteq approved these changes Apr 19, 2019

@eteq

This comment has been minimized.

Copy link
Member

commented Apr 19, 2019

LGTM now too, so I'm going to pull the trigger.

I definitely won't have time for this PR

Fair enough. I might take a pass at it but I'll make an issue now so it could always go in 4.0 if need be.

@eteq eteq merged commit ee12086 into astropy:master Apr 19, 2019

15 checks passed

astropy-bot:changelog Changelog entry consistent with milestone
astropy-bot:milestone This pull request has a milestone set.
ci/circleci: 32bit Your tests passed on CircleCI!
Details
ci/circleci: egg-info-37 Your tests passed on CircleCI!
Details
ci/circleci: html-docs Your tests passed on CircleCI!
Details
ci/circleci: image-tests-mpl202 Your tests passed on CircleCI!
Details
ci/circleci: image-tests-mpl212 Your tests passed on CircleCI!
Details
ci/circleci: image-tests-mpl222 Your tests passed on CircleCI!
Details
ci/circleci: image-tests-mpl300 Your tests passed on CircleCI!
Details
ci/circleci: image-tests-mpldev Your tests passed on CircleCI!
Details
codecov/patch 98.18% of diff hit (target 86.82%)
Details
codecov/project 86.91% (+0.08%) compared to 33db190
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
giles Click details to preview the documentation build
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.